Skip to main content
Enterprise AI Analysis: Energy- and Quantization-aware DNN Partitioning in the Edge-Cloud Continuum (In Progress Paper)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Enterprise Process Flow for Edge-Cloud DL Inference

Model Profiling
Network Profiling
Execution Profiling
Plan Generation
Plan Actuation
Inference
0% R² Score for Quantization Noise Modeling (Degree 4 Polynomial)
Feature Our Approach Previous Works [10, 11]
Consideration of Quantization Effects
  • Execution Time
  • Data Transfer
  • Only on Partition Point [10]
  • Mainly Linear Classifiers [11]
Optimization Scope
  • Jointly minimizes inference time and energy
  • Jointly considers model splitting and quantization
  • Focus on inference time [10]
  • Focus on precision and bit-width [11]
DNN Topology Support
  • Complex DNNs (e.g., YOLO11)
  • Less complex topologies [10]
  • Mainly linear classifiers [11]

YOLO11 Model Evaluation & Performance Gains

The framework was evaluated on complex YOLO11 vision models (652 layers), achieving significant performance gains. Compared to device-only execution without quantization, the approach reduced inference times by up to 33.5% (from 3.64s to 2.42s in device+edge setup) and energy consumption by up to 35.0% (from 14.93J to 9.70J in device+edge setup). Quantization significantly contributed to these improvements, especially in device and device+edge configurations.

Ready to Optimize Your AI Inference?

Book a free 30-minute consultation with our experts to explore how distributed and quantized DNN inference can benefit your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking