Skip to main content
Enterprise AI Analysis: Phoenix: Thermal-Aware On-Device Inference of Multi-Instance DNNs for Mobile Video Applications

Enterprise AI Analysis

Phoenix: Thermal-Aware On-Device Inference of Multi-Instance DNNs for Mobile Video Applications

Running multiple deep neural networks (DNNs) simultaneously on mobile devices introduces challenges due to constrained computing resources. Previous research has explored the use of heterogeneous processors for accelerating DNN inference but often overlooks thermal issues, which can degrade computing power. In this article, we propose Phoenix, a system specifically designed to enhance the performance of multi-instance DNNs in video applications by maximizing accuracy and ensuring the achievement of a required frame rate. Phoenix allocates DNN tasks to the most suitable hardware processors, understanding complex thermal dynamics through reinforcement learning, and postpones the onset of thermal throttling. Despite optimized task allocation, continuous inference of multiple DNNs can still lead to thermal throttling. To manage performance degradation, Phoenix employs a multi-exit network, adaptively executing inference tasks to ensure consistent frame rates. Phoenix minimizes accuracy loss from early exits by optimally generating and operating multi-exit networks. We evaluated Phoenix using two different benchmarks and Virtual Youtuber streaming application. The results demonstrated that Phoenix effectively enhances device performance by delaying thermal throttling and achieving optimal accuracy while maintaining a consistent frame rate.

Authors: SEUNGHYEOK JEON, JIWON KIM, JEHO LEE, HOJUNG CHA

Publication Date: March 2026

Executive Impact: Transforming Mobile AI Performance

Phoenix directly addresses critical performance bottlenecks in multi-instance DNNs on mobile devices, ensuring high accuracy and consistent frame rates even under thermal stress. This proactive approach significantly enhances user experience and extends the operational lifespan of AI applications.

0s Thermal Throttling Delayed
0% Sustained FPS Improvement
0% Accuracy Maintained

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Problem Definition & Core Challenges

On-device artificial intelligence (AI) is gaining popularity for its benefits in privacy, real-time responses, and reduced cloud costs. Notably, many emerging immersive applications such as virtual YouTubers (VTuber) and virtual and augmented reality require the execution of multi-instance deep neural networks (DNNs) within a single app. However, the limited computing resources of mobile devices make running such computationally intensive tasks challenging. This work addresses the critical need to balance latency constraints and accuracy goals amidst thermal issues in multi-instance DNNs on mobile devices, proposing a system to mitigate performance degradation from thermal throttling and ensure a smooth user experience with consistent frame rates.

Phoenix RL-Based Task Allocation Flow

Monitor Device & Thermal State
RL Agent Decides Task Allocation
Configure Hardware & Execute DNNs
Observe Performance & Thermals
Update Policy with Rewards
Deploy Optimized Policy

Comparison of Multi-Exit Network Design Approaches

Feature Phoenix's Approach Existing NAS (e.g., OmniLive [37])
Number of Exits Dynamically configured based on deployed RL rules (processor frequency pairs), adapts to thermal throttling and latency constraints. Arbitrarily chosen (e.g., 3 or 5 exits), often fixed.
Exit Placement Selectively positioned after any network block, excluding blocks that do not meet latency criteria under thermal conditions. General network structure, less emphasis on real-time latency under throttling.
Architecture Configuration Customized Neural Architecture Search (NAS) considering processor-specific slowdown patterns and hardware-in-the-loop measurements. Based on highest/lowest frequencies, dividing resulting latency intervals into equal parts.
Thermal Awareness Explicitly designed to handle thermal throttling, optimizing for sustained performance and minimizing accuracy degradation from early exiting. Less explicit or primarily reactive to thermal events, rather than proactive architecture design.

Adaptive Runtime Scheduler

Phoenix employs a sophisticated runtime scheduler with two strategies: Best-Effort and Sustained Speed. The best-effort method maximizes accuracy at each instance, enforcing shallower exits only when latency constraints are violated. The sustained speed method proactively applies early exiting based on real-time temperature and latency feedback, using control theory and a probabilistic exit mechanism to ensure long-term performance stability and delay thermal throttling. This adaptive approach ensures consistent frame rates while optimizing for accuracy under dynamic thermal conditions.

Real-World Impact: VTuber Application

Phoenix was rigorously tested in a real-world Virtual YouTuber (VTuber) application, a complex scenario involving four concurrent DNNs (3D pose estimation, face detection, face recognition, facial mesh) on mobile devices. The system demonstrated its capability to adaptively manage multi-instance DNNs even in highly complex situations, such as co-running with a screen recorder and handling random touch interactions. Phoenix successfully maintained the target FPS, delivering optimal performance by intelligently adjusting DNN task allocations and early exit depths in real time. This ensures a smooth, high-quality user experience critical for the growing VTuber market, even under dynamic thermal and computational stresses.

Calculate Your Potential AI ROI

Estimate the economic impact of optimizing your enterprise AI operations with advanced thermal-aware inference.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your Path to Optimized AI: Implementation Timeline

We guide you through a structured process to integrate Phoenix's thermal-aware inference capabilities into your mobile AI ecosystem.

Phase 1: Discovery & Strategy (2-4 Weeks)

Comprehensive assessment of your existing mobile AI workloads, hardware, and performance objectives. Definition of target frame rates, accuracy goals, and thermal constraints for multi-instance DNNs.

Phase 2: Learning & Architecture Design (4-8 Weeks)

Deployment of Phoenix's RL agent for offline learning of optimal task allocation policies. Customized Neural Architecture Search (NAS) to design processor-specific multi-exit networks for your DNNs, considering thermal slowdown patterns.

Phase 3: Integration & Testing (3-6 Weeks)

Seamless integration of Phoenix's learned policies and multi-exit networks into your mobile applications. Rigorous testing under real-world conditions, including long-running video workloads and diverse thermal environments.

Phase 4: Deployment & Optimization (Ongoing)

Full deployment of Phoenix with adaptive runtime schedulers. Continuous monitoring and fine-tuning to ensure sustained peak performance, maximize accuracy, and maintain a consistent user experience as device conditions evolve.

Ready to Revolutionize Your Mobile AI?

Book a free consultation with our AI experts to discuss how Phoenix can elevate your enterprise's mobile video applications.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking