Skip to main content
Enterprise AI Analysis: FaceGuide-DEIM: A Dual-Branch Facial Expression Detector with Fused Landmark Priors based on Detection with Improved Matching Transformers

Enterprise AI Analysis

FaceGuide-DEIM: A Dual-Branch Facial Expression Detector with Fused Landmark Priors based on Detection with Improved Matching Transformers

Authors: MINGQI SUN, XIAO YANG, CHENG PENG, KUN ZOU

Publication: ICCSMT 2025: Proceedings of the 2025 6th International Conference on Computer Science and Management Technology (December 2025)

Published Date: 01 April 2026 | DOI: https://doi.org/10.1145/3795154.3795293 | ISBN: 9798400719981

Executive Impact Summary

This research introduces FaceGuide-DEIM, a novel facial expression detector leveraging geometric priors and enhanced feature fusion. It significantly improves state-of-the-art performance in accurately identifying subtle emotional cues, a critical capability for advanced human-computer interaction systems and behavioral analytics. The method's dual-branch architecture, dual-gate prior modulation, and progressive channel attention fusion address limitations in conventional detectors, leading to more robust and precise facial expression analysis even in complex, real-world scenarios.

0 RAF-DB mAP50
0 SFEW mAP50
0 PDF Downloads
0 Total Citations

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction

Facial expression detection simultaneously involves object localization and fine-grained discrimination. Unlike conventional object detection, expressions are typically small in scale, subtle in magnitude, and spatially concentrated, and they are often accompanied by pose variations, occlusions, and unstable illumination. This requires a detector that can stably capture the geometric structures of key facial regions such as the eyebrows, eyes, and mouth, while maintaining multi-scale transmission and aggregation of discriminative information (FPN [20], PAN [13]). However, under limited data and complex scenarios, relying solely on the backbone to learn these structures and attention patterns from scratch often leads to slow convergence and insensitivity to hard examples (the COCO [11] evaluation also emphasizes stability under high IoU thresholds). Existing improvements mostly focus on local adjustments to the detection head or pyramid structures (YOLO [17], SSD [2], RetinaNet [10], FPN [20], PAN [13], DETR [5],DEIM [9]), but they lack an explicit and stable geometric alignment pathway, making it difficult for the model to anchor attention to expression-related regions early on. Meanwhile, the cross-level "channel-stacking" integration commonly seen in DEIM's CCFF [9] tends to inject redundant and mutually exclusive information into subsequent structures, diluting truly discriminative channel signals and undermining robustness in complex scenarios. Overall, the absence of (i) stable geometric prior injection (e.g., face landmarks and face recognition priors, ArcFace [7], POSTERV2 [15], FAN [4]) and (ii) channel-level filtering before fusion are two key factors that limit further improvements in expression detection. In the specific evolution of detection architectures, the transition from anchor-based CNNs to end-to-end Transformers has accelerated significantly. Following the original DETR, recent works have focused on bridging the gap between real-time efficiency and transformer-based accuracy. Notably, in 2024, advanced models like RT-DETR [12] and YOLOv10 [18] have redefined the trade-off by integrating efficient hybrid encoders and NMS-free training. However, as pointed out in a recent comprehensive survey on facial expression analysis [1], these general-purpose detectors optimize primarily for box-level Intersection over Union (IoU) rather than the fine-grained deformation of facial muscles. They treat expressive faces merely as generic objects, lacking the specialized geometric sensitivity required to distinguish subtle affective states. This limitation becomes particularly pronounced when dealing with hard examples where the semantic difference between expressions (e.g., "Fear" vs. "Surprise") is minimal but geographically distinct, necessitating a model design that goes beyond generic object detection paradigms.

Methodology

To this end, we propose FaceGuide-DEIM, which follows the design principle of "geometric prior guidance + controllable modulation + progressive fusion." First, we build a dual-branch backbone composed of a Landmark Prior Branch and an Expression Backbone Branch, aligning frozen face-geometry features with semantic features to provide stable geometric priors. Second, we design a Dual-Gate Face Prior Modulator, which maps priors into controllable modulation signals via channel and spatial gates to amplify key regions on demand (related to SE [6], CBAM [3], ECA [8], but customized with dual gating for expression priors). Third, we introduce Progressive Channel Attention Fusion, which—before features from the top-down path (FPN [20]) and bottom-up path (PAN [13]) enter the fusion block—replaces naive stacking with progressive channel re-calibration and scale alignment (MPCA [19]) to strengthen complementarity.

Experiments

Our proposed FaceGuide-DEIM is implemented using the PyTorch framework. For the dual-branch backbone, we utilize a pre-trained MobileFaceNet (from POSTERV2 [15]) as the Landmark Prior Branch (LPB), which remains frozen during the initial stages to provide stable geometric guidance. The model is trained on a server equipped with 4x NVIDIA A40 GPUs. Compared with DEIM (baseline), FaceGuide-DEIM improves mAP50 by 1.70 points, as shown in Table 1 and Table 2. At the class level, Disgust shows the most significant gain, with Anger, Surprise, Neutral, Sad, Happy also improving slightly, and Fear remaining roughly unchanged. This indicates that the dual-gated prior and PCAF's "pre-fusion channel re-calibration and scale alignment" effectively suppress cross-layer redundancy and accentuate geometry-related cues, delivering stable gains without modifying the detection head or the number of pyramid levels. Overall, mAP50 improves by 0.77 points over the baseline. Deformation-sensitive categories (Disgust, Sad, Happy, Surprise) benefit more, indicating enhanced robustness under occlusion and weak-texture scenarios. While a few categories fluctuate, the overall trend is positive. As reported in Table 3, without changing the head or pyramid configuration, PCAF raises AP50 from 84.10% to 84.40% and AP75% from 77.10% to 77.70%, with concurrent gains at medium, large scales (APm 57.90%→58.70%, API 62.40%→62.50%), validating the effectiveness of channel re-calibration and scale alignment before entering the fusion block. Adding LPB alone significantly improves AP50, AP75 and clearly boosts medium scale (APm 57.90%→61.40%), showing that stable geometric priors can consistently regularize the semantic stream. With DFPM, overall AP reaches the highest (62.10%) while maintaining multi-scale consistency, demonstrating the stable benefits of zero-initialized dual-gate modulation throughout training. The full system (LPB×EBB + DFPM + PCAF) achieves the highest AP50 (85.80%) and excels at small scales (APs 58.30%), indicating that intra-level prior modulation (DFPM) and inter-level attentive fusion (PCAF) can be mutually reinforcing and drive overall performance.

Conclusion

We present FaceGuide-DEIM, which introduces stable geometric priors via LPB×EBB, achieves controllable dual-gated (channel, spatial) modulation through DFPM, and performs filter-before-fusion via PCAF to reduce cross-layer redundancy and enhance complementarity. The method attains SOTA on RAF-DB and SFEW (mAP50: 85.8% and 76.2%). Future work will further improve robustness and generalization in complex, in-the-wild scenarios.

85.8% mAP50 on RAF-DB Achieved

Enterprise Process Flow

Dual-Branch Backbone (LPB + EBB)
Landmark Prior Branch (LPB) for Geometric Priors
Expression Backbone Branch (EBB) for Semantic Features
Dual-Gate Face Prior Modulator (DFPM) for Controlled Fusion
Progressive Channel Attention Fusion (PCAF) for Optimized Integration
State-of-the-Art Facial Expression Detection

FaceGuide-DEIM vs. Baseline (DEIM)

Feature DEIM (Baseline) FaceGuide-DEIM (Ours)
Geometric Prior Injection Lacking explicit pathway Stable, controlled via LPB
Feature Fusion Naive 'channel-stacking' (CCFF) Progressive Channel Attention Fusion (PCAF)
Modulation Limited Dual-Gate Face Prior Modulator (DFPM)
RAF-DB mAP50 84.10% 85.80% (1.7 pts gain)
SFEW mAP50 75.40% 76.17% (0.77 pts gain)
Robustness to Hard Examples Moderate Enhanced for deformation-sensitive categories

Impact of Dual-Gate Face Prior Modulator (DFPM)

The DFPM module significantly improves performance by allowing controlled injection of geometric priors. Its dual-gating mechanism (channel and spatial) adapts to amplify key regions, addressing previous limitations in handling subtle expression differences, especially for deformation-sensitive categories like 'Disgust' and 'Surprise'. This adaptive modulation leads to robust gains in mAP50, validating the benefits of zero-initialized dual-gate modulation throughout training and overall AP reaching the highest (62.10%) while maintaining multi-scale consistency.

Quantify Your AI Advantage

Use our interactive calculator to estimate the potential ROI for integrating advanced AI solutions into your enterprise operations.

Estimated Annual Savings $0
Employee Hours Reclaimed Annually 0

Your AI Implementation Roadmap

Our structured approach ensures seamless integration and maximum value realization for your enterprise AI initiatives.

Phase 01: Discovery & Strategy

In-depth analysis of current operations, identification of AI opportunities, and development of a tailored strategic roadmap aligned with your business objectives.

Phase 02: Pilot & Proof-of-Concept

Deployment of AI models in a controlled environment, demonstrating tangible results and refining the solution based on real-world feedback.

Phase 03: Scaled Implementation

Full-scale integration of the AI solution across relevant departments, including infrastructure setup, data migration, and comprehensive training.

Phase 04: Optimization & Monitoring

Continuous performance monitoring, iterative model refinement, and ongoing support to ensure sustained efficiency and evolving AI capabilities.

Ready to Transform Your Enterprise with AI?

Leverage cutting-edge research and our expert implementation to unlock new efficiencies and drive innovation. Schedule a free consultation today.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking