WHEAT-Net: Real-time Food Object Detection Network Based on Efficient Feature Extraction and Fusion

An Enterprise AI Analysis

Tracking food diet is crucial for effective health control. The rapid advancement of portable equipment has enabled automated food recognition through images, creating food classification a famous study region in artificial intelligence. However, problems persist, such as the limited access to public datasets, high mathematical costs, and poor accuracy for deployment on portable devices. Improvement of detection accuracy and shorter model size are necessary for the development of food recognition systems for health monitoring. This paper explores the development and optimization of a five-category food classification and detection model based on YOLOv8. This paper first integrated the WTConv module into the original C2f module within the backbone of YOLOv8, resulting in the creation of the novel C2f-WTConv module. Furthermore, the paper introduced the Hierarchical Scale-based Feature Pyramid Network (HS-FPN) and the Efficient Local Attention (ELA) module, further improving the model's performance. In response to the challenges in food detection, this study addresses several key issues and presents the following contributions: a 5-class food dataset (comprising staples, vegetables, fruits, meat and soups) with 10000 samples; experiments conducted using this dataset and several mainstream neural networks; and an improved YOLOv8 model called WHEAT-Net that outperforms multiple benchmark models and state-of-the-art methods. The proposed model achieved mAP@50 and mAP@50-95 scores of 0.937 and 0.623, respectively.

Schedule Your Strategy Session

Key Executive Impact Metrics

The WHEAT-Net model significantly enhances food object detection accuracy and efficiency, achieving mAP@50 of 0.937 and mAP@50-95 of 0.623. It outperforms other YOLO models with fewer parameters and lower GFLOPs, making it suitable for real-time applications on portable devices. The integration of C2f-WTConv, HS-FPN, and ELA modules addresses challenges like complex backgrounds, occlusion, and multi-scale detection.

0.937 mAP@50 Score

2.3M Parameters (Million)

6.3 GFLOPs

2.2% YOLOv8 mAP@0.5 Improvement

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Computer Vision & Object Detection in Enterprise

Object detection models like WHEAT-Net are pivotal for automating visual tasks in enterprises, from quality control in manufacturing to inventory management in retail and health monitoring in consumer apps. This paper specifically addresses challenges in real-time food recognition, a subset of computer vision with direct implications for dietary tracking, automated food services, and nutritional analysis. By optimizing for accuracy and efficiency, WHEAT-Net sets a new standard for deploying advanced AI on resource-constrained devices, fostering innovations in personalized health and operational logistics.

Enhanced Feature Extraction with C2f-WTConv

The paper introduces the C2f-WTConv module, which integrates Wavelet Convolution (WTConv) into the YOLOv8 backbone's C2f module. This significantly expands the receptive field and improves the model's ability to extract multi-scale features without a substantial increase in trainable parameters. WTConv's multi-frequency analysis enhances detection in complex backgrounds and varying scales, crucial for diverse food images.

0.937 Achieved mAP@50

Hierarchical Multi-scale Feature Fusion (HS-FPN & ELA)

The YOLOv8 neck module is enhanced with a Hierarchical Scale-based Feature Pyramid Network (HS-FPN) structure and an Efficient Local Attention (ELA) module. HS-FPN improves multi-scale feature fusion, especially for small and occluded objects, while ELA refines the attention mechanism for local details. This combination boosts detection accuracy across different scales, critical for real-time food monitoring.

Feature	Standard YOLOv8	WHEAT-Net (Proposed)
Multi-scale Feature Extraction	C2f module	C2f-WTConv module (Wavelet Convolution for multi-frequency analysis) Expanded receptive field Improved feature capture in complex backgrounds
Feature Fusion (Neck)	Standard FPN	HS-FPN (Hierarchical Scale-based FPN) ELA (Efficient Local Attention) for fine-grained detail Enhanced small object detection
Computational Efficiency	Good	Optimized with fewer parameters and GFLOPs Faster inference for real-time applications
Detection Accuracy	High	Superior mAP@50 and mAP@50-95 Robustness to lighting/occlusion conditions

Enterprise Process Flow

The proposed WHEAT-Net integrates several innovations into the YOLOv8 framework for superior food object detection. The process begins with an enhanced backbone using C2f-WTConv for multi-scale feature extraction, followed by a refined neck network incorporating HS-FPN and ELA for optimal feature fusion and attention, culminating in highly accurate predictions.

Input Image

→

C2f-WTConv Backbone (Multi-scale Feature Extraction)

→

HS-FPN Neck (Hierarchical Feature Fusion)

→

ELA Module (Local Attention Refinement)

→

Detection Head (Prediction)

→

Output: Classified & Detected Food Objects

WHEAT-Net: Outperforming Benchmarks

WHEAT-Net demonstrates significant performance improvements over existing YOLO models. It achieves higher mean Average Precision (mAP) scores across multiple food categories while maintaining a lightweight architecture, making it ideal for resource-constrained environments.

WHEAT-Net achieves a significant improvement in food object detection, surpassing YOLOv5, YOLOv6, YOLOv8n, and YOLOv10n in mAP@0.5. It also boasts superior efficiency with fewer parameters and GFLOPs, making it a highly practical solution.

mAP@0.5: WHEAT-Net achieved 0.937, compared to YOLOv8n's 0.915.
Parameters: WHEAT-Net uses only 2.3 Million parameters, less than YOLOv8n's 3.0 Million.
GFLOPs: WHEAT-Net operates at 6.3 GFLOPs, lower than YOLOv8n's 8.1 GFLOPs.
These metrics highlight WHEAT-Net's optimal balance between accuracy and computational efficiency, crucial for real-time applications and deployment on edge devices.

Calculate Your Potential AI ROI

Estimate the tangible benefits of integrating advanced AI solutions like WHEAT-Net into your enterprise. Customize the inputs to see your potential annual savings and reclaimed operational hours.

Your Industry

Number of Employees impacted by manual visual tasks

Average hours per week per employee on manual tasks

Average hourly cost per employee (fully burdened)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Discuss Your Custom ROI

Your AI Implementation Roadmap

Our phased approach ensures a smooth and effective integration of advanced AI, tailored to your enterprise's unique needs and existing infrastructure.

Phase 1: Initial Assessment & Data Preparation

Review current food detection systems, define scope, and prepare a diverse 5-class food dataset (staples, vegetables, fruits, meat, soups) with 10,000 augmented samples for training and validation.

Phase 2: Model Backbone Integration (C2f-WTConv)

Integrate the C2f-WTConv module into the YOLOv8 backbone, replacing standard convolutions with wavelet convolutions to enhance multi-scale feature extraction while optimizing computational efficiency.

Phase 3: Neck Network Enhancement (HS-FPN & ELA)

Implement the Hierarchical Scale-based Feature Pyramid Network (HS-FPN) and the Efficient Local Attention (ELA) module into the YOLOv8 neck. Focus on improving feature fusion across scales and refining local object attention.

Phase 4: Training & Optimization

Train the WHEAT-Net model on the prepared dataset. Conduct hyperparameter tuning, analyze performance across categories, and optimize for robust detection under various conditions (lighting, occlusion).

Phase 5: Deployment & Integration Strategy

Plan for deployment on target platforms (e.g., edge devices, mobile apps). Develop APIs for integration into existing diet management or food inventory systems. Establish monitoring for real-time performance.

Ready to Transform Your Operations with AI?

Leverage cutting-edge AI research to drive efficiency, innovation, and competitive advantage. Our experts are ready to guide you.

Book Your Free Consultation

WHEAT-Net provides a robust solution for automated food recognition, enabling more effective diet management and health monitoring. Its high efficiency allows deployment on edge devices, expanding use in mobile health apps and IoT. This technology can be leveraged for automated food ordering, restaurant inventory management, and nutritional analysis platforms, leading to improved operational efficiency and informed decision-making.

WHEAT-Net: Real-time Food Object Detection Network Based on Efficient Feature Extraction and Fusion

An Enterprise AI Analysis

Key Executive Impact Metrics

Deep Analysis & Enterprise Applications

Computer Vision & Object Detection in Enterprise

Enhanced Feature Extraction with C2f-WTConv

Hierarchical Multi-scale Feature Fusion (HS-FPN & ELA)

Enterprise Process Flow

WHEAT-Net: Outperforming Benchmarks

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 1: Initial Assessment & Data Preparation

Phase 2: Model Backbone Integration (C2f-WTConv)

Phase 3: Neck Network Enhancement (HS-FPN & ELA)

Phase 4: Training & Optimization

Phase 5: Deployment & Integration Strategy

Ready to Transform Your Operations with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai