Published as a conference paper at ICLR 2026

DATASET COLOR QUANTIZATION: A TRAINING-ORIENTED FRAMEWORK FOR DATASET-LEVEL COMPRESSION

Authors: Chenyue Yu, Lingao Xiao, Jinhong Deng, Ivor W. Tsang, Yang He

Large-scale image datasets are fundamental to deep learning, yet their immense storage demands create significant challenges for deployment, especially in resource-constrained settings. Traditional compression methods primarily focus on reducing sample counts, overlooking the substantial color-space redundancy within each image. This paper introduces a novel framework that addresses this critical gap, enabling effective training on highly compressed visual data.

Schedule Your Strategy Session

Executive Impact

Revolutionizing Dataset Compression for Deep Learning Efficiency

The "Dataset Color Quantization (DCQ)" framework delivers unprecedented compression efficiency for visual datasets without compromising model training performance. By intelligently reducing color redundancy and preserving semantic integrity, DCQ enables robust deep learning in resource-limited environments.

+31.15% CIFAR-10 Accuracy Gain (2-bit CQ)

DCQ Outperforms Pruning Baselines at 96% Comp.

~0.86 Hrs CIFAR-10/100 Processing Time

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Problem & Motivation

DCQ Framework

Quantitative Impact

Scalability & Future

The Inadequacy of Current Compression for Training

Traditional dataset compression methods, such as pruning and distillation, aim to reduce the *number* of training samples. However, they frequently overlook the significant *per-sample color redundancy* inherent in images, which constitutes a major storage and transmission cost. Existing Color Quantization (CQ) techniques, while effective for image compression and visualization, are fundamentally *inference-oriented* and fall short in scenarios requiring model training on the quantized data.

Image-Property-based CQ (e.g., K-Means) lacks neural network guidance, leading to ambiguous semantic boundaries and inefficient color contrast, wasting bits on backgrounds (Figure 1b). Conversely, Model-Perception-based CQ, which uses pre-trained networks to guide color allocation, often introduces abrupt texture and edge discontinuities that distort visual features, severely degrading downstream training performance (Figure 1c). This critical challenge necessitates a new, training-oriented approach to color quantization.

The Hidden Cost of "Smart" Compression

Existing Model-Perception-based Color Quantization methods, while attempting to preserve recognition accuracy, often introduce significant visual distortions. For instance, ColorCNN quantizes CIFAR-10 images to 4 colors, achieving 77% inference accuracy on a pre-trained ResNet-18. However, when a model is trained from scratch on this same quantized dataset, the accuracy plummets to just 58% due to these distorted textures and edge discontinuities (Figure 1c). This critical gap highlights the necessity of a training-oriented approach like DCQ that prioritizes both semantic and structural fidelity for downstream learning.

The Dataset Color Quantization (DCQ) Framework

Our novel Dataset Color Quantization (DCQ) framework compresses visual datasets by simultaneously addressing dataset-level consistency, model-aware color significance, and visual structure preservation. Unlike conventional methods that quantize images independently, DCQ operates at a dataset level to create compact, quantization-aware datasets optimized for training efficacy.

Key components include: Chromaticity-Aware Clustering (CAC) to group images with similar color distributions for shared palette learning; Attention-Guided Palette Allocation to prioritize colors in semantically important regions based on model perception; and Texture-Preserved Palette Optimization using differentiable quantization to maintain edge and texture fidelity.

Enterprise Process Flow

Image Feature Extraction

→

Chromaticity-Aware Clustering (CAC)

→

Shared Cluster-Level Palette Learning

→

Attention-Guided Palette Allocation

→

Texture-Preserved Palette Optimization

→

Quantized Image Reconstruction

Unprecedented Performance Under Aggressive Compression

DCQ demonstrates significant performance improvements across diverse datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet, ImageNet-1K) and various compression rates. It consistently outperforms both traditional color quantization methods and leading dataset pruning algorithms, particularly under aggressive compression scenarios.

For example, at a mere 2 bits (4 colors) per image, DCQ achieves substantial accuracy gains on CIFAR-10, CIFAR-100, and Tiny-ImageNet compared to model-perception-based CQ (Table 7 / Figure 5). When combined with dataset pruning, DCQ enables extreme compression ratios (e.g., 99.2% total compression for CIFAR-10) while maintaining high accuracy, highlighting its effectiveness as a holistic compression strategy (Table 4).

89.15% Accuracy on CIFAR-10 with 2-bit (4 color) Quantization

Performance Comparison: DCQ vs. Leading Baselines
Method	CIFAR-10 (2-bit, 4 colors)	CIFAR-100 (2-bit, 4 colors)	Tiny-ImageNet (2-bit, 4 colors)	ImageNet-1K (96% Comp., 1-bit)
DCQ (Ours)	89.15%	57.69%	50.51%	35.95%
ColorCNN	59.48%	22.81%	19.42%	-
ColorCNN+	49.66%	19.52%	16.38%	-
CQFormer	52.53%	20.01%	20.01%	-
MedianCut	78.15%	35.15%	31.87%	-
OCTree	68.95%	30.15%	26.89%	-
CCS (Pruning)	-	-	-	31.31%
TDDS (Pruning)	-	-	-	29.56%

Scalability, Generalization, and Future Directions

DCQ exhibits strong generalization across various network architectures, including ResNet-34, ResNet-50, Swin Transformer, and ViT-Small (Table 3, Table 8). It can also be seamlessly combined with existing dataset pruning techniques to achieve even higher compression rates, making it a versatile tool for enterprise AI. The computational cost for generating quantized datasets is practical, with ImageNet-1K processing taking approximately 154 minutes and CIFAR-10/100 taking 52 seconds (Table 22).

While DCQ offers a robust solution, future research could explore more adaptive, per-image quantization strategies for even greater flexibility. Additionally, developing neural architectures specifically optimized for color-quantized data, rather than full-color inputs, could further enhance performance on compressed datasets, opening new avenues for efficient and robust deep learning.

154 Mins ImageNet-1K Full Dataset Processing Time

Explore DCQ's Enterprise Potential

Calculate Your Potential AI ROI

See how leveraging advanced AI solutions like Dataset Color Quantization can translate into tangible savings and reclaimed productivity for your organization.

Your Industry

Number of Employees (Impacted by Data Management)

Avg. Weekly Hours Spent on Data-Related Tasks

Avg. Hourly Wage (Fully Loaded)

Annual Cost Savings $0

Annual Hours Reclaimed 0

Optimize Your Data Strategy

Your Journey to Optimized AI

Our structured approach ensures a seamless integration of Dataset Color Quantization, tailored to your enterprise needs and existing infrastructure.

Phase 1: Discovery & Assessment

Comprehensive analysis of your current dataset management, storage overheads, and deep learning pipeline to identify key optimization opportunities for color quantization.

Phase 2: Custom DCQ Framework Design

Development of a bespoke DCQ solution, configuring clustering, palette learning, attention mechanisms, and texture preservation specific to your data types and model training objectives.

Phase 3: Pilot Implementation & Validation

Deployment of the DCQ framework on a subset of your data, rigorous testing, and validation of training performance and storage efficiency against established benchmarks.

Phase 4: Full-Scale Integration & Optimization

Seamless integration of the DCQ pipeline into your enterprise data infrastructure, followed by continuous monitoring and iterative optimization for peak performance and maximum ROI.

Start Your AI Transformation

Ready to Optimize Your Deep Learning Datasets?

Book a personalized consultation with our AI specialists to explore how Dataset Color Quantization can unlock new efficiencies and capabilities for your enterprise.

Book Your Free Consultation

Published as a conference paper at ICLR 2026

DATASET COLOR QUANTIZATION: A TRAINING-ORIENTED FRAMEWORK FOR DATASET-LEVEL COMPRESSION

Executive Impact

Revolutionizing Dataset Compression for Deep Learning Efficiency

Deep Analysis & Enterprise Applications

The Inadequacy of Current Compression for Training

The Hidden Cost of "Smart" Compression

The Dataset Color Quantization (DCQ) Framework

Enterprise Process Flow

Unprecedented Performance Under Aggressive Compression

Scalability, Generalization, and Future Directions

Calculate Your Potential AI ROI

Your Journey to Optimized AI

Phase 1: Discovery & Assessment

Phase 2: Custom DCQ Framework Design

Phase 3: Pilot Implementation & Validation

Phase 4: Full-Scale Integration & Optimization

Ready to Optimize Your Deep Learning Datasets?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai