CAUKER: CLASSIFICATION TIME SERIES FOUNDATION MODELS CAN BE PRETRAINED ON SYNTHETIC DATA
CAUKER's synthetic data enables state-of-the-art Time Series Foundation Models, demonstrating clear scaling laws unlike real-world datasets.
CAUKER is a novel algorithm for generating diverse, causally coherent synthetic time series, specifically tailored for classification. By integrating Gaussian Process kernel composition with Structural Causal Models, CAUKER allows for sample-efficient pre-training of Time Series Foundation Models (TSFMs). Our research shows that TSFMs trained on CAUKER-generated data achieve state-of-the-art performance and exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), a consistency often lacking in real-world datasets.
Executive Impact & Key Findings
Understand the quantifiable benefits and strategic implications for your enterprise.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
How CAUKER Works
CAUKER (Causal-Kernel generation) is designed to generate diverse, causally coherent synthetic time series for classification tasks. It blends common time series patterns (seasonality, periodicity, trend) with a meaningful clustering structure, allowing models to disentangle underlying clusters.
The core approach involves three banks: kernel, mean, and activation functions. These are sampled and composed to form Gaussian Process (GP) priors. These GP priors act as root nodes in a directed acyclic graph (DAG) where edges apply nonlinear activation functions, propagating signals to generate complex, realistic time series.
Predictable Growth in Performance
Our experiments reveal that TSFMs trained on CAUKER data exhibit clear and consistent scaling laws. Accuracy steadily improves with increasing dataset size (from 10K to 10M samples) and model capacity (from 1M to 783M parameters). This contrasts sharply with real-world datasets, which show irregular scaling behavior due to limited diversity and domain mismatch.
These findings confirm that CAUKER-generated data's inherent diversity and causal structure are crucial for sustaining learning over long horizons, unlocking the full potential of foundation models to generalize and adapt.
State-of-the-Art Zero-Shot Classification
CAUKER enables TSFMs to achieve state-of-the-art zero-shot classification performance, often surpassing models pre-trained on much larger real-world datasets. For instance, Mantis and MOMENT pre-trained on CAUKER synthetic data nearly match or even slightly exceed the performance of their counterparts trained on millions of real-world samples.
Furthermore, CAUKER pre-trained models demonstrate strong out-of-distribution generalization across diverse domains, including EEG-heavy tasks in the WOODS benchmark, proving the robustness and transferability of the learned representations.
CAUKER's Causal-Kernel Generation Pipeline
Our novel pipeline generates diverse, causally coherent synthetic time series by combining Gaussian Processes with Structural Causal Models to produce data ideal for classification TSFMs.
| Model | SCM | FPFN | KernelSynth | Mean-KernelSynth | CAUKER (ours) |
|---|---|---|---|---|---|
| Mantis | 73.49% | 77.52% | 77.70% | 78.20% | 78.31% |
| MOMENT | 59.23% | 70.85% | 69.31% | 72.56% | 74.24% |
Achieving State-of-the-Art with Less Data
Pre-training with CAUKER's synthetic data allows models like Mantis and MOMENT to achieve comparable performance using datasets that are ~20x and ~1.3x smaller, respectively, than their original real-world counterparts, making TSFM development highly sample-efficient.
Calculate Your Potential AI ROI
Estimate the annual savings and reclaimed employee hours by implementing advanced Time Series Foundation Models in your operations.
Your Path to Advanced Time Series AI
A structured roadmap to integrate CAUKER-powered Time Series Foundation Models into your enterprise.
Data Strategy & Pipeline Design
Define requirements and customize CAUKER for specific domain patterns, ensuring optimal synthetic data characteristics.
Synthetic Data Generation
Generate large-scale, diverse datasets using the CAUKER pipeline, validating for causal coherence and quality.
TSFM Pre-training & Benchmarking
Train or fine-tune Time Series Foundation Models on the CAUKER-generated synthetic data, evaluating zero-shot performance on target tasks.
Integration & Deployment
Deploy the pre-trained TSFMs into production, monitoring performance and continuously refining the synthetic data generation process for ongoing improvement.
Ready to Transform Your Time Series Analysis?
Leverage the power of CAUKER and synthetic data to build scalable, high-performing Time Series Foundation Models. Our experts are ready to guide you.