CAUKER: CLASSIFICATION TIME SERIES FOUNDATION MODELS CAN BE PRETRAINED ON SYNTHETIC DATA

CAUKER's synthetic data enables state-of-the-art Time Series Foundation Models, demonstrating clear scaling laws unlike real-world datasets.

CAUKER is a novel algorithm for generating diverse, causally coherent synthetic time series, specifically tailored for classification. By integrating Gaussian Process kernel composition with Structural Causal Models, CAUKER allows for sample-efficient pre-training of Time Series Foundation Models (TSFMs). Our research shows that TSFMs trained on CAUKER-generated data achieve state-of-the-art performance and exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), a consistency often lacking in real-world datasets.

Schedule Your AI Strategy Session

Executive Impact & Key Findings

Understand the quantifiable benefits and strategic implications for your enterprise.

0 Max Synthetic Samples Tested

0 Max Model Parameters Tested

0 MOMENT Accuracy Gain (100K to 10M Samples)

0 Mantis Accuracy Gain (10K to 10M Samples)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

How CAUKER Works

CAUKER (Causal-Kernel generation) is designed to generate diverse, causally coherent synthetic time series for classification tasks. It blends common time series patterns (seasonality, periodicity, trend) with a meaningful clustering structure, allowing models to disentangle underlying clusters.

The core approach involves three banks: kernel, mean, and activation functions. These are sampled and composed to form Gaussian Process (GP) priors. These GP priors act as root nodes in a directed acyclic graph (DAG) where edges apply nonlinear activation functions, propagating signals to generate complex, realistic time series.

Predictable Growth in Performance

Our experiments reveal that TSFMs trained on CAUKER data exhibit clear and consistent scaling laws. Accuracy steadily improves with increasing dataset size (from 10K to 10M samples) and model capacity (from 1M to 783M parameters). This contrasts sharply with real-world datasets, which show irregular scaling behavior due to limited diversity and domain mismatch.

These findings confirm that CAUKER-generated data's inherent diversity and causal structure are crucial for sustaining learning over long horizons, unlocking the full potential of foundation models to generalize and adapt.

State-of-the-Art Zero-Shot Classification

CAUKER enables TSFMs to achieve state-of-the-art zero-shot classification performance, often surpassing models pre-trained on much larger real-world datasets. For instance, Mantis and MOMENT pre-trained on CAUKER synthetic data nearly match or even slightly exceed the performance of their counterparts trained on millions of real-world samples.

Furthermore, CAUKER pre-trained models demonstrate strong out-of-distribution generalization across diverse domains, including EEG-heavy tasks in the WOODS benchmark, proving the robustness and transferability of the learned representations.

CAUKER's Causal-Kernel Generation Pipeline

Our novel pipeline generates diverse, causally coherent synthetic time series by combining Gaussian Processes with Structural Causal Models to produce data ideal for classification TSFMs.

Kernel Bank Sampling

→

Kernel Composition

→

Root Nodes Generation (GP Priors)

→

Activation Bank Sampling

→

Causal Graph Propagation

Superior Zero-Shot Classification Accuracy
CAUKER significantly outperforms alternative synthetic data generation methods for pre-training classification Time Series Foundation Models (TSFMs) on the UCR benchmark.
Model	SCM	FPFN	KernelSynth	Mean-KernelSynth	CAUKER (ours)
Mantis	73.49%	77.52%	77.70%	78.20%	78.31%
MOMENT	59.23%	70.85%	69.31%	72.56%	74.24%

Achieving State-of-the-Art with Less Data

Pre-training with CAUKER's synthetic data allows models like Mantis and MOMENT to achieve comparable performance using datasets that are ~20x and ~1.3x smaller, respectively, than their original real-world counterparts, making TSFM development highly sample-efficient.

Explore Sample Efficiency Details

Calculate Your Potential AI ROI

Estimate the annual savings and reclaimed employee hours by implementing advanced Time Series Foundation Models in your operations.

Your Industry

Number of Employees (impacted by manual data tasks)

Avg. Weekly Hours per Employee on Manual Data Tasks

Average Hourly Cost per Employee ($)

Estimated Annual Savings $0

Estimated Annual Hours Reclaimed 0

Your Path to Advanced Time Series AI

A structured roadmap to integrate CAUKER-powered Time Series Foundation Models into your enterprise.

Data Strategy & Pipeline Design

Define requirements and customize CAUKER for specific domain patterns, ensuring optimal synthetic data characteristics.

Synthetic Data Generation

Generate large-scale, diverse datasets using the CAUKER pipeline, validating for causal coherence and quality.

TSFM Pre-training & Benchmarking

Train or fine-tune Time Series Foundation Models on the CAUKER-generated synthetic data, evaluating zero-shot performance on target tasks.

Integration & Deployment

Deploy the pre-trained TSFMs into production, monitoring performance and continuously refining the synthetic data generation process for ongoing improvement.

Ready to Transform Your Time Series Analysis?

Leverage the power of CAUKER and synthetic data to build scalable, high-performing Time Series Foundation Models. Our experts are ready to guide you.

Schedule Your AI Strategy Session

CAUKER: CLASSIFICATION TIME SERIES FOUNDATION MODELS CAN BE PRETRAINED ON SYNTHETIC DATA

CAUKER's synthetic data enables state-of-the-art Time Series Foundation Models, demonstrating clear scaling laws unlike real-world datasets.

Executive Impact & Key Findings

Deep Analysis & Enterprise Applications

How CAUKER Works

Predictable Growth in Performance

State-of-the-Art Zero-Shot Classification

CAUKER's Causal-Kernel Generation Pipeline

Superior Zero-Shot Classification Accuracy

Achieving State-of-the-Art with Less Data

Calculate Your Potential AI ROI

Your Path to Advanced Time Series AI

Data Strategy & Pipeline Design

Synthetic Data Generation

TSFM Pre-training & Benchmarking

Integration & Deployment

Ready to Transform Your Time Series Analysis?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai