Cutting-Edge AI Research Analysis

Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

Authors: Jared Junkin, Samuel Nathanson

Publication Date: October 30, 2025

Abstract: Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizations are instead used. Yet the question of whether it is viable to accept the information loss introduced by causal masking on nonsequential data has received little direct study, in part because few domains offer both spatial and sequential representations of the same dataset. In this work, we investigate this issue in the domain of chess, which naturally supports both representations. We train language models with bidirectional and causal self-attention mechanisms on both spatial (board-based) and sequential (move-based) data. Our results show that models trained on spatial board states - even with causal masking - consistently achieve stronger playing strength than models trained on sequential data. While our experiments are conducted on chess, our results are methodological and may have broader implications: applying causal masking to spatial data is a viable procedure for training unimodal LLMs on spatial data, and in some domains is even preferable to sequentialization.

Executive Impact: Unlocking New LLM Capabilities for Spatial Data

Our research demonstrates that training unimodal language models with causal masking directly on spatial data, like chess FEN, leads to superior performance (2630 ELO) compared to traditional sequentialization (PGN). This challenges conventional wisdom and highlights the viability and advantage of spatially-aware causal masking, opening new avenues for efficient LLM training on structured domains.

0 Grandmaster ELO

0 Best Move Accuracy (Causal FEN)

0 ELO Improvement over PGN

Schedule Your Strategy Session

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Methodology Comparison

Performance Breakthrough

Strategic Advantages

Implementation Insights

Information Processing Differences

Our core hypothesis centers on the inherent efficiency gains when models directly process spatial information, even with causal masking, compared to inferring spatial structures from sequential inputs.

Enterprise Process Flow

PGN Model Ingests Sequential Data

→

Implicit Spatial Reconstruction (G)

→

Maps Latent Spatial to Moves (F→S)

→

Outputs Next Move (ΠP: G◦(F→S))

→

Versus FEN Model Ingests Spatial Data

→

Directly Maps Spatial to Moves (F→S)

→

Outputs Next Move (ΠF: F→S)

Discuss Your Model Architecture

Achieving Grandmaster-Level Chess Play

Our Llama model, even when applying causal masking directly to spatial FEN data, achieved an estimated ELO rating that positions it firmly within the grandmaster tier of human chess players.

2630 Estimated ELO Rating (Causal FEN)

Explore Performance Metrics

Comparative Performance of Masking Strategies

A direct comparison of models trained with different data representations and masking strategies clearly illustrates the significant advantages of applying causal masking to spatial FEN data.

Metric	PGN (Causal Masking)	FEN (Causal Masking)	FEN (Bidirectional)
Estimated ELO Rating	2000	2630	2680
Best Move Accuracy (Stockfish)	40.7%	58.2%	61.6%
Syntactically Valid Moves Rate	99.7%	99.945%	100.0%
Legal Moves Rate	99.7%	99.914%	100.0%

Analyze Your Data Strategy

Key Lessons for Spatial LLM Development

Our findings provide crucial insights for adapting pretrained LLMs to structured, spatial domains, emphasizing the importance of aligning tokenization and prompting with the underlying data structure.

Context

Our findings provide crucial insights for adapting pretrained LLMs to structured, spatial domains, emphasizing the importance of aligning tokenization and prompting with the underlying data structure.

The Challenge

Default tokenizers often create ambiguous merges (e.g., 'pk' for pawn-king) for FEN strings, hindering training stability and performance. Improper prompting can also limit the model's ability to leverage spatial information effectively.

Our Solution

We implemented character-level tokenization for FEN and flattened run-length encodings to ensure consistent representation. Templated prompts embedding FEN, legal moves, and best moves significantly stabilized training and improved convergence, allowing LLMs to exploit explicit spatial features.

The Result

These methodological choices enabled our causal-masked Llama model to achieve grandmaster-level performance and demonstrated that careful preprocessing and prompt engineering are critical, not just technical afterthoughts, when adapting LLMs to new domains.

Optimize Your LLM Implementation

Calculate Your Potential AI ROI

Estimate the tangible benefits of integrating advanced AI capabilities into your enterprise. Adjust the parameters to see your projected annual savings and reclaimed productivity hours.

Your Industry

Number of Employees (Impacted by AI)

Avg. Manual Hours / Week / Employee

Avg. Hourly Cost / Employee ($)

Annual Savings $0

Annual Hours Reclaimed 0

Request a Custom ROI Analysis

Your AI Implementation Roadmap

Our phased approach ensures a smooth and effective integration of advanced AI solutions tailored to your enterprise needs, from strategy to sustained optimization.

Phase 1: Discovery & Strategy

In-depth assessment of your current infrastructure, business goals, and data landscape. Collaborative strategy formulation to identify high-impact AI opportunities.

Phase 2: Pilot & Development

Design and development of a proof-of-concept. Iterative testing and refinement to ensure alignment with defined objectives and performance benchmarks.

Phase 3: Integration & Deployment

Seamless integration of the AI solution into your existing systems. Comprehensive training for your teams and robust deployment protocols.

Phase 4: Optimization & Scaling

Continuous monitoring, performance tuning, and scalable expansion of AI capabilities across your enterprise to maximize long-term value.

Discuss Your Implementation Timeline

Ready to Transform Your Enterprise with AI?

Unlock the full potential of your spatial data and elevate your operational intelligence. Schedule a complimentary consultation with our AI strategists today.

Book Your Free AI Strategy Session

Cutting-Edge AI Research Analysis

Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

Executive Impact: Unlocking New LLM Capabilities for Spatial Data

Deep Analysis & Enterprise Applications

Information Processing Differences

Enterprise Process Flow

Achieving Grandmaster-Level Chess Play

Comparative Performance of Masking Strategies

Key Lessons for Spatial LLM Development

Context

The Challenge

Our Solution

The Result

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Pilot & Development

Phase 3: Integration & Deployment

Phase 4: Optimization & Scaling

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai