Skip to main content
Enterprise AI Analysis: K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

AI in Autonomous Driving

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Authors: Mingxuan Mu, Guo Yang, Lei Chen, Ping Wu, Jianxun Cui

Published Date: March 5, 2026

Generating realistic and diverse trajectories is a critical challenge in autonomous driving simulation. While Large Language Models (LLMs) show promise, existing methods often rely on structured data like vectorized maps, which fail to capture the rich, unstructured visual context of a scene. To address this, we propose K-Gen, an interpretable keypoint-guided multimodal framework that leverages Multimodal Large Language Models (MLLMs) to unify rasterized BEV map inputs with textual scene descriptions. Instead of directly predicting full trajectories, K-Gen generates interpretable keypoints along with reasoning that reflects agent intentions, which are subsequently refined into accurate trajectories by a refinement module. To further enhance keypoint generation, we apply T-DAPO, a trajectory-aware reinforcement fine-tuning algorithm. Experiments on WOMD and nuPlan demonstrate that K-Gen outperforms existing baselines, highlighting the effectiveness of combining multimodal reasoning with keypoint-guided trajectory generation.

Executive Impact: K-Gen in Action

K-Gen introduces a novel multimodal framework for interpretable, high-fidelity trajectory generation in autonomous driving. By integrating rasterized map inputs and separating strategic intent from motion execution, K-Gen provides a new paradigm for language-guided modeling. Extensive experiments on WOMD and nuPlan validate its superior accuracy, safety, and generalization in diverse autonomous driving scenarios.

0 Reduction in Collisions
0 Trajectory Accuracy (mADE)
0 Simulation Efficiency

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Core Innovation
Keypoint-Guided Strategy
Workflow
T-DAPO Algorithm
Real-World Impact

Multimodal Trajectory Generation

K-Gen unifies rasterized BEV map inputs with textual scene descriptions using MLLMs. This approach captures the rich, unstructured visual context of a scene, enabling more flexible and faithful understanding of complex driving environments compared to traditional vectorized map inputs. The key innovation is moving beyond structured data to embrace raw visual and textual data for a holistic scene understanding.

Interpretable Keypoint Generation

Keypoint-Guided Methodology

Instead of direct full trajectory prediction, K-Gen generates sparse, interpretable keypoints along with reasoning reflecting agent intentions. These keypoints are then refined into accurate full trajectories by a specialized refinement module. This two-step process enhances accuracy, stability, and interpretability, providing human-readable explanations of agent behavior.

K-Gen's Trajectory Generation Flow

Scene Data (Map Images & Textual Inputs)
MLLM (Multimodal Reasoning & Keypoints)
Sparse Keypoints
Trajectory Refinement Module
Complete Trajectories

The system processes multimodal scene data, uses a MLLM for reasoning and sparse keypoint generation, and then refines these into complete, accurate trajectories. This modular design ensures both high-level intent capture and fine-grained motion control.

T-DAPO: Enhanced Reinforcement Fine-Tuning

Feature Standard DAPO T-DAPO (K-Gen)
Focus General LLM Alignment Trajectory Generation & Interpretability
Reward Signals Preference-based Trajectory-centric (Accuracy, CoT Length, Format Correctness)
Sample Weighting Uniform Challenging Samples Emphasis (top 30% mADE/mFDE)
Output Control Coarse-grained Fine-grained, physically consistent motion reconstruction
Benefits
  • Improved alignment
  • Stable training
  • High-fidelity trajectories
  • Interpretable reasoning
  • Enhanced safety & physical compliance

T-DAPO (Trajectory-aware Decoupled Clip and Dynamic Sampling Policy Optimization) is a specialized reinforcement fine-tuning algorithm designed for trajectory generation. It incorporates trajectory-centric reward signals and emphasizes challenging samples, leading to higher fidelity and more interpretable outputs than standard DAPO.

Impact on Autonomous Driving

Customer: Changan Automobile Co., Ltd

Problem: Traditional autonomous driving simulations struggle with generating diverse, realistic, and interpretable traffic scenarios due to reliance on structured, abstract map data.

Solution: K-Gen's multimodal approach, combining visual maps with text descriptions, generates accurate and human-interpretable trajectories. Its keypoint-guided strategy and T-DAPO algorithm produce physically consistent and safe motions.

Results: Outperformed existing baselines in trajectory quality (mADE, mFDE, SCR) on WOMD and nuPlan datasets. Achieved superior interpretability through Chain-of-Thought reasoning and attention heatmaps that highlight safety-critical regions. Leads to more robust and explainable autonomous driving systems.

Calculate Your Potential ROI with K-Gen

Estimate the significant time savings and cost efficiencies your organization could achieve by integrating advanced AI solutions like K-Gen into your autonomous driving development. This calculator provides a realistic projection based on industry benchmarks.

ROI Projection

Annual Cost Savings $0
Annual Hours Reclaimed 0

Your K-Gen Implementation Roadmap

Our phased approach ensures a seamless integration of K-Gen into your existing systems, maximizing impact while minimizing disruption. Each step is designed for clarity, efficiency, and measurable results.

Discovery & Strategy

Understand your current simulation workflows, identify key challenges, and define success metrics. Develop a tailored strategy for K-Gen's integration.

Data Integration & Model Fine-tuning

Integrate your specific map data and driving scenarios. Fine-tune K-Gen with your proprietary datasets using T-DAPO for optimal performance and domain adaptation.

Pilot Deployment & Validation

Deploy K-Gen in a controlled environment, validate generated trajectories against real-world data and benchmarks. Gather feedback and iterate for refinements.

Full-Scale Rollout & Optimization

Integrate K-Gen into your full simulation pipeline. Provide ongoing support, monitoring, and continuous optimization to ensure sustained high performance and ROI.

Ready to Transform Your Autonomous Driving Simulations?

Don't let the complexities of trajectory generation hinder your progress. K-Gen offers a powerful, interpretable, and multimodal solution to accelerate your autonomous driving development. Schedule a free consultation with our AI experts to explore how K-Gen can specifically address your organization's needs.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking