Enterprise AI Analysis

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

This research presents PD-RSAC, a groundbreaking framework for city-scale Electric Vehicle (EV) ride-hailing fleet management. It uniquely combines reinforcement learning with mixed-integer linear programming and robust optimization to ensure physically feasible actions and mitigate uncertainties, directly addressing critical operational challenges for urban mobility providers.

Schedule Your Strategy Session

Executive Impact & Strategic Imperatives

PD-RSAC offers a paradigm shift for EV fleet operators, providing a robust, profitable, and safe operational model previously unattainable with traditional methods.

0 Highest Net Profit Achieved

0 Feeder Limit Violations

0 Profit Improvement vs. Greedy

0 Revenue Improvement vs. MAPPO

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Managing large-scale Electric Vehicle (EV) ride-hailing fleets involves complex dispatch, repositioning, and charging decisions. These decisions must strictly adhere to physical constraints like battery State-of-Charge (SoC) limits, charger port capacities, and grid power caps. Compounding this challenge are the stochastic and spatially correlated variations in demand and travel times, making it a difficult constrained sequential decision-making problem under uncertainty.

PD-RSAC formulates the problem as a hex-grid semi-Markov Decision Process (semi-MDP) with mixed actions (discrete service/reposition/charge and continuous charging power) and variable action durations. A key innovation is the feasible-action projection layer, which uses a time-limited rolling Mixed-Integer Linear Program (MILP) to guarantee physical feasibility at every decision step. To handle distributional shifts, PD-RSAC optimizes a Soft Actor-Critic (SAC) agent against a Wasserstein-1 ambiguity set, employing a novel graph-aligned Mahalanobis ground metric to capture spatial correlations in uncertainty. This robust backup utilizes a Kantorovich-Rubinstein dual with a projected subgradient inner loop for primal-dual risk budget updates. The architecture integrates a two-layer Graph Convolutional Network (GCN) encoder for state representation, twin critics, and a value network driving the adversary.

Experiments on a large-scale EV fleet simulator built from NYC taxi data demonstrate that PD-RSAC significantly outperforms strong heuristic, single-agent RL, and multi-agent RL baselines. It achieved the highest net profit of $1.22M, compared to $0.58M-$0.70M for baselines, while critically maintaining zero feeder-limit violations. The ablation study confirmed that all components—MILP projection, Wasserstein DRO, and graph-aligned metric—contribute positively to the robust performance. The MILP layer had the largest impact, underscoring the importance of feasibility guarantees.

This research offers a powerful framework for managing complex cyber-physical systems where hard constraints and real-world uncertainties are prevalent. For enterprise EV fleet operators, PD-RSAC promises not only enhanced profitability through optimized dispatch and charging but also guaranteed operational safety, preventing costly infrastructure overloads. The method's ability to adapt to unseen demand patterns makes it highly robust for real-world deployment, reducing risks associated with distributional shifts and providing a strong competitive advantage in urban mobility services. Future work will focus on optimizing the computational overhead of the MILP layer for even greater scalability.

Unlocking Peak Profitability

$1.22M Highest Net Profit Achieved

PD-RSAC achieved the highest net profit, outperforming all baselines due to its ability to generate substantially more revenue through better dispatching and charging decisions while maintaining strict operational safety.

PD-RSAC vs. Standard Methods

Feature	PD-RSAC (Proposed)	Baselines (SAC, MAPPO, MADDPG)
Feasibility Guarantee	Strictly enforced via rolling MILP	Often heuristic or not guaranteed
Uncertainty Handling	Graph-aligned Wasserstein DRO	Standard RL assumptions
Spatial Correlation	Explicitly modeled (GCN, Mahalanobis metric)	Ignored or implicitly learned
Action Space	Mixed (discrete & continuous) & variable duration	Typically fixed or simpler
Net Profit	Up to $1.22M	Up to $0.70M

A comparative overview highlights PD-RSAC's superior handling of complex constraints and uncertainties.

Enterprise Process Flow: PD-RSAC

Hex-grid State Observation (St)

→

GCN Encoding

→

Actor (Intentions ãt)

→

MILP Projection (at = Πfeas(St, ãt))

→

Semi-MDP Execution (at, rt, St+∆, ξt)

→

Replay Buffer Storage

→

Value Network / Critics / Adversary

→

Robust Target (y_rob)

→

Policy Updates (Actor/Critics/Dual λ)

The core methodology integrates learning and optimization to ensure feasible and robust actions.

Real-world Safety: A Key Differentiator

Ensuring Grid Safety: Zero Feeder Violations

A critical operational constraint in EV fleet management is the feeder power limit, which prevents overloading the local electrical grid. Traditional RL methods often struggle with hard constraints, leading to violations in real-world scenarios.

PD-RSAC's embedded time-limited rolling MILP directly enforces these constraints. Experiments showed that while baselines like SAC produced substantial feeder-limit violations (e.g., peak charging demand of 14,465 kW against a 7,000 kW limit), PD-RSAC consistently kept the charging load below the limit (peak of 6,999 kW).

This capability ensures that the system operates not only profitably but also safely and reliably within infrastructure limitations, preventing costly outages and regulatory penalties.

Adapting to Real-world Volatility

Robustness to Uncertainty: Graph-Aligned WDRO

City-scale demand and travel times are inherently stochastic and spatially correlated. Standard Reinforcement Learning assumes identical training and testing dynamics, making it brittle to distributional shifts.

PD-RSAC addresses this with a Wasserstein-1 ambiguity set and a novel graph-aligned Mahalanobis metric. This metric captures the underlying spatial topology, penalizing spatially disjoint perturbations more heavily than smooth, local variations.

This robust optimization approach ensures that the learned policy performs well even under unseen demand patterns and changing conditions, providing principled robustness with provable guarantees against real-world uncertainties.

Quantify Your Potential ROI

Use our calculator to estimate the efficiency gains and cost savings your enterprise could realize by implementing advanced AI solutions like PD-RSAC.

Your Industry

Number of Employees Impacted

Average Weekly Hours on Repetitive Tasks

Average Hourly Wage ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A typical phased approach to integrate advanced AI solutions into your enterprise operations.

Phase 01: Discovery & Strategy

Comprehensive assessment of your current infrastructure, operational workflows, and strategic objectives to define clear AI integration pathways.

Phase 02: Pilot & Proof-of-Concept

Develop and deploy a small-scale pilot project, demonstrating the feasibility and initial ROI of the AI solution in a controlled environment.

Phase 03: Iterative Development & Integration

Expand the solution based on pilot feedback, integrating it into broader systems with continuous refinement and performance tuning.

Phase 04: Full-Scale Deployment & Optimization

Roll out the AI solution across your enterprise, establishing monitoring frameworks and ongoing optimization strategies for sustained impact.

Ready to Transform Your Operations?

Explore how PD-RSAC or similar cutting-edge AI solutions can drive profitability, safety, and efficiency in your enterprise.

Discuss Your Implementation

Enterprise AI Analysis

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

Executive Impact & Strategic Imperatives

Deep Analysis & Enterprise Applications

Unlocking Peak Profitability

PD-RSAC vs. Standard Methods

Enterprise Process Flow: PD-RSAC

Real-world Safety: A Key Differentiator

Ensuring Grid Safety: Zero Feeder Violations

Adapting to Real-world Volatility

Robustness to Uncertainty: Graph-Aligned WDRO

Quantify Your Potential ROI

Your AI Implementation Roadmap

Phase 01: Discovery & Strategy

Phase 02: Pilot & Proof-of-Concept

Phase 03: Iterative Development & Integration

Phase 04: Full-Scale Deployment & Optimization

Ready to Transform Your Operations?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai