Skip to main content
Enterprise AI Analysis: COOL-MC: VERIFYING AND EXPLAINING RL POLICIES FOR MULTI-BRIDGE NETWORK MAINTENANCE

COOL-MC: VERIFYING AND EXPLAINING RL POLICIES FOR MULTI-BRIDGE NETWORK MAINTENANCE

Reinforcement Learning Verification

This paper introduces COOL-MC, a tool for formally verifying and explaining Reinforcement Learning policies for multi-bridge network maintenance, combining probabilistic model checking with explainability methods.

Executive Impact: Key Metrics

COOL-MC provides a novel approach to verify and explain Reinforcement Learning (RL) policies for multi-bridge network maintenance. By constructing a discrete-time Markov chain (DTMC) from the RL policy and the underlying MDP, it enables probabilistic model checking and explainability analyses. This allows for formal safety guarantees, identification of policy biases, and understanding of decision-making, addressing critical challenges in adopting RL for infrastructure management.

0% Safety Violation Probability
0% Reachability Coverage (DTMC)
0% State Space Reduction

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Feature Lumping on Bridge 1 Condition

Feature lumping coarsened bridge 1 condition ratings (0-3 to 2, 4-6 to 5, 7-9 to 7). The resulting failure probability changed negligibly from 0.03547 to 0.03542. This confirms that a coarse categorical assessment is sufficient for maintaining safety performance, reducing the need for single-point accuracy.

Budget Sensitivity to Bmax

Varying budget cap Bmax reveals that budget exhaustion probability remains extremely low (< 2.1 × 10-6) and decreases monotonically as Bmax increases. This indicates a conservative spending strategy by the policy, avoiding budget exhaustion even under tighter constraints.

BmaxP_?(◊ 'budget_empty')StatesTransitions
92.02 × 10-626,377212,593
101.17 × 10-627,033216,187
114.69 × 10-727,987219,568

Action Replacement: Minor to Major Maintenance

Globally replacing all Minor Maintenance actions (cost 1) with Major Maintenance (cost 2) raises the budget exhaustion probability from 1.17 × 10-6 to 2.20 × 10-5. This quantifies the policy's dependence on cheap interventions for conservative spending.

ConfigurationP_?(◊ 'budget_empty')
Baseline (no replacement, Bmax = 10)1.17 × 10-6
MN → MJ (37 actions)2.20 × 10-5
3.5% Safety Violation Probability (P<0.05(◊ 'failed'))

Model checking revealed that the trained policy has a safety-violation probability of 3.5% over the 20-year planning horizon, slightly above the theoretical minimum of 0%.

Model Size Comparison (States/Transitions)

Model TypeStatesTransitions
Induced DTMC (min)5,17431,494
Induced DTMC (max)27,856227,468
Full MDP156,57950,915,203

The induced Discrete-Time Markov Chain (DTMC) significantly reduces the state and transition space compared to the full MDP, making formal verification tractable and scalable for bridge networks.

Enterprise Process Flow

Encode MDP in PRISM
Train Deep RL Policy (PPO)
Verify Policy with PCTL Queries
Explain Policy Decisions (COOL-MC)

Our methodology outlines a four-stage process for developing and analyzing RL maintenance policies, integrating formal verification and explainability techniques like feature lumping and saliency ranking.

Global Feature Importance Ranking

RankFeatureMean |∇|
1year2.530
2cycle_year2.410
3cond_b12.089
4budget1.457
5cond_b21.278
6cond_b30.882
7init_done0.636

Temporal features (year, cycle_year) and bridge 1 condition (cond_b1) are the most influential, indicating policy decisions are strongly shaped by planning horizon and bridge 1's state, rather than symmetric treatment of all bridges.

0.07535 Failure Probability with Horizon Remap (P_?(◊ 'failed'))

When the policy believes the episode is always ending soon (horizon remap), the failure probability elevates to 0.07535 from a baseline of 0.0355, revealing reward hacking where maintenance costs are reduced near the planning horizon at the expense of structural safety.

Conditional Saliency Ranking (Worst Bridge Focus)

Worst BridgeRank 1Rank 2Rank 3
Bridge 1cond_b1 (2.741)year (1.713)budget (1.492)
Bridge 2cond_b1 (1.433)budget (1.161)cond_b2 (1.112)
Bridge 3cycle_year (4.026)year (3.059)cond_b1 (1.624)

The policy exhibits an attention bias towards bridge 1. Even when bridge 2 or 3 are in worst condition, bridge 1's condition (cond_b1) often remains a top influencing feature, indicating a potential coverage gap in the learned policy for other bridges.

Advanced ROI Calculator

Estimate your potential cost savings and efficiency gains by implementing verified AI policies in your enterprise.

Annual Savings $0
Hours Reclaimed Annually 0

Implementation Roadmap

A structured approach to integrating verified RL policies into your operational framework.

Phase 1: MDP Modeling & Baseline RL

Formally encode your multi-bridge network as an MDP in PRISM, capturing states, actions, transitions, and rewards. Train an initial deep RL policy (PPO) to establish a performance baseline.

Phase 2: Induced DTMC Construction & Verification

COOL-MC automatically constructs the induced Discrete-Time Markov Chain (DTMC) from your trained RL policy. Apply PCTL queries to formally verify safety and performance properties, such as bridge failure probability and budget utilization.

Phase 3: Explainability Analysis & Policy Profiling

Utilize COOL-MC's explainability methods (feature lumping, saliency ranking, action labeling, counterfactual analysis) to understand policy decision-making, identify biases, and characterize behaviors like horizon-gaming.

Phase 4: Policy Refinement & Re-verification

Based on the insights from verification and explainability, iteratively refine your MDP model or RL policy architecture. Re-verify the updated policy with COOL-MC to confirm that anomalies are resolved and safety properties are maintained.

Ready to Transform Your Operations with Verified AI?

Book a personalized consultation to discuss how COOL-MC can integrate into your existing infrastructure management systems and enhance safety with explainable, verifiable RL policies.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking