Enterprise AI Analysis: Search Planning
Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search
Modern e-commerce search is evolving from keyword matching to complex user intents. Existing LLM-based search paradigms face a blindness-latency dilemma: query rewriting is often environment-agnostic, leading to invalid plans, while deep search agents are too slow. We propose Environment-Aware Search Planning (EASP), a novel paradigm that reformulates search planning as a grounded, single-step reasoning process. EASP introduces a Probe-then-Plan mechanism, where a lightweight Retrieval Probe exposes the retrieval environment, enabling the Planner to diagnose execution gaps and generate grounded search plans. Our three-stage methodology includes Offline Data Synthesis, Planner Training and Alignment, and Adaptive Online Serving. Extensive evaluations and A/B testing on JD.com demonstrate EASP significantly improves relevant recall, UCVR, and GMV, with low latency, making it suitable for industrial deployment.
Key Business Impacts
Our analysis of Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search reveals significant gains in critical enterprise metrics, ensuring both performance and efficiency in AI-powered search.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Enterprise Process Flow
The Probe-then-Plan mechanism is central to EASP. Unlike traditional methods, a lightweight Retrieval Probe first provides a real-time snapshot of the retrieval environment (Oinit). This enables the Planner to make an informed, single-step decision, generating a grounded search plan that directly addresses identified execution gaps, such as entity drift or attribute misalignment, without requiring iterative, slow interactions. This approach resolves the 'blindness-latency' dilemma prevalent in other LLM-based search paradigms, ensuring both accuracy and speed.
The first stage involves generating a diverse dataset of execution-validated plans. A Teacher Agent (SOTA LLM) analyzes user queries (q) and retrieval snapshots (Oinit) from the Retrieval Probe. It performs perceptual diagnosis to categorize retrieval states (Effective, Recall Failure, Precision Failure) and then selects an adaptive planning strategy (Preservation, Sanitization, Concretization) to formulate optimal search plans. Stochastic decoding ensures a diverse set of valid reformulation strategies, which are then validated against the retrieval environment.
The Planner, a lightweight LLM, undergoes two phases: Supervised Fine-Tuning (SFT) on the offline dataset to internalize the Teacher's diagnostic and planning patterns, and Reinforcement Learning (RL) using Group Relative Policy Optimization (GRPO). RL aligns the Planner with critical business outcomes, specifically conversion rate (UCVR). The reward function incorporates a 'Hard Relevance Gate', ensuring that high-priced but irrelevant items are penalized, thus grounding conversion optimization in semantic accuracy. This stage ensures the Planner not only generates effective plans but also drives business value.
In the online serving stage, a Complexity-Aware Router (an even smaller language model) selectively activates the EASP pipeline. For simple queries (e.g., 'iPhone 17'), a Fast Path bypasses the planner entirely, ensuring near-zero latency for the majority (~80%) of traffic. Only complex queries trigger the full Complex Path with the Retrieval Probe and Planner, balancing latency and effectiveness. This intelligent routing mechanism is crucial for industrial deployment, where sub-second response times are paramount.
| Method | REL@30 | HR@30 (%) | Latency |
|---|---|---|---|
| Blind Rewriter | 20.7 | 28.6 | low (ms level) |
| w/o RL | 23.0 | 29.5 | low (ms level) |
| ReAct Agent | 24.1 | 30.2 | high (s level) |
| EASP (ours) | 23.3 | 31.0 | low (ms level) |
| Note: EASP achieves optimal balance between performance and efficiency, outperforming baselines across key metrics while maintaining millisecond latency. | |||
Online A/B testing on JD.com's live traffic showed statistically significant lifts: +0.89% UCVR (p<0.05) and +0.57% GMV (p<0.05) on overall traffic. For requests that triggered the new model (complex path), UCVR increased by 4.10% and GMV by 2.59%. End-to-end planner latency was under 20ms at p75 and under 700ms at p99, well within industrial constraints. These results confirm EASP's industrial viability and its successful deployment in JD.com's AI-Search system.
Calculate Your Potential ROI
Estimate the potential annual savings and reclaimed hours for your enterprise by leveraging AI-powered search planning. Adjust the parameters below to see a personalized projection.
Our Proven Implementation Roadmap
A structured approach to ensure seamless integration and maximum impact.
Phase 1: Foundation & Data Integration
Integrate EASP with existing search infrastructure, establish data pipelines for Retrieval Probe, and synthesize initial offline training data.
Phase 2: Model Training & Business Alignment
Initiate supervised fine-tuning for the Planner, followed by reinforcement learning (GRPO) to align with conversion objectives and business KPIs.
Phase 3: Adaptive Deployment & Monitoring
Deploy the Complexity-Aware Router and EASP pipeline with A/B testing, continuous monitoring, and iterative refinement based on online performance.
Phase 4: Personalization & Advanced Features
Extend EASP to incorporate user behavioral signals for personalized planning, adapting strategies based on individual preferences and interaction history.
Ready to Transform Your Enterprise?
Schedule a personalized strategy session with our AI experts to explore how EASP can revolutionize your search capabilities.