Skip to main content
Enterprise AI Analysis: Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Enterprise AI Analysis

Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Authored by Zhiyu Cao, Peifeng Li, Qiaoming Zhu* from School of Computer Science and Technology, Soochow University, Suzhou, China.

Abstract

Conversational Query Rewriting (CQR) aims to rewrite ambiguous queries to achieve more efficient conversational search. Early studies have predominantly focused on the rewriting in isolation, ignoring the feedback from query rewrite, passage retrieval and response generation in the rewriting process. To address this issue, we propose Multi-Faceted Self-Consistent Preference Aligned CQR (MSPA-CQR). Specifically, we first construct self-consistent preference alignment data from three dimensions (rewriting, retrieval, and response) to generate more diverse rewritten queries. Then we propose prefix guided multi-faceted direct preference optimization to learn preference information from three different dimensions. The experimental results show that our MSPA-CQR is effective in both in- and out-of-distribution scenarios.

Executive Impact: Key Performance Indicators

Our analysis highlights significant advancements in conversational search driven by multi-faceted preference alignment.

+2.9% MAX MRR GAIN (DENSE)
95.2% RECALL@100 (SPARSE)
3 PREFERENCE DIMENSIONS

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction
Methodology
Experimental Results
Analysis
Conclusion

The Challenge of Conversational Search

Conversational Question Answering (CQA) often struggles with ambiguous user queries, leading to irrelevant passage retrieval. Conversational Query Rewriting (CQR) is a crucial intermediate step to decontextualize queries and improve retrieval. However, traditional CQR methods often lack feedback from subsequent stages like retrieval and response generation, limiting their effectiveness beyond human readability.

Limitations of Existing CQR Approaches

Early CQR studies primarily focused on human-labeled rewritten queries for training, which are costly and don't necessarily benefit retrieval directly. While some recent works incorporate retrieval supervision, they heavily rely on annotated gold passages, making them less generalizable to unlabeled data and time-consuming. This paper addresses these limitations by integrating feedback from rewriting, retrieval, and response without relying on manual annotations.

MSPA-CQR Enterprise Process Flow

LLM Samples Multiple Rewritten Queries
Retrieve Passages & Generate Responses
Self-Consistency Scoring (Rewriting, Retrieval, Response)
Select Chosen & Rejected Samples
Prefix-Guided MDPO Training (Preference Alignment)
Generate Multi-Faceted Rewritten Queries for Retrieval

Multi-Faceted Preference Data Construction

The first stage involves generating diverse rewritten queries using LLMs and then scoring them across three dimensions: rewriting, retrieval, and response. Self-consistency scoring methods are designed for each preference. For rewriting and response, NLI models measure semantic similarity. For retrieval, intersection of retrieved passages indicates consistency. A key aspect of rewriting scoring is a penalty term for overly concise queries. The chosen and rejected samples are then used for preference optimization.

Prefix-Guided Multi-Faceted Direct Preference Optimization (MDPO)

Building on the DPO framework, MSPA-CQR uses MDPO to align the model with different preferences. Distinct prefix tags like '[REWRITE]', '[RETRIEVAL]', and '[RESPONSE]' are prepended to instructions during training. This mechanism allows the model to differentiate and learn from the various preference types, generating rewritten queries tailored to specific objectives during inference. This adaptive approach enhances the model's ability to capture nuanced information for better conversational search.

Performance Comparison on TopiOCQA (Dense Retrieval)

MSPA-CQR demonstrates superior performance across key metrics compared to state-of-the-art baselines in dense retrieval on the TopiOCQA dataset, highlighting the effectiveness of multi-faceted preference alignment.

System MRR NDCG@3 R@100
MSPA-CQR 41.4 39.5 77.4
AdaCQR 38.5 37.6 75.0
RETPO 30.0 28.9 68.7
CHIQ 38.0 37.0 61.6
0.36 Kendall-Tau Correlation (Rewrite vs. Retrieval)

Low Kendall-Tau correlation between rewriting and retrieval preferences (TopiOCQA) highlights significant differences in query preferences, underscoring the need for a multi-faceted approach.

Zero-Shot Generalization

The MSPA-CQR model exhibits strong zero-shot generalization capabilities, outperforming all strong baselines on TREC CAsT 19-21 datasets. This indicates that the multi-faceted preference alignment successfully mitigates overfitting and allows the model to perform well in out-of-distribution scenarios, which is crucial for real-world applications.

Complementary Nature of Preferences

Ablation studies reveal that utilizing individual or pairwise preferences leads to a decline in performance, demonstrating the complementary effects of combining rewriting, retrieval, and response preferences. The low intersection size of passages retrieved under different preferences (consistently below 70%) further supports that each preference contributes unique and valuable information.

Case Study: Ned Kelly Article (TopiOCQA)

Dialogue Context

Q: What did Ned Kelly do?

A: An Australian bushranger, outlaw, gang leader and convicted police murderer.

Q: What happened after he was locked up for the night?

A: While he was escorted by four policemen, he absconded and ran, taking refuge in a shoemaker's shop.

Q: Did it say what happened in the shoemakers shop?

A:

Ground Truth: Did the Ned Kelly article say what happened to Kelly in the shoemakers shop? (Rank: 79)

Rewritten Queries

Rewrite Preferred Query: Did it say what happened in the shoemaker's shop when Ned Kelly was there? (Rank: Not Found)

Retrieval Preferred Query: Did it say what happened in the shoemakers shop? (Rank: Not Found)

Response Preferred Query: Did the Ned Kelly article say what happened in the shoemakers shop after Ned Kelly took refuge there? (Rank: Not Found)

Rewrite+Retrieval+Response: Rank: 19

Key Insight: This case highlights how individual preferences can lead to missing crucial information or introducing misleading context. The combined MSPA-CQR query integrates key information from all three dimensions ('shoemakers shop', 'Kelly', 'took refuge'), successfully retrieving the gold passage (ranked 19th), while individual preferences failed to find it.

Impact of Prefix Guidance

Removing preference prefixes during MDPO training results in a significant performance decrease across all metrics. This underscores the importance of distinguishing between the three preferences, as prefixes enable the model to capture the unique characteristics and requirements of each dimension, leading to more effective query rewriting.

Summary of MSPA-CQR Contributions

The paper introduces Multi-Faceted Self-Consistent Preference Aligned CQR (MSPA-CQR), a novel framework that enhances conversational search by operating along three dimensions: rewriting, retrieval, and response. Its two main stages, multi-faceted preference data construction and prefix-guided MDPO, enable the model to generate more comprehensive and effective rewritten queries. Experimental results consistently show MSPA-CQR's superior performance in both in- and out-of-distribution scenarios.

Future Directions

While MSPA-CQR significantly advances CQR, there are avenues for further optimization. Future work could explore more advanced heuristic scoring methods for preference data consistency, investigate mutual promotion strategies between the three preferences rather than simple mixing, and experiment with other effective alignment methods beyond DPO to further refine the optimization process.

Advanced ROI Calculator

Estimate the potential return on investment for implementing multi-faceted query rewriting in your enterprise.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Implementation Roadmap

A typical phased approach to integrate MSPA-CQR within your existing enterprise search infrastructure.

Phase 1: Discovery & Integration Assessment (2-4 Weeks)

Detailed analysis of current CQR and retrieval systems. Evaluate data readiness and integration points for MSPA-CQR. Define custom preference alignment metrics and data sources relevant to your business objectives.

Phase 2: Model Customization & Data Generation (4-8 Weeks)

Fine-tune MSPA-CQR base models (LLaMA2-7B, Qwen2.5-32B) with your domain-specific data. Implement automated self-consistency scoring for your unique rewriting, retrieval, and response preferences. Generate initial preference-aligned training data.

Phase 3: Preference Optimization & Evaluation (6-10 Weeks)

Apply prefix-guided MDPO to align the CQR model with your custom multi-faceted preferences. Conduct rigorous A/B testing and offline evaluations against existing baselines using your enterprise datasets. Iterate on preference scoring and model parameters for optimal performance.

Phase 4: Deployment & Monitoring (Ongoing)

Integrate the optimized MSPA-CQR model into your production conversational search environment. Establish continuous monitoring for retrieval performance and user satisfaction. Implement feedback loops to further refine preference alignment and model adaptation over time.

Ready to Transform Your Conversational Search?

Schedule a no-obligation strategy session to explore how MSPA-CQR can deliver precise, self-contained, and contextually rich query rewriting for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking