Skip to main content
Enterprise AI Analysis: Probing Persona-Dependent Preferences in Language Models

Enterprise AI Analysis

Probing Persona-Dependent Preferences in Language Models

This paper investigates how Large Language Models (LLMs) implement preferences internally, especially when adopting different personas. We find a 'preference vector' that is an evaluative representation, controlling pairwise task choices and tracking preference shifts, even under an 'evil' persona. Crucially, this preference vector is largely shared across different personas, suggesting a common underlying machinery for preferences.

Key Research Metrics

Preference Vector Accuracy (Gemma-3-27B)
Preference Vector Accuracy (Qwen-3.5-122B)
Steering Effect on Task Choice

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Do Language Models Use Evaluative Representations?

We found a 'preference vector' in LLMs that acts as an evaluative representation, reliably picking certain tasks and outputs. This vector shifts when preferences change and maintains consistent meaning across contexts, indicating it encodes valuations rather than just descriptive features. Citation: Section 2

To What Extent Do Personas Share Preference Machinery?

Our research indicates that the preference vector is largely shared across different personas. An Assistant-trained probe can predict and steer the choices of qualitatively different personas, including an 'evil' persona whose preferences anti-correlate with the Assistant's, suggesting a universal underlying preference mechanism. Citation: Section 3

Preference Vector Controls Choice and Generalizes

The preference vector causally controls pairwise choices in Gemma-3-27B, tracking shifts to unseen topics and even out-of-distribution preference types (like discriminating true/false statements). It also generalizes to open-ended steering, amplifying the active persona's traits. Citation: Section 2.2, 2.4, 3.2

Max P(chose steered task) with preference vector steering

Enterprise Process Flow

Pairwise choices
Activations (Residual Stream)
Utilities (Thurstonian μ)
Train Ridge Probe (μ = Xω)

Probe Performance: Preference Vector vs. Text Encoder Baseline

Metric Preference Vector (r) Text Encoder Baseline (r)
In-Distribution (Gemma) 0.867 0.73
Leave-One-Topic-Out (Gemma) 0.834 0.56
In-Distribution (Qwen) 0.943 0.89
Leave-One-Topic-Out (Qwen) 0.872 0.71

Preference vector consistently outperforms the descriptive baseline, especially on out-of-distribution tasks.

Steering Preferences in an 'Evil' Persona

Under an 'evil' persona system prompt, the preference vector's harm-vs-benign rating flips sign, rating harmful tasks higher. This shift is not captured by descriptive features alone, demonstrating the vector's evaluative nature and its ability to track persona-dependent preference shifts. Positive steering amplifies this evilness in open-ended generation, while negative steering pulls towards the Assistant voice, demonstrating causal control.

  • Harm-vs-benign rating flips under evil persona.
  • Not explainable by descriptive features.
  • Causal control demonstrated in open-ended generation.

Quantify Your AI ROI

Use our interactive calculator to estimate the potential efficiency gains and cost savings for your enterprise with tailored AI solutions.

Annual Savings $0
Hours Reclaimed Annually 0

Strategic Implementation Roadmap

Our structured approach ensures a seamless integration of advanced AI capabilities into your existing enterprise architecture.

Phase 1: Discovery & Assessment

Identify key business processes, data sources, and potential AI integration points. Conduct a thorough preference modeling audit.

Phase 2: Preference Vector Probing & Fine-Tuning

Train linear probes on internal model activations to identify and extract preference vectors tailored to your enterprise's specific values and persona requirements.

Phase 3: Causal Steering & Behavior Modulation

Implement targeted interventions to causally steer LLM preferences, aligning model behavior with desired enterprise outcomes and safety protocols.

Phase 4: Continuous Monitoring & Adaptation

Establish real-time monitoring of preference vector shifts and model behavior, with mechanisms for adaptive fine-tuning and persona management.

Ready to Transform Your Enterprise with AI?

Schedule a personalized consultation with our AI experts to discuss how these groundbreaking insights can be applied to your unique business challenges.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking