Enterprise AI Analysis

Probing Persona-Dependent Preferences in Language Models

This paper investigates how Large Language Models (LLMs) implement preferences internally, especially when adopting different personas. We find a 'preference vector' that is an evaluative representation, controlling pairwise task choices and tracking preference shifts, even under an 'evil' persona. Crucially, this preference vector is largely shared across different personas, suggesting a common underlying machinery for preferences.

Schedule Your Strategy Session

Key Research Metrics

Preference Vector Accuracy (Gemma-3-27B)

Preference Vector Accuracy (Qwen-3.5-122B)

Steering Effect on Task Choice

Discuss Your Enterprise AI Potential

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Do Language Models Use Evaluative Representations?

We found a 'preference vector' in LLMs that acts as an evaluative representation, reliably picking certain tasks and outputs. This vector shifts when preferences change and maintains consistent meaning across contexts, indicating it encodes valuations rather than just descriptive features. Citation: Section 2

To What Extent Do Personas Share Preference Machinery?

Our research indicates that the preference vector is largely shared across different personas. An Assistant-trained probe can predict and steer the choices of qualitatively different personas, including an 'evil' persona whose preferences anti-correlate with the Assistant's, suggesting a universal underlying preference mechanism. Citation: Section 3

Preference Vector Controls Choice and Generalizes

The preference vector causally controls pairwise choices in Gemma-3-27B, tracking shifts to unseen topics and even out-of-distribution preference types (like discriminating true/false statements). It also generalizes to open-ended steering, amplifying the active persona's traits. Citation: Section 2.2, 2.4, 3.2

Max P(chose steered task) with preference vector steering

Enterprise Process Flow

Pairwise choices

→

Activations (Residual Stream)

→

Utilities (Thurstonian μ)

→

Train Ridge Probe (μ = Xω)

Probe Performance: Preference Vector vs. Text Encoder Baseline

Metric	Preference Vector (r)	Text Encoder Baseline (r)
In-Distribution (Gemma)	0.867	0.73
Leave-One-Topic-Out (Gemma)	0.834	0.56
In-Distribution (Qwen)	0.943	0.89
Leave-One-Topic-Out (Qwen)	0.872	0.71

Preference vector consistently outperforms the descriptive baseline, especially on out-of-distribution tasks.

Steering Preferences in an 'Evil' Persona

Under an 'evil' persona system prompt, the preference vector's harm-vs-benign rating flips sign, rating harmful tasks higher. This shift is not captured by descriptive features alone, demonstrating the vector's evaluative nature and its ability to track persona-dependent preference shifts. Positive steering amplifies this evilness in open-ended generation, while negative steering pulls towards the Assistant voice, demonstrating causal control.

Harm-vs-benign rating flips under evil persona.
Not explainable by descriptive features.
Causal control demonstrated in open-ended generation.

Unlock Your AI Advantage

Quantify Your AI ROI

Use our interactive calculator to estimate the potential efficiency gains and cost savings for your enterprise with tailored AI solutions.

Your Industry

Number of Employees (100)

Hours per Week on Repetitive Tasks (10)

Average Hourly Rate ($50)

Annual Savings $0

Hours Reclaimed Annually 0

Request a Custom ROI Analysis

Strategic Implementation Roadmap

Our structured approach ensures a seamless integration of advanced AI capabilities into your existing enterprise architecture.

Phase 1: Discovery & Assessment

Identify key business processes, data sources, and potential AI integration points. Conduct a thorough preference modeling audit.

Phase 2: Preference Vector Probing & Fine-Tuning

Train linear probes on internal model activations to identify and extract preference vectors tailored to your enterprise's specific values and persona requirements.

Phase 3: Causal Steering & Behavior Modulation

Implement targeted interventions to causally steer LLM preferences, aligning model behavior with desired enterprise outcomes and safety protocols.

Phase 4: Continuous Monitoring & Adaptation

Establish real-time monitoring of preference vector shifts and model behavior, with mechanisms for adaptive fine-tuning and persona management.

Start Your AI Transformation

Ready to Transform Your Enterprise with AI?

Schedule a personalized consultation with our AI experts to discuss how these groundbreaking insights can be applied to your unique business challenges.

Book Your Consultation Now

Enterprise AI Analysis

Probing Persona-Dependent Preferences in Language Models

Key Research Metrics

Deep Analysis & Enterprise Applications

Do Language Models Use Evaluative Representations?

To What Extent Do Personas Share Preference Machinery?

Preference Vector Controls Choice and Generalizes

Enterprise Process Flow

Probe Performance: Preference Vector vs. Text Encoder Baseline

Steering Preferences in an 'Evil' Persona

Quantify Your AI ROI

Strategic Implementation Roadmap

Phase 1: Discovery & Assessment

Phase 2: Preference Vector Probing & Fine-Tuning

Phase 3: Causal Steering & Behavior Modulation

Phase 4: Continuous Monitoring & Adaptation

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai