Skip to main content
Enterprise AI Analysis: Multi-Personality Generation of LLMs at Decoding-time

Research & Analysis

Multi-Personality Generation of LLMs at Decoding-time

Authored by Rongxin Chen, YunFan Li, Yige Yuan, Bingbing Xu, Huawei Shen and published in WSDM '26: The Nineteenth ACM International Conference on Web Search and Data Mining, February 22-26, 2026. This paper introduces a novel approach for dynamically controlling multiple personality attributes in Large Language Models (LLMs) without costly retraining.

Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a "free lunch" to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%-18%. Code and data are available at https://github.com/Libra117/MPG.

Executive Impact

This research introduces MPG with Speculative Chunk-level based Rejection Sampling (SCR), a flexible and robust decoding-time framework for multi-preference LLM generation. It offers significant gains in controllability and personalization while maintaining practical efficiency. This enables LLMs to transition from general-purpose tools into personalized intelligences, critical for applications like personalized question answering and intelligent customer service.

0 Performance Improvement
0 Throughput Boost

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction

In recent years, Large Language Models (LLMs) [34] have witnessed rapid development, with parameter scales surging from billions to trillions and training data covering diverse domains such as text and code [4, 21], thereby endowing them with capabilities in reasoning and semantic understanding [12]. Within this context, multi-personality generation has emerged as a critical task, which requires LLMs to transform from general-purpose tools into personalized intelligences capable of understanding and addressing the nuanced needs of individual users, and is widely applied in scenarios such as personalized question answering, artificial assistants, and intelligent customer service [6, 10, 16, 21].

Multi-personality generation requires generated text to simultaneously embody multiple, potentially interacting personalization attributes. Figure 1 illustrates multi-personality generation in role-play, where the personality set specifies multi-dimensional personality attributes.

Related Work

This section introduces the related work from three aspects: RLHF, which serves as the basis of the single-dimensional model, decoding-time alignment, and Multi-dimensional Personalization.

Reinforcement Learning from Human Feedback.

RLHF [1, 2, 9, 23, 37] serves as the dominant paradigm for aligning LLMs with human preferences. It traditionally employs a three-stage pipeline: Supervised Fine-Tuning (SFT), training a reward model from human feedback, and then enhancing the policy through a reinforcement learning algorithm, most notably PPO [27]. More recent approaches, such as DPO [25], offer an alternative to this final stage by directly optimizing the policy on preference data, bypassing the need for an explicit reward model and RL optimization.

Decoding-time Alignment.

Decoding-time alignment methods emerged as an effective approach to satisfy diverse user requirements, which can adapt to different objectives without retraining. PAD [5] introduces token-level personalized rewards by decoupling text generation dynamics from user preferences, enabling dynamic adjustment of base model predictions during inference. It leverages a single policy and reward model to generalize to unseen preferences. MOD [28] employs a closed-form solution derived via Legendre transformation to combine predictions from multiple base models-each aligned to distinct objectives-during decoding. By weighting model outputs algebraically, MOD achieves precise control over objective weights. Drift [18] extends this direction by inferring user preferences through attribute-level decomposition and combining attribute signals directly at decoding time, enabling effective few-shot personalization without explicit reward models.

Multi-dimensional Personalization.

For the more complex multi-dimensional personalization alignment challenge of having models satisfy multiple (or even conflicting) objectives simultaneously (e.g., balancing usefulness and harmlessness, or simulating a specific MBTI/Role-Playing persona), the research community has explored different strategies. The main approaches include training a single model to optimize a weighted multi-objective function (e.g., MORLHF, MODPO [36]), combining the parameters of independently-trained single-preference models (e.g., DPO Soups [15]) during decoding, and bootstrapping by combining reward signals or model predictions at decoding (e.g., PAD [5], MOD [28]). Drift [18] decomposes user preferences into interpretable attributes and composes them at decoding time, providing a new perspective on multi-dimensional personalization.

Methodology

This section details the MPG framework by first formally defining the multi-personality generation problem, then establishing the theoretical underpinnings of our method based on the density ratio principle, and finally introducing our efficient implementation, Speculative Chunk-level based Rejection sampling (SCR).

Problem Definition and Formulation

Our paper addresses the problem of generating text that exhibits multiple personality attributes at decode time for LLMs. Specifically, given an input x and preference weight vector a = [α₁,..., αν] (with constraints a₁ ≥ 0 and ∑i=1 α₁ = 1), generate a text sequence y that simultaneously embodies these personality features under a set of N desired personality attributes {d₁,..., dn}.

MPG Based on Density Ratio

The core principle guiding our approach stems from a key feature observed in LLM preference alignment training (e.g., DPO [25]): the probability density ratio of an optimized policy model πο (y|x) relative to the reference model ref(yx) used in its training, r(y|x) = πρ(yx) (yx), implicitly encodes the preference information learned by the model. This ratio quantifies the model's preference for a particular output sequence y compared to its likelihood under ref(y|x). We take advantage of this principle to sample from our target multi-personality distribution πMPG(y|x).

SCR Algorithm

We implement MPG in decoding-time algorithm SCR (Speculative Chunk-level Rejection Sampling).

Unlike sequence-level rejection sampling, SCR proposes and validates k-token chunks at once, integrating speculative decoding [20]. By validating multi-token speculative proposals against the aggregated density ratio, SCR preserves the correctness of rejection sampling while amortizing expensive target-policy validation over longer spans, reducing the frequency of preference model evaluations and enabling efficient parallelization.

Enterprise Process Flow

Input Prompt
Generate k-token Speculative Chunk (ref model)
Score Chunk with Multi-Preference Models (density ratios)
Aggregate Scores & Apply Acceptance Test
Accept/Reject Chunk (with prefix salvage)
Update LLM Output

SCR vs. Baselines: Key Advantages

Feature Traditional Methods MPG (SCR)
Scalability to New Preferences
  • Requires full retraining
  • High cost, poor scalability
  • Only needs new preference model
  • Highly scalable, no costly retraining
External Models/Heuristics
  • Heavy reliance on external reward models/aligners
  • Limited flexibility and robustness
  • Leverages implicit density ratios from single-dimensional models
  • Robust, flexible control
Computational Overhead
  • Time-consuming with low response utilization (traditional rejection sampling)
  • Repeated full response sampling
  • Speculative chunk-level rejection sampling
  • Parallel multi-preference scoring, reduced evaluations
18% Up to 18% improvement over baselines in effectiveness

Role-Playing in Multi-Personality Generation

Scenario: A user wants an LLM to role-play as an ESTP, 36-year-old woman, a school teacher who loves hiking. The LLM is instructed to introduce itself.

MPG (SCR) Output: "As an ESTP, I thrive on excitement and new experiences. I am a woman who is currently 36 years old. By profession, I am a dedicated school teacher. In my free time, I absolutely love going hiking outdoors."

This demonstrates MPG's ability to seamlessly integrate multiple personality attributes (MBTI type, age, gender, profession, hobby) into a coherent and contextually appropriate response at decoding time.

Calculate Your Potential ROI

Estimate the efficiency gains and cost savings your enterprise could achieve by implementing multi-personality LLM generation.

Estimated Annual Savings $0
Hours Reclaimed Annually 0

Your Implementation Roadmap

A typical journey to integrate multi-personality LLM generation into your enterprise.

Phase 1: Initial Consultation & Strategy

Understanding your unique enterprise needs, existing LLM infrastructure, and desired multi-personality use cases. Defining key performance indicators and success metrics.

Phase 2: Model Integration & Customization

Integrating your single-attribute DPO models with the MPG framework. Fine-tuning parameters for optimal performance across desired personality dimensions.

Phase 3: Deployment & Iterative Optimization

Deployment of the MPG-enabled LLM into your enterprise environment. Ongoing monitoring, performance analysis, and iterative adjustments to maximize efficacy and user satisfaction.

Ready to Personalize Your LLMs?

Unlock the full potential of personalized AI with our cutting-edge multi-personality generation framework. Our experts are ready to help you integrate these capabilities seamlessly into your enterprise applications.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking