Enterprise AI Analysis
Using Large Language Models to Detect Insufficient Effort Responding in Open-Ended Survey Questions
NICK VON FELTEN, University of St. Gallen, St Gallen, SG, Switzerland
Revolutionizing Data Quality in HCI Research
Careless responses pose a challenge for data quality in online survey research, a core method in human-computer interaction (HCI). Open-ended answers can reveal such insufficient effort responding (IER), but are costly to evaluate manually. I explore the use of two large language model (LLM) pipelines to automate IER detection in a dataset of 1,551 open-text responses: using open-source embedding models with standard classifiers, and using text-generation labelling with GPT-40-mini. Embedding-based models achieved higher precision, but missed inattentive responses, whereas text generation showed better accuracy yet tended to overpredict IER. These patterns were explained by severe class imbalance, which was identified as a typical feature of high-quality crowdsourced samples and thus a central challenge for automated IER detection. I discuss how such pipelines could be integrated into human-in-the-loop workflows and emphasize the need for curated, openly available datasets and improved model engineering to advance reliable IER detection.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
This paper investigated the effectiveness of Large Language Models (LLMs) in automatically detecting Insufficient Effort Responding (IER) in open-ended survey questions. Two primary pipelines were evaluated: feature extraction using open-source embedding models with classifiers, and text generation using GPT-40-mini. While both approaches performed well on the majority class (valid responses), they struggled with the minority class (IER) due to severe class imbalance. The study highlights the potential of LLMs in HCI research for data quality, but emphasizes the need for human-in-the-loop workflows and advanced model engineering to overcome current limitations.
Enterprise Process Flow
The study utilized a dataset of 1,551 open-ended responses from HCI player experience research. Participants were instructed to describe a digital game in at least 50 words, ensuring genuine engagement. A human rater manually classified responses based on adherence to instructions, memory of the game, and description validity, establishing a "ground truth" for IER detection. This contextualized dataset provided an ideal test case for evaluating LLM-based approaches, moving beyond superficial anomaly detection.
| Approach | Key Strengths (IER Detection) | Challenges (IER Detection) | Notable Metrics |
|---|---|---|---|
| Feature Extraction (Embedding Models + Classifiers) |
|
|
|
| Text Generation (GPT-40-mini via API) |
|
|
|
The Class Imbalance Dilemma in IER Detection
A major challenge identified was the strong class imbalance inherent in the dataset, where valid responses vastly outnumber insufficient effort responses. This is a typical characteristic of high-quality crowdsourced samples, which while desirable for overall data quality, severely limits the available data for training robust IER detection models. This imbalance leads to models frequently overpredicting the majority class and struggling to accurately identify the minority (IER) class, impacting F1-score and average precision across both pipelines.
- Crowdsourced data often presents severe class imbalance: IER is a small minority.
- Limits training data for effective IER detection models.
- Models show a systematic tendency to predict the majority (valid) class.
- Directly impacts key metrics like F1-score and Average Precision for the minority class.
Integrating LLMs into Human-in-the-Loop Workflows
Given the LLMs' tendency to overpredict IER, they can be productively integrated into human-in-the-loop workflows to enhance research rigor and efficiency. Two main approaches include full human-in-the-loop annotation, where LLMs act as independent annotators for inter-rater reliability checks, helping to identify potential IER cases missed by human researchers. Alternatively, semi-automatic annotation allows LLMs to pre-screen responses, flagging likely IER cases for human review, thereby reducing workload. However, the semi-automatic approach requires further validation studies to ensure quality control, making the full human-in-the-loop route safer for rigorous research in its current state.
- LLMs can act as independent annotators, identifying discrepancies in human labeling.
- Pre-screening by LLMs can reduce human workload in large datasets.
- Rigorous validation is crucial for semi-automatic annotation workflows.
- Current recommendation: Full human-in-the-loop for maximum rigor and transparency.
Key Takeaways and Engineering Advancements for HCI Researchers
This exploratory study indicates that off-the-shelf LLMs do not yet offer a robust, standalone solution for automated IER detection in open-ended survey data. While embedding-based classifiers are useful for exploratory analysis and are inexpensive, they tend to miss IER. Text generation approaches show higher sensitivity but significantly overclassify IER. Therefore, current research practice should prioritize human-in-the-loop annotation to ensure rigor. Future advancements will likely require more intricate workflows, improved handling of class imbalance, and fine-tuning of large language models on curated, domain-specific datasets to improve reliability and scalability of IER detection in HCI research.
- Off-the-shelf LLMs are not a robust standalone solution for IER detection.
- Embedding models: inexpensive, good for exploration, but may miss IER.
- Text generation models: sensitive but prone to overclassification.
- Prioritize human-in-the-loop annotation for rigorous research.
- Future work: intricate workflows, class imbalance strategies, fine-tuning LLMs on curated datasets.
Calculate Your Potential AI Impact
Estimate the efficiency gains and cost savings your enterprise could achieve by integrating AI-driven data quality solutions.
Your AI Implementation Roadmap
A structured approach to integrating advanced AI solutions for optimal enterprise impact.
Phase 1: Discovery & Strategy
In-depth analysis of current workflows, identification of AI opportunities, and development of a tailored implementation strategy with clear KPIs.
Phase 2: Solution Design & Prototyping
Custom AI model design, system architecture planning, and rapid prototyping to validate concepts and gather early feedback.
Phase 3: Development & Integration
Full-scale development of AI solutions, seamless integration with existing enterprise systems, and rigorous testing.
Phase 4: Deployment & Optimization
Go-live, continuous monitoring of performance, iterative optimization based on real-world data, and ongoing support.
Ready to Transform Your Enterprise?
Book a personalized consultation with our AI experts to explore how these insights can be applied to your unique business challenges.