Enterprise AI Analysis

Complementary Learning Approach for Text Classification using Large Language Models

This study proposes a structured methodology for utilizing Large Language Models (LLMs) in a cost-efficient, parsimonious manner. It integrates the strengths of human scholars and machines while offsetting their respective weaknesses, extending best practices from qualitative research to human-machine teams in quantitative research. The method employs "chain of thought" and "few-shot learning" prompting, allowing humans to utilize abductive reasoning and natural language to interrogate not just what the machine has done but also what the human has done. Our method highlights how scholars can manage LLMs' inherent weaknesses using careful, low-cost techniques. We demonstrate how to use the methodology to interrogate human-machine rating discrepancies for a sample of 1,934 press releases announcing pharmaceutical alliances (1990–2017).

Schedule Your Strategy Session

Executive Impact Snapshot

Facilitated through “chain of thought” and “few-shot learning" prompting from computer science, we introduce a method of utilizing large language models (LLMs) to efficiently tackle large text corpora that require subject matter expertise to classify properly. The approach combines the contextual understanding, creativity, and ethical judgment of humans with the computational efficiency, rule-derived consistency, and large dataset processing capacity of LLMs. We draw upon and extend best practices for co-author teams in qualitative research to human-machine teams in quantitative research, allowing humans to utilize natural language to interrogate not just what the machine has done but also what the human has done. We demonstrate how to use the methodology on a sample of 1,934 press releases announcing pharmaceutical alliances (1990–2017).

0% Final Classification Precision (Round 2)

0 instances Human Error Remediation (Round 2)

0 instances LLM Error Remediation (Round 2)

0 Documents Processed in Study

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Methodology

Theory

LLM Capabilities

Enhanced Reliability & Human-Machine Collaboration

Our Complementary Learning Approach (CLA) significantly bolsters the reliability and trustworthiness of text classification. By engaging human scholars and LLMs in an iterative, dialectical process, CLA serves as a "cognitive mirror," forcing humans to confront their own cognitive biases, implicit biases, and attention deficits. This process allows for the explicit articulation of classification schema and helps identify previously unrecognized complications in source data, such as irrelevant boilerplate text, leading to more robust and objective scholarship. The methodology ensures that irrelevant data is filtered, classifications are based on fulsome descriptions, and human raters are consistently held accountable.

Advancing Epistemology, Socio-Technical Systems, & Meaning-Making

CLA contributes to theories of epistemics and socio-technical systems by asserting that knowledge is co-constructed through multiple sources and perspectives, embracing a pluralistic view. It facilitates a synergistic interplay between human expertise and LLM capabilities, transcending the limitations of isolated operations. Furthermore, the approach advances understanding of meaning-making and vocabularies. By building on scholarly insights to teach LLMs and allowing LLMs to highlight overlooked issues, CLA provides a powerful tool for theory-building, especially when tackling unique vocabularies across vast text corpora, aligning with Peirce's philosophy of fallibilism where knowledge is continually refined through inquiry and feedback.

Leveraging LLM Strengths and Mitigating Weaknesses

The CLA harnesses LLMs' advanced capabilities, particularly their transformer architectures and attention mechanisms, which enable deep contextual understanding and nuanced classification without extensive pre-labeled datasets. Through "chain of thought" (COT) prompting, complex problems are broken into manageable steps, enhancing LLM reasoning and transparency. "Few-shot learning" (FSL) reinforces specific network pathways with multiple examples, obviating the need for burdensome fine-tuning and preventing "catastrophic forgetting." This strategic use ensures LLMs function as reliable, rule-based agents, immune to human biases when properly prompted, and capable of dynamic context adjustment for coherent output.

0% Final Classification Precision Achieved with CLA (Round 2)

Enterprise Process Flow

1. Manual Classification (Subset)

→

2. Develop Initial Prompt (A+B+C)

→

3. LLM Classification (Remainder Subset)

→

4. Comparison (Identify Discrepancies)

→

5. Hypothesis-Building (Cause/Remedy)

→

6. Reconciliation & Refinement

Comparison to Other Text Classification Methodologies

Method	Definition & What It Does	Limitations	How Our Approach Addresses Limitations
Only Human	Relies solely on human raters to classify texts. Utilizes inter-rater reliability.	Subjectivity and inconsistency in classifications. High cost and time for large datasets.	Integrates human insight with LLMs to enhance accuracy and efficiency, leveraging the best of both human judgment and AI.
Lexicon-based	Ignores word order and relies on statistics of word counts or co-occurrences for assessment.	Requires pre-made dictionaries to control for context; ignores context.	Can organically evaluate based on context with no need for pre-made dictionaries.
Topic Modeling	An unsupervised ML technique identifying latent themes by analyzing word co-occurrences/frequencies.	Ignores word order and context; may lead to misinterpretation of nuanced or complex content.	Integrates contextual understanding by using LLMs, which preserve word order and leverage context throughout the text.
Support Vector Machines (SVM)	A machine learning model using a hyperplane categorizes data into different classes.	Requires extensive feature engineering; struggles with high dimensionality and non-linear data; requires large, labeled datasets.	Eliminates the need for manual feature engineering through LLMs' ability to understand and process text in its raw form. By leveraging FSL and COT, our approach minimizes the need for large, labeled datasets and manual feature engineering.
Random Forest	An ensemble learning method building multiple decision trees for accurate/stable prediction.	Can overfit complex text data; feature selection is crucial and challenging; requires large, labeled datasets.	Uses LLMs' pre-trained knowledge to automatically extract and utilize relevant features, avoiding overfitting. By leveraging FSL and COT, our approach minimizes the need for large, labeled datasets and manual feature engineering.

Resolving Discrepancies in Pharmaceutical Text Classification with CLA

The iterative process of CLA revealed several root causes for discrepancies between human and LLM classifications, leading to significant prompt refinement:

Unexpected Output 1: Insufficient Contextual Information. The LLM initially classified texts as highly technical based on keywords without understanding the pharmaceutical industry's inherently high technicality. Remedy: The prompt was updated to explicitly include the technical nature of the industry context. (Related: Contextualism)
Unexpected Output 2: Ontological Error in Classification Type. A binary (high/low) scale proved insufficient to capture the nuanced technicality of press releases, leading to "borderline cases" and unstable classifications (Sorites Paradox). Remedy: The classification type was changed from a dichotomous scale to a 5-point multi-category scale (1, 2, 4, or 5). (Related: Sorites Paradox, Borderline cases)
Unexpected Output 3: Inappropriate Criterion. A human-defined criterion regarding the "spatial distribution of technical concepts" was found to be ignored by the LLM, and upon reflection, deemed preposterous by the human rater. Remedy: This criterion was removed from the prompt. (Related: Fallibilism)
Unexpected Output 4: Omission of a Salient Criterion. The LLM's reasoning for classifying a press release as non-technical highlighted that the human rater had implicitly relied on the absence of "mechanism of action" explanations without explicitly codifying it. The LLM effectively acted as a "cognitive mirror." Remedy: The prompt was updated to include explanations of the "mechanism of action" as a key criterion for technicality. (Related: "Cognitive mirror", Tacit knowledge, Vocabularies)
Unexpected Output 5: Ingestion of Irrelevant Text. The LLM gave high ratings based on boilerplate end material commonly found in press releases, which corporate communication scholars typically remove as irrelevant. Remedy: A pre-processing step was added to remove boilerplate text from press releases before inputting them to the LLM. (Related: Stopwords and stoplists)

Calculate Your Potential ROI

Estimate the efficiency gains and cost savings your enterprise could achieve by integrating our Complementary Learning Approach for text classification.

Your Industry

Number of Employees (involved in text classification)

Average Hours/Week per Employee (on classification)

Average Hourly Rate ($)

Annual Cost Savings $0

Annual Hours Reclaimed 0

Your Implementation Roadmap

A typical rollout involves strategic phases to ensure seamless integration and maximum impact within your organization.

Phase 01: Strategic Alignment & Pilot

Identify critical text classification workflows, define objectives, and run a pilot program using CLA on a subset of your data to demonstrate initial ROI and gather feedback.

Phase 02: Prompt Engineering & Refinement

Collaborate with subject matter experts to develop and iteratively refine LLM prompts, ensuring high precision and consistency for your specific classification tasks.

Phase 03: Scaled Deployment & Integration

Integrate the refined LLM-powered classification into your existing systems, automating large-scale text analysis and freeing up human resources for higher-value tasks.

Phase 04: Continuous Learning & Optimization

Establish feedback loops for ongoing monitoring, prompt updates, and performance optimization, ensuring the system adapts to evolving needs and maintains peak accuracy.

Ready to Transform Your Text Classification?

Our Complementary Learning Approach offers a unique blend of human expertise and AI efficiency. Let's discuss how it can be tailored to your enterprise needs.

Book a Free Consultation Now

Enterprise AI Analysis

Complementary Learning Approach for Text Classification using Large Language Models

Executive Impact Snapshot

Deep Analysis & Enterprise Applications

Enhanced Reliability & Human-Machine Collaboration

Advancing Epistemology, Socio-Technical Systems, & Meaning-Making

Leveraging LLM Strengths and Mitigating Weaknesses

Enterprise Process Flow

Comparison to Other Text Classification Methodologies

Resolving Discrepancies in Pharmaceutical Text Classification with CLA

Calculate Your Potential ROI

Your Implementation Roadmap

Phase 01: Strategic Alignment & Pilot

Phase 02: Prompt Engineering & Refinement

Phase 03: Scaled Deployment & Integration

Phase 04: Continuous Learning & Optimization

Ready to Transform Your Text Classification?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai