CiteFusion: an ensemble framework for citation intent classification harnessing dual-model binary couples and SHAP analyses

Unlocking Deeper Insights into Scholarly Communication with AI

Understanding the motivations underlying scholarly citations is essential to evaluate research impact and promote transparent scholarly communication. This study introduces CiteFusion, an ensemble framework designed to address the multi-class Citation Intent Classification task on two benchmark datasets: SciCite and ACL-ARC. The framework employs a one-vs-all decomposition of the multi-class task into class-specific binary sub-tasks, leveraging complementary pairs of SciBERT and XLNet models, independently tuned, for each citation intent. The outputs of these base models are aggregated through a feedforward neural network meta-classifier to reconstruct the original classification task. To enhance interpretability, SHAP (SHapley Additive exPlanations) is employed to analyze token-level contributions, and interactions among base models, providing transparency into the classification dynamics of CiteFusion, and insights about the kind of misclassifications of the ensemble. In addition, this work investigates the semantic role of structural context by incorporating section titles, as framing devices, into input sentences, assessing their positive impact on classification accuracy. CiteFusion ultimately demonstrates robust performance in imbalanced and data-scarce scenarios: experimental results show that CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite, and 76.24% on ACL-ARC. Furthermore, to ensure interoperability and reusability, citation intents from both datasets schemas are mapped to Citation Typing Ontology (CiTO) object properties, highlighting some overlaps. Finally, we describe and release a web-based application that classifies citation intents leveraging the CiteFusion models developed on SciCite.

Schedule Your Strategy Session

Driving Impact: Key Performance Indicators

Our CiteFusion framework significantly enhances Citation Intent Classification, setting new benchmarks in accuracy and interpretability across diverse datasets.

0 SciCite Macro-F1 (SOTA)

0 ACL-ARC Macro-F1 (SOTA)

0 Overall Accuracy (WS)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Achieving New SOTA Benchmarks

CiteFusion consistently outperforms previous state-of-the-art models on key datasets for Citation Intent Classification, demonstrating superior performance in complex, imbalanced data scenarios.

89.60% SciCite Macro-F1 Score

Impact of Structural Context: WS vs WoS Settings

Incorporating section titles as framing devices significantly enhances model performance and interpretability, particularly for domain-specific models like SciBERT.

Feature	With Section Titles (WS)	Without Section Titles (WoS)
SciCite Macro-F1 (FFNN)	89.60%	88.22%
ACL-ARC Macro-F1 (FFNN)	76.24%	71.46%
SciBERT Token Focus	Specialized, Intent-Specific	General, Less Discriminative
XLNet Token Focus	Broader Linguistic Cues	Broader Linguistic Cues
Interpretability (SHAP)	Enhanced Alignment	Ambiguous Attribution

CiteFusion Ensemble Framework

The CiteFusion framework leverages a multi-stage approach to robustly classify citation intents.

Enterprise Process Flow

OVA Decomposition

→

Level-0 Binary Classification (SciBERT & XLNet)

→

Positive Probability Concatenation

→

Level-1 Multi-Class Meta-Classification

Achieving New SOTA Benchmarks

CiteFusion consistently outperforms previous state-of-the-art models on key datasets for Citation Intent Classification, demonstrating superior performance in complex, imbalanced data scenarios.

89.60% SciCite Macro-F1 Score

Understanding Misclassifications: Method-to-Background

Our SHAP analysis reveals that common misclassifications, such as a 'Method' citation being predicted as 'Background', often occur when Method-tuned models fail to provide strong positive evidence. This suggests ambiguity in the citation context where methodological descriptions might be interpreted as general background information, especially in sections like 'Introduction' or 'Discussion'.

Case Study: Ambiguous Citation Intent

Description: Example from Table 12, ID 1. A citation in the 'Introduction' section, originally labeled 'Method', was predicted as 'Background'. This highlights ambiguity where methodological descriptions might be interpreted as providing general context.

Key Takeaway: Misclassifications often stem from ambiguous textual cues and lack of strong class-specific signals, rather than simple model failure. Section titles can help resolve some of this ambiguity.

Quantify Your AI Impact

Estimate the potential efficiency gains and cost savings your enterprise could achieve by implementing advanced AI solutions like CiteFusion.

ROI Projection for Citation Intent Classification

Your Industry

Number of Employees

Hours Spent on Manual Classification per Week (per Employee)

Average Hourly Rate of Relevant Employees

Annual Savings $0

Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A phased approach to integrate CiteFusion into your enterprise workflow, ensuring seamless transition and maximum impact.

Phase 1: Discovery & Customization

Comprehensive analysis of existing citation workflows, data sources, and specific classification needs. Customization of CiteFusion models to align with your proprietary datasets and semantic requirements.

Phase 2: Integration & Pilot Deployment

Seamless integration of CiteFusion API into your digital library, research assessment platform, or document management system. Pilot deployment with a selected team to gather feedback and optimize performance.

Phase 3: Full-Scale Rollout & Continuous Optimization

Company-wide deployment of the CiteFusion framework. Ongoing monitoring, performance tuning, and regular updates to leverage the latest advancements in AI and NLP, ensuring sustained high accuracy and interpretability.

Discuss Your Implementation Timeline

Ready to Transform Your Scholarly Data Analysis?

Connect with our AI specialists to explore how CiteFusion can enhance your research impact assessment and academic communication.

Book a Free Consultation

CiteFusion: an ensemble framework for citation intent classification harnessing dual-model binary couples and SHAP analyses

Unlocking Deeper Insights into Scholarly Communication with AI

Driving Impact: Key Performance Indicators

Deep Analysis & Enterprise Applications

Achieving New SOTA Benchmarks

Impact of Structural Context: WS vs WoS Settings

CiteFusion Ensemble Framework

Enterprise Process Flow

Achieving New SOTA Benchmarks

Understanding Misclassifications: Method-to-Background

Case Study: Ambiguous Citation Intent

Quantify Your AI Impact

ROI Projection for Citation Intent Classification

Your AI Implementation Roadmap

Phase 1: Discovery & Customization

Phase 2: Integration & Pilot Deployment

Phase 3: Full-Scale Rollout & Continuous Optimization

Ready to Transform Your Scholarly Data Analysis?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai