Skip to main content
Enterprise AI Analysis: Learning the Style via Mixed SN-Grams: An Evaluation in Authorship Attribution

Enterprise AI Analysis

Learning the Style via Mixed SN-Grams: An Evaluation in Authorship Attribution

This study introduces a novel method for authorship attribution based on mixed syntactic n-grams (sn-grams), which combine words, POS tags, and dependency relation tags to model writing style. Experiments on the PAN-CLEF 2012 and CCAT50 datasets demonstrate that mixed sn-grams outperform homogeneous sn-grams, especially the POS-Word category, achieving higher accuracy. The method offers interpretability and efficiency for small to moderate datasets, suitable for real-world applications without requiring specialized hardware.

Executive Impact: Key Metrics

0 Accuracy Gain (PAN 12 Task I)
0 Accuracy Gain (CCAT50, POS-Word)
0 Efficiency for small datasets

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

74.36% Peak Accuracy on CCAT50

Mixed vs. Homogeneous SN-Grams Accuracy (PAN 12)

SN-Gram Type Homogeneous (n=4) Mixed (n=3)
DR 50.00% 92.85%
POS 67.85% 82.14%
Word 64.28% 78.57%

Mixed sn-grams consistently outperform homogeneous sn-grams in authorship attribution tasks across various categories, demonstrating their enhanced ability to capture nuanced writing styles. The DR-POS combination yielded the best results overall.

Enterprise Process Flow

Text Preprocessing
Syntactic Parsing
Syntactic Dependency Representation
Feature Selection
Machine Learning Algorithm
Authorship Attribution

CCAT50 Corpus Evaluation Insights

The CCAT50 corpus, with its 50 authors and thematic diversity, proved more demanding. Mixed sn-grams of size n=2, particularly the POS-Word combination, achieved the highest accuracy, outperforming homogeneous sn-grams by about 5%. This highlights the effectiveness of combining lexical and grammatical information for robust authorship attribution.

Key Takeaway: Mixed sn-grams, especially POS-Word, enhance attribution accuracy on diverse, smaller datasets.

O(n³) Time Complexity for SN-Gram Generation

Comparison with Deep Learning Approaches (CCAT50)

Method Type Features Used Accuracy
Deep Learning (Syntactic Only) Syntax Tree Embeddings 10.08%
Deep Learning (Syntactic + Lexical) Syntax Tree + Lexical Embeddings 81.00%
Proposed Mixed SN-Grams Lexical, POS, DR Tags 74.36%

While advanced deep learning models combining multiple feature types achieve higher accuracy, the proposed mixed sn-grams approach offers competitive performance based solely on feature engineering. It also excels in interpretability and resource efficiency, making it suitable for scenarios where deep learning models might overfit or require significant computational power.

Predict Your AI ROI

Estimate the potential annual savings and hours reclaimed by integrating advanced AI solutions into your enterprise workflows. Adjust parameters to reflect your organizational context.

Annual Savings $0
Hours Reclaimed 0

Implementation Roadmap

Our structured approach ensures a seamless integration of AI, maximizing your return on investment with minimal disruption.

Phase 1: Discovery & Strategy

Comprehensive analysis of existing workflows and identification of AI integration points.

Phase 2: Pilot & Validation

Deployment of a prototype solution on a limited scale for performance validation.

Phase 3: Full-Scale Deployment

Seamless integration of the validated AI solution across all relevant operations.

Ready to Transform Your Enterprise with AI?

Our experts are here to guide you through every step, from initial strategy to full-scale implementation. Schedule a free consultation to discuss how our solutions can address your unique challenges and drive measurable results.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking