Enterprise AI Analysis
Learning the Style via Mixed SN-Grams: An Evaluation in Authorship Attribution
This study introduces a novel method for authorship attribution based on mixed syntactic n-grams (sn-grams), which combine words, POS tags, and dependency relation tags to model writing style. Experiments on the PAN-CLEF 2012 and CCAT50 datasets demonstrate that mixed sn-grams outperform homogeneous sn-grams, especially the POS-Word category, achieving higher accuracy. The method offers interpretability and efficiency for small to moderate datasets, suitable for real-world applications without requiring specialized hardware.
Executive Impact: Key Metrics
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
| SN-Gram Type | Homogeneous (n=4) | Mixed (n=3) |
|---|---|---|
| DR | 50.00% | 92.85% |
| POS | 67.85% | 82.14% |
| Word | 64.28% | 78.57% |
Mixed sn-grams consistently outperform homogeneous sn-grams in authorship attribution tasks across various categories, demonstrating their enhanced ability to capture nuanced writing styles. The DR-POS combination yielded the best results overall.
Enterprise Process Flow
CCAT50 Corpus Evaluation Insights
The CCAT50 corpus, with its 50 authors and thematic diversity, proved more demanding. Mixed sn-grams of size n=2, particularly the POS-Word combination, achieved the highest accuracy, outperforming homogeneous sn-grams by about 5%. This highlights the effectiveness of combining lexical and grammatical information for robust authorship attribution.
Key Takeaway: Mixed sn-grams, especially POS-Word, enhance attribution accuracy on diverse, smaller datasets.
| Method Type | Features Used | Accuracy |
|---|---|---|
| Deep Learning (Syntactic Only) | Syntax Tree Embeddings | 10.08% |
| Deep Learning (Syntactic + Lexical) | Syntax Tree + Lexical Embeddings | 81.00% |
| Proposed Mixed SN-Grams | Lexical, POS, DR Tags | 74.36% |
While advanced deep learning models combining multiple feature types achieve higher accuracy, the proposed mixed sn-grams approach offers competitive performance based solely on feature engineering. It also excels in interpretability and resource efficiency, making it suitable for scenarios where deep learning models might overfit or require significant computational power.
Predict Your AI ROI
Estimate the potential annual savings and hours reclaimed by integrating advanced AI solutions into your enterprise workflows. Adjust parameters to reflect your organizational context.
Implementation Roadmap
Our structured approach ensures a seamless integration of AI, maximizing your return on investment with minimal disruption.
Phase 1: Discovery & Strategy
Comprehensive analysis of existing workflows and identification of AI integration points.
Phase 2: Pilot & Validation
Deployment of a prototype solution on a limited scale for performance validation.
Phase 3: Full-Scale Deployment
Seamless integration of the validated AI solution across all relevant operations.
Ready to Transform Your Enterprise with AI?
Our experts are here to guide you through every step, from initial strategy to full-scale implementation. Schedule a free consultation to discuss how our solutions can address your unique challenges and drive measurable results.