Enterprise AI Analysis
Detecting GenAI assistance in programming assessments with over-uniqueness and sample matching
In engineering education, Generative Artificial Intelligence (GenAI) might be misused to complete assessments with limited understanding. On courses that allow the use of GenAI, students might also forget to acknowledge its assistance. There is a need to identify such assistance. We present an automated detector with over-uniqueness and sample matching. GenAI-assisted submissions are identified based on their uniqueness and their similarity to a GenAI sample. Unique to our GenAI detector, it requires no training data and/or dedicated rules for each programming/scripting language. Further, the method can be integrated into any existing similarity detectors to identify plagiarism. The detector covers five similarity measurements, two similarity modes, and eight programming/scripting languages. Our evaluation of four data sets with thousands of submissions shows that our detector is effective (71% MAP). However, many factors can affect its effectiveness, including submission length and student attempts to align the code. Combining both mechanisms does not result in higher effectiveness, yet it takes longer to process.
Executive Impact at a Glance
This research introduces a novel, practical GenAI detection method, designed to integrate seamlessly into existing academic integrity frameworks while offering robust performance across diverse programming contexts.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The Rise of GenAI in Academia and the Need for Detection
Generative Artificial Intelligence (GenAI) is rapidly transforming academia, offering unprecedented ways to access information and complete tasks. While GenAI can be a powerful tool for learning—aiding in understanding program flow, error messages, and providing code feedback—it also introduces significant challenges, particularly in programming assessments. The ease with which GenAI can generate solutions means students might complete assessments without genuine understanding or forget to acknowledge assistance, leading to academic integrity issues.
Current GenAI detectors often fall short for programming tasks, either requiring extensive training data, relying on language-specific syntax rules, or lacking integration with existing plagiarism detection systems. This creates a practical barrier for instructors needing to identify unacknowledged GenAI assistance efficiently. This research addresses this gap by presenting a novel, automated GenAI detector that is both practical and effective.
Our Novel GenAI Detection Methodology
Our detector identifies GenAI-assisted submissions through two primary mechanisms: over-uniqueness and sample matching. GenAI-generated code often exhibits a unique style, distinct from typical student submissions, and produces similar solutions for common prompts. By analyzing these characteristics, we can effectively flag potentially assisted code. A key advantage is its independence from training data and language-specific rules, making it highly adaptable across various programming contexts.
The detector integrates seamlessly with existing code similarity analysis tools like SSTRANGE, employing five robust similarity measurements—Cosine, Jaccard, MinHash, Super-Bit, and RKRGST—each with standard and sensitive modes. This comprehensive approach ensures high detection accuracy while maintaining efficiency.
Enterprise Process Flow
The process begins with preprocessing submissions to normalize code and generalize identifiers. Next, similarity scores are calculated. These scores then feed into our over-uniqueness and sample matching algorithms. Finally, a detailed report highlights suspicious submissions, enabling educators to make informed decisions about academic integrity.
Robust Evaluation & Performance Metrics
Our detector was evaluated across four diverse Python and Java datasets, encompassing thousands of student submissions and GenAI-assisted examples. The Mean Average Precision (MAP) metric was used to assess effectiveness, focusing on the ranked position of identified GenAI-assisted submissions. Processing time measured efficiency.
The over-uniqueness mechanism showed strong performance, especially on assessments expecting longer solutions (Exam dataset: 82% average MAP). The sensitive mode, which accounts for identifier names and constants, often improved detection, particularly for distinct GenAI styles. However, its effectiveness was significantly reduced when GenAI-generated code was explicitly aligned to student styles (Weekly Align dataset: 26% average MAP).
The sample matching mechanism proved highly effective, achieving an 80% average MAP on the Weekly dataset. Its strength lies in identifying code segments similar to known GenAI samples, with sensitive mode often yielding statistically significant improvements. This mechanism is particularly strong when students are less fluent in disguising GenAI output.
Combining both mechanisms yielded an overall average of 72% MAP. While sometimes outperforming individual mechanisms, the combination did not consistently result in higher effectiveness than sample matching alone, suggesting that not all GenAI-assisted submissions are simultaneously unique and similar to a GenAI sample. Efficiency analysis consistently showed MinHash and Super-Bit as the fastest measurements due to their locality-sensitive hashing and binning mechanisms, while RKRGST was the slowest due to its quadratic complexity.
| Similarity Measurement | Mode | Weekly (%) | Weekly Alt (%) | Weekly Align (%) | Exam (%) |
|---|---|---|---|---|---|
| Cosine | Sensitive | 71 | 80 | 25 | 79 |
| Cosine | Standard | 66 | 73 | 27 | 79 |
| Jaccard | Sensitive | 80 | 88 | 22 | 100 |
| Jaccard | Standard | 76 | 82 | 25 | 95 |
| MinHash | Sensitive | 71 | 83 | 25 | 83 |
| MinHash | Standard | 66 | 70 | 26 | 79 |
| RKRGST | Sensitive | 63 | 74 | 25 | 80 |
| RKRGST | Standard | 56 | 66 | 27 | 71 |
Case Study: Exam Data Set Highlights
The Exam data set presented unique characteristics, featuring longer submissions due to four tasks per exam and a strict "no discussion" policy. In this context, our approach achieved its highest effectiveness, with an 82% average MAP. Notably, Jaccard in sensitive mode reached 100% MAP, demonstrating exceptional precision when GenAI-generated code had minimal external influence and clear stylistic differences. This outcome underscores the detector's power in controlled, high-stakes assessment environments where GenAI assistance is less disguised and solutions are more complex.
Limitations & Future Research Directions
While effective, our current GenAI detector has certain limitations. It was primarily evaluated on introductory programming courses and Python/Java submissions, suggesting a need for replication across diverse programming languages, course levels, and institutional settings. The chosen metrics (MAP, processing time) provide strong indicators, but exploring precision, recall, and ROC curves could offer a more nuanced understanding of performance trade-offs.
Future work will involve testing additional similarity measurements like Winnowing or local alignment, and comparing our detector against existing text-based and programming-specific GenAI detectors in controlled environments. Investigating the impact of submission length, content variation, assessment design, and the proportion of aligned content on detection performance is crucial. We also plan to develop a script to leverage results from popular plagiarism detectors (MOSS, JPlag, Sherlock) and explore additional mechanisms for identifying heavily disguised or aligned GenAI-assisted submissions, potentially by monitoring the creation process for anomalous behaviors. Enhancements to token weighting and the combination of over-uniqueness and sample matching mechanisms are also planned to maximize effectiveness.
Ultimately, this research serves as a stepping stone towards more robust and adaptive tools for maintaining academic integrity in the evolving landscape of AI-assisted learning.
Calculate Your Potential ROI with AI Solutions
See how implementing advanced AI detection and integrity solutions can save your institution significant resources annually.
Your AI Implementation Roadmap
A phased approach to integrating advanced AI solutions for academic integrity, ensuring a smooth transition and maximum impact.
Phase 1: Discovery & Strategy
Comprehensive assessment of your current academic integrity challenges, existing systems, and institutional goals. Define key performance indicators and tailor an AI strategy to your unique needs.
Phase 2: Solution Design & Customization
Design and customize the GenAI detection framework, integrating it with your learning management systems and existing plagiarism detectors. Develop custom rules and thresholds based on your assessment types.
Phase 3: Pilot Implementation & Testing
Roll out the solution in a pilot program with selected courses. Gather feedback, conduct rigorous testing, and fine-tune the system for optimal performance and user experience.
Phase 4: Full-Scale Deployment & Training
Deploy the AI integrity solution across your institution. Provide comprehensive training for instructors and administrators on system usage, report interpretation, and best practices.
Phase 5: Continuous Optimization & Support
Ongoing monitoring, performance analysis, and iterative improvements. Benefit from continuous updates, dedicated support, and adaptation to new GenAI models and academic policies.
Ready to Implement AI in Your Enterprise?
Our team specializes in leveraging advanced AI solutions to enhance academic integrity and operational efficiency. Book a free consultation to see how our expertise can benefit your institution.