Enterprise AI Analysis
LLM-based Vulnerable Code Augmentation: Generate or Refactor?
This research explores Large Language Model (LLM)-based data augmentation techniques to address the severe imbalance in vulnerable code-bases, which limits the effectiveness of Deep Learning classifiers. It compares controlled generation of new vulnerable samples against semantics-preserving refactoring of existing ones, finding that a hybrid strategy significantly boosts vulnerability classification performance.
Executive Impact & Key Findings
LLM-based code augmentation offers a promising avenue for improving vulnerability detection. A hybrid approach, combining both generation and refactoring, yields the most significant performance gains for Deep Learning classifiers, making it a critical strategy for enhancing software security.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Explore the two distinct LLM-based augmentation strategies: controlled generation of new vulnerable code and semantics-preserving refactoring of existing functions.
LLM-based Code Augmentation Process
Generation-based Data Augmentation
This strategy involves synthesizing entirely new vulnerable functions using a few-shot prompting scheme. The LLM (Qwen2.5-Coder-32B) is provided with examples from the training set and instructed to generate new, independent functions per vulnerability type.
Strict system and user messages ensure the model acts as an expert, follows project-style conventions, generates realistic logic with vulnerabilities, and adheres to output constraints (e.g., 20-150 non-empty lines, no comments). Generated samples undergo syntax parsing and label quality verification (using GPT-5.1).
Refactoring-based Data Augmentation
Here, augmented samples are produced by refactoring existing vulnerable functions from the dataset. For each vulnerability type and function, the LLM generates 'n' refactored variants, applying a selection of 18 common refactoring techniques (e.g., Renaming, Dead Code Insertion, Logic-preserving rewrites).
The prompting emphasizes preserving original semantics, parameter lists, return types, and the vulnerability itself, while strictly forbidding dangerous operations. Each refactored function must apply at least two distinct transformations. Quality checks focus on syntax and refactoring integrity.
Details on the dataset, models, and evaluation metrics used to assess the effectiveness of LLM-based code augmentation.
Dataset and Models
The study utilizes the SVEN Dataset [11], a carefully curated collection of security-related commits and critical CWE types from 2023, split into 80% training and 20% validation. This dataset is known for its quality and focus on critical vulnerabilities.
For augmented data generation, Qwen2.5-Coder-32B was selected due to its high rank in code LLM benchmarks for C/C++ and Python. CodeBERT served as the vulnerability classifier, chosen for its established code representation capabilities and lightness for fine-tuning.
The technical setup included 3 A100-SXM4 GPUs and 16GB RAM for efficient processing.
| CWE-89 | CWE-125 | CWE-78 | CWE-476 | CWE-416 | CWE-22 | CWE-787 | CWE-79 | CWE-190 |
|---|---|---|---|---|---|---|---|---|
| 141 | 107 | 69 | 60 | 45 | 42 | 41 | 39 | 32 |
Key findings from the assessment of augmentation approaches and their impact on vulnerability classifier performance.
| Approach | N° of samples | % of augmentation | Avg. time per sample | Syntax quality | Label quality | Refactor quality |
|---|---|---|---|---|---|---|
| Generation | 3348 | 581% | 13.38s | 98.5% | 0% | N/A |
| Refactoring | 1224 | 213% | 59.08s | 79.7% | N/A | 100% |
| Training data | Original data | Generation augmented | Refactoring augmented | Both augmentations |
|---|---|---|---|---|
| Macro F1 | 0.62 | 0.64 | 0.60 | 0.67 |
RQ1: Effectiveness & Quality of LLM-based Augmentation
LLM-based augmentation is effective in enriching vulnerable code-bases: Generation increased dataset size by 581% (3348 samples) and Refactoring by 213% (1224 samples).
Syntactic quality was high (98.5% for Generation, 79.7% for Refactoring). Refactoring quality was 'perfect' (100%). However, label quality for generated samples was surprisingly 0% when verified by GPT-5.1, a finding also observed in the original dataset for many CWEs, necessitating further investigation.
RQ2: Boosting DL Performance
Generation-based augmentation alone improved overall Macro F1 from 0.62 to 0.64 (+0.02). Refactoring-based augmentation alone did not bring overall improvement (0.60), though it helped some minority CWEs (e.g., CWE-22 by 18%).
The most effective strategy was a hybrid approach combining both augmentations, leading to an overall Macro F1 score of 0.67 (+0.05), and boosts across all CWEs (up to 18%).
Calculate Your Potential ROI
Estimate the impact of advanced AI integration on your operational efficiency and cost savings.
Your AI Implementation Roadmap
A structured approach to integrating cutting-edge AI for maximum impact and minimal disruption.
Phase 01: Discovery & Strategy
Comprehensive assessment of your current infrastructure, identification of key pain points, and definition of strategic AI objectives tailored to your business goals. This phase culminates in a detailed proposal and a clear roadmap.
Phase 02: Pilot & Proof-of-Concept
Deployment of a small-scale AI solution in a controlled environment to validate the technology, demonstrate tangible value, and gather initial feedback. This iterative process ensures alignment with expectations.
Phase 03: Full-Scale Integration
Seamless integration of the AI solution across relevant departments, including data migration, system customization, and comprehensive training for your teams. We ensure minimal disruption and maximum adoption.
Phase 04: Optimization & Scaling
Continuous monitoring, performance tuning, and iterative enhancements based on real-world usage and evolving business needs. We support scaling the solution to new areas for sustained competitive advantage.
Ready to Transform Your Enterprise with AI?
Unlock the full potential of your data and operations. Our experts are ready to guide you through a tailored AI strategy and implementation.