Software Engineering Research
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Code Smell Detection
This research provides a comprehensive evaluation of Parameter-Efficient Fine-Tuning (PEFT) methods for code smell detection using Large Language Models (LLMs) and Small Language Models (SLMs). We construct a high-quality, source-code-centric benchmark and systematically evaluate four PEFT methods across various LMs, comparing them against traditional heuristics, DL-based approaches, and In-Context Learning (ICL) with general-purpose LLMs. Our findings demonstrate that PEFT methods match or exceed full fine-tuning performance while significantly reducing peak GPU memory usage, outperforming all baselines. We offer actionable insights into PEFT method selection based on model, data, and computational resources.
Executive Impact & Key Findings
Our study reveals significant advancements in automated code smell detection, offering tangible benefits for software development teams.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
PEFT Methods vs. Full Fine-tuning
Our analysis reveals that all four PEFT methods (prompt tuning, prefix tuning, LoRA, and (IA)³) achieve better or at least comparable effectiveness than full fine-tuning on most small models for code smell detection. This is achieved while updating far fewer parameters, significantly reducing computational overhead and peak GPU memory usage, making PEFT a highly efficient approach for code smell detection. For LLMs, (IA)³ consistently outperforms other PEFT techniques.
Dataset Construction Pipeline
Effectiveness of Different PEFT Methods
For SLMs, both (IA)³ and prefix tuning achieve strong performance, while for LLMs, (IA)³ consistently delivers the best results across all four types of code smells. The optimal PEFT method for SLMs depends on the specific model. LoRA, while parameter-intensive, sometimes shows instability and poorer effectiveness compared to other PEFT methods.
Challenges in Code Smell Detection
Traditional heuristics-based tools suffer from high sensitivity to threshold selection and limited semantic understanding. ML/DL models show unsatisfactory performance. Large Language Models (LLMs) offer promise but are impeded by prohibitive full fine-tuning costs and lack of LM-ready benchmarks. Existing datasets primarily use software metrics or contain noisy, unverified labels.
| Method Category | Key Advantages | Limitations |
|---|---|---|
| PEFT-tuned LMs |
|
|
| Heuristics-based Detectors |
|
|
| DL-based Approaches |
|
|
| LLMs with ICL (Zero/Few-shot) |
|
|
LLMs vs. SLMs Performance
For CC and CM detection, SLMs and LLMs show comparable performance. Notably, GraphCodeBERT (SLM) significantly outperforms all models, including LLMs, for Data Class detection. However, for Feature Envy, LLMs fine-tuned with PEFT methods substantially outperform SLMs due to the complex semantic nature benefiting from richer contextual representations of larger models.
Impact of Low-Resource Scenarios
Challenge: When training data is limited (e.g., 50 samples), the effectiveness of PEFT methods drops. However, performance significantly improves as training samples increase to 250 or 500, with several PEFT techniques outperforming full fine-tuning.
Solution: LoRA tends to be the most effective PEFT method in scarce data scenarios. Starting fine-tuning experiments with a smaller dataset is an effective strategy to optimize resource usage, evaluate models quickly, and identify suitable PEFT methods for specific tasks, ultimately saving computational effort.
Estimate Your AI ROI for Code Quality
Discover the potential savings and efficiency gains your organization could achieve by implementing AI-driven code smell detection with PEFT.
Your Implementation Roadmap
A phased approach to integrate PEFT-tuned LMs into your development workflow for enhanced code quality.
Phase 1: Initial Assessment & Setup
Evaluate existing code quality processes, identify critical code smells, and set up the PEFT environment. This phase involves data collection, model selection (SLM/LLM), and initial training on a smaller dataset for rapid prototyping.
Phase 2: PEFT Fine-Tuning & Optimization
Apply selected PEFT methods (e.g., (IA)³ for LLMs, Prefix Tuning for SLMs) using curated datasets. Optimize hyper-parameters based on code smell type, model, and available resources. Benchmark against baselines.
Phase 3: Integration & Real-time Deployment
Integrate PEFT-tuned LMs into CI/CD pipelines for real-time code smell detection. Provide immediate feedback to developers, enhancing code quality throughout the development lifecycle.
Phase 4: Monitoring & Continuous Improvement
Monitor model performance, identify new code smell patterns, and continuously refine PEFT strategies. Explore advanced techniques like Retrieval-Augmented Generation (RAG) for further accuracy enhancements and scalability.
Ready to Transform Your Code Quality?
Schedule a personalized strategy session with our AI experts to explore how PEFT-tuned LMs can benefit your organization.