Software Engineering Research

A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Code Smell Detection

This research provides a comprehensive evaluation of Parameter-Efficient Fine-Tuning (PEFT) methods for code smell detection using Large Language Models (LLMs) and Small Language Models (SLMs). We construct a high-quality, source-code-centric benchmark and systematically evaluate four PEFT methods across various LMs, comparing them against traditional heuristics, DL-based approaches, and In-Context Learning (ICL) with general-purpose LLMs. Our findings demonstrate that PEFT methods match or exceed full fine-tuning performance while significantly reducing peak GPU memory usage, outperforming all baselines. We offer actionable insights into PEFT method selection based on model, data, and computational resources.

Schedule Your AI Strategy Session

Executive Impact & Key Findings

Our study reveals significant advancements in automated code smell detection, offering tangible benefits for software development teams.

0 Performance Improvement (MCC)

Substantial GPU Memory Reduction

0 Code Smell Types Covered

0 Manual Review Hours

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Parameter-Efficient Fine-Tuning (PEFT)

Code Smell Detection

Large Language Models (LLMs)

PEFT Methods vs. Full Fine-tuning

Our analysis reveals that all four PEFT methods (prompt tuning, prefix tuning, LoRA, and (IA)³) achieve better or at least comparable effectiveness than full fine-tuning on most small models for code smell detection. This is achieved while updating far fewer parameters, significantly reducing computational overhead and peak GPU memory usage, making PEFT a highly efficient approach for code smell detection. For LLMs, (IA)³ consistently outperforms other PEFT techniques.

Dataset Construction Pipeline

Java Repository Selection

→

Potential Code Smell Detection

→

Data Preparation

→

Deduplication & Token Limit

→

Two-Stage Manual Review

→

Balanced Dataset Split

49.96% Highest MCC achieved by StarCoderBase-3B with (IA)³ for Complex Conditional detection.

Effectiveness of Different PEFT Methods

For SLMs, both (IA)³ and prefix tuning achieve strong performance, while for LLMs, (IA)³ consistently delivers the best results across all four types of code smells. The optimal PEFT method for SLMs depends on the specific model. LoRA, while parameter-intensive, sometimes shows instability and poorer effectiveness compared to other PEFT methods.

Challenges in Code Smell Detection

Traditional heuristics-based tools suffer from high sensitivity to threshold selection and limited semantic understanding. ML/DL models show unsatisfactory performance. Large Language Models (LLMs) offer promise but are impeded by prohibitive full fine-tuning costs and lack of LM-ready benchmarks. Existing datasets primarily use software metrics or contain noisy, unverified labels.

PEFT vs. State-of-the-Art Baselines

Method Category	Key Advantages	Limitations
PEFT-tuned LMs	High Accuracy (MCC improvements 0.33%-13.69%) Reduced GPU Memory Usage Effective for Method- & Class-level smells Adaptable to low-resource scenarios	Requires initial setup and tuning Performance can be sensitive to hyper-parameters
Heuristics-based Detectors	Rule-based, interpretable Fast for static analysis	High sensitivity to thresholds Limited semantic understanding Prone to false positives/negatives (Oracle Problem)
DL-based Approaches	Learns patterns from data Less reliance on manual rules	Unsatisfactory performance compared to LMs Struggles with deeper semantic relationships Requires high-quality, labeled datasets
LLMs with ICL (Zero/Few-shot)	No model training or parameter updates Flexible for rapid prototyping Competitive for certain smell types (CC, DC)	Lags significantly behind PEFT for CM & FE (deeper semantics) Performance not consistently improved with more examples Prompt sensitivity

LLMs vs. SLMs Performance

For CC and CM detection, SLMs and LLMs show comparable performance. Notably, GraphCodeBERT (SLM) significantly outperforms all models, including LLMs, for Data Class detection. However, for Feature Envy, LLMs fine-tuned with PEFT methods substantially outperform SLMs due to the complex semantic nature benefiting from richer contextual representations of larger models.

Impact of Low-Resource Scenarios

Challenge: When training data is limited (e.g., 50 samples), the effectiveness of PEFT methods drops. However, performance significantly improves as training samples increase to 250 or 500, with several PEFT techniques outperforming full fine-tuning.

Solution: LoRA tends to be the most effective PEFT method in scarce data scenarios. Starting fine-tuning experiments with a smaller dataset is an effective strategy to optimize resource usage, evaluate models quickly, and identify suitable PEFT methods for specific tasks, ultimately saving computational effort.

$13.69% Maximum MCC improvement over baselines achieved by PEFT-tuned LMs.

Estimate Your AI ROI for Code Quality

Discover the potential savings and efficiency gains your organization could achieve by implementing AI-driven code smell detection with PEFT.

Your Industry

Number of Software Engineers: 100

Avg. Weekly Hours on Code Review/Refactoring: 5 hrs

Avg. Hourly Cost per Engineer ($): $100

Potential Annual Savings $0

Annual Hours Reclaimed 0

Calculate Your Specific ROI

Your Implementation Roadmap

A phased approach to integrate PEFT-tuned LMs into your development workflow for enhanced code quality.

Phase 1: Initial Assessment & Setup

Evaluate existing code quality processes, identify critical code smells, and set up the PEFT environment. This phase involves data collection, model selection (SLM/LLM), and initial training on a smaller dataset for rapid prototyping.

Phase 2: PEFT Fine-Tuning & Optimization

Apply selected PEFT methods (e.g., (IA)³ for LLMs, Prefix Tuning for SLMs) using curated datasets. Optimize hyper-parameters based on code smell type, model, and available resources. Benchmark against baselines.

Phase 3: Integration & Real-time Deployment

Integrate PEFT-tuned LMs into CI/CD pipelines for real-time code smell detection. Provide immediate feedback to developers, enhancing code quality throughout the development lifecycle.

Phase 4: Monitoring & Continuous Improvement

Monitor model performance, identify new code smell patterns, and continuously refine PEFT strategies. Explore advanced techniques like Retrieval-Augmented Generation (RAG) for further accuracy enhancements and scalability.

Get Started on Your Roadmap

Ready to Transform Your Code Quality?

Schedule a personalized strategy session with our AI experts to explore how PEFT-tuned LMs can benefit your organization.

Schedule Your Strategy Session

Software Engineering Research

A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Code Smell Detection

Executive Impact & Key Findings

Deep Analysis & Enterprise Applications

PEFT Methods vs. Full Fine-tuning

Dataset Construction Pipeline

Effectiveness of Different PEFT Methods

Challenges in Code Smell Detection

PEFT vs. State-of-the-Art Baselines

LLMs vs. SLMs Performance

Impact of Low-Resource Scenarios

Estimate Your AI ROI for Code Quality

Your Implementation Roadmap

Phase 1: Initial Assessment & Setup

Phase 2: PEFT Fine-Tuning & Optimization

Phase 3: Integration & Real-time Deployment

Phase 4: Monitoring & Continuous Improvement

Ready to Transform Your Code Quality?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai