LLMs for Cybersecurity

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

LLM4CodeRE addresses the challenge of malware reverse engineering using domain-adaptive large language models (LLMs). It introduces the first malware-aware causal language modeling (CLM) pretraining framework and a bidirectional reverse engineering framework supporting both assembly-to-source decompilation and source-to-assembly translation within a unified model. The framework utilizes Multi-Adapters and Seq2Seq Unified prefixing for task adaptation. Experimental results demonstrate superior performance over existing tools in semantic similarity, structural fidelity, and re-executability, crucial for real-world malware analysis.

Schedule Your AI Strategy Session

Quantifiable Impact & Key Metrics

LLM4CodeRE sets new benchmarks in code reverse engineering, delivering superior performance across critical dimensions.

0 Re-executability Rate (Asm→Src)

0 Semantic Similarity (Asm→Src)

0 Edit Similarity (Asm→Src)

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM4CodeRE formalizes code transformation as P(y|x,t), requiring semantic, structural, and re-executability evaluation. The framework uses malware-aware CLM pretraining, task-specific adapters, and LoRA updates for parameter-efficient fine-tuning, supporting both assembly-to-source and source-to-assembly transformations.

86% Achieved Re-executability for Asm→Src

LLM4CodeRE (S2S) demonstrates superior functional correctness by generating code that recompiles and executes successfully, significantly outperforming other models.

LLM4CodeRE System Pipeline

Input Data (Malware Binaries)

→

Data Preparation (Disassembly, Tokenization)

→

Domain Pretraining (Malware-aware CLM)

→

Task Adaptation (LoRA, Adapters/Prefixes)

→

Inference (Unified LLM4PE-Mal)

→

Output (Asm2Src / Src2Asm)

The framework utilizes malware-aware CLM pretraining on a curated corpus of real-world malware samples to learn domain-specific knowledge. It employs LoRA for efficient fine-tuning and hierarchical adaptation, combining task-specific adapters and LoRA low-rank deltas.

Reduced Perplexity Across Datasets

Domain adaptation consistently reduces perplexity across all datasets and backbone models, indicating improved performance.

Feature	Multi-Adapter (MA)	Seq2Seq Unified (S2S)
Mechanism	Modular task heads attached to shared backbone.	Decoder-only LLM augmented with task-specific prefix tokens.
Advantages	Reduces fine-tuning cost, flexible task specialization, avoids catastrophic forgetting.	Unifies tasks under single autoregressive framework, simpler architecture for some cases.
Performance (Asm→Src Semantic)	0.85 (Highest)	0.81
Performance (Asm→Src Re-executability)	53%	86% (Highest)

LLM4CodeRE significantly outperforms baselines in both Asm→Src and Src→Asm tasks regarding semantic and edit similarity. Crucially, the Seq2Seq Unified variant achieves the highest re-executability rate (86%) for Asm→Src, demonstrating functional correctness.

Model	Semantic Similarity	Edit Similarity
LLM4CodeRE (MA)	0.85	0.63
LLM4CodeRE (S2S)	0.81	0.61
DeepSeek (MA)	0.42	0.45
LLM4Decompile	0.78	0.80

Model	Semantic Similarity	Edit Similarity
LLM4CodeRE (MA)	0.64	0.27
LLM4CodeRE (S2S)	0.48	0.26
DeepSeek (MA)	0.47	0.15
LLM4Decompile	0.42	0.11

Current limitations include focus on Windows PE malware, potential label noise from automated decompilation, and limited behavioral coverage in sandboxed environments. Future work aims for cross-platform generalization (Android malware, ELF) and symbolic execution-based evaluation.

Future Directions: Expanding Scope

Future research will extend LLM4CodeRE to Android malware analysis, supporting representations like APKs, Dalvik bytecode, and smali code. It will also model Android framework APIs and permission-based behaviors, aiming for cross-platform generalization and symbolic execution-based evaluation.

Emphasis: Expanding to Android malware and cross-platform generalization is a key next step.

Calculate Your Potential ROI

Estimate the significant efficiency gains and cost savings your enterprise could achieve with domain-adaptive AI solutions.

AI Efficiency & Cost Savings Estimator

Your Industry

Number of Employees in Relevant Dept.

Avg. Hours/Week on Manual Tasks (per employee)

Avg. Hourly Cost (incl. overhead)

Annual Cost Savings $0

Annual Hours Reclaimed 0

Quantify Your AI Impact

Your AI Implementation Roadmap

A typical enterprise deployment journey, tailored for maximum impact and smooth integration.

Phase 1: Discovery & Strategy

In-depth analysis of current workflows, identification of high-impact AI opportunities, and development of a tailored implementation strategy.

Phase 2: Data & Model Adaptation

Preparation of enterprise-specific data, fine-tuning of LLMs for domain-specific tasks, and initial model validation.

Phase 3: Integration & Pilot

Seamless integration of AI solutions into existing systems, pilot deployment with a subset of users, and continuous feedback collection.

Phase 4: Scaling & Optimization

Full-scale deployment across the organization, performance monitoring, and iterative optimization for sustained ROI.

Plan Your AI Journey

Ready to Transform Your Enterprise with AI?

Book a personalized consultation with our AI specialists to explore how these insights can drive innovation and efficiency in your organization.

Book Your Free Consultation

LLMs for Cybersecurity

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

Quantifiable Impact & Key Metrics

Deep Analysis & Enterprise Applications

LLM4CodeRE System Pipeline

Future Directions: Expanding Scope

Calculate Your Potential ROI

AI Efficiency & Cost Savings Estimator

Your AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Data & Model Adaptation

Phase 3: Integration & Pilot

Phase 4: Scaling & Optimization

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai