Enterprise AI Analysis

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Authors: Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang

Memory-based self-evolution is a promising paradigm for coding agents, but existing methods often fail to leverage shared infrastructure across diverse real-world problems. This paper introduces Memory Transfer Learning (MTL) using a unified memory pool from heterogeneous domains. Our evaluation across 6 coding benchmarks shows that MTL improves average performance by 3.7%, primarily by transferring meta-knowledge rather than task-specific code. We discover that abstraction dictates transferability, with high-level insights generalizing well and low-level traces often inducing negative transfer due to excessive specificity. The effectiveness of transfer scales with memory pool size and can occur even between different models. This work establishes empirical design principles for expanding memory utilization beyond single-domain silos, leading to more capable coding agents.

Schedule Your Strategy Session

Executive Impact: Key Findings

Our research reveals critical insights for leveraging memory in coding agents, demonstrating significant performance gains and establishing principles for effective knowledge transfer.

0 Avg. Performance Improvement (MTL vs. Zero-shot)

0 Algorithmic Strategy Transfer (Portion of total gains)

0 Insight Memory Transfer Effectiveness (Highest Performing Format)

0 Scales with Memory Pool Size & Domains

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Memory Transfer Learning Outperforms Baselines

Memory Transfer Learning consistently enhances coding agent performance across diverse benchmarks and models, showcasing its broad applicability.

3.7% Average Performance Gain across 6 Benchmarks (GPT-5-mini)

Comparison	Zero-shot Avg Pass@3	MTL (Insight) Avg Pass@3	Improvement
GPT-5-mini	0.523	0.560	+3.7%

Method	#Memories	Avg. Pass@3
Zero-shot	-	0.584
ReasoningBank	97	0.601
AgentKB	5,899	0.613
MTL (Ours)	431	0.630

Insight: MTL achieves the highest average performance with significantly fewer memories than AgentKB, demonstrating superior effectiveness and efficiency.

Meta-Knowledge Drives Transfer Benefits

Memory Transfer Learning primarily benefits agents by providing high-level procedural and behavioral guidance, not task-specific code.

Enterprise Process Flow

Iterative Workflow Discipline (15.0%)

→

Interaction Protocol Adherence (14.4%)

→

Input Validation & Robustness (10.4%)

→

API & Interface Compliance (9.5%)

→

Anti-Pattern Avoidance (8.5%)

→

Environmental Adaptation (8.1%)

→

File and Syntax Management (7.8%)

→

Test Driven Verification (6.4%)

→

Repository Exploration Tactics (5.5%)

→

Algorithmic Strategy Transfer (5.5%)

Insight: Transferred memory primarily provides meta-knowledge (e.g., workflow, guardrails, practices) rather than direct algorithmic code, accounting for the vast majority of performance gains.

Case Study: Zero-shot vs. Memory Transfer Learning with Insight

An agent fails in zero-shot by naively raising an error. With MTL (Insight), it uses meta-knowledge about inline Python here-docs for testing and successfully completes the task.

Zero-shot Scenario:

Model: Zero-shot

Memory: (No Memory)

Reasoning: "I will modify django/db/models/aggregates.py to raise a FieldError when an Aggregate's source expression contains a window (over) clause. This prevents creating an aggregate over a window expression..."

Result: Fail

MTL with Insight Scenario:

Model: Memory Transfer Learning with Insight

Memory Item: Generated from LiveCodeBench: "Title: Create quick self-contained tests using an inline Python here-doc to validate fixes. Description: When making small code fixes, write a minimal tests... Content: Set up a short battery of tests that cover..."

Reasoning: "I will use Memory Item 2 (use an inline Python here-doc for safe, atomic edits and quick verification) to modify django/db/models/sql/-query.py. The goal is to ensure that when resolving aggregates we detect if any referenced annotation either is a subquery or contains a window expression..."

Result: Success

Higher Abstraction Enhances Transferability

Abstract memory formats, particularly 'Insights', generalize better across tasks and avoid negative transfer caused by overly specific details.

Insight Most Effective Memory Format for Transfer

Memory Format	Avg. Performance (GPT-5-mini)
Trajectory	0.534
Workflow	0.538
Summary	0.546
Insight	0.560

Insight: Highly abstract memory representations like 'Insights' consistently yield higher transfer effectiveness compared to lower-abstraction formats.

Insight Type	Avg. Performance
Task-specific Insights	0.523
Task-agnostic Insights	0.534

Insight: Even within the Insight format, task-agnostic memories consistently outperform task-specific ones, highlighting abstraction as a key factor in transfer performance.

Case Study: Trajectory vs. Insight Transfer

Trajectory memory can lead to brittle implementation anchoring and execution errors due to task-specific commands. Insight memory provides high-level guidance, leading to successful adaptation.

Memory Transfer Learning with Trajectory:

Model: Memory Transfer Learning with Trajectory

Memory Item: Generated from MLGym-Bench: Task: "You are given the train, validation and test sets of Housing Price Prediction dataset and a baseline..." Trajectory: "Agent: ls -la data && nl -ba evaluate.py | sed -n '1,240p' System: <returncode>0</returncode><output>total 464 drwxr-xr-x 2 root root 4096 Dec 25 18:50 .drwxrwxrw- 1 agent agent 4096 Jan 14 06:02 ... Agent: cat <<'PY' > train_predict.py OneHot Encoder(..., sparse=False)) System: PY ('onehot', TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'sparse'"

Reasoning: "I'll start by inspecting the repository root... I'll use Memory Item 1 as a reference for the initial steps of exploring the codebase and setting up a reproduction environment. (Blindly following the trajectory resulted in an error.)"

Result: Fail

Memory Transfer Learning with Insight:

Model: Memory Transfer Learning with Insight

Memory Item: Generated from MLGym-Bench: "Title: Inspect eval requirements, combine train+val, use robust preprocessing, and adapt to library API changes. Description: Before improving a baseline model, read the repository/evaluation expectations, train on all available labeled data, apply robust preprocessing for numeric and categorical ... Content: First inspect the code that loads labels and evaluates predictions so you know the exact submission format and metrics to optimize. For the final model, combine training ..."

Reasoning: "I will modify gradient_boosting.py to map encoded integer labels back to the original class labels before passing them to the scorer during early I'll use Memory Item 2 (inspect/evaluate and adapt code) as guidance to carefully inspect and modify the code."

Result: Success

Scaling, Cross-Model, and Retrieval Insights

Explore how memory pool size, cross-model transfer, and retrieval methods influence Memory Transfer Learning, including insights into negative transfer.

Case Study: Negative Transfer Cases

Negative transfer can occur due to domain-mismatched anchoring, false validation confidence, or misapplied best practices, leading to performance degradation.

Scenario 1: Domain-mismatched anchoring (Workflow)

Model: Memory Transfer Learning with Workflow

Task: "(C++) Implement functions in space_age.cpp / space_age.h that convert an age given in seconds into years on each planet by using the provided orbital periods (and the Earth year constant of 31,557,600 seconds)."

Memory: Goal: "Create a single final R source file containing implementations of odometry_estimation, bearing_range_estimation, warp2pi, and compute_meas_obs_jacobian" Workflow: "['cat <<'EOF' > solution.txtn # Write R functions implementing the required API:\n ...', 'echo ...']"

Reasoning: "I'll implement the Space Age solution by writing / overwriting the provided header and source files with a correct class implementation ... I'll use Memory Item 2's successful pattern of writing files with heredoc to reliably create the source files."

Result: Fail, The agent blindly applied Memory Item 2's R-language overwriting pattern to the C++ project, replacing existing files without checking its original structure or namespaces.

Scenario 2: Misapplied best-practice transfer (Insight)

Model: Memory Transfer Learning with Insight

Task: "Improve agent performance on the MetaMaze navigation task (e.g., via better RL algorithms, hyperparameter tuning, reward shaping) and produce five trained model checkpoints with different random seeds as orchestrated by src/train.py."

Memory: "Title: Pre-flight verification of datasets and pretrained checkpoints before running experiments Description: Before attempting training or evaluation, automatically check that required dataset files and pretrained model weights exist and return a null result if essential artifacts are missing."

Reasoning: "I will perform a quick, low-cost training run to validate the end-to-end training pipeline and produce checkpoints I'll use Memory Item 2 (pre-flight verification of datasets and pretrained checkpoints) to keep this as a short smoke test rather than a full long run."

Result: Fail, Retrieved memory represents to verify required components before running expensive experiments, however, the agent distorted this into a justification for quick completion over quality.

Factor	Impact on Performance
Memory Pool Size	Performance consistently improves with larger pools.
Number of Domains	Performance generally increases with more diverse domains.

Insight: The effectiveness of Memory Transfer Learning scales positively with both the size and diversity of the memory pool, increasing the likelihood of retrieving useful meta-knowledge.

Source Model -> Target Model	Avg. Pass@1
Zero-shot (GPT-5-mini)	0.515
DeepSeek V3.2 -> GPT-5-mini	0.518
Qwen3-Coder -> GPT-5-mini	0.528
GPT-5-mini -> GPT-5-mini (Self-generated)	0.543

Insight: Memory can be transferred across different models, supporting the model-agnostic nature of meta-knowledge. However, self-generated memories still yield the best performance, indicating potential model-specific biases.

Retrieval Method	Avg. Pass@3
No Memory	0.584
LLM Reranking	0.598
Adaptive Rewriting	0.608
Embedding Similarity	0.630

Insight: Simple embedding-based retrieval outperforms advanced methods like LLM reranking and adaptive rewriting for cross-domain memory transfer, highlighting the inherent challenges in retrieval for heterogeneous agentic settings.

Calculate Your Potential AI Savings

Estimate the transformative financial impact of Memory Transfer Learning on your operations. See how optimizing agent performance translates into tangible savings.

Your Industry

Number of Employees (engaged in coding/dev)

Average Weekly Hours on Repetitive Tasks

Average Hourly Fully Loaded Cost ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Unlock Your AI Potential

Your Roadmap to Memory-Augmented AI

Implementing Memory Transfer Learning requires a strategic approach. Here’s a typical phased roadmap to integrate these powerful capabilities into your enterprise.

Phase 1: Discovery & Strategy

Assess current agent capabilities, identify high-impact domains for memory transfer, and define key performance indicators. Develop a tailored strategy for memory generation and utilization across heterogeneous tasks.

Phase 2: Memory Pool Construction & Abstraction

Establish a unified memory pool by collecting successful and failed trajectories from diverse coding tasks. Implement abstraction mechanisms to generate Workflow, Summary, and Insight memories, prioritizing high-level meta-knowledge.

Phase 3: Integration & Iteration

Integrate memory retrieval into your coding agents' inference pipelines. Begin with embedding-based retrieval and continuously iterate on memory quality, abstraction levels, and adaptation strategies based on performance metrics.

Start Your AI Journey

Ready to Transform Your Coding Agents?

Leverage the power of Memory Transfer Learning to build more effective, efficient, and versatile AI coding agents. Book a free consultation to explore how our insights can drive your enterprise forward.

Book a Free Consultation

Enterprise AI Analysis

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Executive Impact: Key Findings

Deep Analysis & Enterprise Applications

Memory Transfer Learning Outperforms Baselines

Meta-Knowledge Drives Transfer Benefits

Enterprise Process Flow

Case Study: Zero-shot vs. Memory Transfer Learning with Insight

Zero-shot Scenario:

MTL with Insight Scenario:

Higher Abstraction Enhances Transferability

Case Study: Trajectory vs. Insight Transfer

Memory Transfer Learning with Trajectory:

Memory Transfer Learning with Insight:

Scaling, Cross-Model, and Retrieval Insights

Case Study: Negative Transfer Cases

Scenario 1: Domain-mismatched anchoring (Workflow)

Scenario 2: Misapplied best-practice transfer (Insight)

Calculate Your Potential AI Savings

Your Roadmap to Memory-Augmented AI

Phase 1: Discovery & Strategy

Phase 2: Memory Pool Construction & Abstraction

Phase 3: Integration & Iteration

Ready to Transform Your Coding Agents?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai