Enterprise AI Analysis
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
Authors: Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang
Memory-based self-evolution is a promising paradigm for coding agents, but existing methods often fail to leverage shared infrastructure across diverse real-world problems. This paper introduces Memory Transfer Learning (MTL) using a unified memory pool from heterogeneous domains. Our evaluation across 6 coding benchmarks shows that MTL improves average performance by 3.7%, primarily by transferring meta-knowledge rather than task-specific code. We discover that abstraction dictates transferability, with high-level insights generalizing well and low-level traces often inducing negative transfer due to excessive specificity. The effectiveness of transfer scales with memory pool size and can occur even between different models. This work establishes empirical design principles for expanding memory utilization beyond single-domain silos, leading to more capable coding agents.
Executive Impact: Key Findings
Our research reveals critical insights for leveraging memory in coding agents, demonstrating significant performance gains and establishing principles for effective knowledge transfer.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Memory Transfer Learning Outperforms Baselines
Memory Transfer Learning consistently enhances coding agent performance across diverse benchmarks and models, showcasing its broad applicability.
| Comparison | Zero-shot Avg Pass@3 | MTL (Insight) Avg Pass@3 | Improvement |
|---|---|---|---|
| GPT-5-mini | 0.523 | 0.560 | +3.7% |
| Method | #Memories | Avg. Pass@3 |
|---|---|---|
| Zero-shot | - | 0.584 |
| ReasoningBank | 97 | 0.601 |
| AgentKB | 5,899 | 0.613 |
| MTL (Ours) | 431 | 0.630 |
Insight: MTL achieves the highest average performance with significantly fewer memories than AgentKB, demonstrating superior effectiveness and efficiency.
Meta-Knowledge Drives Transfer Benefits
Memory Transfer Learning primarily benefits agents by providing high-level procedural and behavioral guidance, not task-specific code.
Enterprise Process Flow
Insight: Transferred memory primarily provides meta-knowledge (e.g., workflow, guardrails, practices) rather than direct algorithmic code, accounting for the vast majority of performance gains.
Case Study: Zero-shot vs. Memory Transfer Learning with Insight
An agent fails in zero-shot by naively raising an error. With MTL (Insight), it uses meta-knowledge about inline Python here-docs for testing and successfully completes the task.
Zero-shot Scenario:
Model: Zero-shot
Memory: (No Memory)
Reasoning: "I will modify django/db/models/aggregates.py to raise a FieldError when an Aggregate's source expression contains a window (over) clause. This prevents creating an aggregate over a window expression..."
Result: Fail
MTL with Insight Scenario:
Model: Memory Transfer Learning with Insight
Memory Item: Generated from LiveCodeBench: "Title: Create quick self-contained tests using an inline Python here-doc to validate fixes. Description: When making small code fixes, write a minimal tests... Content: Set up a short battery of tests that cover..."
Reasoning: "I will use Memory Item 2 (use an inline Python here-doc for safe, atomic edits and quick verification) to modify django/db/models/sql/-query.py. The goal is to ensure that when resolving aggregates we detect if any referenced annotation either is a subquery or contains a window expression..."
Result: Success
Higher Abstraction Enhances Transferability
Abstract memory formats, particularly 'Insights', generalize better across tasks and avoid negative transfer caused by overly specific details.
| Memory Format | Avg. Performance (GPT-5-mini) |
|---|---|
| Trajectory | 0.534 |
| Workflow | 0.538 |
| Summary | 0.546 |
| Insight | 0.560 |
Insight: Highly abstract memory representations like 'Insights' consistently yield higher transfer effectiveness compared to lower-abstraction formats.
| Insight Type | Avg. Performance |
|---|---|
| Task-specific Insights | 0.523 |
| Task-agnostic Insights | 0.534 |
Insight: Even within the Insight format, task-agnostic memories consistently outperform task-specific ones, highlighting abstraction as a key factor in transfer performance.
Case Study: Trajectory vs. Insight Transfer
Trajectory memory can lead to brittle implementation anchoring and execution errors due to task-specific commands. Insight memory provides high-level guidance, leading to successful adaptation.
Memory Transfer Learning with Trajectory:
Model: Memory Transfer Learning with Trajectory
Memory Item: Generated from MLGym-Bench: Task: "You are given the train, validation and test sets of Housing Price Prediction dataset and a baseline..." Trajectory: "Agent: ls -la data && nl -ba evaluate.py | sed -n '1,240p' System: <returncode>0</returncode><output>total 464 drwxr-xr-x 2 root root 4096 Dec 25 18:50 .drwxrwxrw- 1 agent agent 4096 Jan 14 06:02 ... Agent: cat <<'PY' > train_predict.py OneHot Encoder(..., sparse=False)) System: PY ('onehot', TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'sparse'"
Reasoning: "I'll start by inspecting the repository root... I'll use Memory Item 1 as a reference for the initial steps of exploring the codebase and setting up a reproduction environment. (Blindly following the trajectory resulted in an error.)"
Result: Fail
Memory Transfer Learning with Insight:
Model: Memory Transfer Learning with Insight
Memory Item: Generated from MLGym-Bench: "Title: Inspect eval requirements, combine train+val, use robust preprocessing, and adapt to library API changes. Description: Before improving a baseline model, read the repository/evaluation expectations, train on all available labeled data, apply robust preprocessing for numeric and categorical ... Content: First inspect the code that loads labels and evaluates predictions so you know the exact submission format and metrics to optimize. For the final model, combine training ..."
Reasoning: "I will modify gradient_boosting.py to map encoded integer labels back to the original class labels before passing them to the scorer during early I'll use Memory Item 2 (inspect/evaluate and adapt code) as guidance to carefully inspect and modify the code."
Result: Success
Scaling, Cross-Model, and Retrieval Insights
Explore how memory pool size, cross-model transfer, and retrieval methods influence Memory Transfer Learning, including insights into negative transfer.
Case Study: Negative Transfer Cases
Negative transfer can occur due to domain-mismatched anchoring, false validation confidence, or misapplied best practices, leading to performance degradation.
Scenario 1: Domain-mismatched anchoring (Workflow)
Model: Memory Transfer Learning with Workflow
Task: "(C++) Implement functions in space_age.cpp / space_age.h that convert an age given in seconds into years on each planet by using the provided orbital periods (and the Earth year constant of 31,557,600 seconds)."
Memory: Goal: "Create a single final R source file containing implementations of odometry_estimation, bearing_range_estimation, warp2pi, and compute_meas_obs_jacobian" Workflow: "['cat <<'EOF' > solution.txtn # Write R functions implementing the required API:\n ...', 'echo ...']"
Reasoning: "I'll implement the Space Age solution by writing / overwriting the provided header and source files with a correct class implementation ... I'll use Memory Item 2's successful pattern of writing files with heredoc to reliably create the source files."
Result: Fail, The agent blindly applied Memory Item 2's R-language overwriting pattern to the C++ project, replacing existing files without checking its original structure or namespaces.
Scenario 2: Misapplied best-practice transfer (Insight)
Model: Memory Transfer Learning with Insight
Task: "Improve agent performance on the MetaMaze navigation task (e.g., via better RL algorithms, hyperparameter tuning, reward shaping) and produce five trained model checkpoints with different random seeds as orchestrated by src/train.py."
Memory: "Title: Pre-flight verification of datasets and pretrained checkpoints before running experiments Description: Before attempting training or evaluation, automatically check that required dataset files and pretrained model weights exist and return a null result if essential artifacts are missing."
Reasoning: "I will perform a quick, low-cost training run to validate the end-to-end training pipeline and produce checkpoints I'll use Memory Item 2 (pre-flight verification of datasets and pretrained checkpoints) to keep this as a short smoke test rather than a full long run."
Result: Fail, Retrieved memory represents to verify required components before running expensive experiments, however, the agent distorted this into a justification for quick completion over quality.
| Factor | Impact on Performance |
|---|---|
| Memory Pool Size | Performance consistently improves with larger pools. |
| Number of Domains | Performance generally increases with more diverse domains. |
Insight: The effectiveness of Memory Transfer Learning scales positively with both the size and diversity of the memory pool, increasing the likelihood of retrieving useful meta-knowledge.
| Source Model -> Target Model | Avg. Pass@1 |
|---|---|
| Zero-shot (GPT-5-mini) | 0.515 |
| DeepSeek V3.2 -> GPT-5-mini | 0.518 |
| Qwen3-Coder -> GPT-5-mini | 0.528 |
| GPT-5-mini -> GPT-5-mini (Self-generated) | 0.543 |
Insight: Memory can be transferred across different models, supporting the model-agnostic nature of meta-knowledge. However, self-generated memories still yield the best performance, indicating potential model-specific biases.
| Retrieval Method | Avg. Pass@3 |
|---|---|
| No Memory | 0.584 |
| LLM Reranking | 0.598 |
| Adaptive Rewriting | 0.608 |
| Embedding Similarity | 0.630 |
Insight: Simple embedding-based retrieval outperforms advanced methods like LLM reranking and adaptive rewriting for cross-domain memory transfer, highlighting the inherent challenges in retrieval for heterogeneous agentic settings.
Calculate Your Potential AI Savings
Estimate the transformative financial impact of Memory Transfer Learning on your operations. See how optimizing agent performance translates into tangible savings.
Your Roadmap to Memory-Augmented AI
Implementing Memory Transfer Learning requires a strategic approach. Here’s a typical phased roadmap to integrate these powerful capabilities into your enterprise.
Phase 1: Discovery & Strategy
Assess current agent capabilities, identify high-impact domains for memory transfer, and define key performance indicators. Develop a tailored strategy for memory generation and utilization across heterogeneous tasks.
Phase 2: Memory Pool Construction & Abstraction
Establish a unified memory pool by collecting successful and failed trajectories from diverse coding tasks. Implement abstraction mechanisms to generate Workflow, Summary, and Insight memories, prioritizing high-level meta-knowledge.
Phase 3: Integration & Iteration
Integrate memory retrieval into your coding agents' inference pipelines. Begin with embedding-based retrieval and continuously iterate on memory quality, abstraction levels, and adaptation strategies based on performance metrics.
Ready to Transform Your Coding Agents?
Leverage the power of Memory Transfer Learning to build more effective, efficient, and versatile AI coding agents. Book a free consultation to explore how our insights can drive your enterprise forward.