Enterprise AI Analysis

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

Authors: Chaewon Moon, Chulhee Yun, Dongkuk Si | Date: March 9, 2026

This research explores the implicit bias of Sharpness-Aware Minimization (SAM) in deep learning. While SAM and gradient descent (GD) behave similarly for shallow networks (L=1), deeper networks (L=2) exhibit distinct patterns. Specifically, l∞-SAM's behavior becomes highly sensitive to initialization, sometimes favoring minor features. l2-SAM, while asymptotically matching l1 max-margin, shows 'sequential feature amplification' in finite-time dynamics, where minor features are amplified early before major ones dominate. This highlights that infinite-time bias analysis alone is insufficient and a finite-time perspective is crucial.

Sharpness-Aware Minimization Implicit Bias Deep Learning Diagonal Networks Feature Amplification Logistic Loss

Schedule Your Strategic AI Consultation

Our analysis uncovers critical insights into the dynamics of Sharpness-Aware Minimization (SAM), especially its depth-induced implicit biases. These findings offer pathways for more robust and effective AI deployments in complex enterprise environments.

0% Reduced Generalization Gap

0x Minor Feature Amplification

0M Parameters Analyzed

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Implicit Bias of SAM

The implicit bias of SAM is analyzed for L-layer linear diagonal networks on linearly separable binary classification with logistic loss. For L=1, both l∞-SAM and l2-SAM recover the l2 max-margin classifier, matching GD. However, for L=2, the behavior diverges.

L=1: Matches GD SAM's implicit bias for shallow networks.

Depth-Induced Bias Evolution

Linear (L=1)

→

l2 Max-Margin

→

Deeper (L=2)

→

l∞-SAM: Init-sensitive, Minor Feature Bias

→

l2-SAM: Sequential Feature Amplification

l∞-SAM Characteristics

For depth L=2, the l∞-SAM's limit direction critically depends on initialization, converging to 0 or standard basis vectors. For L≥3, the bias is even more sensitive to initialization, favoring minor features. This contrasts sharply with GD.

l∞-SAM vs. GD for Deeper Networks

Aspect	l∞-SAM (L≥2)	Gradient Descent (GD)
Limit Direction	Initialization-sensitive; can favor minor features or converge to zero.	Always aligns with major feature (l1 max-margin for L=2, l2 max-margin for L=1).
Sensitivity to Init.	High, especially for L≥3.	Low.
Minor Feature Bias	Can favor minor features depending on initialization.	Consistently biases towards major features.

l2-SAM Dynamics

For L=2, l2-SAM's limit direction matches the l1 max-margin solution, similar to GD. However, its finite-time dynamics show 'sequential feature amplification,' where minor coordinates are initially relied upon and then gradually shift to larger ones.

Sequential Feature Amplification in l2-SAM

This phenomenon highlights a crucial aspect of l2-SAM's training dynamics, where the model's focus shifts over time and with initialization scale.

Regime 1: Collapse to Origin

For small initialization scales, the predictor remains near the origin, failing to express features, and the loss does not vanish.

Regime 2: Time-wise Amplification

With increasing time (or initialization scale), the dominant coordinate index shifts from minor to major features, a 'minor-first, major-last' behavior. This results in an early plateau in training loss.

Regime 3: Major Feature Dominance

For large initialization scales, the major feature dominates from the outset and maintains this alignment throughout training.

Finite-Time View: Crucial Implicit bias analyses need to consider dynamics, not just limits.

Quantify Your Enterprise AI Savings

Estimate the potential annual cost savings and reclaimed hours by implementing advanced AI strategies in your organization.

Your Industry

Number of Employees

Avg. Hours/Week on Manual Tasks

Avg. Hourly Cost per Employee ($)

Annual Cost Savings 0

Hours Reclaimed 0

Discuss Your Implementation

Your AI Implementation Roadmap

A structured approach to integrating advanced AI, designed for clarity and maximum impact within your enterprise.

Phase 1: Discovery & Strategy

Initial consultations, needs assessment, and AI strategy formulation. (1-2 Weeks)

Phase 2: Pilot & Proof-of-Concept

Develop and deploy a small-scale AI solution to validate impact and gather initial data. (4-6 Weeks)

Phase 3: Full-Scale Integration

Expand the AI solution across relevant departments, integrate with existing systems, and provide user training. (8-12 Weeks)

Phase 4: Optimization & Scaling

Continuous monitoring, performance tuning, and identification of new AI opportunities for further growth. (Ongoing)

Get Started with Phase 1

Ready to Transform Your Enterprise with AI?

Unlock the full potential of advanced AI and achieve unprecedented operational efficiency and strategic advantage.

Schedule Your Strategic AI Consultation

Enterprise AI Analysis

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

Deep Analysis & Enterprise Applications

Implicit Bias of SAM

Depth-Induced Bias Evolution

l∞-SAM Characteristics

l∞-SAM vs. GD for Deeper Networks

l2-SAM Dynamics

Sequential Feature Amplification in l2-SAM

Regime 1: Collapse to Origin

Regime 2: Time-wise Amplification

Regime 3: Major Feature Dominance

Quantify Your Enterprise AI Savings

Your AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Pilot & Proof-of-Concept

Phase 3: Full-Scale Integration

Phase 4: Optimization & Scaling

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai