Skip to main content
Enterprise AI Analysis: Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

Enterprise AI Analysis

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

Authors: Chaewon Moon, Chulhee Yun, Dongkuk Si | Date: March 9, 2026

This research explores the implicit bias of Sharpness-Aware Minimization (SAM) in deep learning. While SAM and gradient descent (GD) behave similarly for shallow networks (L=1), deeper networks (L=2) exhibit distinct patterns. Specifically, l∞-SAM's behavior becomes highly sensitive to initialization, sometimes favoring minor features. l2-SAM, while asymptotically matching l1 max-margin, shows 'sequential feature amplification' in finite-time dynamics, where minor features are amplified early before major ones dominate. This highlights that infinite-time bias analysis alone is insufficient and a finite-time perspective is crucial.

Sharpness-Aware Minimization Implicit Bias Deep Learning Diagonal Networks Feature Amplification Logistic Loss

Our analysis uncovers critical insights into the dynamics of Sharpness-Aware Minimization (SAM), especially its depth-induced implicit biases. These findings offer pathways for more robust and effective AI deployments in complex enterprise environments.

0% Reduced Generalization Gap
0x Minor Feature Amplification
0M Parameters Analyzed

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Implicit Bias of SAM

The implicit bias of SAM is analyzed for L-layer linear diagonal networks on linearly separable binary classification with logistic loss. For L=1, both l∞-SAM and l2-SAM recover the l2 max-margin classifier, matching GD. However, for L=2, the behavior diverges.

L=1: Matches GD SAM's implicit bias for shallow networks.

Depth-Induced Bias Evolution

Linear (L=1)
l2 Max-Margin
Deeper (L=2)
l∞-SAM: Init-sensitive, Minor Feature Bias
l2-SAM: Sequential Feature Amplification

l∞-SAM Characteristics

For depth L=2, the l∞-SAM's limit direction critically depends on initialization, converging to 0 or standard basis vectors. For L≥3, the bias is even more sensitive to initialization, favoring minor features. This contrasts sharply with GD.

l∞-SAM vs. GD for Deeper Networks

Aspect l∞-SAM (L≥2) Gradient Descent (GD)
Limit Direction
  • Initialization-sensitive; can favor minor features or converge to zero.
  • Always aligns with major feature (l1 max-margin for L=2, l2 max-margin for L=1).
Sensitivity to Init.
  • High, especially for L≥3.
  • Low.
Minor Feature Bias
  • Can favor minor features depending on initialization.
  • Consistently biases towards major features.

l2-SAM Dynamics

For L=2, l2-SAM's limit direction matches the l1 max-margin solution, similar to GD. However, its finite-time dynamics show 'sequential feature amplification,' where minor coordinates are initially relied upon and then gradually shift to larger ones.

Sequential Feature Amplification in l2-SAM

This phenomenon highlights a crucial aspect of l2-SAM's training dynamics, where the model's focus shifts over time and with initialization scale.

Regime 1: Collapse to Origin

For small initialization scales, the predictor remains near the origin, failing to express features, and the loss does not vanish.

Regime 2: Time-wise Amplification

With increasing time (or initialization scale), the dominant coordinate index shifts from minor to major features, a 'minor-first, major-last' behavior. This results in an early plateau in training loss.

Regime 3: Major Feature Dominance

For large initialization scales, the major feature dominates from the outset and maintains this alignment throughout training.

Finite-Time View: Crucial Implicit bias analyses need to consider dynamics, not just limits.

Quantify Your Enterprise AI Savings

Estimate the potential annual cost savings and reclaimed hours by implementing advanced AI strategies in your organization.

Annual Cost Savings 0
Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach to integrating advanced AI, designed for clarity and maximum impact within your enterprise.

Phase 1: Discovery & Strategy

Initial consultations, needs assessment, and AI strategy formulation. (1-2 Weeks)

Phase 2: Pilot & Proof-of-Concept

Develop and deploy a small-scale AI solution to validate impact and gather initial data. (4-6 Weeks)

Phase 3: Full-Scale Integration

Expand the AI solution across relevant departments, integrate with existing systems, and provide user training. (8-12 Weeks)

Phase 4: Optimization & Scaling

Continuous monitoring, performance tuning, and identification of new AI opportunities for further growth. (Ongoing)

Ready to Transform Your Enterprise with AI?

Unlock the full potential of advanced AI and achieve unprecedented operational efficiency and strategic advantage.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking