Enterprise AI Analysis
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
Authors: Chaewon Moon, Chulhee Yun, Dongkuk Si | Date: March 9, 2026
This research explores the implicit bias of Sharpness-Aware Minimization (SAM) in deep learning. While SAM and gradient descent (GD) behave similarly for shallow networks (L=1), deeper networks (L=2) exhibit distinct patterns. Specifically, l∞-SAM's behavior becomes highly sensitive to initialization, sometimes favoring minor features. l2-SAM, while asymptotically matching l1 max-margin, shows 'sequential feature amplification' in finite-time dynamics, where minor features are amplified early before major ones dominate. This highlights that infinite-time bias analysis alone is insufficient and a finite-time perspective is crucial.
Our analysis uncovers critical insights into the dynamics of Sharpness-Aware Minimization (SAM), especially its depth-induced implicit biases. These findings offer pathways for more robust and effective AI deployments in complex enterprise environments.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Implicit Bias of SAM
The implicit bias of SAM is analyzed for L-layer linear diagonal networks on linearly separable binary classification with logistic loss. For L=1, both l∞-SAM and l2-SAM recover the l2 max-margin classifier, matching GD. However, for L=2, the behavior diverges.
Depth-Induced Bias Evolution
l∞-SAM Characteristics
For depth L=2, the l∞-SAM's limit direction critically depends on initialization, converging to 0 or standard basis vectors. For L≥3, the bias is even more sensitive to initialization, favoring minor features. This contrasts sharply with GD.
| Aspect | l∞-SAM (L≥2) | Gradient Descent (GD) |
|---|---|---|
| Limit Direction |
|
|
| Sensitivity to Init. |
|
|
| Minor Feature Bias |
|
|
l2-SAM Dynamics
For L=2, l2-SAM's limit direction matches the l1 max-margin solution, similar to GD. However, its finite-time dynamics show 'sequential feature amplification,' where minor coordinates are initially relied upon and then gradually shift to larger ones.
Sequential Feature Amplification in l2-SAM
This phenomenon highlights a crucial aspect of l2-SAM's training dynamics, where the model's focus shifts over time and with initialization scale.
Regime 1: Collapse to Origin
For small initialization scales, the predictor remains near the origin, failing to express features, and the loss does not vanish.
Regime 2: Time-wise Amplification
With increasing time (or initialization scale), the dominant coordinate index shifts from minor to major features, a 'minor-first, major-last' behavior. This results in an early plateau in training loss.
Regime 3: Major Feature Dominance
For large initialization scales, the major feature dominates from the outset and maintains this alignment throughout training.
Quantify Your Enterprise AI Savings
Estimate the potential annual cost savings and reclaimed hours by implementing advanced AI strategies in your organization.
Your AI Implementation Roadmap
A structured approach to integrating advanced AI, designed for clarity and maximum impact within your enterprise.
Phase 1: Discovery & Strategy
Initial consultations, needs assessment, and AI strategy formulation. (1-2 Weeks)
Phase 2: Pilot & Proof-of-Concept
Develop and deploy a small-scale AI solution to validate impact and gather initial data. (4-6 Weeks)
Phase 3: Full-Scale Integration
Expand the AI solution across relevant departments, integrate with existing systems, and provide user training. (8-12 Weeks)
Phase 4: Optimization & Scaling
Continuous monitoring, performance tuning, and identification of new AI opportunities for further growth. (Ongoing)
Ready to Transform Your Enterprise with AI?
Unlock the full potential of advanced AI and achieve unprecedented operational efficiency and strategic advantage.