Enterprise AI Analysis

SoK: Agentic Skills — Beyond Tool Use in LLM Agents

Large Language Model (LLM) agents have rapidly advanced, yet a fundamental inefficiency persists as each new task requires re-deriving execution strategies. This paper introduces the concept of agentic skills: reusable, callable modules encapsulating procedural knowledge. We present a unified definition, a lifecycle model, design patterns, and taxonomies for skills. Furthermore, we analyze security implications, including supply-chain risks, and survey evaluation methods, anchored by a case study of the ClawHavoc campaign. Curated skills are shown to significantly improve agent success rates, highlighting their role as critical components for robust, verifiable, and certifiable autonomous agents.

Schedule Your Strategy Session

Executive Impact: Key Findings

Insights from cutting-edge research demonstrate the transformative potential of agentic skills in enterprise AI.

0 Curated Skills Boost Success Rate

0 Self-Generated Skills Impact Performance

0 Malicious Skill Rate (ClawHavoc)

0 Healthcare Domain Improvement

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Definition

Lifecycle

Design Patterns

Security & Governance

Evaluation

Agentic skills are reusable, callable modules encapsulating a sequence of actions or policies to achieve specific goals under recurring conditions. They are formally defined as a four-tuple: S = (C, π, Τ, R), where C is the applicability condition, π is the executable policy, T is the termination condition, and R is the reusable callable interface. This distinction separates skills from atomic tools, one-time plans, and episodic memories, making them first-class units of procedural knowledge.

Abstraction	Unit of Reuse	Execution Semantics	Verification Surface	Composability	Governance Surface
Tool	Single API call	Stateless, single invocation	Input/output schema	Sequential chaining	Permission per tool
Plan	Task decomposition	One-time reasoning scaffold	Step consistency	Hierarchical decomposition	N/A (ephemeral)
Episodic memory	Stored observation	Retrieval, no direct execution	Relevance, recency	Indirect (informs reasoning)	Access control on store
Prompt template	Text fragment	Injected into context window	Output quality	String concatenation	Template authorship
Agentic skill	Procedural module	Callable workflow with termination	Outcome correctness, safety	Hierarchical, DAG, recursive	Trust tier, sandboxing, provenance

The skill lifecycle models the stages from discovery and refinement to storage, retrieval, execution, and evaluation/update. It emphasizes skills as evolving system components shaped by interaction and feedback. Key stages include identifying task patterns (Discovery), iteratively improving skills (Practice/Refinement), packaging procedures (Distillation), persisting skills (Storage), selecting and combining skills (Retrieval/Composition), running policies (Execution), and monitoring performance (Evaluation/Update).

Enterprise Process Flow

Discovery

→

Practice/Refinement

→

Distillation

→

Storage

→

Retrieval/Composition

→

Execution

→

Evaluation/Update

We identify seven design patterns describing how skills are packaged, loaded, and executed: Metadata-Driven Disclosure (P1) for efficient loading, Code-as-Skill (P2) for determinism, Workflow Enforcement (P3) for reliability, Self-Evolving Skill Libraries (P4) for autonomous growth, Hybrid NL+Code Macros (P5) for flexibility, Meta-Skills (P6) for skill generation, and Plugin/Marketplace Distribution (P7) for ecosystem scaling. These patterns represent different trade-offs in context cost, determinism, composability, and governance.

7 Design Patterns Identified in Agentic Skills

The skill layer introduces new attack surfaces, including poisoned skill retrieval, malicious skill payloads (prompt injection or code injection), cross-tenant leakage, skill drift exploitation, confused deputy via environmental injection, and applicability condition poisoning. A four-tier trust model (Metadata Only, Instruction Access, Supervised Execution, Autonomous Execution) and sandboxing mechanisms are proposed as mitigations. The ClawHavoc campaign serves as a stark case study, demonstrating large-scale credential and asset theft via malicious skills.

ClawHavoc: A Real-World Agent Supply-Chain Attack

The ClawHavoc campaign infiltrated OpenClaw's skill registry (ClawHub) with nearly 1,200 malicious skills, leading to widespread credential and asset theft. This highlights the severe supply-chain risks in agent ecosystems, mirroring traditional software package vulnerabilities.

Attack vectors included poisoned skill retrieval (Pattern-1), malicious code payloads (Pattern-2) with reverse shells and credential exfiltration, prompt injection via documentation (Pattern-5), and applicability condition poisoning (Pattern-1). Critical assets harvested included API keys, crypto wallets, browser credentials, SSH keys, and local files.

Evaluating agentic skills involves assessing Correctness, Robustness, Efficiency, Generalization, and Safety. Deterministic evaluation harnesses, such as SkillsBench, are crucial for scalable and reproducible assessment by checking environment state against expected outcomes. SkillsBench evidence demonstrates that curated skills significantly improve success rates (+16.2pp), while self-generated skills may degrade performance, emphasizing the need for robust verification.

16.2% Average Pass Rate Improvement with Curated Skills

Calculate Your Potential AI ROI

Estimate the efficiency gains and cost savings your enterprise could achieve with advanced AI agentic skills.

Your Industry

Number of Employees Directly Affected by AI

Average Hours Saved Per Employee/Week (AI-enabled tasks)

Average Hourly Fully Loaded Cost Per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

Our phased approach ensures a smooth and effective integration of agentic AI skills into your operations.

Phase 1: Discovery & Strategy

Assess current workflows, identify high-impact areas for AI integration, and define clear objectives and KPIs.

Phase 2: Pilot & Development

Develop and test initial agentic skills on a small scale, gathering feedback and refining for optimal performance.

Phase 3: Scaled Deployment

Expand AI agent deployment across relevant departments, ensuring seamless integration and user adoption.

Phase 4: Optimization & Governance

Continuously monitor performance, update skills, and establish robust governance frameworks for long-term reliability and security.

Ready to Transform Your Enterprise with AI?

Unlock the full potential of agentic AI skills. Schedule a personalized consultation with our experts to design your custom AI strategy.

Book a Consultation

Enterprise AI Analysis

SoK: Agentic Skills — Beyond Tool Use in LLM Agents

Executive Impact: Key Findings

Deep Analysis & Enterprise Applications

Enterprise Process Flow

ClawHavoc: A Real-World Agent Supply-Chain Attack

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Pilot & Development

Phase 3: Scaled Deployment

Phase 4: Optimization & Governance

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai