Skill

create-skill

7 ↓ | .tar.gz

Creates new agent skills following the Agent Skills specification. Investigates the repo for conventions, designs the skill around progressive disclosure, writes SKILL.md with effective trigger descriptions, and validates with representative prompts. Use when the user wants to create a skill, build a SKILL.md, turn a workflow into a reusable skill, teach the agent a new task, scaffold a new agent capability, has a repeated workflow they want to codify, is frustrated by inconsistent agent behavior, or wants to package expertise for a team.

Create agent skills that are portable, easy to trigger, and cheap to load. A skill is a folder containing a SKILL.md file with YAML frontmatter and markdown instructions.

Workflow

  1. Intake (mandatory gate) — Understand what the skill should do. Ask at least 3 targeted questions before drafting anything. Collect:

    • A short name (lowercase, hyphenated)
    • What the skill enables the agent to do
    • When it should activate (trigger conditions)
    • What success looks like

    Summarize your understanding and get explicit confirmation before proceeding. Do not write SKILL.md until the user confirms.

  2. Investigate the repo — Before asking questions, search the repo for:

    • Existing skills, conventions, and workflow docs
    • Scripts, templates, schemas relevant to the target workflow
    • Tool or dependency requirements
    • Whether the conversation already contains a workflow to capture
  3. Clarify — Ask only questions that materially affect the skill. Push until these are clear:

    • Required workflow steps and their order
    • Required inputs and expected outputs
    • Dependencies on tools, scripts, or services
    • Whether the skill needs references/, scripts/, or assets/
  4. Design the package — Structure:

    skill-name/
    ├── SKILL.md        # Metadata + core workflow
    ├── references/     # Detailed docs, loaded on demand
    ├── scripts/        # Executable code
    └── assets/         # Templates, resources
    
    • Keep SKILL.md under 500 lines
    • Move bulky detail into references/
    • Put deterministic execution in scripts/
    • Don't duplicate guidance across files
  5. Write SKILL.md — Structure:

    ---
    name: skill-name
    description: >
      What the skill does and produces. Use when the user wants
      to <scenario>, mentions <keyword>, or asks about <topic>.
    ---
    

    Frontmatter rules:

    • Required: name and description in frontmatter. Other fields depend on your project's conventions.

    Writing rules:

    • Description optimizes activation, not teaching. State the job and when to use it in words a user would actually say. Include both actions and situations. Keep workflow details out of the description.

      Bad: Follows a 7-step process to generate SKILL.md files with YAML frontmatter.

      Good: Creates agent skills. Use when the user wants to build a SKILL.md, turn a workflow into a reusable skill, or is frustrated by inconsistent agent behavior.

    • Body is procedural and imperative. Tell the agent exactly how to proceed. Don't restate trigger criteria from the description — a "When to use" section in the body duplicates the description.

    • Use imperative form. "Do not", "Use", "Run" — not "prefer" or "consider".

    • Be concise. Terse reminders, not tutorials.

    • Include a complete example. One full, copy-paste-ready artifact beats scattered snippets.

    • Include a Boundaries section (mandatory). List what the skill DOES and Does NOT do.

    • Include a Common Failures section. List 2–3 domain-specific mistakes an agent would make without this guidance.

    See example-skill.md for a complete finished skill demonstrating these principles.

  6. Validate — Test the skill with representative prompts:

    • 2–3 realistic positive prompts (things users would say)
    • At least 1 negative prompt (adjacent but shouldn't trigger)

    Write a brief validation report noting:

    • Which prompts triggered correctly
    • Which failed and why (trigger wording, workflow ambiguity, or missing resources)
    • What was fixed based on the failures

    Skip validation only for trivial skills where the trigger surface is obvious.

    Portability check — For distributable skills, verify:

    • No hardcoded project-specific paths (use discovery)
    • No project-specific terminology (internal jargon)
    • No references to specific rules/tools only in your repo
    • Instructions work in any repo with any directory layout
  7. Acknowledge sources — If the skill draws on external practices, create references/ACKNOWLEDGMENTS.md listing each source with a link, license, what was adapted, and the version it was adopted in.

  8. Confirm — Show the user the created skill and ask if adjustments are needed.

Boundaries

  • DOES create skill directories, SKILL.md, references/, scripts/
  • DOES validate with representative prompts
  • Does NOT modify existing skills
  • Does NOT create rules or profiles (separate workflows)

Example Scenario

User: "Turn my database migration steps into a skill." → Investigate repo (Flyway config) → ask about rollback scope → create migrate-database/SKILL.md → validate with prompts.

Common Failures

  • Description leaks workflow — the agent reads the summary and skips the body, following a shortcut instead of the full procedure.
  • Body too abstract to act on — "investigate the problem" isn't actionable. "Run git log --oneline -20 to check recent patterns" is.
  • Weak enforcement in instructions — If evals show the agent ignoring a step, add it to a Common Failures section with NEVER/MUST language. Explicit failure modes with strong directives are more effective than polite workflow steps.

Quality Checklist

Before finalizing, use the skill design checklist, skill validation, and token optimization.

├── references/
│ ├── example-skill.md
│ ├── skill-validation.md
│ └── token-optimization.md
└── SKILL.md
SKILL.md | | Raw

Create Skill

Create agent skills that are portable, easy to trigger, and cheap to load. A skill is a folder containing a SKILL.md file with YAML frontmatter and markdown instructions.

Workflow

  1. Intake (mandatory gate) — Understand what the skill should do. Ask at least 3 targeted questions before drafting anything. Collect:

    • A short name (lowercase, hyphenated)
    • What the skill enables the agent to do
    • When it should activate (trigger conditions)
    • What success looks like

    Summarize your understanding and get explicit confirmation before proceeding. Do not write SKILL.md until the user confirms.

  2. Investigate the repo — Before asking questions, search the repo for:

    • Existing skills, conventions, and workflow docs
    • Scripts, templates, schemas relevant to the target workflow
    • Tool or dependency requirements
    • Whether the conversation already contains a workflow to capture
  3. Clarify — Ask only questions that materially affect the skill. Push until these are clear:

    • Required workflow steps and their order
    • Required inputs and expected outputs
    • Dependencies on tools, scripts, or services
    • Whether the skill needs references/, scripts/, or assets/
  4. Design the package — Structure:

    skill-name/
    ├── SKILL.md        # Metadata + core workflow
    ├── references/     # Detailed docs, loaded on demand
    ├── scripts/        # Executable code
    └── assets/         # Templates, resources
    
    • Keep SKILL.md under 500 lines
    • Move bulky detail into references/
    • Put deterministic execution in scripts/
    • Don't duplicate guidance across files
  5. Write SKILL.md — Structure:

    ---
    name: skill-name
    description: >
      What the skill does and produces. Use when the user wants
      to <scenario>, mentions <keyword>, or asks about <topic>.
    ---
    

    Frontmatter rules:

    • Required: name and description in frontmatter. Other fields depend on your project's conventions.

    Writing rules:

    • Description optimizes activation, not teaching. State the job and when to use it in words a user would actually say. Include both actions and situations. Keep workflow details out of the description.

      Bad: Follows a 7-step process to generate SKILL.md files with YAML frontmatter.

      Good: Creates agent skills. Use when the user wants to build a SKILL.md, turn a workflow into a reusable skill, or is frustrated by inconsistent agent behavior.

    • Body is procedural and imperative. Tell the agent exactly how to proceed. Don't restate trigger criteria from the description — a "When to use" section in the body duplicates the description.

    • Use imperative form. "Do not", "Use", "Run" — not "prefer" or "consider".

    • Be concise. Terse reminders, not tutorials.

    • Include a complete example. One full, copy-paste-ready artifact beats scattered snippets.

    • Include a Boundaries section (mandatory). List what the skill DOES and Does NOT do.

    • Include a Common Failures section. List 2–3 domain-specific mistakes an agent would make without this guidance.

    See example-skill.md for a complete finished skill demonstrating these principles.

  6. Validate — Test the skill with representative prompts:

    • 2–3 realistic positive prompts (things users would say)
    • At least 1 negative prompt (adjacent but shouldn't trigger)

    Write a brief validation report noting:

    • Which prompts triggered correctly
    • Which failed and why (trigger wording, workflow ambiguity, or missing resources)
    • What was fixed based on the failures

    Skip validation only for trivial skills where the trigger surface is obvious.

    Portability check — For distributable skills, verify:

    • No hardcoded project-specific paths (use discovery)
    • No project-specific terminology (internal jargon)
    • No references to specific rules/tools only in your repo
    • Instructions work in any repo with any directory layout
  7. Acknowledge sources — If the skill draws on external practices, create references/ACKNOWLEDGMENTS.md listing each source with a link, license, what was adapted, and the version it was adopted in.

  8. Confirm — Show the user the created skill and ask if adjustments are needed.

Boundaries

  • DOES create skill directories, SKILL.md, references/, scripts/
  • DOES validate with representative prompts
  • Does NOT modify existing skills
  • Does NOT create rules or profiles (separate workflows)

Example Scenario

User: "Turn my database migration steps into a skill." → Investigate repo (Flyway config) → ask about rollback scope → create migrate-database/SKILL.md → validate with prompts.

Common Failures

  • Description leaks workflow — the agent reads the summary and skips the body, following a shortcut instead of the full procedure.
  • Body too abstract to act on — "investigate the problem" isn't actionable. "Run git log --oneline -20 to check recent patterns" is.
  • Weak enforcement in instructions — If evals show the agent ignoring a step, add it to a Common Failures section with NEVER/MUST language. Explicit failure modes with strong directives are more effective than polite workflow steps.

Quality Checklist

Before finalizing, use the skill design checklist, skill validation, and token optimization.

references/example-skill.md | | Raw

Example Skill

A finished skill for running the test suite:

---
name: run-tests
description: >
  Runs the project's test suite and reports results. Use when
  the user wants to run tests, check if tests pass, verify
  changes don't break anything, or asks about test failures.
---

# Run Tests

Run the full test suite, surface failures clearly, and suggest
fixes when the cause is obvious.

## Workflow

1. **Detect the test runner** — Check package.json scripts,
   Makefile targets, or pyproject.toml for the test command.
   Prefer `npm test`, `make test`, or `pytest` in that order.

2. **Run the suite** — Execute the detected command. Capture
   stdout and stderr.

3. **Report results** — If all tests pass, confirm with a
   one-line summary. If tests fail, list each failing test
   with its error message and the file:line reference.

4. **Suggest fixes** — For failures with an obvious cause
   (import error, missing env var, typo), propose a concrete
   fix. For ambiguous failures, ask the user before changing
   anything.

## Conventions

- Never modify test files to make tests pass.
- Run the full suite unless the user explicitly scopes to a
  subset.

This example shows:

  • A description with natural trigger phrases and situations
  • A tight four-step workflow
  • Conventions that constrain behavior without rigid rules
references/skill-design-checklist.md | | Raw

Skill Design Checklist

Use this before finalizing a generated or revised skill.

Problem and Scope

  • Is the target problem concrete and repeatable?
  • Is the skill solving one coherent job rather than several?
  • Is the audience clear?
  • Are out-of-scope cases obvious from the wording?

Trigger Quality

  • Does the description say what the skill does?
  • Does it say when to use it?
  • Does it include phrases a user would actually say?
  • Does it avoid vague language like "helps with" or "handles"?
  • If over-triggering is a risk, does the description narrow scope clearly?
  • Does the body avoid restating trigger criteria already covered by the description?
  • Does the description answer both "what" and "when" in a single read? (completeness)
  • Could this skill accidentally trigger instead of another skill in the same workspace? Are the trigger terms unique to this skill's domain? (distinctiveness)

Workflow Quality

  • Are the steps in the correct order?
  • Does the skill investigate local context before asking avoidable questions?
  • Are decision points and defaults explicit?
  • Are required inputs and expected outputs stated?
  • Are external dependencies named only when necessary?

Packaging

  • Is SKILL.md sufficient on its own?
  • If not, are extra details in references/ instead of bloating the main file?
  • Are scripts/ included only for deterministic or fragile tasks?
  • Are assets/ included only when they materially improve execution?

Validation

  • Are there concrete examples or scenarios?
  • Does the skill define what success looks like?
  • Does it describe how to catch obvious failure modes?
  • If assumptions were needed, are they stated explicitly?

Portability

  • Is the wording generic unless the user explicitly asked for a repo-bound skill?
  • Does the skill avoid environment assumptions it cannot justify?
  • If the skill is repo-bound, does it say so plainly?

Consistency

  • Does the name follow existing patterns (lowercase-hyphenated)?
  • Does the description avoid leaking workflow steps that would let the agent skip the body?
  • Is there a companion rule or skill that should be referenced?
  • Does the CHANGELOG format match other artifacts?

Validation

  • Have you tested with at least one near-miss negative prompt?
  • Could someone verify the output is correct without re-reading the whole skill?

Token Budget

  • Is the front-loaded cost justified for activation frequency?
  • Could any section move to references/ without losing workflow clarity?

Boundaries

  • Does the skill state what it DOES?
  • Does it state what it Does NOT do?
  • Are the boundaries specific enough to prevent scope creep?
  • Would an agent know which files and actions are off-limits?
references/skill-validation.md | | Raw

Skill Validation

Use this when a skill needs prompt-level validation before shipping.

Goal

Prove that the skill:

  • Triggers for obvious requests
  • Triggers for paraphrased requests
  • Does not trigger for nearby but unrelated work
  • Gives another agent enough detail to act without guessing

Minimum Prompt Set

Write at least 3 prompts:

  • positive-obvious — direct request using the most likely trigger words
  • positive-paraphrased — same job, different wording
  • negative-adjacent — close enough to confuse a weak description, but should not load the skill. The best negatives are near-misses: queries that share keywords or domain with the skill but need something different. "Write a fibonacci function" is too easy as a negative for a deploy skill — "set up a staging environment" is a real near-miss that tests discrimination.

Add more prompts only when the surface area is large or the user explicitly wants deeper validation.

Comparison Modes

Choose the lightest comparison that answers the risk:

  • manual simulation — read the prompt against the skill and judge whether the trigger and workflow would work
  • before vs after — compare the current skill against the revised skill
  • trigger wording A vs B — use when the main risk is activation quality rather than body content

When Validation Can Be Skipped

Skip only when all of these are true:

  • The edit does not change the trigger surface
  • The edit does not change the workflow meaning
  • The edit does not add or remove important resources

If any of those changed, run at least a lightweight prompt simulation.

Review Checklist

For each prompt, record:

  • Should the skill trigger?
  • Which words or phrases should cause activation?
  • Which part of the body should guide the next step?
  • Where could an agent take a shortcut or misread?

Common failures:

  • Description too vague to trigger
  • Description so broad that it over-triggers
  • Description summarizes workflow, tempting the agent to skip the body
  • Body assumes repo facts it never tells the reader to discover
  • Examples are longer than the rules they clarify
references/token-optimization.md | | Raw

Token Optimization

Use this after the skill works. Optimize for lower context cost without reducing execution quality.

Keep in SKILL.md

  • The trigger-bearing frontmatter
  • The core workflow
  • Critical decision rules
  • Short examples that anchor the workflow
  • Direct links to bundled references

Move Out of SKILL.md

  • Long domain primers
  • Exhaustive edge-case catalogs
  • Variant-specific instructions
  • Large examples
  • Detailed command references
  • Documentation discoverable from the repo at runtime

Compression Rules

  • Delete repeated ideas before rewriting sentences
  • Prefer short checklists over explanatory paragraphs
  • Replace generic advice with workflow-specific rules
  • Keep examples only if they teach something not already obvious from the instructions
  • Avoid motivational or narrative text
  • Prefer one sharp sentence over two soft ones

Smell Tests

The main file is probably too large if:

  • Multiple sections repeat the same workflow in different words
  • The body restates trigger criteria already in the description (e.g., a "When to use" section that duplicates the description)
  • Examples are longer than the instructions they illustrate
  • Reference material dominates the core procedure
  • The skill explains common concepts instead of workflow-specific guidance
  • Multiple sections serve the same purpose (e.g., a quality checklist and a common failures section that overlap)

Final Pass

Ask:

  1. What text can be deleted with no loss of behavior?
  2. What text belongs in references/?
  3. What assumptions should be stated once instead of repeated?
  4. Is the description still strong enough to trigger correctly after trimming?

create-skill is a meta-skill that creates other agent skills following the Agent Skills specification.

Why a mandatory intake gate before writing

Jumping to writing without understanding requirements produces generic skills that don't match the user's actual workflow. Three questions minimum forces the agent to understand before acting.

Why descriptions optimize for activation, not teaching

LLMs use the description field for routing — deciding which skill to activate. A description that teaches the workflow instead of describing triggers causes mis-activation. The body teaches; the description matches.

Why skills are validated with negative prompts

A skill that activates on everything is useless. Negative prompts ("this should NOT trigger the skill") test that the trigger surface has boundaries. Without them, over-eager activation degrades the full system.

[1.3.0] - 2026-06-11

Added

Changed

Removed

[1.2.1] - 2026-05-26

Changed

[1.2.0] - 2026-05-24

Added

Changed

[1.1.0] - 2026-05-05

Added

Changed

[1.0.0] - 2026-04-28

Added