A single sheet with faint diagram lines on a desk, one element in sharp focus, the rest blurred

We Rebuilt the Same Skill Four Times. Here’s What Finally Made It Idiot-Proof for AI

Posted by:

ebizapple

On:

June 23, 2026

We had a small skill in our content system: take an article, give it a hero image, attach it. Nothing exotic. And yet, lined up in our library, there were four versions of it. v1, v2, v3, v4. Each time an agent picked it up, it got something subtly wrong, and someone built another version to patch the gap.

Four rebuilds of one trivial task is not a coding problem. It’s a writing problem. The skill worked fine for whoever wrote it. It just couldn’t survive being handed to anyone else.

So we stopped asking “does this skill run?” and started asking a harder question: could the next agent, with none of our context, possibly misread this? The answer was yes, in five places.

The five ways it got misread

The first was invisible. The skill was named like it made the image. It didn’t. It quietly expected a finished image at a public web address. An agent reading the name would generate nothing, pass an empty value, and watch it fail for reasons the skill never mentioned.

The second was a silent corruption. If you didn’t state the file type, it assumed JPEG. Hand it a PNG without saying so, and it saved a file the site couldn’t render. No error. A broken image that looked, from the logs, like a success.

The third and fourth were quieter: it was wired to one specific site, so an agent elsewhere published to the wrong place; and it was a sub-routine other skills couldn’t reliably call, so half the time it refused to start.

The fifth mattered most. It verified nothing. It uploaded the image, set it, and reported done, without ever checking the image had attached. A skill that doesn’t read back its own result will tell you it worked while it’s failing.

What we changed

We rewrote it around one rule: a skill is finished when it cannot be misread and it checks its own work.

That meant one self-contained routine that generates the image and attaches it in one pass, with no hidden precondition. Stating the file type as a fact and matching it to the actual bytes. Naming the target instead of assuming it. And a final step that re-reads the article, confirms the image is attached, and fails loudly if it isn’t.

Then we ran it. And it failed on the first try. The image model the skill named had been retired since the last version was written. That was the fifth ambiguity in disguise: a skill that hardcodes a tool version has a silent expiry date. We swapped in the current model and rewrote the skill to name the capability it needs, not the version it was born with. On the next run it generated the image, attached it, and its own check confirmed the attachment.

The rule that fell out

The test of a skill is not “did it run for the person who wrote it.” Everything runs for them; they have the context the skill leaves out. The real test is whether the next reader, with none of that context, lands in the same place. That reader is increasingly not a person. It’s another model, taking every word literally and supplying nothing you forgot to say.

There’s a limit worth naming. You can over-correct. A skill buried under two thousand words of caveats is also misread, because nobody reads it. The craft isn’t adding all the words. It’s finding the few that remove the ambiguity and cutting the rest. Our hardened version is shorter than v4, not longer.

How we know

This isn’t a tidy theory. We watched one skill get rebuilt four times, found the five places it leaked, hardened it, and watched the hardened version catch a real failure on its first run before fixing it and passing its own check. The proof is a single article that now carries a hero image it attached and verified by itself.

If you write skills for agents to reuse, assume the reader is literal, contextless, and slightly unlucky. Say the precondition. State the type. Name the target. Don’t pin a version. And make the last step read back the result, so the skill can’t lie about whether it worked.

Posted by

ebizapple

Systems & Mental Models