Notes on writing prompts you can actually delete
A good prompt is not the one that gets the model to answer well today. A good prompt is the one you’ll know how to delete a year from now, when the model is twice as capable and your prompt has accumulated three rounds of band-aids.
The “delete test”
Before adding a sentence to a prompt, I ask: will I know which problem this sentence was trying to fix, six months from now? If no — write that problem into the comment above the sentence. If you can’t even articulate the problem, the sentence probably isn’t earning its place.
Hard to delete
"Be very careful with the output format. Make sure to include all required fields. Output should be valid JSON. Do not add commentary."
Easy to delete
// gpt-4o-mini used to drop the // 'confidence' field ~5% of the time // (eval run 2024-11-12) "Return all keys in the schema below, even if the value is null."
The second version is delete-ready. When a future model stops dropping fields, you can find the eval that motivated the line and verify the fix; then the line goes.
Comments are not lipstick
I keep a comment above every guardrail clause saying:
- which model version it was added for,
- which eval was failing,
- when I last re-ran that eval.
This sounds like over-engineering. It is not. It’s the difference between a prompt I can refactor and a prompt that has become a sedimentary rock.
What this changes day-to-day
I write fewer instructions per pass. I let the model fail in eval, then add exactly the clause that fixes the failure, with the comment attached. The prompt grows linearly with the number of distinct failures I’ve cared enough to fix, not with my anxiety on a given Tuesday.