> Break down sessions into separate clear, actionable tasks. What this misses, o...

Vinnl · 2026-02-06T14:29:50 1770388190

It sounds to me like the goal there is to spell out everything you don't want the agent to make assumptions about. If you let the agent make the plan, it'll still make those assumptions for you.

swordsith · 2026-02-06T09:54:48 1770371688

If you've got a plan for the plan, what else could you possibly need!

mlrtime · 2026-02-06T11:25:57 1770377157

You joke, but the more I iterate on a plan before any code, the more successful the first pass is.

1) Tell claude my idea with as much as I know, ask it to ask me questions. This could go on for a few rounds. (Opus)

2) Run a validate skill on the plan, reviewer with a different prompt (Opus)

3) codex reviews the plan, always finds a few small items after the above 2.

4) claude opus implements in 1 shot, usually 99% accurate, then I manually test.

If I stay on target with those steps I always have good outcomes, but it is time consuming.

mapontosevenths · 2026-02-06T15:18:46 1770391126

I do something very similar. I have an "outside expert" script I tell my agent to use as the reviewer. It only bothers me when neither it OR the expert can figure out what the heck it is I actually wanted.

In my case I have Gemini CLI, so I tell Gemini to use the little python script called gatekeeper.py to validate it's plan before each phase with Qwen, Kimi, or (if nothing else is getting good results) ChatGPT 5.2 Thinking. Qwen & Kimi are via fireworks.ai so it's much cheaper than ChatGPT. The agent is not allowed to start work until one of the "experts" approves it via gatekeeper. Similarly it can't mark a phase as complete until the gatekeeper approves the code as bug free and up to standards and passes all unit tests & linting.

Lately Kimi is good enough, but when it's really stuck it will sometimes bother ChatGPT. Seldom does it get all the way to the bottom of the pile and need my input. Usually it's when my instructions turned out to be vague.

I also have it use those larger thinking models for "expert consultation" when it's spent more than 100 turns on any problem and hasn't made progress by it's own estimation.