I often hear that AI coding feels inconsistent or generates inadequate results. I am somewhat surprised by this because more often than not, I get pretty good results.

When dealing with any AI agent (or any LLM tool for that matter), there’s really just three things that impact your results eventually:

  1. context you provide
  2. prompt you input
  3. executing in chunks

This might sound discouragingly obvious, but being deliberate with these three factors (every time you send a request to Claude Code, ChatGPT, etc.) makes a noticeable difference in the results you see.

…and it’s straightforward to get this 80% right.

Context

LLMs are world knowledge pocket machines. Every time you want to work on a task, you need to trim that world knowledge pocket machine to a surgical one that’s only focused on the task at hand. You do this by seeding context.

The simplest way to do this, especially for AI Coding:

There are many other ways, and engineering a more elegant way to provide this context is becoming the next frontier for AI development btw2.

Prompt

Think of your prompts as specs, not search queries. For example: “Write me a unit test for this authentication class” 🙅‍♂️.

Instead of that one-liner, here’s how I would start that same prompt:

Persona:
- You're an expert Android developer well versed with the industry norms and conventions for testing
- You only use Kotlin and JUnit 5, Mockito

Task:
Write the unit tests for @AuthService.kt

Context/Constraints:
- Follow the existing testing patterns as demonstrated in @RuleEngineService.kt
- Start by writing the tests for the three public methods first - `login`, `logout`, `refreshToken`
- Prefer Fakes over Mocks; if we don't have a convenient Fake class, let's add one there
  - Remember, never make real network/database calls with these tests
- Make sure to cover happy paths and error cases as well

Output:
- AuthServiceTest.kt in folder <src/test/...>
- Test names: methodName_condition_expectedResult

Verify:
- Use command `make test AuthService` to keep testing just this class
- Do not run lint checks while iterating as it will take a long time
- I need to hit a code coverage of at least 80%.
  - You can check coverage for this class with `make test-coverage AuthService`

First propose the plan before you start making changes or coding. Only after I accept, proceed

I have a text expansion snippet aiprompt; that I start with almost every single time. This reminds me to structure and start any prompt:

Persona:
- {cursor}

Task:
-

Context/Details/Constraints:
-

Output:
-

Verify:
-

This structure forces you to think through the problem and gives the AI what it needs to make good decisions.

Custom commands

But it can get tedious to write these prompts in detail every single time. So you might want to create “command” templates. These are just markdown files that capture your detailed prompts.

This is one of those things that people don’t leverage enough. Especially if your team has a shared folder of commands that everyone is iterating on, you can land up with a powerful set of prompts that you can quickly use to get really good results. I have commands like /write-unit-test.md, /write-pr-desc.md, /debug-ticket.md, /understand-feature.md etc.

Chunking

AI agents hit limits: context windows fill up, attention drifts, agents start hallucinating, you get poor results. Newer models can run hours‑long coding sessions, but until that is common, the simpler fix is to break work into discrete chunks and plan before coding.

Most developers I talk to seem to miss this. I can’t stress how important this is, especially when you’re working on slightly longer tasks. My post goes into this, and that was the single biggest step‑function improvement in my own AI coding practice.

Briefly this is how I go about it:

Session 1 — Plan only

Session 2+ — Execute one task at a time

One‑shot requests force the agent to plan and execute simultaneously - which doesn’t produce great results. If you were to shoot these as PRs to your colleagues for reviewing, how would you break those up. You wouldn’t write 10,000 lines so don’t do that with your agents either.

Plan → chunk → execute → verify.


So the next time you’re not getting good results, ask yourself these three things:

  1. Am I providing all the necessary context?
  2. Is my prompt a clear spec?
  3. Am I executing in small, verifiable chunks?

  1. I wrote a post about this btw, on consolidating these instructions for various agents and tools. ↩︎

  2. Anthropic’s recent post on “context engineering” is a good overview of techniques. ↩︎