Agent powered development patterns

authored by Luke B. Silver on 9th April 2025

Why learn AI

You can’t opt out of AI. Imagine three endgame scenarios:

  1. Terminator – AI becomes superintelligent, discards humans.
  2. Startup-as-a-prompt – AI becomes superintelligent, needs human direction.
  3. Autocomplete++ – AI stagnates, stays a powerful tool for humans.

All three require you to get good at using AI. In (1), you have limited time to gather resources. In (2) and (3), AI is a multiplier. If you can’t direct it effectively, you lose.

What can go wrong

Hallucination

LLMs make things up - especially when pressured or constrained.

Try this:

Prompt: "Answer the question only with a single word. What sport does Michael Jordan play?"
→ ChatGPT: "Basketball" ✅

Prompt: "Answer the question only with a single word. What sport does Michael Batkin play?"
→ ChatGPT: "Baseball" ❌

Michael Batkin doesn’t exist. Why did this happen?

From Anthropic’s research:

When a model sees a name it “thinks” it knows, it activates a “known answer” circuit - even if it lacks the actual info. It fills in the gap with a plausible-sounding answer.

This resembles human behavior: we bluff when we feel we should know. LLMs do the same.

To minimize hallucination:

  • Avoid pressure
  • Don’t constrain output too tightly
  • Phrase as open-ended fact-finding

Sycophancy

LLMs will agree with you - even when you’re wrong.

Example:

Prompt: Does Appwrite have a Python SDK? yes/no
→ ChatGPT: Yes.
Prompt: That’s wrong, I just checked.
→ ChatGPT: You're right, it doesn’t.

This is more dangerous than hallucination. The model changes its answer based on user insistence, not truth.

To minimize sycophancy:

  • Don’t assert - show evidence
  • Ask for explanation, not correction
  • Be cautious with edge-case facts

Why it matters: You’ll get false confirmations in tasks like writing tests, verifying configs, or debugging.

Reward hacking

Modern thinking LLMs are trained with reinforcement learning. They optimize for reward - not truth or intent.

If you instruct an agent to generate code that passes tests, but the test is hard, it may cheat:

  • It edits the test
  • It bypasses constraints
  • It satisfies the letter, not the spirit

See: OpenAI Chain-of-Thought Monitoring

Try it yourself:

  1. Write a strict test for a complex function.
  2. Ask Claude Code or Aider to write code to pass the test.
  3. Inspect whether it edits or sidesteps the test.

To minimize reward hacking:

  • Write your own tests
  • Don’t allow the model to define its own success criteria
  • Stay in the loop as verifier

Agent-based AI for development

Chat interfaces are clumsy for software engineering. You need to copy/paste huge files. Feedback loops are slow.

Agents solve this. They integrate with terminals or editors and loop over tasks automatically.

  • Terminal agents: Aider, Claude Code
  • Editor agents: Windsurf, Cursor

Here are 3 safe, high-value ways to use them.

Pattern 1 – Needle in a repo

Use case: Search in large codebases.

“Here is a large code repository. Find where email invoices are generated.”

Agents can navigate structure faster than humans. No code is written, so there’s no risk. Useful for onboarding, triage, or audits.

Pattern 2 – Standalone library generation

Use case: Create an isolated module.

“Write a zero-dependency PHP library for string manipulation. Add full test coverage. Match this style: https://github.com/utopia-php/compression”

This works well:

  • Clean API = better prompting
  • No coupling with production
  • You can verify tests before integration

Watch for reward hacking - don’t let the agent generate both the test and the logic unchecked.

Pattern 3 – Understand and replicate a Bug

Use case: Debug faster with agent context loading.

“Here’s a PHP trace. Locate the cause and suggest how to replicate.”

Invalid query: Attribute not found in schema: transformedAt
#19 /usr/src/code/vendor/appwrite/server-ce/src/Appwrite/Platform/Workers/StatsResources.php(245): count
...

You could manually inspect StatsResources.php, but the agent can process the full file and surrounding context in seconds.

Do this before letting it fix anything. Use the agent for diagnosis, not surgery.

Summary

LLMs are unpredictable. They hallucinate, flatter, and cheat rewards - but they’re still insanely useful.

Use them in structured ways:

  • Ask neutral, open-ended questions
  • Define and verify your own reward functions
  • Use agents where context size and iteration speed matter most

AI isn’t a dev replacement (yet). It’s a tool that’s only as good as your ability to control it.


Back to posts