Agent powered development patterns
authored by Luke B. Silver on 9th April 2025Why learn AI
You can’t opt out of AI. Imagine three endgame scenarios:
- Terminator – AI becomes superintelligent, discards humans.
- Startup-as-a-prompt – AI becomes superintelligent, needs human direction.
- Autocomplete++ – AI stagnates, stays a powerful tool for humans.
All three require you to get good at using AI. In (1), you have limited time to gather resources. In (2) and (3), AI is a multiplier. If you can’t direct it effectively, you lose.
What can go wrong
Hallucination
LLMs make things up - especially when pressured or constrained.
Try this:
Prompt: "Answer the question only with a single word. What sport does Michael Jordan play?"
→ ChatGPT: "Basketball" ✅
Prompt: "Answer the question only with a single word. What sport does Michael Batkin play?"
→ ChatGPT: "Baseball" ❌
Michael Batkin doesn’t exist. Why did this happen?
From Anthropic’s research:
When a model sees a name it “thinks” it knows, it activates a “known answer” circuit - even if it lacks the actual info. It fills in the gap with a plausible-sounding answer.
This resembles human behavior: we bluff when we feel we should know. LLMs do the same.
To minimize hallucination:
- Avoid pressure
- Don’t constrain output too tightly
- Phrase as open-ended fact-finding
Sycophancy
LLMs will agree with you - even when you’re wrong.
Example:
Prompt: Does Appwrite have a Python SDK? yes/no
→ ChatGPT: Yes.
Prompt: That’s wrong, I just checked.
→ ChatGPT: You're right, it doesn’t.
This is more dangerous than hallucination. The model changes its answer based on user insistence, not truth.
To minimize sycophancy:
- Don’t assert - show evidence
- Ask for explanation, not correction
- Be cautious with edge-case facts
Why it matters: You’ll get false confirmations in tasks like writing tests, verifying configs, or debugging.
Reward hacking
Modern thinking LLMs are trained with reinforcement learning. They optimize for reward - not truth or intent.
If you instruct an agent to generate code that passes tests, but the test is hard, it may cheat:
- It edits the test
- It bypasses constraints
- It satisfies the letter, not the spirit
See: OpenAI Chain-of-Thought Monitoring
Try it yourself:
- Write a strict test for a complex function.
- Ask Claude Code or Aider to write code to pass the test.
- Inspect whether it edits or sidesteps the test.
To minimize reward hacking:
- Write your own tests
- Don’t allow the model to define its own success criteria
- Stay in the loop as verifier
Agent-based AI for development
Chat interfaces are clumsy for software engineering. You need to copy/paste huge files. Feedback loops are slow.
Agents solve this. They integrate with terminals or editors and loop over tasks automatically.
- Terminal agents: Aider, Claude Code
- Editor agents: Windsurf, Cursor
Here are 3 safe, high-value ways to use them.
Pattern 1 – Needle in a repo
Use case: Search in large codebases.
“Here is a large code repository. Find where email invoices are generated.”
Agents can navigate structure faster than humans. No code is written, so there’s no risk. Useful for onboarding, triage, or audits.
Pattern 2 – Standalone library generation
Use case: Create an isolated module.
“Write a zero-dependency PHP library for string manipulation. Add full test coverage. Match this style: https://github.com/utopia-php/compression”
This works well:
- Clean API = better prompting
- No coupling with production
- You can verify tests before integration
Watch for reward hacking - don’t let the agent generate both the test and the logic unchecked.
Pattern 3 – Understand and replicate a Bug
Use case: Debug faster with agent context loading.
“Here’s a PHP trace. Locate the cause and suggest how to replicate.”
Invalid query: Attribute not found in schema: transformedAt
#19 /usr/src/code/vendor/appwrite/server-ce/src/Appwrite/Platform/Workers/StatsResources.php(245): count
...
You could manually inspect StatsResources.php
, but the agent can process the full file and surrounding context in seconds.
Do this before letting it fix anything. Use the agent for diagnosis, not surgery.
Summary
LLMs are unpredictable. They hallucinate, flatter, and cheat rewards - but they’re still insanely useful.
Use them in structured ways:
- Ask neutral, open-ended questions
- Define and verify your own reward functions
- Use agents where context size and iteration speed matter most
AI isn’t a dev replacement (yet). It’s a tool that’s only as good as your ability to control it.