Stop using AI as a chat box
Use AI for investigation, review, and QA reports, while keeping architecture, product risk, and ownership with the human.
Last week, I was fixing a simple bug that looked like one wrong config value.
Then I found the value was correct for some flows and wrong for others, which meant it was not as easy as replacing the value.
This is where AI would have swapped the value if I had just prompted “implement the fix for ticket JIRA-1234”. But I also found, thanks to AI, the different flows, the backend service that owns these configurations, and what the impact is on other parts of the system when changing those values.
My AI did an investigation, adversarial review, decided which follow-ups we needed and with which people, created a plan with multiple alternatives, implementation, and QA.
So what’s the difference between one AI and the other?
In this article, we’ll see how to use AI for software engineers. I’ve checked, and I consumed $20k in June in my AI workflows at work, so I’ve experimented a lot until I landed in this workflow.
When the output is an investigation report, a review checklist, a test-gap list, a PR summary, a failing endpoint trace, or a set of reproducible steps, I can inspect it. I can compare it against the code. I can throw it away. I can ask a sharper follow-up.
When the output is a decision about architecture, product behavior, ownership, risk, or tradeoffs, I slow down. That is still my job.
The best AI work is work you can verify
Humans are bad linters. Machines are good at it.
And I want you to stress the parts we’re at and the parts the machine is better at. Since we are bad linters, it makes no sense to ask AI to generate thousands of lines of code and later review them. That’s vibe coding on steroids, and after a few prompts, you can’t comprehend what the code does, and AI will start fixing one thing to break another.
So what are humans good at?
Decision-making when presented with the right information. The best use of AI for any software engineer is to do automated work and provide evidence.
Think about how any big company works: There are executives who have assistants who handle their admin work. They have a middle manager who surfaces only the relevant details, and they make sure their teams do their work. We have to turn into an executive when using AI.
I was reading about how Kun Chen, an ex-L8 in multiple big tech companies like Meta and Microsoft, used AI. He started abstracting his work so he could send prompts to a single agent, and he can trust that the agent will follow the steps and surface only the relevant information.
I’ve started doing the same in my work. Instead of prompting myself and waiting for completion (or multi-tasking, which is worse), I shifted into an async-first workflow. I let AI work in the background and produce an artifact that I can inspect
For example:
New Ticket: Explore the code and documentation, generate a report of it, and propose different alternatives
Reviewing someone’s code: Explore the code and documentation, find inefficiencies and bugs.
2nd+ review of someone’s code: Check how my previous comments are addressed, and if there’s any need for a follow-up in any
Implementation: Write the code according to our agreed plan and execute a QA pipeline
The important point of this is that this AI work is done BEFORE I shift my attention to this unit of work. If I’m starting a ticket and prompting, then while waiting, I start a code review, and while waiting, I start another, and I check yet another terminal tab, I’m just multitasking. After a couple of hours, my brain is fried.
What I want with AI is to reduce the mechanical work around thinking, so I arrive at a better starting point.
It’s what executives do. They are not worse employees because someone is feeding them the information, and they are not doing the work directly. They are able to think critically and spot any errors in the reasoning of their reports or those of other executives.
So stop thinking that if you don’t write the code yourself, you’re a worse software engineer. Quite the opposite. Only the great engineers figure out how to use AI instead of producing AI slop.
Read more about why context switching around AI code generation hurts deep work:
Use AI to investigate before you implement
The sloppy workflow is common because it’s easy, and it feels productive.
In a pre-AI world where writing more lines of code seemed better, AI gives you that for free. It’s like centuries ago when having food for the winter was a sign of success. Fast-forward to the future, you have a supermarket on every corner, so access to food is no longer a bottleneck, and we start considering things like eating healthy.
So, how does writing healthy code look?
For me, it looks like this
Understand the steps required for completion
Be aware of where you are at any point in time
For this, we need to split the work into 2 parts
Mechanical work
Decision-making
The decision-making belongs to humans. The mechanical work for AI. And making sure the AI does the right mechanical work also belongs to the human
In the example at the beginning of the article about the bug I was working at, the whole investigation of where these configs are used was done by AI. However, AI would have solved the issue at any layer that I’ve told it, but I had to make the decisions and align with the stakeholders on the right path.
This is a perfect AI use case because the first deliverable that AI provided was understanding, not code.
Pre-AI, and also post-AI, a Senior/Staff/Principal Engineer would have asked for information about what the problem is, what the metrics are, and what the alternatives are from another engineer instead of reviewing a PR. I’ve heard and I’ve said myself, that code reviews are a knowledge-sharing mechanism, but they also add artificial complexity of the specific syntax for something that we have to understand through natural language instead of code language
Now you know the lesson: Ask the AI for information before you ask for code
A useful next step is this article on what strong engineers do before coding:
Use AI to handle mechanical review work
Code review has two different jobs that we often mix together.
One job is mechanical, and it’s about finding repeated work, missing tests, unnecessary network calls, convention drift, obvious edge cases, duplicated logic, strange data structures, risky null handling, and file-level inconsistencies.
The other job is judgment: does this design fit the system, is the behavior correct, are we creating the right abstraction, did we preserve the product intent, and are we accepting the right operational risk?
AI is very useful for the first job, and humans are pretty bad at finding those buried in hundreds of lines of code.
So instead of asking AI to decide if it fits or not, ask again for the information. Generate a report of the code review, including all the findings, and helping you understand this code. This will help you make a decision without being blocked on understanding the syntax.
Use AI to check whether feedback was addressed
There is a boring part of code review that nobody likes admitting is expensive.
You leave comments. The author says, “Updated.” Then you reopen the PR and manually check whether each comment was actually addressed. Sometimes it was fully addressed, sometimes it was partially addressed, sometimes the author changes a lot of the code so it’s harder to find it, it’s fixed, and sometimes the author forgets. I’ve forgotten many times myself.
This is tedious work. It’s very mechanical, and it’s scoped and targeted enough that AI can do it without failures.
So, as part of this information report that I ask AI, I also ask for a table with all my comments to verify they are fixed.
A useful next step is this code review piece:
Build async workflows, not fake multitasking
A week ago, after 3 hours of work, my brain couldn’t work anymore.
I’ve talked a lot with peers about how we are more tired and drained of energy since we use AI. There’s too much context switching between one thing to the other while the AI works. Too much information in the agent’s conversation
But do we really need the multitasking? Are executives constantly going to the desks of each of their reports and checking on whether the teams are working and writing code?
Nah. Their real job is to delegate downwards and for their reports to report upwards, just as the name indicates. Their work is pretty much asynchronous. I task you with something, and you report back, also you tell me when the work is done, and I review a final version of the work, not the intermediates.
Multitasking only splits your attention and drains your energy. Async AI workflows let work advance without your attention, so you arrive at a better starting point.
This is the part that changed my own workflow the most. I am increasingly setting up loops that generate reports before I arrive at having to work on a given unit of work. I am not trying to review five things at once. I am going sequentially, but I’ve created systems, so by the time I sit down to review one thing, I have a report, and I can ask questions with that knowledge as context..
There are three levels of depth here
Quitting on a prompt. You are single-threaded with a lot of idle time. Most likely, you’ll fill the idle time with something else, causing the next level
Multitasking: You split attention and switch frenetically between different contexts. Your judgment quality drops, and you end up more tired.
Async AI workflow: You are single-threaded yourself, but you have other agents doing work before you arrive at them.
The risk here is queuing low-quality work and creating more review burden for yourself. If the reports are noisy, vague, or impossible to verify, you created more trouble for yourself. So iterate in your workflows to make sure they are useful instead of creating artificial work for you
Create a strong QA pipeline
I am comfortable letting agents implement code as long as there’s a strong verification workflow. This QA pipeline needs to be defined before the agent starts implementing. The test scenarios need to come from the design. The agent will just create tests that fit the implementation.
I let the agent create unit tests, but I try each commit to have some functionality that can be tested end-to-end. If the task changes an endpoint, the tests hit the endpoint. If the real dependency cannot run locally, I mock it before starting the implementation. This lets the agent verify that the code does what the specification said
And I make my agents write a report of the tests run and the results. This is why delegation works, that I can understand what was run and I can check if there was any misunderstanding.
It’s not enough for tests to be green. You must check what was tested
Conclusion: Your new job is orchestration
Congrats, you’ve been promoted to executive developer.
Everyone in your team was promoted.
You now define the task, provide the context, and review your report’s output. You own the result, but what concerns you is the system that creates the software.
An executive is taking many more decisions than any other role in the company, because they are presented with the information and their job is to make decisions and own them.
Coming back to the example I opened the article with: A bad AI usage is making AI fix it with the first and easiest solution. A good AI usage is using AI to investigate, present the information, and I am the one who makes the decision.
This is my new mental model. Let me know if it sounds relatable to you.
If you want to go deeper on building reliable AI workflows, read about harness engineering next
Harness Engineering: Turning AI Agents Into Reliable Engineers
·Most AI coding agents can write impressive demos. Few can ship production code without breaking everything around it. The difference is harness engineering: the discipline of building systems that make AI agents reliable.








