Is AI making you a worse developer?
Two 2026 studies say AI-assisted devs score 17% lower and ship 41% more complex code. Here's how to prevent it.
Is AI making developers lazy? Anthropic ran a randomized controlled trial in 2026 to answer that. Their finding: AI-assisted engineers scored 17% lower on code comprehension than engineers who worked manually.
That’s too large to be acceptable.
The code is there. The feature works. The tests pass.
It runs. What else is there to check?
AI tools don’t make developers lazy by force. They create conditions where laziness is the path of least resistance. The assumption that working code is good code.
Working is not the same as correct. Correct is not the same as maintainable. And AI is not in the room when the edge case hits production and someone has to dig in without a mental model of what was built.
I’ve been building software and working with engineering teams at Amazon for years. I know this. I also watch myself do it anyway. Once AI generates code that does what I asked, I don’t want to spend time inside it. The review feels like a chore.
That is exactly where the real bugs live. In the code that looks fine until it breaks.
I’ve been noticing this pattern in myself and on my team for months. Then Anthropic published the study.
The question isn’t whether AI is making developers lazy. It does. The better question is which kind of lazy, and whether you’re the one steering it.
Get the guide to build your first AI agent directly in your inbox on newsletter signup:
In this post, you’ll learn
What Anthropic’s 2026 randomized controlled trial found about AI-assisted developer comprehension and why the results are worse than most people realize
How Carnegie Mellon’s 807-repo study shows AI tools increase code complexity by 41% without delivering sustained velocity gains
The three modes of AI use at work, and why only one of them actually makes you a better engineer
How the Expertise Reversal Effect explains why AI hurts junior developers far more than seniors, and what that means for the talent pipeline
The Explainability Gap metric and how to use it to know whether you actually own the code you ship
What Anthropic’s 2026 Study Actually Found
In January 2026, Anthropic published a randomized controlled trial that directly answers the question everyone is arguing about in comment sections.
52 junior Python engineers, all with one to three years of experience. Their task was to learn Trio, a novel async Python library they had never seen. Half the group used Claude throughout the learning process. Half worked manually, the same way engineers learned before AI tools existed.
Here is what happened.
The AI group finished roughly two minutes faster. That gap was not statistically significant. Then came the comprehension quiz.
The manual group scored 67%. The AI group scored 50%. That is two full letter grades. This was not a rounding error or a noisy result. The gap was real, measurable, and significant.
Debugging was the worst-performing category for the AI group. The section where you most need to understand what you built was exactly where the AI-assisted engineers fell furthest behind.
Anthropic gave this effect a name: the illusion of competence. Developers feel fast and capable while failing to internalize what they actually built. The code works. The mental model does not exist.
The Code Itself Gets Worse: Carnegie Mellon’s 807-Repo Study
If the Anthropic study is about what happens in your head, the Carnegie Mellon study is about what happens in your codebase.
Researchers ran a difference-in-differences analysis across 807 repositories that adopted Cursor between January 2024 and August 2025, comparing them to 1,380 matched control repositories that did not. They measured code quality using SonarQube, a static analysis tool most engineering teams already use for CI pipelines.
Month one after adopting AI tooling: a 3 to 5x spike in lines of code written. Engineers were shipping more code than they ever had. By month three, velocity had reverted to baseline. No sustained gain.
But the quality numbers kept climbing in the wrong direction.
Security and static-analysis warnings increased by 30%. Code complexity increased by 41%, growing disproportionately faster than the actual amount of code. The researchers called this complexity debt. AI speed is a loan against future maintenance hours. The interest does not announce itself. It compounds slowly, invisibly, until the day someone has to touch that code and the whole session turns into archaeology.
The pattern is consistent with what you would expect if a tool lets people write code faster than they understand it. More code, more warnings, more complexity, same velocity after the initial burst. Faster input, worse output.
For example, the press said Amazon was adding more human reviews for AI-generated code. This is the reality:
AI Does Make Developers Lazy — But Which Kind Depends on You
The yes/no framing is wrong. It makes for good arguments on social media and produces nothing useful for the engineers actually trying to work well.
I know, I started the article with a yes/no question. Now it’s time to reframe it.
The Anthropic study, if you read past the headline, is not an indictment of AI. It is an indictment of one specific way of using it. The effect depends entirely on how you interact with the tool.
Three modes exist. Most engineers use the first one almost exclusively. Only one of them makes you smarter while you work.
3 Ways to Use AI at Work. Only One Makes You Smarter.
Blind Delegation (the Self-Automator)
This is the default mode. You describe what you need, AI generates it, you check that it runs, and you ship it.
It is the fastest interaction pattern. It is also the one that produced the worst comprehension outcomes in the Anthropic study, under 40% for engineers who relied on it heavily. A Boston Consulting Group study found a parallel result: engineers in pure delegation mode developed neither domain skills nor meaningful AI skills. They got faster at prompting and slower at thinking. The effect on deep work and focused output runs deeper than most engineers realize until it is too late to course-correct.
I have seen this play out in specific ways at work
Solutions built before defining the problem.
Documents that don’t contain any decisions.
Code that solves a problem we don’t have.
With AI, it’s common to think “it compiles, it works“
That assumption is exactly the illusion of competence the Anthropic study quantified.
This framework of 3 AI coding loops helped me fix my AI Slop:
Manual (No AI)
The manual group in the Anthropic study scored 67% on comprehension. That is the baseline. It is slow, metabolically expensive, and cognitively demanding.
It is also the only mode that forces schema formation. Schema formation is the cognitive process where you build a mental structure for how something works, not just what the output looks like. Senior-level intuition, architecture judgment, and debugging ability: these come from schemas built through repeated, difficult cognitive effort.
Cognitive psychologists call this the “desirable difficulties” principle. The struggle is not a bug in the learning process. It is the feature. When things are hard to process, the brain encodes them more durably. When AI removes the friction, it also removes the encoding.
Manual mode is not the goal for every task. But for any concept you need to own, it is non-negotiable.
AI for Learning (Socratic Mode)
This is the mode almost nobody uses by default, and it is the one that produces better outcomes than working without AI at all.
When you use AI as a Socratic tutor instead of a code generator, restricting it from writing the solution and instead asking it to explain, question, and guide, you get a 19.6% improvement in learning outcomes compared to traditional methods. A LearnLM experiment found an additional 5.5% improvement on novel problem-solving tasks.
The key constraint is that the AI does not write the code you ship. You iterate toward your own solution using AI as a thinking partner. This is different from copy-paste. You ask AI to explain a concept, form your own hypothesis, write the code, and then ask AI to critique it.
The difference between this and Blind Delegation is the same as the difference between using Stack Overflow to check your approach versus using it to replace your thinking. It’s the same as checking with some peers your proposal versus waiting for your peers to solve everything by themselves.
Besides the intention, you must make AI give you the right answers instead of hallucinations. This is how:
The 6 Interaction Patterns from the Anthropic Study
The Anthropic study identified six interaction patterns. Three of them hurt comprehension. Three of them help it.
Three detrimental patterns:
AI Delegation → Prompt, copy, ship. No comprehension transfer.
Progressive Reliance → Starts manual, escalates to full AI when stuck. Rewires the problem-solving instinct.
Iterative Debugging → Paste the error back to AI without reading it. Trains helplessness, not debugging skills.
Three beneficial patterns:
Generation-Then-Comprehension → Generate code, then explain every line before shipping. Forces schema formation.
Hybrid Code-Explanation → Alternate between writing code and explaining what you built. Keeps understanding in-head.
Conceptual Inquiry → Ask AI to explain the concept, not write the solution. Builds the mental model first.
Most engineers drift into the detrimental column without noticing. The beneficial patterns require deliberate choice every session.
7 Laziness Patterns I Have Seen in Software Engineers
Auto-Generating documentation Nobody Will Read
A document gets circulated for a real operational need. You can tell if it’s AI-generated because the author skipped any step that would require actual thinking. The structure is there. The analysis is not.
If the person who wrote it did not invest the time to reason through what was actually needed, nobody will invest the time to engage with the output.
I’m someone interested in productivity. Some time ago, I learned how to evaluate what content was worth my time and what it was not. The best content to consume is the content that took most effort the author to create
A TikTok takes 1 hour to produce
A YouTube video or newsletter takes ~10 hours to produce
A book takes ~1 year to produce
The same person can produce the 3 of the examples, but their book will be a better source to consume.
The same with any document at work. If the author is not spending time creating the doc, it’s actually increasing the effort of the review.
Output without ownership. AI-generated documents without thinking are not documents. They are noise.
Building Proposals Without Investigating the Problem
A proposal for a new internal tool arrives with an impressive scope. UI, knowledge base, progressive disclosure of information, automated code reviews, one-stop App for all the software engineering work. Ambitious.
But none of the foundational questions are answered. Who owns this data? Who keeps it current? What problem does this solve that existing tools do not already cover? The proposal, written with AI, generated 10s of features in minutes.
The real question is, what of that is really needed? Building is cheaper with AI, but is still not free. Maintaining software still requires human work. You don’t want to solve problems you don’t have.
At any point in time, there’s a single bottleneck in your team. Only one. You just need to find it and put your efforts into solving it. Once you solve it, the bottleneck will be in another area. Only then do you move to the next one.
Scope without substance. AI can generate ten features in minutes. It cannot decide which problem is worth solving.
Skipping the Review of AI-Generated Code
This one is personal.
Once AI-generated code appears to work, the instinct is to move on. The code is there. The tests pass. The feature runs. Reviewing the code feels like going backward.
But reviewing AI-generated code is not optional. AI makes errors at the conceptual level, not just the syntactic one. A function can compile, pass tests, and still be fundamentally wrong in how it handles the edge case that matters. You will not find that in the happy path.
The mental effort of reviewing code you did not write, code that already appears to work, is exactly the effort that keeps you from shipping the wrong thing to production.
Redefine correct. Working is not the same as correct. Correct is not the same as maintainable.
Trusting AI Output Without Verification
AI’s confidence is dangerous. It does not hesitate when it is wrong. The output looks identical whether AI got it right or got it subtly incorrect in a way that will only surface under specific conditions. The plausibility of the output is not evidence of its correctness.
Because AI communicates in natural language like another engineer, we start trusting it like we’d trust a human. “AI handled it, I can move on,” is exactly what the Anthropic study labeled the illusion of competence. You feel like you understood what was built. You did not. You saw it run.
And the worst part is this: You can’t push responsibility to AI. AI makes code writing faster, but you’re still responsible for it.
Responsibility is still yours. Read the output before you ship it. AI’s confidence is not your confidence.
Auto-Generating Comparatives Without Understanding What Is Being Compared
I worked on a proposal comparing database technology alternatives. AI made it incredibly fast to produce a comparison. The keyword is “I.” I used AI to sharpen a comparison I had already reasoned through. I understood the trade-offs before the table existed.
I have seen the inverse. A technology comparison gets produced fast. The result is a table of bullet points with no understanding behind it. You can’t explain the tradeoffs because you did not reason about them, AI did. It’s borrowed intelligence from a model.
The borrowed opinion. A comparative you cannot explain is not yours. It belongs to the AI, and the AI does not go to the design review.
Generating Code Too Fast to Understand the System
The lesson I keep relearning: the temptation is to generate the code as quickly as possible and move on. The cost shows up later. When something breaks, and you do not have the mental model to debug it, you are starting from zero in a system you think you understand.
The better path is to dig deep into how things work before reaching for generation. Understand the system first. Automate writing code on the parts you already understand.
Velocity debt. Generating code fast is not the same as moving fast. The debt comes due when something breaks.
Using AI as an Excuse for Low-Quality Work
I see the complain from many people that AI means lowering quality. I see it at work, but the problem isn’t AI. I’ve seen that before AI. It happens when leaders demand unreasonable deadlines for projects.
Some people say, “If we are going to do it with AI, it is going to have bad quality.” I’d rephrase to “if we are doing it this fast, we’re going to have bad quality, or we need more people.“
It’s wrong to blame the tool for a decision made by the person. And it ignores the fact that you can still do it the old way if it’s arguably better. If the tool was only having downsides, we’d just stop using it.
AI is used as an excuse to demand faster work. That’s fine, but we shouldn’t remove other constraints like the standards.
Outsourced accountability. The tool does not own the output. You do.
Read more about how to maintain deep work while using AI
Why This Is Worse for Junior Developers: the Expertise Reversal Effect
The same tool has opposite effects depending on your level of experience. This is one of the most important findings from the research, and almost nobody talks about it.
Why Vibe Coding Harms Junior Developers Worse
For novice developers, AI acts as a separate brain brain. It stores the outputs that the junior engineer never built internally. It skips the cognitive struggle that forms schemas. The result is fast output and shallow understanding, which is the illusion of competence in its purest form. No schema means no debugging intuition. No debugging intuition means no path to senior.
For experienced developers, AI acts as an extension of their brains. It offloads boilerplate and syntax, the artificial cognitive load that does not require judgment. The expert keeps architecture, logic, and system understanding in their head, where those things belong. AI removes friction. It does not remove thinking, because the thinking already happened before the prompt.
The consequence of this split is already visible. Entry-level compensation is under pressure as the supply of AI-assisted juniors grows. Meanwhile, the shortage of engineers who can genuinely architect and own complex systems is intensifying.
The risk is this: if AI prevents juniors from doing the hard cognitive work that produces seniors, the talent pipeline breaks at the source. Faster juniors who never become seniors. That’s a structural problem for every organization that needs experienced technical leadership five years from now.
Read this system for learning in the age of AI, and how to decide what to learn and what to do with AI
The Explainability Gap: the One Metric Worth Watching
Explainability Gap is the distance between the complexity of AI-generated code and your own conceptual understanding of it.
It measures if you are shipping code you do not own.
The self-check takes two minutes. After any AI-assisted session, close the AI tool. Explain what you just built back to yourself in plain language, out loud or in writing. Not the output. The logic. Why does this function behave this way? What happens when this edge case hits? What assumption is baked into this design?
If you cannot do that, the gap is real.
The MIT Media Lab EEG study on ChatGPT use found something that matches this: brain activity scales inversely with AI autonomy. The more the AI drives, the less the brain engages. Researchers measured reduced neural signal in the prefrontal cortex, the region responsible for reasoning and decision-making, during high-AI-autonomy sessions. The feeling of flow during AI-assisted coding may be the brain switching off, not speeding up.
Most people don’t realize we have an opportunity to do this while AI is generating the outputs. This is how:
The Stack Overflow Test: Are You Iterating or Copy-Pasting?
Most engineers have used Stack Overflow at some point. The question was never whether to look things up. It was always how you used what you found.
Two modes existed with Stack Overflow. The first: find a solution, copy it, paste it, move on. The code is in your file now. The understanding is not. The second: find a solution, read it, understand why it works, adapt it to your context, and own the result.
AI is the new Stack Overflow. The same reasoning applies.
Auto-accepting everything from AI is Blind Delegation. It is the detrimental pattern. Iterating with AI, generating code, and then working through it until you can explain it, is Generation-Then-Comprehension. It is a beneficial pattern.
The test is simple. Before you ship, ask yourself: Can I explain this code back to myself without looking at the AI output?
If yes, you are iterating. You own it.
If no, you are copy-pasting. The AI understands the code. You are shipping someone else’s work under your name.
How to Use AI Every Day Without Getting Lazy
These are the habits I return to when I notice myself sliding into the detrimental patterns.
When the concept is new, ask AI to explain, not to generate. If you have never worked with a library or pattern before, Socratic mode is the only mode that builds a usable mental model.
After generating, rewrite the key logic in your own words before shipping. One paragraph. Natural language. If you cannot write it, you do not understand it.
Never paste an error back to AI without reading it first. Read the error. Form a hypothesis about what caused it. Then ask AI to help you test the hypothesis. Do not delegate your diagnostic process. Even if AI is faster at solving it, you’re not understanding the edge cases.
Treat AI confidence as marketing, not signal. The model does not know when it is wrong. You have to be the one who checks.
Write something without AI from time to time. This is not about productivity. It is about keeping the schema formation pathways active. It’s like having calculators, but playing the Brain Training game in your Nintendo to keep your brain active.
Watch your Explainability Gap. If you cannot explain it, you do not own it. Close the AI window and try. Like when you were a student, just re-reading didn’t mean you understood the concepts.
In a new codebase or concept, restrict AI to explanation mode. This is the period when schemas form. Don’t skip the struggle. Think by yourself.
Common Questions
Does AI make developers lazy?
Per the Anthropic 2026 randomized controlled trial, AI-assisted junior developers scored 17% lower on comprehension quizzes than peers who worked without AI. The effect is real and statistically significant with a Cohen’s d of 0.738. Whether it makes any individual developer lazy depends on how they use the tool, but the default interaction patterns trend toward the detrimental outcomes.
What did the Anthropic AI coding study actually find?
52 junior Python engineers were split into two groups. Half used Claude to learn a novel async library. Half worked manually. The AI group finished approximately two minutes faster, a difference that was not statistically significant. On the comprehension quiz, the AI group scored 50%, and the manual group scored 67%. Debugging was the weakest category for the AI group. Anthropic called the underlying effect the illusion of competence.
Is vibe coding bad for junior developers?
Yes, and the research explains why. This is called the Expertise Reversal Effect. AI helps expert developers offload boilerplate while keeping architecture and logic in-head. For junior developers, it skips the cognitive struggle that builds the judgment and schema formation they need to reach the senior level. Vibe coding for juniors is not a shortcut. It is a detour around the work that creates expertise.
How do senior developers use AI differently?
Seniors use AI to handle extraneous cognitive load, boilerplate, syntax, and formatting, while keeping intrinsic load in their own heads. The mental model is theirs. The code structure, the architectural judgment, the edge case reasoning: all internal. AI removes friction from work that the senior already understands. It does not replace the understanding.
Can you still learn to code with AI?
Yes, but only in Socratic mode. Using AI to explain concepts, guide your thinking, and critique your approach produces better learning outcomes than working without AI at all, around 19.6% improvement. Using AI to generate code, you then copy, which produces worse outcomes. The tool is identical. The question is whether you ask it to show you the answer or help you find it.
What is the Explainability Gap?
The Explainability Gap is the distance between the complexity of AI-generated code and your own understanding of it. The self-check: close the AI window and explain what you built in plain language. If you cannot, the gap is open and you are shipping code you do not own.
Conclusion: The Developers Who Stay Sharp Choose to Look
The illusion of competence is not something that happens to you. It is a decision. You make it the moment when the code runs and decide to move on.
It is not.
“The feature worked. I moved on”. That is the whole pattern. The research quantifies it, the examples make it visible, and the fix is simple enough to be annoying.
You just have to decide to look, every time, at the code you are about to ship.
The developers who stay sharp are not the ones who use AI less. They are the ones who never stop owning what they build.
Key Takeaways
AI-assisted junior developers scored 17% lower on comprehension in a 2026 Anthropic randomized controlled trial, with a large and statistically significant effect size.
Carnegie Mellon’s 807-repo study found AI tools increase code complexity by 41% and security warnings by 30%, with no sustained velocity gain after month three.
Three modes of AI use exist: Blind Delegation (detrimental), Manual (builds schemas), and Socratic (19.6% better learning outcomes). Most engineers default to the first.
The Expertise Reversal Effect means AI helps senior developers amplify their judgment while preventing junior developers from building judgment in the first place.
The Explainability Gap, the distance between code complexity and your understanding of it, is the one metric that tells you whether you own what you ship.
If you want to go deeper into using AI.
If you want to become the engineer AI can’t replace.
This next article is for you:














