I heard about this study too. Here’s my take after not writing a single line of code for six months:
1. For any given problem, bugfix, feature, agents can generate you endless solutions that fit your problem description perfectly, but they can be fragile, maintainable, or simply wrong given a broader problem. It’s difficult to spot these, which is why I’m returning to what I used to do before Claude code:
2. I want to write code or at least decide where the entry point is going to be for the implementation. Sometimes models hallucinate, sometimes they don’t use the correct terms in their search tool, still find some place to begin the imp but maybe not the best one. Even though I don’t plan coding entire features, I was at least to search the codebase manually, or do some basic evaluation before I had it over to the agent.
I think it’s like if we are a senior engineer and we are constantly reviewing a junior’s code. We may need to micromanage in our words what they have to do. We may even need to paste a code snippet of what we expect.
People say the 2 hardest problems are cache invaliadtion and naming. But prompting is mostly about naming, so it’s a hard problem too :)
I do have a question though, a few times you mentioned that using AI as a partner to understand things (Socratic mode) gives around 19.6% better learning outcomes. Where did you get that from?
Maybe I should reframe the article that it’s not a number that comes from a general study about using any AI in Socratic mode, but a study in indonesian high schools.
This other study claims a 5.5% improvement on a tool called LearnML, which used SOcratic questions to make students reflect:
From my point of view, the gians are there on studying with socratic method. They were already there before AI, but now the need for better study techniques is bigger
I heard about this study too. Here’s my take after not writing a single line of code for six months:
1. For any given problem, bugfix, feature, agents can generate you endless solutions that fit your problem description perfectly, but they can be fragile, maintainable, or simply wrong given a broader problem. It’s difficult to spot these, which is why I’m returning to what I used to do before Claude code:
2. I want to write code or at least decide where the entry point is going to be for the implementation. Sometimes models hallucinate, sometimes they don’t use the correct terms in their search tool, still find some place to begin the imp but maybe not the best one. Even though I don’t plan coding entire features, I was at least to search the codebase manually, or do some basic evaluation before I had it over to the agent.
I think it’s like if we are a senior engineer and we are constantly reviewing a junior’s code. We may need to micromanage in our words what they have to do. We may even need to paste a code snippet of what we expect.
People say the 2 hardest problems are cache invaliadtion and naming. But prompting is mostly about naming, so it’s a hard problem too :)
Very good article!
I do have a question though, a few times you mentioned that using AI as a partner to understand things (Socratic mode) gives around 19.6% better learning outcomes. Where did you get that from?
Thanks Noel!
That number comes from https://iacis.org/iis/2025/4_iis_2025_233-247.pdf
Maybe I should reframe the article that it’s not a number that comes from a general study about using any AI in Socratic mode, but a study in indonesian high schools.
This other study claims a 5.5% improvement on a tool called LearnML, which used SOcratic questions to make students reflect:
https://storage.googleapis.com/deepmind-media/LearnLM/learnLM_nov25.pdf
From my point of view, the gians are there on studying with socratic method. They were already there before AI, but now the need for better study techniques is bigger