The Smallest Reliable Working Set
Bigger context windows won’t fix bad workflows. The real skill is deciding what an agent should load, forget, and persist.
I think we are making the same mistake with AI agents that junior engineers make with memory. When a program runs out of RAM, the naive fix is to buy a bigger machine. Sometimes that helps. If the machine has 8GB and the workload needs 32GB, there is no philosophical lesson to extract. You need more memory. But in most interesting cases, the better question is not how to buy more RAM, but why the program is loading so much at once.
Agents have the same problem. When they forget something, lose the thread, or start making strange decisions after a long session, our instinct is to ask for a bigger context window. If the model could see more code, more logs, more everything, then maybe it would behave better. And often it does. But context is not intelligence. Context is closer to working memory. In the same way we have memory leaks, we can have context leaks. Context is where current computation happens. It is not where the whole system should live.
Context Is Working Memory
This is why “just give it more context” feels right and still fails. A prompt full of requirements, logs, design notes, failed attempts, tool output, meeting context, preferences, and old chat history is like a process with a bloated working set. Everything is technically available, but relevance gets harder. The agent may remember a stale decision, miss the current constraint, or treat a temporary workaround as truth. The problem is that it has too much information in the wrong place. It may have the wrong information
I have noticed this in my own use of agents. The best sessions are not the ones where I paste the most context. They are the ones where the current task is small enough that the relevant context is obvious. “Look at these three files and explain the failure.” “Given this plan, implement only this part.” These prompts are not big or complex. They do not look like an “uber-agent” that replaces software engineers. But they work because the agent does not have to search through irrelevant material before doing the next useful thing.
This is also how good software systems work. We do not process a large dataset by loading the whole database into RAM and hoping the machine survives. We stream. We paginate. We index. We cache. We keep durable data in durable places with many 9s and load only what we need for the current operation. The trick is not merely having memory. The trick is having the right boundaries between memory, storage, indexes, logs, state, and temporary buffers.
Agents need the same boundaries. The prompt is the working set. Source control, tests, issues, and docs are durable truth. Persistent memory should contain selected facts worth reloading, not every thought the agent ever had. State should tell the agent where the workflow is. Scratch notes and tool output should expire. If everything goes into the same bucket, the system becomes harder to reason about. It is the AI version of an invisible global state.
Persistent memory is especially tempting because forgetting is annoying. We want the agent to remember our preferences, our codebase conventions, the architecture decisions we already explained, and the mistakes it made last time. That is reasonable. But memory that remembers too much becomes dangerous. A stale decision can be applied with confidence for months. A temporary workaround can become a permanent assumption. A private piece of context can leak into a task where it does not belong.
The goal is not an agent that remembers everything. The goal is an agent that can cheaply reload the right thing.

