When marginal gains lie to you: knowing when to switch tech as a productive software engineer
Engineers chase percent gains and costly rewrites. Learn a system to price ROI in money, compare A/B/current, set stop-losses, and switch only when it pays.
Note: Template for paid subs at the end of the article, scroll down to check it out!
I almost rewrote a Go service to Java because internal tooling is stronger in Java at Amazon. The rewrite looked like a good option on paper. Then I did the math, and it was this kind of “80% of cost to bring only 20% of results”. That is not real progress. That is creating our own busywork.
A productive engineer does not chase percent gains. A productive engineer converts gains to money, headcount, number of alarms... When you do that, fake ROI shows up fast. The more you walk in one direction, the bigger the gain has to be to switch.
This is the system I use to evaluate options and make a decision. It prevents the shiny object syndrome. It also puts decisions into paper so they can be used for a promotion or performance review because leaders understand risk, cost, and time.
In this post, you’ll learn
How to convert performance gains into money, incidents, and headcount saved.
How to compare A vs B vs current.
How to use AI for speed while keeping judgment in your hands.
We don’t evaluate things clearly from the inside. Sunk cost makes a change in direction feel like throwing work away, which distorts the comparison
Expose fake ROI with money math
“30% faster” without cost is a lie. Tie every improvement to infra money per month, on-call hours, or user impact. If incidents drop, you buy back nights and weekends, you buy back work hours that can be spent on something else. Those hours are real money. If your infrastructure has a flat cost despite usage, reducing resource consumption doesn’t bring money back.
I used this bar to justify a PostgreSQL database migration to AWS DynamoDB. The migration would remove months of KTLO. Another example is that I also used it to reject a personal productivity tool switch from Alfred to Raycast. The cost was learning again the same things in another tool, and there was no clear benefit for my workflow. That cost of time should be spent shipping, not re-learning productivity tools.
Alternatives only matter in comparison
An option is never good or bad alone. You compare A vs B vs current, and each candidate must beat today on a clear metric. Pick one primary metric for the decision, then list the second-order effects. Latency, headcount capacity, operational load, and blast radius cover most cases.
Do not hide uncertainty. When a line in your table is unknown, mark it. Unknowns need time-boxed research, not blind faith. Include who will own it and who wakes up at 2 am. Ownership and support reveal the true cost more than benchmark numbers do.
Capacity budget in your roadmap instead of random initiatives
This is more for managers. Still, if you’re an IC, it’s a topic to bring to your manager. Set a fixed capacity bucket in the team’s roadmap for tech investments. Align this with leadership once, then protect it. The point is simple: these tech investments must unblock roadmap velocity for the future quarters. If not, they are a whim from engineers.
This kills random refactors that only help one repo and keeps the team in control. Going back to the service rewrite from Go to Java, I parked that idea because the impact was a single service, the cost was high, and there were only marginal gains. It did not unblock anyone else.
If a proposal has a clear payback and still does not fit the budget, schedule it next quarter. We have to acknowledge that there are more things to do than capacity, but that doesn’t mean the proposals that don’t make it above the line are bad. A good idea that waits is still a good idea.
To be a productive engineer, make small changes that speed delivery for others, not only for yourself.
Define stop loss and a two-way door before you start
Treat the switch like a trade. Set a time box in engineer weeks to work on it. When you hit the cap, if you haven’t finished and it doesn’t seem like you’ll finish soon, you stop. No one more sprint. This rule protects from infinite projects that you continue because it’s a pity to have wasted that much time already on it.
Set clear fail points. Latency targets, error budget, etc. If you release something and it goes above your thresholds, revert the change and write what you learned, and if it makes sense to iterate on it to fix it or kill it. Numbers remove debate. You do not argue feelings when the charts say stop.
Plan the rollback before it happens. Use feature flags or parallel stacks so your rollback is cheap. Rehearse the revert also to ensure it really works. Even with data migrations, keep them reversible until guardrail tests pass.
Reduce risk with AI, without outsourcing judgment
Use AI for speed, not to shut down your brain. Draft your alternatives table, your checklists, and your rollback plan. Then verify every line with the primary docs and your internal wikis. Ask AI for failure modes you might miss, then test them. The loop is ask, verify, and spike.
An example is that I copied from a similar repo (shut my brain off), then fixed it after I prompted the AI for it with documentation. It was faster to use AI with the right docs than to copy code or copy from AI.
Coming back to the database example, I compared an AWS DynamoDB vs keep PostgreSQL table in less than one hour. Then I validated each line with the docs and our internal wikis. It’s not yet enough to implement and have it in prod in a few days, but it’s a good signal to know if we should keep investing in this direction or discard it.
But I’ve also seen the opposite, auto-generated comparisons and docs that didn’t focus on the points that matter for our service, missed an owner of the actions, missed the point. The AI-generated document looked tidy and still lied. AI is a speed tool, not a source of truth. It helps you avoid wasting a day in the wrong direction, it does not replace your judgment.
Turn a risky switch into a promotion story
Leaders care about the cost of delay, risk, and user impact. Write the story in that order. Show what each week of indecision costs in money or a metric about users. Show the guardrails and the line you would not cross.
Surface your artifacts. Include the ROI math, the comparison table, the stop loss information, and the rollback plan. Show the hard call and the number that drove it. Proceed or stop, both can be the right answer when the math says so. Saying “no” is part of the story. The right “no” may not be your flagship project in a promotion, but it is the decision that makes people trust you.
When I started working at Amazon, I was one to pay a lot of attention to put the right data in a doc to create an artifact, besides making a point. Now I lean more into keeping iterating and shipping code, instead of polishing a doc forever. Iterations are faster with AI, and they have built credibility.
Conclusion
The more you walk in one direction, the bigger the gain must be to switch. That is why percent speed on its own is a trap. Tie every claim to money, incidents, and on-call hours. You will cut through noise faster.
Use techniques like a capacity budget or a stop loss. You will run fewer bad bets and keep momentum.
The real win is not the shiny chance to get promoted and deprecate the project afterwards. The win is creating a team that delivers smoothly.
👋 Download the one-pager template
This content is only available for paid subscribers. If you’re serious about leveling up your productivity and being more impactful, subscribe below.
Download the one-pager template I’ve prepared in the paid subscribers’ resources. Use this with your next proposal. No need to wait for a big project, use AI to quickly apply it to your existing work.
When to use
percent gains are proposed without checking the costs → convert % into money and time to force clarity
You want paper evidence of decision-making for promotion → managers can defend a written decision when you’re not in the room
When not to use
You can’t measure impact → fix observability before deciding
There is no rollback plan → build a two-way door first
Leadership only wants a quick spike or demo → run a fast experiment, document learnings, move on
Failure modes and fixes
Sunk cost creep → set a time box and a hard stop
Table theater → validate every number at the source
AI over-trust → ground prompts in docs, test risks, then decide with judgment
This is an article inside our system’s transition from phase 2 to phase 3: earn career capital. I’m building this system for paid subscribers. Thanks for your continued support!
🗞️ Other articles people like
👏 Weekly applause
Here are some articles I read during the last week:
Choosing Between Normalization Or Denormalization by
. Normalization keeps your data clean, but if you’re read-heavy, don’t be afraid to denormalization.Scaling the data storage layer in system design by
. Scaling the stateful stuff is the real challenge.I Studied How Top 0.1% Engineering Teams Do Code Reviews by
. High-performing teams nail code reviews by keeping PRs small and focused.
P.S. This may interest you:
Are you in doubt whether the paid version of the newsletter is for you? Discover the benefits here
Could you take one minute to answer a quick, anonymous survey to make me improve this newsletter? Take the survey here
Are you a brand looking to advertise to engaged engineers and leaders? Book your slot now
Give a like ❤️ to this post if you found it useful, and share it with a friend to get referral rewards









