How Software Engineers Make Productive Decisions (without slowing the team down)
Engineers stall by overthinking reversible choices. Learn a simple 3-question filter to move faster, avoid bottlenecks, and grow your career
Most teams don’t get stuck because problems are impossible. They get stuck because every choice is treated like it’s irreversible. In reality, lots of calls are two-way doors: you can walk through, check the room, and walk back out. Save the caution for the true one-way doors: data migrations, security posture, customer-visible changes with real blast radius.
When I’m unsure, I run a fast, risk-aware filter. If the downside is small, the change is reversible, or I can mitigate quickly, I ship with guardrails. That’s how you move fast without being sloppy.
⭐ In this post, you'll learn:
How to tell if a decision is reversible or not
The 3 questions I ask before slowing down
How to move fast without being sloppy
Why speed compounds into career growth
Stop treating every choice like a one-way door
Not every door leads to a cliff, some just swing back open. Two-way doors are things like toggling a feature flag, shipping a non-consumed response field, or swapping an internal library behind an abstraction. If it goes sideways, you flip the switch or roll back.
One-way doors are different. Think data migrations, schema changes, or decisions that can silently corrupt data or take a core service down. At my job, when a migration touches your database and could risk data loss, I’d slow down on purpose: rehearsal in non-prod environments, snapshot plans, read-only windows if needed, and crisp rollback playbooks.
The productivity benefit is knowing the difference before you start. Over-investing in reversible decisions burns time and morale. Under-investing in high-stakes calls burns trust and customer goodwill.
A fast, risk-aware framework (ask these 3 questions)
When a decision lands on your lap, take 1-2 minutes and ask:
1) What’s the impact if I’m wrong?
Is the effect invisible, annoying, or catastrophic? User-visible errors, security regressions, and data integrity issues are “slow-down” territory. On the other hand, shipping a field the client doesn’t yet consume is low risk. I’ve green-lit rollouts like this with smoke tests + feature flag, skipping a day or two of heavy testing because there was effectively no customer impact and rollback was trivial.
2) How hard is it to reverse?
Reversal options change everything. If I can roll back in ~10 minutes because I have alarms, canary checks, and a pre-wired rollback, I bias toward speed. When reversal is painful (e.g., a destructive migration), I do design notes, peer review, and a rehearsal.
3) Can we mitigate fast with a small blast radius?
Sometimes you can’t prevent every issue, but you can limit the blast radius. Canaries, partial rollouts, and scoped feature flags mean we learn quickly without harming many users. A line I actually use with stakeholders:
“Do we need to focus on prevention here, or can we move forward and mitigate fast with a small blast radius if something is wrong? If mitigation is fast and contained, let’s go.”
A tiny decision matrix you can use
If your situation is somewhere in the middle, pick the stricter option to err on the safe side.
Turn one-way doors into two-way doors (tactics that actually work)
Feature flags and small PRs
Flags are the safest way to keep a single-branch mainline in production. Merge early, merge often, even incomplete work, because the flag hides it. That enables smaller PRs, faster reviews, and quicker rollback. In my experience, for many features, the client hasn’t started working on the changes on their end, which makes these deployments extremely low risk because they aren’t consuming your new changes yet.
Checklist for feature flags:
Default-off flag per risk domain (UI, backend path, integration).
One-line rationale in the PR (“why now, why safe”).
Smoke tests for both on/off states.
Exit plan: when and how to delete the flag.
Canary checks, alarms, and 10-minute rollbacks
Even without a feature flag, speed is safe if your observability and rollback are tight:
Before: canary tests succeeding, metrics emitted.
After: alarms on errors, latency, saturation, and key business metrics.
Abort: scripted rollback (or deploy previous artifact) within ~10 minutes.
I’ve shipped features knowing that if anything trips alarms, the change is reverted quickly. That confidence changes the cost/benefit calculus.
When to timebox vs. slow down
Timebox reversible decisions to 30-60 minutes of research. Make a call, document trade-offs, and move.
Slow down for one-way doors: destructive DB changes, non-backward compatible API changes, payment logic. Do some shadow testing to properly mimic production.
Two-minute safety-net before shipping:
Write a one-line rationale.
Identify the kill switch (flag or rollback).
Ping the right stakeholder if risk > medium.
Confirm alarms cover the critical path.
Examples you can reuse (from my day-job)
Shipping non-consumed fields safely in a REST API
I added a new field to an HTTP response that clients weren’t consuming yet. A full regression would have cost 1-2 days. Instead, I agreed with my team to:
Shipped behind a feature flag.
Ran smoke tests on the endpoint.
Set alarms and a canary to verify no unexpected 4xx/5xx patterns.
Communicated “proceed unless blocked.”
No customer impact, tiny blast radius, trivial rollback. That’s a textbook two-way door.
Handling high-stakes database changes
For a risky database migration (possible customer data loss if wrong, service down for 1+ hours), we did the opposite:
Wrote a design doc with trade-offs and risk analysis.
Investigated the migration in staging with production-like data.
Booked a change window, took snapshots, confirmed restore steps.
Assigned an on-call with a printed execution and rollback playbook.
One-way doors require us to slow down, for good reason.
Conclusion: speed compounds when you manage downside
You don’t grow by being right once. You grow by making many decisions and handling the wrong ones well. Use the 3-question filter:
Impact if wrong
Ease of reversal
Fast mitigation with small blast radius
Turn as many calls as possible into two-way doors with flags, canaries, alarms, and quick rollbacks. Slow down only for the truly irreversible. That’s how software engineers make decisions—fast, but not sloppy.
Bonus: Stakeholder alignment in 15 minutes
Book a 15-minute huddle where you: state the problem, options, trade-offs, risk level, and your recommendation. Close with: “I’ll proceed unless you see blockers.” Silence becomes alignment, and you avoid approval ping-pong.
You don’t need a five-page RFC for every call. A micro-ADR keeps history without ceremony:
# ADR: Add field (server response)
- Context: Mobile clients currently ignore this field, used by future device rollout.
- Decision: Ship behind flag, smoke test, canary 5%, alarms on 4xx/5xx and latency.
- Alternatives: Delay until client change, ship without flag.
- Consequences: If wrong, toggle flag off, rollback build, cleanup flag in 2 weeks.
- Author: <your-name> | Date: 2025-09-13
This takes two minutes and prevents “why did we do this?” archaeology months later.
Bonus 2: Common questions answered
How do I know if a decision is reversible? If you can turn it off, roll it back quickly, or hide it (flag) without customer harm, it’s reversible. If it threatens data integrity, security, or customer trust and can’t be undone cleanly, treat it as a one-way door.
When should I write a full RFC vs. a micro-ADR? Use a micro-ADR for low/medium-risk calls to keep momentum. Use an RFC or longer design doc for high-risk, hard-to-reverse changes (migrations, auth, billing).
What’s a good rollback target? Aim for ~10 minutes from alarm to safe state for medium-risk changes. For high-risk changes, rehearse rollback in staging and ensure snapshot/restore times are known.
How do I communicate speed safely? Frame decisions in risk terms: “Impact small, reversible in 10 minutes, mitigation plan ready.” If mitigation is fast and the blast radius is tiny, bias toward speed.
Do feature flags add tech debt? Only if you don’t clean them up. Track flags, add an “expiry” note in your micro-ADR, and remove them once the decision is proven.
🗞️ Other articles people like
👏 Weekly applause
These are some great articles I’ve read last week:
Operating Principles That Guided Me to Staff Engineer (Part 1: Driving Impact) by
. I was happy to read this post about Jordan’s promotion. Don’t wait for tickets, hunt for problems and solve them early. That’s how you create visible impact and grow faster.How to Use AI to Improve Teamwork in Engineering Teams by
and . AI won’t fix teamwork on its own, but with trust and autonomy in place it removes friction and lets teams move faster.If You Write 80% Less Code as Tech Lead by
. As a tech lead, your leverage comes from enabling others: teach through reviews, own key components, and carve time for prototypes.GenAI for Engineers (Part 1: The Foundations) by
. I wrote recently about context engineering. LLMs are just prediction machines. To build production-ready systems, we need to focus on the context.
P.S. This may interest you:
Are you in doubt whether the paid version of the newsletter is for you? Discover the benefits here
Could you take one minute to answer a quick, anonymous survey to make me improve this newsletter? Take the survey here
Are you a brand looking to advertise to engaged engineers and leaders? Book your slot now
Give a like ❤️ to this post if you found it useful, and share it with a friend to get referral rewards
Pretty great article, love the framework 🙌