The most expensive product bugs I have worked on did not throw exceptions. They passed QA. They shipped. Everyone involved could point to working code.
They were still wrong.
A domain bug is what happens when the software is internally consistent, but the organization never agreed on what the thing actually is. Legal means one thing. Product means another. Engineering encodes a third. Support or operations end up discovering the gap after the workflow starts breaking in real life.
At Zendesk, “agreement” was one of those words. Legal, product, and engineering were all working in good faith, but not from the same definition. That mattered because the workflow crossed policy, approvals, system states, and real customer commitments. We were fully capable of shipping technically correct changes that still optimized the wrong part of the process. The cost showed up as rework, ambiguous handoffs, and more translation than a healthy product system should need.
That is why I care about Domain-Driven Design. Not as architecture theatre. As a way to stop teams from confidently shipping the wrong thing.
The bug is upstream of the code
CPOs usually notice domain bugs late because nothing is visibly on fire. There is no outage. There is just a growing pile of signals that do not quite line up:
- roadmap items that keep getting re-scoped after kickoff
- support or operations teams doing manual interpretation between systems
- the same noun meaning different things in planning docs, UI copy, and data models
- engineering asking “which version of this do we actually mean?” halfway through delivery
When those signals show up together, the problem is rarely backlog hygiene. It is usually semantic debt.
Semantic debt compounds faster than most technical debt because it spreads through decisions. One vague noun in a planning meeting becomes a misleading API name. That API name becomes a data field. That data field becomes reporting logic. Then a quarter later you are debating a KPI built on a definition nobody would defend out loud.
The smell test I actually use
Instead of teaching DDD as a set of patterns, this is the diagnostic I use before a workflow gets expensive:
| Early smell | What it usually means | Business cost | Product move |
|---|---|---|---|
| One noun needs different hidden definitions in the same meeting | Two contexts are pretending to be one | Roadmap churn and misbuilds | Force one explicit definition or split the workflow |
| A workflow crosses legal, support, product, and engineering with no obvious owner | Ownership boundary is unclear | Slow approvals, manual workarounds, trust loss | Define the boundary and owning team before scoping features |
| Teams keep sharing the same table, event, or status for convenience | Different lifecycles are being collapsed together | Silent regressions and reporting drift | Put an explicit interface or handoff between contexts |
| Metrics look stable while the frontline says the flow is broken | Reporting model and operational reality disagree | False confidence and delayed fixes | Reconcile the domain language before changing the dashboard |
That is the part of DDD I find most useful. It gives the team a way to name the ambiguity before it leaks into delivery.
Three checks matter more than the jargon
I do not start with architecture patterns. I start with three questions.
1. Can one term survive every surface without changing meaning?
If a term cannot survive legal copy, UI labels, roadmap docs, API names, and reporting without changing meaning, the team is not ready to ship around it. That is usually the first sign the domain model is still fuzzy.
This is what people mean by ubiquitous language, but the operational point is simpler: shared language reduces translation work. Translation work is where expensive misunderstandings hide. It is also why I treat documentation as a product surface, not an afterthought.
2. Where does this workflow stop being the same thing?
A “user” in access control is not the same as a “user” in billing. An “agreement” in a contractual system is not automatically the same thing as an agreement step inside an operational workflow. Good teams stop pretending everything belongs in one universal model.
This is what bounded contexts are for. Not because engineers love boxes. Because product teams need cleaner ownership, cleaner handoffs, and fewer accidental dependencies.
3. What has to stay true together or the user stops trusting the system?
Engineers would describe this as an aggregate boundary. I think about it in more practical terms: what must change together, and what absolutely cannot drift apart?
That question matters in internal platforms, compliance workflows, and trust-sensitive systems because partial truth is often worse than visible failure. A broken page gets escalated quickly. A system that looks right while carrying the wrong status spreads bad decisions.
What changed once I started treating this as product work
The fix at Zendesk was not memorizing DDD terminology. It was forcing the shared language to hold across the workflow and refusing to keep scoping around fuzzy definitions.
In practice, that meant fewer arguments about whether engineering had “implemented the requirement” and better conversations about whether we were modeling the right thing in the first place. It meant less rework after discovery. It meant the work moved out of translation overhead and into clearer ownership.
I use the same discipline in Meitheal. The domain areas live separately: tasks, auth, strategy, observability. More importantly, the language and responsibilities stay separate too. Cross-domain communication happens through deliberate interfaces instead of deep imports and implicit coupling. The useful outcome is not architectural purity. It is being able to evolve one area without casually breaking another.
The operating rules I keep coming back to are straightforward:
- If the team cannot agree on a term, we are not ready to write code.
- If a workflow crosses contexts, the handoff needs to be explicit.
- If ownership is vague, the architecture will not rescue us later.
- If reporting, UI language, and operational behavior disagree, I treat that as a product bug.
Where this rigor earns its keep
I care most about this in the kinds of product areas I keep gravitating toward: internal platforms, trust and safety, compliance-heavy workflows, AI in production, and data systems leadership uses to make real decisions.
Those domains punish confident ambiguity.
On an internal platform, a vague model turns into more manual support, slower contribution flow, and teams working around the system. It is one of the reasons I use a more domain-aware lens in prioritization for internal platforms. In a trust-sensitive workflow, it turns into audit risk and inconsistent decisions. In AI or data-heavy surfaces, it turns into dashboards or automations that look precise while resting on the wrong definitions.
That is why DDD matters to product leaders. It is not an engineering preference. It is a way to reduce rework, protect trust, and keep different parts of the organization from shipping different interpretations of the same product.
When the overhead is not worth it
I would not use this level of rigor on a marketing microsite or a simple one-team feature with obvious rules. Sometimes the right move is to ship the page, learn, and move on.
But once a workflow spans multiple teams, carries policy or trust implications, or feeds downstream reporting and automation, the overhead stops being overhead. It becomes the cost of not lying to yourself about what the system means.
If one noun needs three hidden definitions in the same meeting, I stop calling that a naming problem. It is usually two contexts pretending to be one.
If your team keeps arguing over the same noun, the model is usually wrong before the roadmap is. Get in touch.
Related thinking
- Documentation is a Product Surface - the documentation discipline that keeps shared language from drifting across code, docs, and handoffs
- Why RICE Fails on Internal Platforms - where I use domain-weighted prioritization when ambiguous internal work gets scored badly
- Your AI Demo Is Not Production Ready - why trust-sensitive systems fail in production when the operational model is fuzzier than the demo suggests
Further Reading
- Eric Evans - Domain-Driven Design - the original book
- Martin Fowler - DDD bliki - concise summary
- Vaughn Vernon - Implementing Domain-Driven Design - practical companion