The most expensive product bugs I have worked on did not throw exceptions. They passed QA. They shipped. Everyone involved could point to working code.

They were still wrong.

A domain bug is what happens when the software is internally consistent, but the organization never agreed on what the thing actually is. Legal means one thing. Product means another. Engineering encodes a third. Support or operations end up discovering the gap after the workflow starts breaking in real life.

At Zendesk, “agreement” was one of those words. Legal, product, and engineering were all working in good faith, but not from the same definition. That mattered because the workflow crossed policy, approvals, system states, and real customer commitments. We were fully capable of shipping technically correct changes that still optimized the wrong part of the process. The cost showed up as rework, ambiguous handoffs, and more translation than a healthy product system should need.

That is why I care about Domain-Driven Design. Not as architecture theatre. As a way to stop teams from confidently shipping the wrong thing.

The bug is upstream of the code

CPOs usually notice domain bugs late because nothing is visibly on fire. There is no outage. There is just a growing pile of signals that do not quite line up:

  • roadmap items that keep getting re-scoped after kickoff
  • support or operations teams doing manual interpretation between systems
  • the same noun meaning different things in planning docs, UI copy, and data models
  • engineering asking “which version of this do we actually mean?” halfway through delivery

When those signals show up together, the problem is rarely backlog hygiene. It is usually semantic debt.

Semantic debt compounds faster than most technical debt because it spreads through decisions. One vague noun in a planning meeting becomes a misleading API name. That API name becomes a data field. That data field becomes reporting logic. Then a quarter later you are debating a KPI built on a definition nobody would defend out loud.

The smell test I actually use

Instead of teaching DDD as a set of patterns, this is the diagnostic I use before a workflow gets expensive:

Early smellWhat it usually meansBusiness costProduct move
One noun needs different hidden definitions in the same meetingTwo contexts are pretending to be oneRoadmap churn and misbuildsForce one explicit definition or split the workflow
A workflow crosses legal, support, product, and engineering with no obvious ownerOwnership boundary is unclearSlow approvals, manual workarounds, trust lossDefine the boundary and owning team before scoping features
Teams keep sharing the same table, event, or status for convenienceDifferent lifecycles are being collapsed togetherSilent regressions and reporting driftPut an explicit interface or handoff between contexts
Metrics look stable while the frontline says the flow is brokenReporting model and operational reality disagreeFalse confidence and delayed fixesReconcile the domain language before changing the dashboard

That is the part of DDD I find most useful. It gives the team a way to name the ambiguity before it leaks into delivery.

Three checks matter more than the jargon

I do not start with architecture patterns. I start with three questions.

1. Can one term survive every surface without changing meaning?

If a term cannot survive legal copy, UI labels, roadmap docs, API names, and reporting without changing meaning, the team is not ready to ship around it. That is usually the first sign the domain model is still fuzzy.

This is what people mean by ubiquitous language, but the operational point is simpler: shared language reduces translation work. Translation work is where expensive misunderstandings hide. It is also why I treat documentation as a product surface, not an afterthought.

2. Where does this workflow stop being the same thing?

A “user” in access control is not the same as a “user” in billing. An “agreement” in a contractual system is not automatically the same thing as an agreement step inside an operational workflow. Good teams stop pretending everything belongs in one universal model.

This is what bounded contexts are for. Not because engineers love boxes. Because product teams need cleaner ownership, cleaner handoffs, and fewer accidental dependencies.

3. What has to stay true together or the user stops trusting the system?

Engineers would describe this as an aggregate boundary. I think about it in more practical terms: what must change together, and what absolutely cannot drift apart?

That question matters in internal platforms, compliance workflows, and trust-sensitive systems because partial truth is often worse than visible failure. A broken page gets escalated quickly. A system that looks right while carrying the wrong status spreads bad decisions.

What changed once I started treating this as product work

The fix at Zendesk was not memorizing DDD terminology. It was forcing the shared language to hold across the workflow and refusing to keep scoping around fuzzy definitions.

In practice, that meant fewer arguments about whether engineering had “implemented the requirement” and better conversations about whether we were modeling the right thing in the first place. It meant less rework after discovery. It meant the work moved out of translation overhead and into clearer ownership.

I use the same discipline in Meitheal. The domain areas live separately: tasks, auth, strategy, observability. More importantly, the language and responsibilities stay separate too. Cross-domain communication happens through deliberate interfaces instead of deep imports and implicit coupling. The useful outcome is not architectural purity. It is being able to evolve one area without casually breaking another.

The operating rules I keep coming back to are straightforward:

  • If the team cannot agree on a term, we are not ready to write code.
  • If a workflow crosses contexts, the handoff needs to be explicit.
  • If ownership is vague, the architecture will not rescue us later.
  • If reporting, UI language, and operational behavior disagree, I treat that as a product bug.

Where this rigor earns its keep

I care most about this in the kinds of product areas I keep gravitating toward: internal platforms, trust and safety, compliance-heavy workflows, AI in production, and data systems leadership uses to make real decisions.

Those domains punish confident ambiguity.

On an internal platform, a vague model turns into more manual support, slower contribution flow, and teams working around the system. It is one of the reasons I use a more domain-aware lens in prioritization for internal platforms. In a trust-sensitive workflow, it turns into audit risk and inconsistent decisions. In AI or data-heavy surfaces, it turns into dashboards or automations that look precise while resting on the wrong definitions.

That is why DDD matters to product leaders. It is not an engineering preference. It is a way to reduce rework, protect trust, and keep different parts of the organization from shipping different interpretations of the same product.

When the overhead is not worth it

I would not use this level of rigor on a marketing microsite or a simple one-team feature with obvious rules. Sometimes the right move is to ship the page, learn, and move on.

But once a workflow spans multiple teams, carries policy or trust implications, or feeds downstream reporting and automation, the overhead stops being overhead. It becomes the cost of not lying to yourself about what the system means.

If one noun needs three hidden definitions in the same meeting, I stop calling that a naming problem. It is usually two contexts pretending to be one.

If your team keeps arguing over the same noun, the model is usually wrong before the roadmap is. Get in touch.

Further Reading