What is the context for "Signal to Noise: The Spreadsheet That Changed What We Shipped"?

Thousands of support tickets per month at Zendesk, well-tagged and triaged.

What is the core friction or problem?

The same recurring issues kept hiding across different tags and segments because the dashboard flattened everything into medium-priority noise.

What is the key insight or pivot?

A four-dimension scorecard surfaced the real patterns fast, turned scattered complaints into ranked themes, and changed what we shipped.

Signal to Noise: The Spreadsheet That Changed What We Shipped

Most prioritisation failures do not start in roadmap planning. They start earlier, when every signal looks roughly the same size.

At Zendesk, the most useful tool I built for that problem was a spreadsheet. Four dimensions, a weighted total, and just enough structure to stop us mistaking noise for demand.

It changed what rose to the top of the roadmap that quarter.

If you want the reusable framework version of this thinking, start with The Loudest Request Is Rarely the Most Important. This note is the scar tissue behind it.

When every dashboard says “medium”

We had a support queue. Thousands of tickets per month flowing into a well-organised triage system with tags, priorities, escalation paths, and routing rules. On paper, we knew exactly what customers were experiencing.

In practice, we were blind.

The same five issues kept surfacing across different customer segments, phrased differently each time, tagged differently by different support agents. A mid-market account in EMEA would report a workflow friction as a bug. An enterprise customer in North America would describe the same friction as a feature request. A trial user would just leave.

Each instance looked like a one-off in the dashboard. The aggregated view showed a flat distribution of ticket categories — everything roughly equal in urgency. Nothing screamed.

That’s the trap. When everything looks medium priority, nothing gets prioritised. The roadmap ends up driven by whoever’s got the loudest voice in the room.

The scorecard I actually used

I pulled a month of ticket data, around 3,000 tickets, into a spreadsheet and scored the emerging themes across four dimensions:

Dimension	Question	What counted as high signal	What it stopped us doing
Account weight	What kind of customer context sits behind this report?	Long-tenured accounts, renewed customers, or accounts with real workflow depth	Treating every complaint as if it carried the same product meaning
Behaviour change	Did the customer change what they did?	Workarounds, drop-off, repeated tickets, abandoned flows	Rewarding verbosity over action
Frequency across accounts	How many different accounts hit the same wall?	Repetition across segments and regions	Mistaking scattered phrasing for unrelated issues
Specificity	Can we point to a concrete failure mode?	Reproducible steps, clear breakdown points, tight descriptions	Filling the roadmap with vague dissatisfaction

Each theme got a simple score on each dimension and a weighted total. The exact weighting mattered less than forcing the criteria into the open. Once the criteria were visible, the conversation improved immediately. We stopped arguing about whose anecdote felt most convincing and started arguing about what actually counted as evidence.

A single high-touch escalation still mattered. It just no longer outweighed a quieter issue that showed the same workaround behaviour across multiple unrelated accounts.

What made the spreadsheet useful was not the spreadsheet itself. It was the split. Four different questions produced a better picture than one generic priority number.

What changed once the themes were visible

Three of the five issues that rose to the top were things the product team already vaguely knew about. The difference was evidence. Instead of “support says this keeps coming up,” we could say “this issue shows up across segments, changes customer behaviour, and keeps getting disguised as different ticket categories.”

That is a very different conversation. It moves the room from “should we look into this?” to “why is this still unresolved?”

The other two themes were not brand-new in the sense that nobody had ever seen them. They were new as coherent product problems. They had been hiding in the handoff between features, teams, and ticket taxonomies, so nobody had treated them as one thing.

This is the real job of signal scoring. Not to prove support is right. Not to create a prettier backlog. It is to make the real pattern legible early enough that the roadmap can still respond.

The second axis is what made it work

Looking back, that scorecard was solving the same class of problem Meitheal tackles more explicitly in its Eisenhower planning feature: urgency and importance are different signals, and treating them as one priority number hides the real decision.

A loud issue can be urgent without being strategically important. A strategic issue can be important without being urgent. The value of an urgency x importance matrix is not the boxes. It is the forcing function. It makes the team say what kind of signal they are actually looking at.

The spreadsheet did something similar on the signal side. It refused to flatten customer context, behaviour, repetition, and specificity into one fuzzy sense of priority. That is why it worked. Once you keep the dimensions visible for long enough, louder stops beating clearer.

What I got wrong

The model still had a blind spot. It rewarded customers who generated tickets, which meant it undercounted people who hit friction and quietly disappeared.

I tried to compensate by cross-referencing ticket data with usage analytics and looking for places where adoption dropped without a matching spike in support. That helped, but it was a different analysis. The spreadsheet was good at exposing visible recurring pain. It was weaker at exposing silent abandonment.

The other thing I would change is social, not analytical. I should have brought the support team into the method earlier. They knew which tickets were one account escalating three times and which were genuinely fresh reports from different customers. That would have made the signal cleaner faster.

When this approach is worth using

Use this when:

the same product problem is being described differently by support, sales, and customers
you have enough signal volume that raw ticket counts are becoming misleading
the roadmap is getting pulled toward whoever escalated most recently
you need a bridge between anecdotal support pain and planning-quality evidence

Use something else when:

you are dealing with an incident and need response, not scoring
you are at zero-to-one and do not have enough recurring signal to rank themes
the real risk is silent churn or adoption drop, where behavioural analytics should lead

This kind of scorecard is especially useful in messy B2B SaaS environments, internal platforms, and trust-sensitive surfaces where the product problem often sits in the handoff between systems rather than inside one obvious feature.

If your feedback system keeps producing a flat list of medium-priority issues, the problem is probably not volume. It is the model.

The reusable framework behind this case study is The Loudest Request Is Rarely the Most Important
Once a theme is real, I use RICE/DRICE to decide what actually gets built
After shipping, I care about whether the friction actually moved, which is why I lean on a modified HEART approach

When every dashboard says “medium”

The scorecard I actually used

What changed once the themes were visible

The second axis is what made it work

What I got wrong

When this approach is worth using

Related thinking

Join the Discussion

More Field Notes

Why Raw RICE Fails on Internal Platforms

Domain Bugs Cost More Than Code Bugs

The Loudest Request Is Rarely the Most Important