I kept watching the same failure mode: a team spends a quarter building the feature that came from the most senior stakeholder, ignoring the signal that appeared consistently across 40 support tickets. The loudest request and the most important request are rarely the same thing. I built the Signal Scorecard because I needed a way to quantify the difference.
Why informal models break
Most product teams have an informal mental model for evaluating feedback. The issue is that informal models are inconsistent, invisible, and unchallengeable. Two PMs on the same team will weigh the same customer email differently because they’re applying different criteria, and neither has written those criteria down.
The Signal Scorecard makes the evaluation criteria explicit. Once they’re explicit, the team can debate the criteria instead of debating individual signals. That’s a far more productive conversation.
How I score signals
| Dimension | Question | Weight | Why This Weight |
|---|---|---|---|
| Frequency | How often does this signal appear across sources? | 25% | Repeated signals are more likely to represent real patterns than one-offs |
| Severity | How much does this hurt the user when it happens? | 25% | A rare but catastrophic problem can matter more than a frequent annoyance |
| Revenue Signal | Is this tied to retention, expansion, or churn? | 20% | Signals connected to money get executive attention and funding |
| Strategic Alignment | Does addressing this move us toward our stated vision? | 20% | Prevents optimising for the wrong direction, even valid signals can be distractions |
| Effort to Validate | Can we cheaply verify this signal is real? | 10% | Cheap-to-validate signals get investigated first |
Each dimension is scored 1-5. Apply the weights to get a composite score. The composite ranks themes, not individual signals. You should have grouped related signals into themes before scoring.
The process that surfaces real priorities
- Collect - Pull signals from every source into one view. Support tickets, sales call notes, NPS comments, feature requests, social media, internal stakeholder inputs. Don’t curate yet.
- Tag - Group related signals into themes. “Login is slow” and “authentication takes forever” are the same theme. Don’t merge too aggressively. If you’re unsure whether two signals are the same problem, keep them separate. You can merge later.
- Score - Rate each theme 1-5 on every dimension. Do this with at least two people to avoid one person’s bias dominating. Where you disagree on a score, discuss. The conversation is as valuable as the number.
- Weight - Apply the weights above to get a composite score. Rank themes by composite.
- Validate - High-scoring themes get research investment: customer interviews, data pulls, prototype tests. The scorecard tells you where to look, not what to build.
- Act - Feed validated themes into RICE/DRICE for prioritisation. The Signal Scorecard decides what deserves attention. RICE/DRICE decides what gets built.
Where this changed the roadmap: Zendesk escalation deflection
The most impactful application was at Zendesk when we needed to reduce support escalations.
Quarterly planning. Before each quarter, we pulled thousands of macros, ticket tags, and raw support conversations into a spreadsheet. Instead of looking at “Top 10 feature requests,” we grouped the raw friction points into themes and scored them. Frequency got a 30% weight, but “Time Spent by Support” (effort/severity) got 40%. This gave us a ranked list of what customers were actually bleeding time on that we could compare against the roadmap. The delta between “features we wanted to build” and “friction customers needed removed” was always the most interesting conversation, and it usually changed the roadmap.
After a major release. Every significant launch changes the signal landscape. When we shipped a new compliance surface, we ran the scorecard specifically on tickets generated in the first 30 days. It caught a confusing UI pattern that was generating high-urgency, high-anxiety tickets, even though the total volume was low. Because “severity” was weighted equally to “frequency,” it jumped to the top of the list and we fixed it in the next sprint, stopping the bleed before it became a narrative.
What keeps surprising me
Frequency and severity are inversely correlated more often than you’d expect. The most common feedback is usually a minor annoyance. The most severe feedback is usually rare. The scorecard forces you to weight both, which prevents you from chasing high-frequency low-severity issues just because they’re loud.
“Effort to Validate” is the tiebreaker. When two themes score similarly, the one you can validate cheaply should go first. A single data pull that confirms a signal is worth more than an expensive research project that might.
Internal stakeholder opinions are signals too. Score them the same way. The CEO’s pet feature gets the same 1-5 scoring as a support ticket theme. This isn’t about ignoring leadership input. It’s about evaluating it consistently. If the CEO’s idea genuinely scores high, great. If it doesn’t, you have data to push back with.
Don’t skip the tagging step. The most common shortcut is going straight from raw signals to scoring. But ungrouped signals produce misleading scores. You end up with 50 individually low-scoring items that are actually 5 high-scoring themes.
When to use something else
The Signal Scorecard is designed for ongoing product development where you have a continuous flow of customer signal. It’s less useful for:
- Zero-to-one products where you don’t have enough signal volume to score themes. In early stages, you’re better off with direct customer discovery.
- Platform / API products where the “customer” is a developer and the signal comes through GitHub issues, SDK analytics, and integration patterns. The dimensions still work, but the sources are different.
- Emergency triage. If there’s a production incident or a critical bug, don’t score it. Fix it. The scorecard is for planning, not firefighting.
The tool doesn’t matter. What matters is that your team has an explicit, challengeable model for deciding what to work on. When the loudest voice in the room can be overruled by data, you’ve built a healthy product culture.
Want to discuss how I apply signal scoring at scale? Get in touch.
Related thinking
- Validated themes from the scorecard feed directly into RICE/DRICE for prioritisation
- For measuring whether your response to a signal actually improved the user experience, I use a modified HEART approach
- AI Production Readiness includes a user feedback loop item that creates the signal this scorecard evaluates
Further Reading
- Teresa Torres - Continuous Discovery Habits - the best framework for turning signal into opportunity trees
- Marty Cagan - Inspired - foundational thinking on how product teams should evaluate customer input