Building the wrong product is the most expensive way to learn what customers don't want. We help you test ideas cheaply before you invest millions in development. You'll know what works and what doesn't before your first line of code.
Product discovery is the systematic process of validating that people want what you plan to build. It's not guesswork. It's research, prototyping, testing, and iteration. We'll help you define your hypothesis, design tests, gather data, and decide: should we build this? Should we build something different? Should we not build anything at all?
Define falsifiable hypotheses about your idea.
Design experiments to test hypotheses.
Run interviews, surveys, and tests. Collect data.
Synthesize insights. Make go/no-go decision. Direction for next step.
Product discovery is the discipline of validating assumptions before committing engineering resources to them. A well-run discovery sprint costs $15,000–$30,000. The development waste it prevents consistently exceeds $200,000. Despite that ROI, Reforge data from 2024 (based on 10,000+ PM participants) found that 61% of product teams have inadequate discovery before building — and 68% prioritize without a clear outcome. The problem is not that PMs don't know discovery is valuable; it is that the organizational incentives reward shipping features, not learning. What follows is a structured breakdown of the discovery methods, measurement frameworks, tool stack, and failure modes that separate teams that ship products customers use from teams that ship products customers ignore.
The choice of discovery method is determined by two variables: how much you know about the problem space, and how much time you have before a decision must be made. Using a concierge MVP to validate a problem hypothesis that hasn't been researched yet is putting the cart before the horse. Using a 6-week diary study to validate a UI decision that could be answered with a 3-day usability test is burning budget on precision you don't need. The matrix below maps five core discovery methods to the questions they answer, the resources they require, and the conditions under which each is appropriate. DeepLearnHQ take: the most common discovery failure we see is teams using the methods they are most comfortable with rather than the methods that answer the question at hand. JTBD interviews are the default for everything — even when a smoke test or concierge MVP would give faster, more actionable signal.
Jobs-to-be-Done Interviews (1–2 weeks, $5K–$15K, 8–15 sessions). Semi-structured interviews using the Bob Moesta switch interview technique — reconstructing the timeline of a specific recent customer decision to uncover the causal mechanism behind it. Tells you: what job the customer was trying to accomplish, what triggered their search for a solution, what they tried before, and what made them choose or reject your type of solution. Output: documented struggling moment map, JTBD statement, four forces model (push, pull, anxiety, habit). Use when: exploring a new problem space, validating strategic assumptions about customer motivation, or understanding why customers churn. User Interviews / Problem Interviews (1–2 weeks, $4K–$10K, 10–20 sessions). Rob Fitzpatrick's Mom Test protocol — ask about the customer's life, not your idea. "Tell me about the last time you had to deal with [problem]." Goal: confirm that the problem exists, understand the frequency and severity, identify the customer segment that suffers most acutely. Use when: pre-product at pre-seed stage, or when pivoting into a new customer segment. Prototype Testing / Usability Testing (3–5 days, $4K–$12K, 5–8 sessions). 5–8 users attempt specific tasks with a mid-fidelity Figma prototype. Jakob Nielsen's 5-user rule: testing with 5 users in a qualitative session reveals approximately 85% of usability problems. Output: task completion rates, friction points, misunderstandings, and a prioritized list of improvements. Use when: you have a designed solution and need to identify friction before committing to development. Smoke Test / Landing Page Test (3–7 days, $1K–$5K). Build the simplest possible representation of the product value proposition — a landing page with a CTA — and drive targeted traffic to it. Measure: click-through rate on the CTA (indicating interest), email sign-ups or waitlist registrations (indicating intent), and conversion to a paid waitlist (indicating willingness to pay). Output: a behavioral signal (what people do) rather than a stated preference (what people say). Use when: you need to validate demand before building anything. Concierge MVP (2–6 weeks, $5K–$30K). Manually deliver the product experience without building the technology. If you're building a tax preparation product, prepare taxes manually for 10 customers. Measures: willingness to use the service, satisfaction with the outcome, and whether the economics of the full product are viable. Output: real customer feedback from real service delivery, including the edge cases that would have broken the automated version. Use when: the value proposition involves a complex service that can be manually delivered to a small number of early customers before automating.
| Method | Cost Range | Time Required | Primary Output | Signal Type | Best For | Key Limitation |
|---|---|---|---|---|---|---|
| JTBD Interviews | $5K–$15K | 1–2 weeks | Struggling moment map, JTBD statement | Qualitative causal | New problem space; strategic pivots; churn analysis | Cannot scale; requires trained interviewer |
| Problem Interviews (Mom Test) | $4K–$10K | 1–2 weeks | Problem validation, frequency/severity data | Qualitative behavioral | Pre-product stage; new customer segment exploration | Stated vs. actual behavior divergence |
| Prototype / Usability Testing | $4K–$12K | 3–5 days | Task completion rates, friction points | Behavioral (UI-level) | Pre-development UI validation; design iteration | Tests interface, not problem-solution fit |
| Smoke Test / Landing Page | $1K–$5K | 3–7 days | CTR, sign-up rate, paid waitlist conversion | Behavioral (demand signal) | Demand validation before building anything | Traffic quality determines signal quality |
| Concierge MVP | $5K–$30K | 2–6 weeks | Service validation, economic viability, edge cases | Behavioral (real service delivery) | Complex service products; B2B marketplace validation | Not scalable; founder/team time-intensive |
The "Lean Startup vs. Continuous Discovery" debate is more nuanced than it appears. Both advocate for learning before building — the difference is in cadence and integration. Lean Startup's periodic "build-measure-learn" loops are most appropriate for novel business model exploration (H3 horizons, pre-product stage). Teresa Torres's Continuous Discovery — weekly customer interviews maintained as an ongoing practice — is the right operating mode for teams iterating on existing products with existing customers. The current practitioner consensus, articulated on Lenny's Newsletter and in the Reforge curriculum, is that sprints are most valuable for exploring fundamentally new territory while continuous discovery is the right operating mode for teams maintaining and improving existing products. Most mature product organizations need both, operating at different levels of their portfolio.
Teresa Torres's Continuous Discovery Habits (2021) advocates for a minimum of one 30-minute customer interview per week, per product team. The goal is not to run formal research studies but to maintain a continuous connection to customer reality. Over 12 weeks, this generates 12 data points that collectively surface patterns no single study would find. The central artifact is the Opportunity Solution Tree (OST): a visual representation of the desired outcome (a specific, measurable change in customer behavior), the opportunity space (customer needs, pain points, and desires that if addressed would move the metric), and the solution space (specific product ideas and experiments that address each opportunity). The OST forces a separation between problem space and solution space — most product teams skip directly to solutions, and the OST creates a visual accountability structure that exposes when a team is building solutions without a validated opportunity. At 6 months of effective continuous discovery, a team will have: 24+ customer interviews logged and synthesized in Dovetail, a mature OST with 3–5 validated opportunities and 8–15 solution ideas at various testing stages, and a measurable improvement in time-to-decision — from 3–4 weeks to 3–5 days per product question, per Reforge 2024 benchmark data. DeepLearnHQ take: the biggest barrier to continuous discovery is not methodology — it is customer access. In B2B companies where Sales controls customer relationships, PMs must negotiate for research access. The solution we recommend: formalize a research partnership model where Customer Success introduces researchers to customers, and research findings are shared back to CS as a benefit to that relationship.
| Dimension | Lean Startup (Ries) | Continuous Discovery (Torres) | Waterfall Discovery (Traditional) |
|---|---|---|---|
| Discovery mode | Periodic build-measure-learn loops | Continuous weekly research | Phased (discovery then delivery) |
| Primary artifact | Minimum Viable Product | Opportunity Solution Tree | Requirements specification document |
| Learning metric | Validated learning (pivot or persevere) | Assumption invalidation rate | Sign-off milestones |
| Integration with delivery | Sequential (discover then build) | Parallel (discovery and delivery simultaneous) | Sequential; long feedback loop |
| Best for | Novel business model exploration (H3); pre-product | Iterating on existing products (H1/H2) | Fixed-scope regulated systems; known problem/solution pairs |
| Risk profile | High learning speed; build costs money | Low risk; continuous validation | High risk; late-cycle feedback too expensive to act on |
Product-market fit is the most important milestone in a product's early life and the most poorly measured. The classic Sean Ellis threshold — 40% of users saying they would be "very disappointed" if the product went away — is a useful benchmark but not a sufficient definition. True PMF is behavioral, not attitudinal: retention curves that flatten rather than decline to zero, organic word-of-mouth growth without paid acquisition, and customer expansion within accounts (for B2B). Before PMF, the only honest metrics are engagement rate with the core value action, retention at 30 and 90 days, and the percentage of users who can articulate the specific job the product is hired to do. After PMF, standard growth metrics (acquisition, activation, retention, referral, revenue) become meaningful. Before it, they produce confident-looking data that says nothing useful about whether the product will sustain itself. The consequence of misidentifying PMF: teams scale acquisition and infrastructure before the product retains customers, which is the most expensive possible way to discover that the retention problem still exists.
Every feature, experiment, and product bet rests on a stack of explicit assumptions that must be true for the bet to succeed. The most common discovery failure: teams identify their desired outcome but never map the assumptions underneath it. An Opportunity Solution Tree makes this visible. Before any solution is built, the team should document: Desirability assumption — customers want this, in the form the product delivers it. Usability assumption — customers can use the product without friction that causes abandonment. Feasibility assumption — the team can build this within reasonable time and resource constraints. Viability assumption — the economics of the feature or product work at the scale required. The RICE scoring framework (Reach × Impact × Confidence / Effort) can structure assumption prioritization — but its primary value is making assumptions explicit and comparable, not producing objectively correct prioritization. Teresa Torres's consistent critique of RICE/ICE: these are output-oriented tools that reinforce feature factory thinking. Her alternative: use the OST to prioritize outcomes and opportunities first, before generating solutions, so that prioritization happens at the right level of abstraction. DeepLearnHQ take: the most valuable discovery sessions we have run are those where the team maps the assumptions behind their top three roadmap items and then identifies which single assumption, if wrong, would kill the entire initiative. That assumption is always the first thing to test.
Discovery-to-delivery ratio benchmark: Marty Cagan's recommended ratio for healthy product teams is approximately 20–30% of PM and designer capacity dedicated to discovery work, with the remainder on delivery. Reforge 2024 data found that 78% of PMs spend less than 20% of their time on discovery — despite being aware that discovery should occupy 30–50% of their time. The gap is structural, not individual: organizations reward shipping features, not learning. The most common discovery failure modes, based on Doblin/Deloitte research and Reforge curriculum analysis: The "insights with no authority" problem. Research surfaces important findings but the Decider lacks authority or willingness to act on them. Design thinking fails when it operates as a research function disconnected from decision-making power. The "wrong problem" trap. Discovery sprints can be beautifully executed on the wrong problem — one that is technically interesting but not commercially important. Sprint goal-setting must include explicit business case validation. The "prototype as deliverable" failure. Teams treat the sprint prototype as a product artifact rather than a learning tool, skipping the testing step or using internal stakeholders as test participants instead of real customers. The "one-and-done" failure. A discovery sprint generates insights that are never acted on because there is no integration with the product development process. Discovery must always connect to a defined next step in the development pipeline. The "build trap" (Melissa Perri, 2018). A product organization that measures success by features shipped, not by outcomes produced. Symptoms: a roadmap filled with feature requests from Sales and executives; no time allocated to discovery; success defined as on-time, on-scope delivery. The escape requires changing the metric from features shipped to outcomes achieved — a change that requires executive alignment and typically takes 12–18 months to fully operationalize.
Idea for financial forecasting tool. CFOs wanted it but controllers wouldn't use it. Pivoted to controller-first. 30% sign-up conversion at launch.
Validated three use cases. One had 70% willingness to pay. Built that first. Faster path to revenue.
Usually $10K-$30K depending on scope. Much cheaper than building the wrong product.
6-8 weeks for a solid discovery phase. You can get preliminary signals in 2-3 weeks.
We can validate it. We'll help you understand if it's solving the right problem and positioning it correctly.
That's the best outcome. You saved millions. We'll help you understand what to build instead.
Tell us about your problem. We'll give you an honest read on scope, approach, and whether we're the right team.