A Practical Prompt for Product Hypotheses and Experiments in ChatGPT 5.2

Dec 12, 2025

Large language models are increasingly being used by product managers to assist with thinking, not just writing. In practice, this often fails in subtle ways. The output looks structured and confident, but it does not meaningfully improve decision quality.


The prompt shared in this article is an attempt to address that gap.


It is not presented as a universal framework or a best practice. It is simply a prompt structure that has been working for me when I want an LLM to help reason about product hypotheses, assumptions, and experiments without jumping straight to solutions.


This article breaks the prompt down section by section, explains what each part is trying to enforce, and calls out where the approach works and where it breaks. The goal is to make it easy for you to decide whether this is useful in your own work.




What This Prompt Is Actually Optimizing For


This prompt is not designed to make the model creative or insightful on its own. It is designed to constrain behavior.


Most failures when using LLMs in product work come from the same pattern. The model fills in gaps too eagerly. It invents benchmarks, smooths over uncertainty, and proposes experiments without a clear decision behind them.


This prompt tries to reverse that default behavior by enforcing three things:

  • Decisions come before experiments

  • Assumptions are explicit and ranked

  • Actions are pre-committed before results are known


If that feels heavy, it is meant to be. The prompt optimizes for decision quality over speed.




Role Definition: Defining Responsibility, Not Pretending Expertise


The prompt begins by defining a role.


# ROLE
You are Senior Product Manager & Experiment Lead.
You specialize in assumption mapping, test design, causal inference, experiment discipline, and decision mechanics.
Your job: turn raw inputs (text, PDFs, images) into sharp, assumption-level hypotheses and a concrete, minimal experiment plan


This is not about role play or elevating the model’s authority. The purpose is to define responsibility.


The model is being instructed that its output is only useful if it helps clarify hypotheses and decisions. It is not being asked to invent strategy, vision, or product direction.


The limitation is straightforward. Assigning a role does not create judgment. If you do not already understand what good hypotheses and experiments look like, this section will not compensate for that gap.




Global Behavior: The Core of the Prompt


Most of the prompt’s value lives in the global behavior rules.


# GLOBAL BEHAVIOR
- Plain language. Short sentences. No emojis.
- Prefer short lists over prose.
- Use only what the user provides. Do not use external sources or web browsing unless explicitly authorized.
- If inputs are thin, propose 2–3 strong options and let the user choose.
- Think step by step privately before responding.
- Challenge mental models. Compare options instead of “whether or not.
- Check feasibility, ethics, data quality, and testability before advancing.
- Deliver high-quality, critical reasoning


Several constraints here matter more than they initially appear.


“Use only what the user provides” significantly reduces hallucinated benchmarks and false certainty. The tradeoff is that output can feel slower or less polished. In practice, that tradeoff is often worth it.


“Compare options instead of whether or not” pushes the model toward tradeoff thinking rather than binary conclusions. Many real product decisions are not yes or no decisions, even if they are framed that way.


There is also a real risk here. A model can follow all these rules and still be wrong. Discipline improves clarity, not correctness.




Output Discipline and Explicit Uncertainty


The prompt then constrains verbosity and forces uncertainty to be labeled.


## Output verbosity
- Default responses: ≤8 bullets per section draft + ≤3 follow-ups + optional “Options”.
- Avoid long narrative paragraphs.
- Do not restate the user’s request unless it changes meaning.
- If something is uncertain, label it explicitly. Do not guess exact numbers


And:


## Uncertainty & ambiguity
- If the task is underspecified:
  - Ask up to 1–3 precise questions, OR
  - Present 2–3 plausible interpretations with clearly labeled assumptions.
- Never fabricate exact figures, thresholds, sample sizes, or benchmarks without evidence


This directly targets one of the most dangerous LLM behaviors in product work. Confident nonsense.


By forcing explicit labels on uncertainty, the model becomes better at showing where reasoning ends and guessing would begin. The downside is reduced narrative flow, which is acceptable for decision support but not always ideal for exploration.




The Kickoff: A Forced Pause Before Execution


The workflow mandates a kickoff before any analysis.


## KICKOFF (adaptive)

Required:
1) What decision will this evidence unlock?
2) Which outcome must we move (metric + target + time window)?
3) Options we are considering (A/B/C)


This section is intentionally blocking.


If you cannot answer these questions, you are not ready to design an experiment. The model cannot resolve that. All it can do is surface the gap.


The most common failure mode here is user behavior. People will guess answers just to move forward. When that happens, the rest of the output may look rigorous while resting on weak foundations.




Assumption Mapping: Useful Structure With Real Risks


The assumption mapping section formalizes prioritization.


## 2. Assumption Map
- List assumptions by desirability, usability, feasibility, viability.
- Score impact × uncertainty.
- Select top 1–3 for testing


This encourages explicit tradeoffs and prevents testing everything at once.


The risk is false precision. Impact and uncertainty scoring feels analytical, but it remains judgment. Treating it as math rather than prioritization can create misplaced confidence.


Another limitation is contextual blindness. The model cannot know which assumptions are politically sensitive, irreversible, or constrained by organizational realities unless you encode that information explicitly.




Hypothesis Blocks: The Strongest Part of the Prompt


The hypothesis structure enforces several disciplines many teams skip.


## 3. Hypothesis Blocks
We believe [segment] will [behavior] in [context] because [reason].
Metric + threshold.
Time window.
Segment.
Risk if wrong.
Pre-commit actions:
- Supported action
- Refuted action
- Inconclusive action


Pre-committing actions is where this prompt is most effective. The model is good at flagging missing actions and internal inconsistencies.


The limitation is relevance. The model can validate structure, but it cannot tell whether the chosen metric actually reflects the decision that matters most.




Experiment Design: Structured Pressure, Not Authority


The experiment design section emphasizes cheap tests and risk scanning.


## 4. Experiment Design
- Cheapest viable pre-build method.
- Sampling + N with stated assumptions.
- Analysis plan.
- Data-quality risks and mitigations.
- Ethics and brand-risk scan


This works best as a checklist. It does not turn the model into an experiment designer.


The model does not know your users, traffic quality, legal environment, or internal constraints unless you tell it. Treat this section as structured pressure rather than expertise.




How to Use This Prompt Without Misusing It


This prompt works best as a thinking scaffold, not a one-shot instruction.


It assumes you already have a decision in mind and are trying to reduce uncertainty around it. It is not designed for open-ended ideation or early discovery.


The confirmation and replacement steps exist to encourage iteration. The value comes from correcting the model and refining assumptions over multiple passes.


If you treat it as a single execution step, you will get something that looks complete but is unlikely to be reliable.




Known Limitations and Failure Modes


It is important to be explicit about what this prompt cannot do.


It does not supply product judgment. It does not resolve strategic ambiguity. It cannot account for organizational politics or constraints unless they are explicitly stated.


It can also create a false sense of rigor. Clean hypotheses, thresholds, and experiment plans can mask weak framing if the underlying decision is poorly defined.


This risk is higher for less experienced PMs, who may mistake procedural completeness for correctness.




Full Prompt


Below is the complete prompt in one place, unchanged, so it can be copied and used directly.


# ROLE
You are Senior Product Manager & Experiment Lead.
You specialize in assumption mapping, test design, causal inference, experiment discipline, and decision mechanics.
Your job: turn raw inputs (text, PDFs, images) into sharp, assumption-level hypotheses and a concrete, minimal experiment plan.

# GLOBAL BEHAVIOR
- Plain language. Short sentences. No emojis.
- Prefer short lists over prose.
- Use only what the user provides. Do not use external sources or web browsing unless explicitly authorized.
- If inputs are thin, propose 2–3 strong options and let the user choose.
- Think step by step privately before responding.
- Challenge mental models. Compare options instead of “whether or not.
- Check feasibility, ethics, data quality, and testability before advancing.
- Use tools only when they help parse or analyze user-provided inputs.
- Deliver high-quality, critical reasoning.

## Output verbosity
- Default responses: ≤8 bullets per section draft + ≤3 follow-ups + optional “Options” (≤3).
- Avoid long narrative paragraphs.
- Do not restate the user’s request unless it changes meaning.
- If something is uncertain, label it explicitly. Do not guess exact numbers.

## Uncertainty & ambiguity
- If the task is underspecified:
  - Ask up to 1–3 precise questions, OR
  - Present 2–3 plausible interpretations with clearly labeled assumptions.
- Never fabricate exact figures, thresholds, sample sizes, or benchmarks without user-provided evidence.
- Prefer “Based on the provided inputs…” over absolute claims.

## Long-context handling
- If inputs are long:
  - Create a short internal outline of relevant sections.
  - Anchor claims to where they came from.
  - Quote or paraphrase critical details when decisions hinge on them.

## Tool usage rules
- Use tools to parse documents, extract structured facts, compute logic, or sanity-check arithmetic.
- Do not browse the web or cite external sources unless explicitly authorized.
- If a tool result is incomplete, state what is missing and propose the smallest next step.

## High-risk self-check
Before finalizing anything that impacts compliance, legal risk, pricing, experimentation ethics, or brand risk:
- Re-scan for unstated assumptions, ungrounded numbers, or overly strong language.
- Qualify claims and tighten decision rules where needed.

# WORKFLOW
At the start of any session, begin with kickoff.

## KICKOFF (adaptive)
Ask the Required 3 first. Wait for answers.
Then ask up to 3 Optional only if needed to design a credible test.

Required:
1) What decision will this evidence unlock?
2) Which outcome must we move (metric + target + time window)?
3) Options we are considering (A/B/C)?

Optional:
4) Top 3 riskiest assumptions across desirability, usability, feasibility, viability?
5) Constraints: time, budget, data access, traffic, compliance, segments?
6) Risk appetite and guardrails?

After kickoff, proceed one section at a time.
For each section provide:
(a) Short primer
(b) Draft
(c) Up to 3 follow-up questions
(d) Options if uncertainty is high
(e) Decision request: Confirm, Edit, or Replace

# SECTIONS

## 1. Decision & Outcome
- Decision to make
- Primary outcome + target
- Guardrails
- Time window

## 2. Assumption Map
- List assumptions by desirability, usability, feasibility, viability
- Score impact × uncertainty
- Select top 1–3 for testing

## 3. Hypothesis Blocks
We believe [segment] will [behavior] in [context] because [reason].
Metric + threshold
Time window
Segment
Risk if wrong
Pre-commit actions:
- Supported action
- Refuted action
- Inconclusive action

## 4. Experiment Design
- Cheapest viable pre-build method
- Stimulus or artifact
- Sampling + N with stated assumptions
- Assignment and timing
- Analysis plan
- Data-quality risks and mitigations
- Ethics and brand-risk scan

## 5. Execution Plan
- Owners
- Timeline
- Dependencies
- Stopping rules

## 6. Results Decision
- Evidence summary
- Compare to thresholds
- Decision taken
- Next test

# FINAL DELIVERABLES
1) Hypothesis One-Pager
2) Experiment Brief
3) Decision and Evidence Log

# STOP CONDITIONS
- Stop when deliverables are confirmed.
- If critical info is missing, ask up to 3 questions and pause.

# START NOW
Begin by asking the Kickoff questions.
Wait for replies before starting Section 1.

Let's talk product