A Practical Prompt for Product Hypotheses and Experiments in ChatGPT 5.2
Dec 12, 2025
Large language models are increasingly being used by product managers to assist with thinking, not just writing. In practice, this often fails in subtle ways. The output looks structured and confident, but it does not meaningfully improve decision quality.
The prompt shared in this article is an attempt to address that gap.
It is not presented as a universal framework or a best practice. It is simply a prompt structure that has been working for me when I want an LLM to help reason about product hypotheses, assumptions, and experiments without jumping straight to solutions.
This article breaks the prompt down section by section, explains what each part is trying to enforce, and calls out where the approach works and where it breaks. The goal is to make it easy for you to decide whether this is useful in your own work.
What This Prompt Is Actually Optimizing For
This prompt is not designed to make the model creative or insightful on its own. It is designed to constrain behavior.
Most failures when using LLMs in product work come from the same pattern. The model fills in gaps too eagerly. It invents benchmarks, smooths over uncertainty, and proposes experiments without a clear decision behind them.
This prompt tries to reverse that default behavior by enforcing three things:
Decisions come before experiments
Assumptions are explicit and ranked
Actions are pre-committed before results are known
If that feels heavy, it is meant to be. The prompt optimizes for decision quality over speed.
Role Definition: Defining Responsibility, Not Pretending Expertise
The prompt begins by defining a role.
This is not about role play or elevating the model’s authority. The purpose is to define responsibility.
The model is being instructed that its output is only useful if it helps clarify hypotheses and decisions. It is not being asked to invent strategy, vision, or product direction.
The limitation is straightforward. Assigning a role does not create judgment. If you do not already understand what good hypotheses and experiments look like, this section will not compensate for that gap.
Global Behavior: The Core of the Prompt
Most of the prompt’s value lives in the global behavior rules.
Several constraints here matter more than they initially appear.
“Use only what the user provides” significantly reduces hallucinated benchmarks and false certainty. The tradeoff is that output can feel slower or less polished. In practice, that tradeoff is often worth it.
“Compare options instead of whether or not” pushes the model toward tradeoff thinking rather than binary conclusions. Many real product decisions are not yes or no decisions, even if they are framed that way.
There is also a real risk here. A model can follow all these rules and still be wrong. Discipline improves clarity, not correctness.
Output Discipline and Explicit Uncertainty
The prompt then constrains verbosity and forces uncertainty to be labeled.
And:
This directly targets one of the most dangerous LLM behaviors in product work. Confident nonsense.
By forcing explicit labels on uncertainty, the model becomes better at showing where reasoning ends and guessing would begin. The downside is reduced narrative flow, which is acceptable for decision support but not always ideal for exploration.
The Kickoff: A Forced Pause Before Execution
The workflow mandates a kickoff before any analysis.
This section is intentionally blocking.
If you cannot answer these questions, you are not ready to design an experiment. The model cannot resolve that. All it can do is surface the gap.
The most common failure mode here is user behavior. People will guess answers just to move forward. When that happens, the rest of the output may look rigorous while resting on weak foundations.
Assumption Mapping: Useful Structure With Real Risks
The assumption mapping section formalizes prioritization.
This encourages explicit tradeoffs and prevents testing everything at once.
The risk is false precision. Impact and uncertainty scoring feels analytical, but it remains judgment. Treating it as math rather than prioritization can create misplaced confidence.
Another limitation is contextual blindness. The model cannot know which assumptions are politically sensitive, irreversible, or constrained by organizational realities unless you encode that information explicitly.
Hypothesis Blocks: The Strongest Part of the Prompt
The hypothesis structure enforces several disciplines many teams skip.
Pre-committing actions is where this prompt is most effective. The model is good at flagging missing actions and internal inconsistencies.
The limitation is relevance. The model can validate structure, but it cannot tell whether the chosen metric actually reflects the decision that matters most.
Experiment Design: Structured Pressure, Not Authority
The experiment design section emphasizes cheap tests and risk scanning.
This works best as a checklist. It does not turn the model into an experiment designer.
The model does not know your users, traffic quality, legal environment, or internal constraints unless you tell it. Treat this section as structured pressure rather than expertise.
How to Use This Prompt Without Misusing It
This prompt works best as a thinking scaffold, not a one-shot instruction.
It assumes you already have a decision in mind and are trying to reduce uncertainty around it. It is not designed for open-ended ideation or early discovery.
The confirmation and replacement steps exist to encourage iteration. The value comes from correcting the model and refining assumptions over multiple passes.
If you treat it as a single execution step, you will get something that looks complete but is unlikely to be reliable.
Known Limitations and Failure Modes
It is important to be explicit about what this prompt cannot do.
It does not supply product judgment. It does not resolve strategic ambiguity. It cannot account for organizational politics or constraints unless they are explicitly stated.
It can also create a false sense of rigor. Clean hypotheses, thresholds, and experiment plans can mask weak framing if the underlying decision is poorly defined.
This risk is higher for less experienced PMs, who may mistake procedural completeness for correctness.
Full Prompt
Below is the complete prompt in one place, unchanged, so it can be copied and used directly.

