Why Great ML Products Start With Better Questions: A Product Manager’s Guide to Machine Learning (Part II)
The core idea
A machine learning model is a mathematical approximation of a relationship in the world:
y = f(X) + ε
y is the target you want to predict or decide.
X is the set of features you use as inputs.
ε is irreducible noise: randomness, missing context, measurement error, factors you can't observe.
Two things follow from this. The goal is not perfection. The goal is the best usable approximation under real constraints. And some error is permanent: you can't eliminate noise that is inherent to the problem.
The modeling process exists to find an approximation that is good enough to ship, safe enough to trust, and cheap enough to run. Whether a model creates durable value or fails quietly in production usually comes down to process, not algorithm choice.
The lifecycle: a loop, not a straight line
Model development has six phases. It's not a waterfall. It's normal to loop back to earlier phases after learning something during evaluation or deployment.
Phase 1: Business understanding. This is where most ML projects fail, often politely. "Predict churn" is not a target. It's a wish. A target becomes real only when it's measurable: what is the exact definition of churn, over what time window, for which segment, and what action will the product take based on the prediction?
A strong framing includes the decision and action (what changes in the product if the model is right), the constraints (latency, cost per prediction, privacy, regulatory requirements), success metrics at both the model level and the product level, and a baseline: what happens today without ML, or with a simple heuristic. If you can't articulate the baseline, you can't quantify value.
Phase 2: Data understanding. Map what exists against what the problem needs. What data sources do you have, how complete are they, where are the blind spots, how stable are definitions over time, and what biases are baked into how the data was collected? This is where many teams discover that their "labels" are proxies, and proxies come with compromises.
Phase 3: Data preparation. This is where most real effort lands. It includes cleaning duplicates and missing values, creating consistent identifiers across systems, joining events and users and sessions, building features that are available at prediction time, and setting up reproducible pipelines so training datasets can be rebuilt. A simple litmus test: if you can't reproduce a training dataset tomorrow from raw sources, you have a demo, not a product-grade pipeline.
Phase 4: Modeling. Only now are you ready to train. The typical sequence: establish a baseline model, train candidate model families, tune hyperparameters, compare performance on validation data, and iterate based on failure analysis (what does it get wrong, and why). This is also where you decide how much complexity you can afford, technically and operationally.
Phase 5: Evaluation. Evaluation is where you decide whether the model is good enough for the intended use. That includes offline metrics on unseen data, error analysis by segment (who is it worse for?), robustness checks (what happens when inputs drift?), risk checks (does it create harmful failure modes?), and calibration and threshold setting (when do you actually take action?).
The standard split: training set to fit model parameters, validation set to tune and compare versions, test set to evaluate once at the end as a clean estimate of real-world performance. The biggest pitfall is data leakage: if information that wouldn't exist at prediction time slips into training or selection, results will look great and collapse later. If performance feels suspiciously high, assume leakage until proven otherwise.
Phase 6: Deployment. Deployment is not "push to prod." It's the beginning of running an ML system. A production deployment needs inference infrastructure (batch or real-time), monitoring for data drift and prediction drift, feedback loops for outcomes and labels, alerting and rollback plans, and a retraining strategy with clear ownership. The world changes. If you don't plan for drift, the model will decay quietly while dashboards keep looking fine.
Features: the highest-leverage step
Features are the measurable signals you feed into the model. Good features are available at prediction time, stable enough to generalize, and have a plausible connection to the target. Feature work is often the largest driver of performance and the easiest place to make costly mistakes.
Four practical ways to find strong features: talk to domain experts who know what changes before the target changes; visualize candidate features against the target and look for structure; use correlations and statistical tests to rank candidates early; train a baseline model and inspect which signals matter, then iterate.
More features is not always better. Too few increases bias and can make the task unsolvable. Too many careless features increases variance, adds noise, and raises leakage risk. A practical approach: start with a broad but defensible set, then prune aggressively once you understand what is real signal and what is noise.
The bias-variance trade-off
Model error is a blend of three things:
Bias: error from being too simplistic. The model consistently wrong in the same direction.
Variance: error from being too sensitive to noise in the training data. Great on training data, disappointing in the real world.
Irreducible error: noise you can't eliminate regardless of model complexity.
High bias is underfitting. High variance is overfitting. The goal is not to maximize complexity. It's to find the point where performance generalizes and operational costs stay manageable.
Why "better model" can mean "worse product"
A model with higher offline accuracy can still be a poor product decision if it costs significantly more to run, is too slow for your UX, can't be explained when users or regulators ask why, increases support burden, or introduces unacceptable failure modes.
The "best" model is the one that creates product value reliably within cost, latency, and trust constraints. Those constraints are product requirements, not engineering footnotes.
Before you move to the next phase: questions to be able to answer
Before Phase 1 → 2: Can you define the target precisely and measurably? Do you know what action the product will take based on the prediction? Have you defined what a baseline looks like?
Before Phase 2 → 3: Do you understand where your labels come from and what compromises they carry? Have you identified the blind spots in your data?
Before Phase 3 → 4: Can you reproduce your training dataset from raw sources? Are your features actually available at prediction time?
Before Phase 4 → 5: Have you established a baseline to compare against? Are you evaluating on held-out data you haven't touched during feature engineering?
Before Phase 5 → 6: Have you run error analysis by segment? Have you checked for leakage? Do you have a threshold decision and a rollback plan?
Before Phase 6 → ongoing: Is monitoring in place for data drift and prediction drift? Is there an owner for retraining?
Takeaways
Modeling success is primarily process, not algorithm choice.
A model is an approximation of reality. Irreducible noise is always present.
Feature work is often the highest-leverage step and the easiest place to make costly mistakes.
Keep training, validation, and test sets strictly separated. Guard against leakage.
Deployment turns a model into a system. Monitoring, drift, retraining, and ownership determine long-term value.
The best model is not the most accurate one. It's the one that creates product value reliably within real constraints.
