Minimum viable product in data products

Updated: Jul 9, 2018

Minimum viable product (MVP) is a concept used frequently in project management. In general terms, it is a product with just enough features to satisfy early customers, and to provide feedback for future product development. It requires wisely managing, like in many other development projects, the dynamic balance between cost, schedule, features, and quality. As William S. Junk ( points out, cost, schedule, features, and quality form a four-dimensional self-regulating system that seeks a balance or equilibrium among the four dimensions. A balance will be achieved with or without project manager intercession. But without management intervention, the end point of equilibrium could be very detrimental to the business, as all of us know.

Information, information, information

The key is recognizing that, in the early stages, the focus has to be information. Or, more specifically, the ratio of information per unit of effort or investment. This concept is, in fact, applicable to many other dimensions besides product development, particularly in startups. It is at the heart of most of their strategic decisions.

Yesterday I came across an interesting port by Dat Tran, Head of Data Science @Idealo ( He reflected about the MVP concept applied to data products. Some ideas introduced were not specific to data products, like the required focus on validated information gathering per unit of effort mentioned early (although “validated” in data science may be a statistically well-defined property), but other insights were very specific to data products and spot on.


I found particularly interesting his reflections about interpretability in connection with MVPs and idea of information gathering optimization. If information gathering is our top priority, most probably we will favor interpretability over accuracy. Our own experience is that reasonable accuracy can be obtained with simpler interpretable models that can deliver great insights and, additionally, help with management “buy in”. Clustering with a not too complex distance metric, or a shallow classification/regression tree can be eye opener and can be delivered to the client with limited costs of implementation.

Baseline solution

Another idea discussed in the blog was the early establishment of a baseline solution. Also based on our experience, very frequently it is an MVP solution itself: it is a feasible solution that can be put into production, and brings useful insights besides fulfilling its purpose of reference for further improvement. We were working with a waste collector that already had a geolocation solution implemented in the trucks, but was not using the data. The company was using the classical (though suboptimal) dispatching solution of assigning the drivers to different urban areas. We went on site to gather data and ensure that the policy was implemented correctly. Descriptive analytics of this baseline model showed that part of the problem was that drivers were doing significant unreported “off-hours” deliveries (most probably for personal benefit). A bit of on site management cleared the problem and introduced relevant reductions in cost.

Path dependency

Implicit but no discussed in the Dat’s post is the path dependency of decisions, including the development of MVP. Together with the information gathering potential, we also have to be aware of the doors that close, even with well thought experiments. In one of our clients, using real time routing costs for dynamic pricing was not a possibility because of the difficulty to change the pricing architecture implemented early that the clients were used to.

How do you handle data MVP’s in your organization?

Contact @Innitium at or join our blog at


© 2020 by Innitium Analytics Consulting S.L.