From Analysis to Action by Way of Experiments

The analysts at Gartner describe a hierarchy of data analytics that roughly matches the hierarchy described by Leek and Peng that I described in the previous post.


(source: accessed 5 April 2015).

  Here’s my comparison of the two frameworks:

Gartner Leek and Peng
Descriptive Analytics Descriptive and Exploratory Analysis
Diagnostic Analytics Inferential and Predictive analysis, applied to current situation
Predictive Analytics Inferential and Predictive analysis, applied to future events
Prescriptive Analytics Causal and Mechanistic analysis, applying the discovery of predictor variables that drive desired outcomes relevant to the business

In contrast to Leek and Peng’s scientific perspective, Gartner emphasizes the importance of analysis applied to business problems –for example, in the list of attributes of predictive analytics, they insist on “the business relevance of the resulting insights (no ivory tower analyses)”.

Prescriptive analytics necessarily moves from a passive analysis of data to interventions that change systems to achieve desired performance. In this phase, it seems inevitable that we have to experiment to develop belief in causal relationships between drivers and outcomes. And that’s a good thing, because data analysis abstracted from the underlying physical or social system can go badly wrong no matter how sophisticated our tools.

George Box described the basic challenges for data analysis almost 50 years ago in “The Use and Abuse of Regression” (Technometrics, 8, 4, 625-629, available here). His insights apply not just to regression but to any modeling method and deserve study today-- the article is short and accessible with just a few equations. George emphasized that data analysis used to predict future performance always is conditional on the underlying physical or social system remaining roughly the same in the future as during the period analyzed. Related to this point, some variables in an operating system may be controlled to such a narrow range that modeling methods will not find them to be important--and yet these variables are crucial to good performance.

As we move beyond description to understand why things happen, we are driven to experiments.  As George concluded his article: “To find out what happens to a system when you interfere with it you have to interfere with it (not just passively observe it.)"

Predictions Drive Deeper Learning—True for Pre-verbal Infants as well as for You and Me

Clarifying the Data Question