From Analysis to Action by Way of Experiments
The analysts at Gartner describe a hierarchy of data analytics that roughly matches the hierarchy described by Leek and Peng that I described in the previous post.
Here’s my comparison of the two frameworks:
|Gartner||Leek and Peng|
|Descriptive Analytics||Descriptive and Exploratory Analysis|
|Diagnostic Analytics||Inferential and Predictive analysis, applied to current situation|
|Predictive Analytics||Inferential and Predictive analysis, applied to future events|
|Prescriptive Analytics||Causal and Mechanistic analysis, applying the discovery of predictor variables that drive desired outcomes relevant to the business|
In contrast to Leek and Peng’s scientific perspective, Gartner emphasizes the importance of analysis applied to business problems –for example, in the list of attributes of predictive analytics, they insist on “the business relevance of the resulting insights (no ivory tower analyses)”.
Prescriptive analytics necessarily moves from a passive analysis of data to interventions that change systems to achieve desired performance. In this phase, it seems inevitable that we have to experiment to develop belief in causal relationships between drivers and outcomes. And that’s a good thing, because data analysis abstracted from the underlying physical or social system can go badly wrong no matter how sophisticated our tools.
George Box described the basic challenges for data analysis almost 50 years ago in “The Use and Abuse of Regression” (Technometrics, 8, 4, 625-629, available here). His insights apply not just to regression but to any modeling method and deserve study today-- the article is short and accessible with just a few equations. George emphasized that data analysis used to predict future performance always is conditional on the underlying physical or social system remaining roughly the same in the future as during the period analyzed. Related to this point, some variables in an operating system may be controlled to such a narrow range that modeling methods will not find them to be important--and yet these variables are crucial to good performance.
As we move beyond description to understand why things happen, we are driven to experiments. As George concluded his article: “To find out what happens to a system when you interfere with it you have to interfere with it (not just passively observe it.)"