Data Science Across Disciplines
Session-06: RECOGNISING AND AVOIDING TRAPS
Data analysis and statistical routines and procedures are ingrained with several pitfalls and limitations – these range from methodological pitfalls in the processes and data that once can use, to cognitive and behavioural pitfalls that one can come across in making inferences from data and data artefacts. This week we discuss such theoretical and practical traps and pitfalls, how we can be aware of them and what approaches we have to avoid them.
## Highlights of the lecture
In the session, we will discuss causality and when and to what extent it can be expected and observed, and while also discussing the notion of confounding. We will then discuss statistical traps such as Simpson’s paradox, regression to the mean, as well as touching upon the discussions of around the null-hypothesis testing process and the new statistics. We will also look at how visualisations can deceive and what we need to be careful about when representing data visually, as well as some of the cognitive biases that might have implications on how inferences and decisions based on data are made.
## Practical Lab Session
In the practical session, we will explore some examples of how such traps could be encountered in practice. We will then spend some considerable time on exploring data visualisations and making better decisions in designing effective visual representations to support our reasoning.
Reading lists & Resources
Required reading
- The New Statistics Debate: Here is an easy read from Geoff Cummings on the “New Statistics” where there is also a link to his really fun to watch “Dance of p-values” video: Cumming , Geoff. “Mind Your Confidence Interval: How Statistics Skew Research Results.” The Conversation, 22 Oct. 2020. [link] (see the optional reading for Cumming’s already transformational paper)
- Simpson’s paradox: You can read the first two sections of this rather practical paper with very good examples of how one can observe the paradox in research settings but also in real settings: Kievit, R., Frankenhuis, W.E., Waldorp, L. and Borsboom, D., 2013. Simpson’s paradox in psychological science: a practical guide. Frontiers in psychology, 4, p.513. [online paper]
- On confounding: An accessible introduction to confounding effects with some practical advice on how to approach and avoid them: Skelly, A.C., Dettori, J.R. and Brodt, E.D., 2012. Assessing bias: the importance of considering confounding. Evidence-based spine-care journal, 3(1), p.9. [pdf]
- On cognitive biases in visualisation: Valdez, A.C., Ziefle, M. and Sedlmair, M., 2018. Studying Biases in Visualization Research: Framework and Methods. In Cognitive Biases in Visualizations (pp. 13-27). Springer, Cham. [pdf]
- This paper refers to the cognitive bias cheat sheet here which is a useful visual exploration of the biases: https://busterbenson.com/piles/cognitive-biases/
Optional reading
- A causal modelling explanation to Simpson’s paradox by Judea Pearl: Pearl, J., 2014. Comment: understanding Simpson’s paradox. The American Statistician, 68(1), pp.8-13. [pdf]
- For a more informative and academic coverage of the New Statistics, this is Cumming’s very influential article which also forms the basis of his book: Cumming, G. (2014) ‘The New Statistics: Why and How’, Psychological Science, 25(1), pp. 7–29. doi: 10.1177/0956797613504966. [pdf]
- On regression to the mean: Barnett, A.G., Van Der Pols, J.C. and Dobson, A.J., 2005. Regression to the mean: what it is and how to deal with it. International journal of epidemiology, 34(1), pp.215-220. [pdf]
Further reading
- A good read on p-values from the highly interesting book “Reinhart, A., 2015. Statistics done wrong: The woefully complete guide. No starch press.”: https://www.statisticsdonewrong.com/data-analysis.html
- A fun and short example of causation/correlation relation: Matthews, R., 2000. Storks deliver babies (p= 0.008). Teaching Statistics, 22(2), pp.36-38. [pdf]