Data Science Across Disciplines
Session-01: INTRODUCTION, HISTORICAL PERSPECTIVES & BASIC CONCEPTS
This week discusses data science as a field that cuts across disciplines and provides a historical perspective on the subject. We discuss the terms Data Science and Data Scientists, reflect on examples of Data Science projects, and discuss the research process at a methodological level. We will also use the examples as probes to think broadly on the potential influence of data-intensive scientific approaches on knowledge, industry and the wider society.
The practical lab session help students get acquainted with the analytical platform that will be used throughout the term and provides a first experience working with data sets within a data science approach.
Highlights of the lecture
This week we start by an introduction where we look at how the module is operating and discussing the basic objectives and definitions of the module.
Some of the key concepts you should remember from this week are …
- the discussion about the terms Data Science and Data Scientists
- the DS process and basic concepts from each step of the process
- Importance of being critical and inquisitive in data science
- different analyst types and skills
Practical Lab Session
This week is mainly a setup week where you get introduced to the coding environment and to Python.
At the end of the session, you should ..
- have installed Anaconda and run it from your account
- have tried out basic Python commands and reflect on how they operate
- have loaded your first data file into Python and read the data in it
Reading lists & Resources
Required reading
- On the origins of Data Science and Data Analysis (first 10 pages): Tukey, J.W., 1962. The future of data analysis. The annals of mathematical statistics, 33(1), pp.1-67. [pdf]
- A formal look at data science: Dhar, V., 2013. Data science and prediction. Communications of the ACM, 56(12), pp.64-73. [library pdf link]
- A systematic study of enterprise analysts, their daily tasks, and challenges: Kandel, Sean, et al. “Enterprise data analysis and visualization: An interview study.” Visualization and Computer Graphics, IEEE Transactions on 18.12 (2012): 2917-2926.
- On Google’s influenza epidemic application: Ginsberg, Jeremy, et al. “Detecting influenza epidemics using search engine query data.” Nature (2008) (need to search this through our library)
- On the critique of the Google Flu Trend project: Lazer, D., Kennedy, R., King, G. and Vespignani, A., 2014. The parable of Google Flu: traps in big data analysis. Science, 343(6176), pp.1203-1205. [pdf]
- An applied Data Science example: Quercia, D., Schifanella, R. and Aiello, L.M., 2014, September. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM conference on Hypertext and social media (pp. 116-125). [pdf]
Optional reading and resources
- On the information pyramid: Ackoff, R.L., 1989. From data to wisdom. Journal of applied systems analysis, 16(1), pp.3-9. [a pdf link to a short extract]
- A critique of the information pyramid: https://hbr.org/2010/02/data-is-to-info-as-info-is-not
- The survey on analyst types and skills : Analyzing the Analyzers By Harlan Harris, Sean Murphy, Marck Vaisman
- A public facing intro to Data Science : Data Science: A guide for society by Sense about Science - [pdf link]
- On data biography: D’Ignazio, C., 2017. Creative data literacy: Bridging the gap between the data-haves and data-have nots. Information Design Journal, 23(1), pp.6-18. [pdf]
Although we try to cover the basics in Python programming in this tutorial, some of you, especially those who are new to Python, might benefit from some external tutorials which cover the basics. There are many resources online but here are some good links:
And here are some books that can you with your learning:
- Data science from scratch : first principles with Python / Joel Grus – http://encore.lib.warwick.ac.uk/iii/encore/record/C__Rb3426067
- Especially Chapter 2: A Crash course in Python to help with the preparations – https://ebookcentral.proquest.com/lib/warw/reader.action?docID=5750897&ppg=33
-
Data science for business : what you need to know about data mining and data-analytic thinking / Foster Provost & Tom Fawcett – http://encore.lib.warwick.ac.uk/iii/encore/record/C__Rb3141595
-
Hunt, J., 2019. A Beginners Guide to Python 3 Programming. Springer. http://encore.lib.warwick.ac.uk/iii/encore/record/C__Rb3404588
- Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython