Foundations of data sciences

The data sciences have a common concern: How can others be confident that our statistical approaches have been brought to bear on appropriate datasets? This course focuses on the ‘data’ of the data sciences. It develops in students an appreciation for the many ways in which dealing with a dataset can get out-of-hand, and establishes approaches to ensure data science is conducted in ways that engenders trusted findings.

Preamble

Overview

The course will be an enormous amount of work and cause you some amount of stress. This is unfortunate, but there’s little way around it. All I can tell you is that having done this course, it’ll be easier in the future.

The purpose of this course is to develop students who appreciate, and can iterate on, the foundations of the data sciences.

This course will require students to:

Essentially this course provides students with everything that they need to know to be able to do the most exciting thing in the world: use data to tell convincing stories.

FAQ

Learning objectives

The purpose of the course is to develop the core skills common to all of the data sciences across academia and industry. By the end of the course, you should be able to:

  1. Engage critically with ideas and readings in data science.
  2. Conduct research in data science in a reproducible and ethical way.
  3. Write and present your research.
  4. Understand what constitutes ethical high-quality data science practice, especially reproducibility and respect for those that underpin our data.
  5. Respectfully identify strengths and weaknesses in the data science research conducted by others.
  6. Develop the ability to appropriately choose and apply statistical models to real-world situations.
  7. Conduct all aspects of the typical data science workflow including deployment.
  8. Reflect effectively on your own learning and professional development.

Pre-requisites

Textbook

Alexander, 2022, Telling Stories with Data, CRC Press.

Content

Week 1

‘Drinking from a fire hose’.

Week 2

‘Science-ing’.

Week 3

‘Communicating’.

Week 4

‘Gathering data’.

Week 5

‘Hunting data’.

Week 6

‘Cleaning data’.

Week 7

‘Store, retrieve, disseminate and protect’.

Week 8

‘Share, but not too much’.

Week 9

‘Whoops, I forgot EDA’.

Week 10

‘IJALM - It’s Just A Linear Model’.

Week 11

‘Lorem ipsum’.

Week 12

‘Deployment’

Assessment

Summary

Item Weight (%) Due date
Weekly quiz 10 Weekly before the lecture
Tutorial 10 Weekly before the lecture
Professional conduct 1 Anytime during the teaching term
Paper 1 25 End of Week 3
Paper 2 25 End of Week 6
Paper 3 25 End of Week 9
Final Paper (initial submission) 1 End of Week 12
Final Paper (peer review) 3 Three days after that
Final Paper 25 Ten days after that

Weekly quizzes

Tutorial

Professional conduct

Paper #1

Paper #2

Paper #3

Final Paper