Surveys, Sampling, and Observational Data

STA304 is an upper-level undergraduate course at the University of Toronto’s Department of Statistical Sciences.

Preamble

Overview

The best thing about being a statistician, is that you get to play in everyone’s backyard.

John Tukey

The work of applied statisticians, regardless of their specific job title and area of application, is the most important and exciting work in the world right now. The ability to gather data, analyse it, and communicate your understanding of the underlying process is incredibly valuable. In this course you will learn and apply the essentials of this.

We focus on surveys, sampling and observational data. The very stuff of statistical science! We will approach these topics from a practical perspective. You will actually run surveys and learn how messy it is to put one together. You will learn how to think about sampling, how to implement it, and why the details matter. You will forecast an election. And you will conduct original research. More generally, you will learn how to obtain and analyse data and use it to make sensible claims about the world.

To work as an applied statistician requires you to be able to, as part of a small team:

You likely have some of these skills already. This course will further develop them. At the end of the course you will have a portfolio of work focused on surveying, sampling, and observational data, that you could show off to a potential employer.

Each week you will read relevant papers and books, engage with them through discussion with each other, myself, and the TA. You will bring this all together and show off how much you have learnt through practical, on-going, assessment.

It is important to recognise that putting together everything that you have learnt to this point in this way will be difficult. It is not possible to cover everything that you will need to know. You should proactively identify and address aspects where you are weak through seeking additional information and resources. This course acts as a guide as to what is important, it does not contain everything that is important.

This course is different to many other courses at the University of Toronto. At the end of this course, you will have a portfolio of work that you could show off to a potential employer. You will have developed the skills to work successfully as an applied statistician or data scientist. And you will know how to fill gaps in your knowledge yourself. A lot of scholarships and jobs these days ask for GitHub and blog links etc to show off a portfolio of your work. This is the class that gives you a chance to develop these. It’s very important to having something to show that needs to go beyond what is done in a normal class.

How to succeed

In this course you will work in a self-directed, open-ended manner. Identify relevant areas of interest and then learn the skills that you need to explore those areas.

To successfully complete this course, you should expect to spend a large portion of your time reading and writing (both code and text). Deeply engage with the materials. Find a small study group and keep each other motivated and focused. At the start of the week, read the course notes, all compulsory materials and some recommended materials based on your interest. After doing that, but before the ‘lecture’ time you should complete the weekly quiz. During ‘lectures’ I’ll live-code, discuss materials in the course notes, talk about an experiment, and you’ll have a chance to discuss the materials with me.

You need to be more active in your learning in this course than others - read the notes and related materials - and then go out there and teach yourself more and apply it. You will not be spoon-fed in this course. Each week try to write reproducible, understandable, R code surrounded by beautifully crafted text that motivates, backgrounds, explains, discusses and criticizes. Make steady progress toward the assessment.

This is not a ‘bird course’. Typically, after the term is finished, students say that the course is difficult but rewarding. The TAs and I are always available to answer any questions. Please come to office hours!

How we’ll work

This webpage will provide almost all the guiding materials that you need and links to the relevant parts of the notes. The course notes are available here: https://www.tellingstorieswithdata.com. Those contain notes and other material that you could go over. There is a course Slack for discussion. We’ll use Quercus really only for assessment submission and grading. I expect you to work professionally, and so we’ll try to use professional tools to the extent possible.

A rough weekly flow for the course would be something like:

  1. Read the week’s course notes.
  2. Read/watch/listen to the compulsory materials.
  3. Complete the weekly quiz.
  4. Attend the lecture.
  5. Attend the lab.
  6. Make progress on a paper.

Advice from past students

Successful past students have the following advice (completely unedited by me):

Acknowledgements

Thank you to the following people for generously providing comments, references, suggestions, and thoughts that directly contributed to this outline: Bethany White, Dan Simpson, Jesse Gronsbell, Kelly Lyons, Lauren Kennedy, and Monica Alexander. Thank you especially to Samantha-Jo Caetano who influenced all aspects of this and co-taught the first version in Fall 2020.

Content

Each week you should go through the course notes and all compulsory materials. During the lecture I will live-code various aspects. I will also discuss a case study, typically a paper. During the lab, a TA will either lead small group discussions or similarly lead other work. The lecture will be recorded and posted here, but again, it’s not enough to just watch that - you need to read and write yourself.

Week 1

‘Drinking from a fire hose’.

Week 2

‘Science-ing’.

Week 3

‘Why, if ever I did fall off—which there’s no chance of—but if I did–’.

Week 4

‘Stratified, systematic, and cluster sampling’

Week 5

‘Gathering data’.

Week 6

‘Whoops, I forgot EDA’.

Week 7

‘IJALM - It’s Just A Linear Model’.

Week 8

‘Celestial Navigation’.

Week 9

‘Multilevel regression with post-stratification’

Week 10

‘Such a shame they’ll never meet’.

Week 11

‘Why does it always rain on me?’.

Week 12

‘Lorem ipsum’.

Assessment

Summary

Item Weight (%) Due date
Weekly quiz 20 Weekly before the lecture
Professional conduct 3 Anytime during the teaching term
Paper 1 24 End of Week 3
Paper 2 24 End of Week 6
Paper 3 24 End of Week 9
Final Paper (initial submission) 1 End of Week 12
Final Paper (peer review) 3 Three days after that
Final Paper 25 Ten days after that

Weekly quizzes

Professional conduct

Paper #1

Paper #2

Paper #3

Final Paper