INF2178: Experimental Design

INF2178 is a masters-level course at the University of Toronto’s Faculty of Information.

Preamble

Overview

Experimental design has a long and robust tradition within traditional applications such as agriculture, medicine, physics, and chemistry. It allows us to speak of causality with confidence. Typically, these are situations in which control groups can be established, randomization is appropriate, and ethical concerns can be assuaged. Unfortunately, such a set-up is rarely possible in the full extent of the modern applications where we want to understand causality.

Source: https://xkcd.com/2400/

This course covers the traditional approaches and statistical methods, but focuses on what to do when traditional experimental design methods cannot be implemented or are not appropriate (i.e. what feels like most of the time these days). We cover experiments in their modern guise especially the concerns that we might have when we can run them; but also methods that can provide some causal understanding even when we cannot conduct traditional experiments. Importantly, these approaches do not rely on ‘big data’ or fancy statistics, but instead on thoroughly interrogating the data that are available to get understanding through as simple means as possible.

This is a hands-on course in which you will conduct research projects using real-world data. This means that you will: obtain and clean relevant datasets; develop your own research questions; use the statistical techniques that you are introduced to in class to answer those questions; and finally communicate your results in a meaningful way. This course is designed around approaches that are used extensively in academia, government, and industry. Furthermore, it includes many aspects, such as data cleaning and preparation, that are critical, but rarely taught.

This course is different to many other courses at the University of Toronto. At the end of this course, you will have a portfolio of work that you could show off to a potential employer. You will have developed the skills to work successfully as an applied statistician or data scientist. And you will know how to fill gaps in your knowledge yourself. A lot of scholarships and jobs these days ask for GitHub and blog links etc to show off a portfolio of your work. This is the class that gives you a chance to develop these. It’s very important to having something that shows off what you can do, and that needs to go beyond what is done in a normal class.

How to succeed

In this course you will work in a self-directed, open-ended manner. Identify relevant areas of interest and then learn the skills that you need to explore those areas.

To successfully complete this course, you should expect to spend a large portion of your time reading and writing (both code and text). Deeply engage with the materials. Find a small study group and keep each other motivated and focused. At the start of the week, read the course notes, all compulsory materials and some recommended materials based on your interest. After doing that, but before the ‘lecture’ time you should complete the weekly quiz. During ‘lectures’ I’ll live-code, discuss materials in the course notes, talk about an experiment, and you’ll have a chance to discuss the materials with me.

You need to be more active in your learning in this course than others - read the notes and related materials - and then go out there and teach yourself more and apply it. You will not be spoon-fed in this course. Each week try to write reproducible, understandable, R code surrounded by beautifully crafted text that motivates, backgrounds, explains, discusses and criticizes. Make steady progress toward the assessment.

This is not a ‘bird course’. Typically, after the term is finished, students say that the course is difficult but rewarding. The TAs and I are always available to answer any questions. Please come to office hours!

How we’ll work

This webpage will provide almost all the guiding materials that you need and links to the relevant parts of the notes. The course notes are available here: https://www.tellingstorieswithdata.com. Those contain notes and other material that you could go over. There is a course Slack for discussion. We’ll use Quercus really only for assessment submission and grading. I expect you to work professionally, and so we’ll try to use professional tools to the extent possible.

A rough weekly flow for the course would be something like:

  1. Read the week’s course notes.
  2. Read/watch/listen to the compulsory materials.
  3. Complete the weekly quiz.
  4. Attend the lecture.
  5. Attend the lab.
  6. Make progress on a paper.

Advice from past students

Successful past students have the following advice (completely unedited by me):

Acknowledgements

Thanks to the following who helped develop this course: Monica Alexander, Kelly Lyons, Sharla Gelfand, Faria Khandaker, Hidaya Ismail, A Mahfouz, Paul Hodgetts, Thomas Rosenthal.

Content

Each week you should go through the course notes and all compulsory materials. During the lecture I will live-code various aspects. I will also discuss a case study, typically a paper. During the lab, a TA will either lead small group discussions or similarly lead other work. The lecture will be recorded and posted here, but again, it’s not enough to just watch that - you need to read and write yourself.

Week 1

‘Drinking from a fire hose’.

Week 2

‘Science-ing’.

Week 3

‘Why, if ever I did fall off—which there’s no chance of—but if I did–’.

Week 4

‘Gathering data’.

Week 5

‘Whoops, I forgot EDA’.

Reading Week

Week 6

‘IJALM - It’s Just A Linear Model’.

Week 7

‘Celestial Navigation’.

Week 8

‘Such a shame they’ll never meet’.

Week 9

‘Why does it always rain on me?’.

Week 10

‘Post Hoc, Ergo Propter Hoc’.

Week 11

‘But it works on my machine’.

Week 12

‘Lorem ipsum’.

Assessment

Summary

Item Weight (%) Due date
Weekly quiz 20 Weekly before the lecture
Professional conduct 1 Anytime during the teaching term
Paper 1 25 End of Week 3
Paper 2 25 End of Week 6
Paper 3 25 End of Week 9
Final Paper (initial submission) 1 End of Week 12
Final Paper (peer review) 3 Three days after that
Final Paper 25 Ten days after that

Weekly quizzes

Professional conduct

Paper #1

Paper #2

Paper #3

Final Paper