Introduction to Modeling

Preamble

Overview

After developing comfort with the idea of how our world becomes data and everything that goes into creating a nice, tidy, analysis dataset, we may be interested to model it. This course builds on “Data science foundations” by developing the modeling side of things. Many of the results that we will obtain from our models are determined well before we ever contemplate regression, and the credibility of any conclusions relies on a robust workflow.

This course takes those skills as given, and focuses on doing a good job of the modeling. The typical person completing this course is a PhD student who is looking to apply statistics to some other discipline.

Learning objectives

By the end of the course you should have three quantitative papers that can be submitted to a journal/conference. More generally you should be able to:

  1. Engage critically with ideas and readings related to statistical modeling (demonstrated in all papers and the notebook).
  2. Conduct quantitative research in a high-quality, reproducible and ethical way (demonstrated in all papers).
  3. Clearly communicate what was done, what was found, and why in writing (demonstrated in all papers).
  4. Respectfully identify strengths and weaknesses in the data science research conducted by others (demonstrated in quizzes, and the peer review).
  5. Develop the ability to appropriately choose and apply statistical models to real-world situations (demonstrated in the final paper)
  6. Conduct all aspects of the typical data science workflow (demonstrated in all papers).

Pre-requisites

Textbooks

  • McElreath, 2020, Statistical Rethinking, 2nd edition, Chapman & Hall/CRC. Please purchase a print copy, which will be about about $70. The author provides a free set of lectures that we will go through, which make purchasing the book not unreasonable.
  • Johnson, Ott, and Dogucu, 2021, Bayes Rules!, Chapman & Hall/CRC. You are welcome to purchase a print copy, but there is also a free version online.

Acknowledgements

This course—especially the key feature of combining the two books—is largely based on Andrew Heiss’ ‘Bayesian Statistics’ course. I have tweaked the order of content a little, added articles, and changed assessment. I am grateful for Andrew’s ideas, suggestions, and generosity.

Content

Each week you should read the chapters, watch the relevant lecture, and attempt the exercises.

Week 1

  • Read Statistical Rethinking Chapter 1, “The Golem of Prague”
  • Watch Statistical Rethinking 2023 videos, Lecture 1, “The Golem of Prague”
  • Read Statistical Rethinking Chapter 2, “Small Worlds and Large Worlds”
  • Read Bayes Rules! Chapter 1, “The (Big) Bayesian Picture”
  • Read Bayes Rules! Chapter 2, “Bayes’ Rule”
  • Read Freedman, David, 1991, “Statistical Models and Shoe Leather” Sociological Methodology, Vol. 21, pp. 291-313.

Week 2

  • Watch Statistical Rethinking 2023 videos, Lecture 2, “The Garden of Forking Data”
  • Read Statistical Rethinking Chapter 3, “Sampling the Imaginary”
  • Read Bayes Rules! Chapter 3, “The Beta-Binomial Bayesian Model”

Week 3

  • Read Statistical Rethinking Chapter 4, “Geocentric Models”
  • Watch Statistical Rethinking 2023 videos, Lecture 3, “Geocentric Models”
  • Watch Statistical Rethinking 2023 videos, Lecture 4, “Categories & Curves”
  • Read Bayes Rules! Chapter 4, “Balance and Sequentiality in Bayesian Analyses”
  • Read Bayes Rules! Chapter 5, “Conjugate Families”

Week 4

  • Read Statistical Rethinking Chapter 5, “The Many Variables & The Spurious Waffles”
  • Watch Statistical Rethinking 2023 videos, Lecture 5, “Elemental Confounds”
  • Read Statistical Rethinking Chapter 6, “The Haunted DAG & The Causal Terror”
  • Watch Statistical Rethinking 2023 videos, Lecture 6, “Good and Bad Controls”

Week 5

  • Read Statistical Rethinking Chapter 7, “Ulysses’ Compass”
  • Watch Statistical Rethinking 2023 videos, Lecture 7, “Fitting over and under”
  • Read Bayes Rules! Chapter 9, “Simple Normal Regression”
  • Read Bayes Rules! Chapter 10, “Evaluating Regression Models”
  • Read Bayes Rules! Chapter 11, “Extending the Normal Regression Model”

Week 6

  • Read Statistical Rethinking Chapter 9, “Markov Chain Monte Carlo”
  • Watch Statistical Rethinking 2023 videos, Lecture 8, “Markov Chain Monte Carlo”
  • Read Bayes Rules! Chapter 6, “Approximating the Posterior”
  • Read Bayes Rules! Chapter 7, “MCMC under the Hood”
  • Read Bayes Rules! Chapter 8, “Posterior Inference & Prediction”

Week 7

  • Read Statistical Rethinking Chapter 10, “Big Entropy and the Generalized Linear Model”
  • Read Statistical Rethinking Chapter 11, “God Spiked the Integers”
  • Watch Statistical Rethinking 2023 videos, Lecture 9, “Modeling events”
  • Watch Statistical Rethinking 2023 videos, Lecture 10, “Counts and Hudden Confounds”
  • Read Bayes Rules! Chapter 12, “Poisson & Negative Binomial Regression”
  • Read Bayes Rules! Chapter 13, “Logistic Regression”
  • Read Burch, Tyler James, 2023, “2023 NHL Playoff Predictions”, April.

Week 8

  • Read Statistical Rethinking Chapter 12, “Monsters and Mixtures”
  • Watch Statistical Rethinking 2023 videos, Lecture 11, “Ordered Categories”

Week 9

  • Read Statistical Rethinking Chapter 13, “Models with Memory”
  • Watch Statistical Rethinking 2023 videos, Lecture 12, “Multilevel Models”
  • Read Bayes Rules! Chapter 15, “Hierarchical Models are Exciting”
  • Read Bayes Rules! Chapter 16, “(Normal) Hierarchical Models without Predictors”
  • Read Bayes Rules! Chapter 17, “(Normal) Hierarchical Models with Predictors”

Week 10

  • Read Statistical Rethinking Chapter 14, “Adventures in Covariance”
  • Watch Statistical Rethinking 2023 videos, Lecture 13, “Multilevel adventures”
  • Watch Statistical Rethinking 2023 videos, Lecture 14, “Correlated features”
  • Read Bayes Rules! Chapter 18, “Non-Normal Hierarchical Regression & Classification”
  • Read Bayes Rules! Chapter 19, “Adding More Layers”

Week 11

  • Read Statistical Rethinking Chapter 15, “Missing Data and Other Opportunities”
  • Watch Statistical Rethinking 2023 videos, Lecture 17, “Measurement and misclassification”
  • Watch Statistical Rethinking 2023 videos, Lecture 18, “Missing Data”
  • Read Rubin, D, 1976, “Inference and missing data”, Biometrika.

Week 12

  • Read Statistical Rethinking Chapter 16, “Generalized Linear Madness”
  • Read Statistical Rethinking Chapter 17, “Horoscopes”
  • Watch Statistical Rethinking 2023 videos, Lecture 19, “Generalized Linear Madness”
  • Watch Statistical Rethinking 2023 videos, Lecture 20, “Horoscopes”

Assessment

Notebook

  • Due date: Try to keep this updated weekly, on average, over the course of the term.
  • Task: Use Quarto to keep a notebook of what you read in the style of this one by Andrew Heiss. In particular, please add notes as you read/watch, and attempt at least half the exercises.
  • Weight: 20 per cent.

Paper I

  • Due date: Thursday, noon, Week 1.
    • Marking starts, noon, on the following Monday, and you can update until then to incorporate peer review comments. Please do not make any changes after marking starts.
  • Task: Donaldson Paper
  • Weight: 10 per cent.
  • You must do this individually.
  • You do not need to do this again if you already did it as part of “Data science foundations”. In that case, the weight will be redistributed to Papers II and III.

Paper II

  • Due date: Thursday, noon, Week 4.
    • Marking starts, noon, on the following Monday, and you can update until then to incorporate peer review comments. Please do not make any changes after marking starts.
  • Task: Murrumbidgee Paper
  • You are welcome to work in teams up to size three.
  • Weight: 20 per cent.
  • The mark is conditional on the paper being submitted to a journal/conference within two weeks of marking (being rejected is fine—it just has to have been submitted).

Paper III

  • Due date: Thursday, noon, Week 8.
    • Marking starts, noon, on the following Monday, and you can update until then to incorporate peer review comments. Please do not make any changes after marking starts.
  • Task: Spadina Paper or Spofforth Paper
  • You are welcome to work in teams up to size three.
  • Weight: 20 per cent.
  • The mark is conditional on the paper being submitted to a journal/conference within two weeks of marking (being rejected is fine—it just has to have been submitted).

Final Paper

  • Due date: Thursday, noon, Week 12.
    • Marking starts, noon, two weeks after this date, and you can update update until then to allow you to incorporate peer review comments. Please do not make any changes after marking starts.
  • Task: Final Paper
  • You must do this individually.
  • Weight: 30 per cent.
  • The mark is conditional on the paper being submitted to a journal/conference within two weeks of marking (being rejected is fine—it just has to have been submitted).