Data Science Foundations

Quantitative approaches have a common concern: How can others be confident that our statistical models have been brought to bear on appropriate datasets? This course focuses on the ‘data’ of data science. It develops in students an appreciation for the many ways in which dealing with a dataset can get out-of-hand, and establishes approaches to ensure data science is conducted in ways that engenders trusted findings. It touches on statistical modelling, but focuses on everything that comes before and after modelling, and in doing so ensures modelling and analysis are placed on a firmer foundation. In assessment, students will conduct end-to-end data science projects using real-world data, enabling them to fully understand potential pitfalls, and build a portfolio.

Preamble

Overview

The purpose of this course is to develop students who appreciate, and can iterate on, the foundations of data science.

The focus of the learning will be on:

  1. actively reading and consider relevant literature;
  2. actively using the statistical programming language R in real-world conditions;
  3. gathering, cleaning, and preparing datasets; and
  4. choosing and implementing statistical models and evaluating their estimates.

Essentially this course provides students with everything that they need to know to be able to do the most exciting thing in the world: use data to tell convincing stories.

FAQ

Learning objectives

The purpose of the course is to develop the core skills of data science that are applicable across academia and industry. By the end of the course, you should be able to:

  1. Engage critically with ideas and readings in data science (demonstrated in all papers but also tutorials and quizzes).
  2. Conduct research in data science in a reproducible and ethical way (demonstrated in all papers).
  3. Clearly communicate what was done, what was found, and why in writing (demonstrated in all papers).
  4. Understand what constitutes ethical high-quality data science practice, especially reproducibility and respect for those that underpin our data (demonstrated in all papers and selected quizzes).
  5. Respectfully identify strengths and weaknesses in the data science research conducted by others (demonstrated in quizzes, and the peer review).
  6. Develop the ability to appropriately choose and apply statistical models to real-world situations (demonstrated in the final paper)
  7. Conduct all aspects of the typical data science workflow (demonstrated in all papers).
  8. Reflect effectively on your own learning and professional development (demonstrated in some tutorials and quizzes).

Pre-requisites

Textbook

Telling Stories with Data

Content

Before class starts you should go through Chapter 1 and Appendix A of Telling Stories with Data.

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Assessment

Summary

Item Weight (%) Due date
Quiz 8 Weekly, end of each week
SQL quiz 1 Friday, noon, Week 11
Personal website 1 Friday, noon, Week 11
Tutorial 10 Weekly, end of each week
Paper 1 25 Friday, noon, Week 3
Paper 2 25 Friday, noon, Week 6
Paper 3 25 Friday, noon, Week 8
Paper 4 25 Friday, noon, Week 10
Final Paper (initial submission) 2 Wednesday, noon, Week 12
Conduct peer review 3 Friday, noon, Week 12
Final Paper 25 Two weeks after that

You must submit Paper 1. You must submit the Final Paper. You must submit and get at least 70 per cent on both the SQL quiz and the Personal website.

Beyond that, you have scope to pick an assessment schedule that works for you. We will take your best three of the twelve tutorials for that 10 per cent, and your best five of twelve quizzes for that 10 per cent. And we will take your two best papers from Papers 1-4 for that 50 per cent (25 per cent for each). The remainder is made up of 2 per cent for submitting a draft of the Final Paper, 3 per cent for conducting peer review of other people’s drafts of the Final Paper, and 25 per cent for the Final Paper.

Additional details:

Quiz

SQL quiz

Personal website

Tutorial

Paper #1

Paper #2

Paper #3

Paper #4

Final Paper

Other

Children in the classroom

Babies (bottle-feeding, nursing, etc) are welcome in class as often as necessary. You are welcome to take breaks to feed your infant or express milk as needed, either in the classroom or elsewhere including here. A list of baby change stations is also available here. Please communicate with me so that I can make sure that we have regular breaks to accommodate this.

For older children, I understand that unexpected disruptions in childcare can happen. You are welcome to bring your child to class in order to cover unforeseeable gaps in childcare.

Accommodations with regard to assessment

Please do not reveal your personal or medical information to me. I understand that illness or personal emergencies can happen from time to time. The following accommodations to assessment requirements exist to provide for those situations.

Straight-forward (will automatically apply to all students - there’s no need to ask for these):

So for those (with the exception of Paper #1), if you have a situation, then just don’t submit.

Slightly more involved:

Re-grading

Requests to have your work re-graded will not be accepted within 24 hours of the release of grades. This is to give you a chance to reflect. Similarly, requests to have your work re-graded more than seven days after the release of the grades will not be accepted. This is to ensure the course runs smoothly.

Inside that 1-7 day period if you would like to request a re-grade, please email . Please specify where the marking error was made in relation to the marking guide. The entire assessment will be re-marked and it is possible that your grade could reduce.

Plenty of students get 0 on the first paper, but go on to get an A+ overall in the course. The nature of the work in this course requires students to adjust from what is expected in other courses, and the forgiving assessment weighting is designed to allow this.

Plagiarism and integrity

Please do not plagiarize. In particular, be careful to acknowledge the source of code - if it is extensive then through proper citation and if it is just a couple of lines from Stack Overflow then in a comment immediately next to the code.

You are responsible for knowing the content of the University of Toronto’s Code of Behaviour on Academic Matters.

Academic offenses includes (but is not limited to) plagiarism, cheating, copying R code, communication/extra resources during closed book assessments, purchasing labor for assessments (of any kind). Academic offenses will be taken seriously and dealt with accordingly. If you have any questions about what is or is not permitted in this course, please contact me.

Please consult the University’s site on Academic Integrity. Please also see the definition of plagiarism in section B.I.1.(d) of the University’s Code of Behaviour on Academic Matters available here. Please read the Code. Please review Cite it Right and if you require further clarification, consult the site How Not to Plagiarize.

Late policy

You are expected to manage your time effectively. If no extension has been granted and no accommodation applies, then the late submission of an assessment item carries a penalty of 10 percentage points per day to a maximum of one week after which it will no longer be accepted, e.g. a problem set submitted a day late that would have otherwise received 8/10 will receive 7/10, if that same problem set was submitted two days late then it would receive 6/10.

Writing

Papers and reports should be well-written, well-organized, and easy to follow. They should flow easily from one point to the next. They should have proper sentence structure, spelling, vocabulary, and grammar. Each point should be articulated clearly and completely without being overly verbose. Papers should demonstrate your understanding of the topics you are studying in the course and your confidence in using the terms, techniques and issues you have learned. As always, references must be properly included and cited. If you have concerns about your ability to do any of this then please make use of the writing support provided to the faculty, colleges and the SGS Graduate Centre for Academic Communication.

Minimum submission requirement

If you are going to not be able to submit at least two term papers, and/or be unable to submit the final paper then it would be unfair on the other students to allow you to pass the course. Please ensure you and your college registrar or faculty/department advisor get in touch with me as early as possible if this may be the case for you.

Relationship to PhD Student Learning Outcomes