Foundations of data sciences

The data sciences have a common concern: How can others be confident that our statistical approaches have been brought to bear on appropriate datasets? This course focuses on the ‘data’ of the data sciences. It develops in students an appreciation for the many ways in which dealing with a dataset can get out-of-hand, and establishes approaches to ensure data science is conducted in ways that engenders trusted findings. It focuses on everything that comes before modelling, and by focusing on those steps places modelling and analysis on a more firm foundation.

Preamble

Overview

The course will be an enormous amount of work and cause you some amount of stress. This is unfortunate, but there’s little way around it. All I can tell you is that having done this course, it’ll be easier in the future.

The purpose of this course is to develop students who appreciate, and can iterate on, the foundations of the data sciences.

This course will require students to:

Essentially this course provides students with everything that they need to know to be able to do the most exciting thing in the world: use data to tell convincing stories.

FAQ

Learning objectives

The purpose of the course is to develop the core skills common to all of the data sciences across academia and industry. By the end of the course, you should be able to:

  1. Engage critically with ideas and readings in data science.
  2. Conduct research in data science in a reproducible and ethical way.
  3. Write and present your research.
  4. Understand what constitutes ethical high-quality data science practice, especially reproducibility and respect for those that underpin our data.
  5. Respectfully identify strengths and weaknesses in the data science research conducted by others.
  6. Develop the ability to appropriately choose and apply statistical models to real-world situations.
  7. Conduct all aspects of the typical data science workflow.
  8. Reflect effectively on your own learning and professional development.

Pre-requisites

Textbook

Telling Stories with Data

Content

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Assessment

Summary

Item Weight (%) Due date
Quiz 10 Weekly, end of each week
Tutorial 10 Weekly, end of each week
Paper 1 25 Friday, noon, Week 4
Paper 2 25 Friday, noon, Week 6
Paper 3 25 Friday, noon, Week 8
Paper 4 25 Friday, noon, Week 10
Final Paper (initial submission) 1 Wednesday, noon Week 12
Final Paper (peer review) 4 Friday, noon, Week 12
Final Paper 25 Two weeks after that

You must submit Paper 1. And you must submit the Final Paper.

Beyond that, you have scope to pick an assessment schedule that works for you. We will take your best two of the eleven tutorials for that 10 per cent, and your best four of eleven quizzes for that 10 per cent. And we will take your two best papers from Papers 1-4 for that 50 per cent (25 per cent for each). The remainder is made up of 1 per cent for submitting a draft of the Final Paper, 4 per cent for peer reviewing other people’s drafts of the Final Paper, and 25 per cent for the Final Paper.

Additional details:

Quiz

Tutorial

Paper #1

Paper #2

Paper #3

Paper #4

Final Paper

Other

Children in the classroom

Babies (bottle-feeding, nursing, etc) are welcome in class as often as necessary. You are welcome to take breaks to feed your infant or express milk as needed, either in the classroom or elsewhere including: https://familycare.utoronto.ca/childcare/breastfeeding-at-u-of-t/. A list of baby change stations is also available: https://familycare.utoronto.ca/childcare/baby-change-stations-at-u-of-t/. Please communicate with me so that I can make sure that we have regular breaks to accommodate this. For older children, I understand that unexpected disruptions in childcare can happen. You are welcome to bring your child to class in order to cover unforeseeable gaps in childcare.

Accommodations with regard to assessment

Please do not reveal your personal or medical information to me. I understand that illness or personal emergencies can happen from time to time. The following accommodations to assessment requirements exist to provide for those situations.

Straight-forward (will automatically apply to all students - there’s no need to ask for these):

So for those (with the exception of Paper #1), if you have a situation, then just don’t submit.

Slightly more involved:

Re-grading

Requests to have your work re-graded will not be accepted within 24 hours of the release of grades. This is to give you a chance to reflect. Similarly, requests to have your work re-graded more than seven days after the release of the grades will not be accepted. This is to ensure the course runs smoothly.

Inside that 1-7 day period if you would like to request a re-grade, please email . Please specify where the marking error was made in relation to the marking guide. The entire assessment will be re-marked and it is possible that your grade could reduce.

Plagiarism and integrity

Please do not plagiarize. In particular, be careful to acknowledge the source of code - if it’s extensive then through proper citation and if it’s just a couple of lines from Stack Overflow then in a comment immediately next to the code.

You are responsible for knowing the content of the University of Toronto’s Code of Behaviour on Academic Matters.

Academic offenses includes (but is not limited to) plagiarism, cheating, copying R code, communication/extra resources during closed book assessments, purchasing labor for assessments (of any kind). Academic offenses will be taken seriously and dealt with accordingly. If you have any questions about what is or is not permitted in this course, please contact me.

Please consult the University’s site on Academic Integrity http://academicintegrity.utoronto.ca/. Please also see the definition of plagiarism in section B.I.1.(d) of the University’s Code of Behaviour on Academic Matters http://www.governingcouncil.utoronto.ca/Assets/Governing+Council+Digital+Assets/Policies/PDF/ppjun011995.pdf. Please read the Code. Please review Cite it Right and if you require further clarification, consult the site How Not to Plagiarize http://advice.writing.utoronto.ca/wp-content/uploads/sites/2/how-not-to-plagiarize.pdf.

Late policy

You are expected to manage your time effectively. If no extension has been granted and no accommodation applies, then the late submission of an assessment item carries a penalty of 10 percentage points per day to a maximum of one week after which it will no longer be accepted, e.g. a problem set submitted a day late that would have otherwise received 8/10 will receive 7/10, if that same problem set was submitted two days late then it would receive 6/10.

Writing

Papers and reports should be well-written, well-organized, and easy to follow. They should flow easily from one point to the next. They should have proper sentence structure, spelling, vocabulary, and grammar. Each point should be articulated clearly and completely without being overly verbose. Papers should demonstrate your understanding of the topics you are studying in the course and your confidence in using the terms, techniques and issues you have learned. As always, references must be properly included and cited. If you have concerns about your ability to do any of this then please make use of the writing support provided to the faculty, colleges and the SGS Graduate Centre for Academic Communication.

Minimum submission requirement

If you are going to not be able to submit at least two term papers, and/or be unable to submit the final paper then it would be unfair on the other students to allow you to pass the course. Please ensure you and your registrar get in touch with me as early as possible if this may be the case for you.