Toronto Data Workshop

Overview

The Toronto Data Workshop (TDW) brings together academia and industry to share data science best practice. We are broadly interested, but especially in the data-centric aspects of a data science project. We meet weekly for an hour, typically Fridays at noon (Toronto time), hybrid (in-person at the Faculty of Information and online via Zoom), with most talks recorded and shared. For an invitation please sign up here. Anyone is welcome to attend—it is free and you don’t need to be affiliated with the university.

Current organizing committee:

Past committee members:

  • Amy Farrow (2021-22)
  • Faria Khandaker (2020-21)
  • Lorena Almaraz De La Garza (2021-22)

The TDW is a joint initiative between the Faculty of Information and the Department of Statistical Sciences at the University of Toronto and we especially thank Wendy Duff and Radu Craiu for their support.

Fall 2023

  • Friday 22 September 2023, noon - 1pm
    • Saloni Dattani, Our World in Data
    • Missing data in global health
    • Saloni Dattani is a researcher on global health at Our World in Data, and an editor at Works in Progress. She’s interested in everything related to global health, epidemiology and meta-science.
  • Friday 29 September 2023, noon - 1pm
    • Nima Sarajpoor, Manulife
    • STUMPY: A powerful tool for modern time series analysis
    • Nima Sarajpoor is a data scientist in Manulife, working in Fraud Detection. He has been contributing to the software STUMPY for about two years
  • Friday 6 October 2023, noon - 1pm
    • Fabrizio Dell’Acqua, Harvard Business School
    • Experimental evidence from a study of Boston Consulting Group consultants on the effect of access to GPT-4 on performance. Paper here.
    • Fabrizio Dell’Acqua is a postdoctoral research fellow and teaching fellow at Harvard Business School and the Laboratory for Innovation Science at Harvard (LISH). He received a Ph.D. in Management from Columbia Business School. His research focuses on the areas of automation, human/AI collaboration, and business ethics.
  • Friday 13 October 2023, noon - 1pm
    • Tom Cardoso, The Globe and Mail
    • Secret Canada
    • Tom Cardoso is a member of The Globe and Mail’s investigations team based in Toronto. Since the fall of 2021, he has been investigating Canada’s broken freedom of information systems.
  • Friday 20 October 2023, noon - 1pm
    • Marzieh Fadaee, Cohere for AI
    • Aya: An Open Science Initiative to Accelerate Multilingual AI Progress
  • Friday 27 October 2023, noon - 1pm
    • Wendy Foster, Shopify
    • Socio-technical processes for data integrity
    • Wendy Foster is Director of Engineering and Data, Optimize, at Shopify
  • Friday 3 November 2023, noon - 1pm
    • Apoorva Lal, Netflix
    • Modern balancing methods for causal inference
    • Data scientist on the experimentation team at Netflix working on observational causal inference with spatial and panel data
  • Friday 17 November 2023, noon - 1pm
    • Kevin Wilson, Borealis AI

Past talks

Fall 2022

  • Friday 9 September 2022, noon - 1pm
    Ryan Briggs, University of Guelph
    Statistical power in political science
    Ryan Briggs is a social scientist at the University of Guelph
  • Thursday 22 September 2022, 5pm - 6pm
    Melina Vidoni, Australian National University
    Dr Vidoni is a Lecturer at the Australian National University in the CECS School of Computing, where she continues her domestic and international collaborations with Canada and Germany. Dr Vidoni’s main research interests are mining software repositories, technical debt, software development, and empirical software engineering when applied to data science and scientific software.
  • Friday 30 September 2022, noon - 1pm
    Emily Giambalvo and Ence Morse, The Washington Post
    How the NFL blocks Black coaches
    Emily Giambalvo covers University of Maryland athletics for The Washington Post, where she has worked since June 2018. Emily grew up in South Carolina and graduated from the University of Georgia.
    Clara Ence Morse is an Investigative Reporting Workshop intern with The Washington Post’s data desk. She is a student at Columbia University and the editor in chief of the Columbia Daily Spectator.
  • Friday 7 October 2022, noon - 1pm
    Rohan Alexander, University of Toronto
    Rethinking data science
  • Friday 14 October 2022, noon - 1pm
    April Wang, University of Michigan
    Reimagining Tools for Collaborative Data Science
    April Wang is a Ph.D. candidate at University of Michigan School of Information, advised by Dr. Steve Oney and Dr. Christopher Brooks. Her research in human-computer interaction (HCI) explores barriers in real-world data science programming practices, and reimagines the workflow and interfaces for collaborative data science environments.
  • Friday 21 October 2022, noon - 1pm
    Meg Risdal, Kaggle (Google)
    Meg Risdal is a lead Product Manager for Kaggle (a Google company) where she works with software developers, designers, and researchers to create great experiences for people learning ML, ML practitioners, and ML researchers.
  • Friday 4 November 2022, noon - 1pm
    Maitreyee Sidhaye and Meggie Debnath, St. Michael’s Hospital, Unity Health Toronto
    The Things We Learned from Deploying AI in Healthcare
    Maitreyee and Meggie are data scientists in the Data Science & Advanced Analytics (DSAA) unit at St. Michael’s Hospital in Toronto. DSAA is a multiteam unit in the hospital that provides data sciecne and machine learning solutions across a variety of problems: clinical prediction tools, staffing optimization, imaging, and more. DSAA works very closely with others in the hospital to collaborate on understanding problems and providing solutions. Maitreyee and Meggie will share their experiences and learnings from building and deploying machine learning tools in the hospital.
  • Friday 18 November 2022, noon - 1pm
    Lindsay Katz, University of Toronto
    A new, comprehensive database of all proceedings of the Australian Parliamentary Debates
    Lindsay Katz holds a Masters of Statistics from the University of Toronto and a Bachelor of Arts and Science from the University of Guelph where she specialized in Mathematical Science and International Development. At Guelph she worked with Professor Ryan Briggs to explore lived poverty in Africa using Afrobarometer data. At Toronto she works with Professor Monica Alexander to research demographic variation in short-term migration patterns using Facebook data, and with Professor Rohan Alexander to digitize the Australian parliamentary debates from 1901 to present. As an interdisciplinary researcher, she is interested in using statistics to better understand social processes in the world.
  • Friday 25 November 2022, noon - 1pm
    Marcel Fortin and Leanne Trimble, U of T Map & Data Library
    Recent additions to the Map and Data Library
    The Map and Data Library’s data collections, software & support, with a focus on recently acquired datasets.
    Leanne Trimble is a Data & Statistics Librarian, and Marcel Fortin is Head, Map and Data Library.
  • Friday 16 December 2022, noon - 1pm
    Zane Schwartz, Investigative Journalism Foundation
    Zane Schwartz is the editor-in-chief of the Investigative Journalism Foundation.

Winter 2022

  • Fri 28 Jan 2022, noon - 1pm
    Ashok Chaurasia, University of Waterloo
    Multiple Imputation: Old and New Combining Rules for Statistical Inference
    Dr. Ashok Chaurasia is an Assistant Professor (of Statistics) in the School of Public Health Sciences at University of Waterloo. His background/training is in Statistics, with research interests in topics of Missing Data, Data Imputation, Model Selection, and Longitudinal Data Analysis Methodology.
  • Fri 4 Feb 2022, noon - 1pm
    Nick Huntington-Klein, Seattle University
    The Influence of Hidden Researcher Decisions in Applied Microeconomics
    I am an economics professor at Seattle University, with research that focuses on higher education, econometrics, and metascience.
  • Fri 11 Feb 2022, noon - 1pm
    Silvia Canelón, University of Pennsylvania
    Lessons Learned from EHR Research
    Silvia Canelón is a postdoctoral research scientist in the Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania where she applies biomedical informatics to population health research. She uses R to work on projects that develop novel data mining methods to extract pregnancy-related information from Electronic Health Records (EHR) and that study the relationship between environment and disease.
  • Fri 18 Feb 2022, noon - 1pm
    Vincent Arel-Bundock, Université de Montréal
    What modelsummary taught me about R package development
    I am a political science professor at the Université de Montréal.
  • Fri 4 Mar 2022, noon - 1pm
    Maria Kamenetsky, University of Wisconsin-Madison
    Spatial clustering
    I am a PhD candidate in Epidemiology at the University of Wisconsin-Madison, where I also completed my MS in Statistics. My research focuses on methods in spatial epidemiology, specifically working on statistical methods and applications in spatial cluster detection.
  • Fri 11 Mar 2022, noon - 1pm
    Irena Papst, McMaster University
    Using models to help guide pandemic response
    I’m a postdoctoral fellow in McMaster’s Mathematics & Statistics department, where I work on mathematical modelling, especially of infectious disease dynamics. I did my PhD in Cornell’s Center for Applied Mathematics. I care deeply about reproducible research, clear scientific communication, good teaching, and big salads.
  • Fri 18 Mar 2022, noon - 1pm
    May Chan and Ramses Van Zon
  • Thu 24 Mar 2022, 5pm - 6pm
    Emi Tanaka, Monash University
    An anthology of experimental designs
    Dr. Emi Tanaka is an assistant professor in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area include data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter’s Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.
  • Fri 1 Apr 2022, noon - 1pm
    Brittany Witham, Geopolitica
    Data science at a startup
    Originally from Melbourne, Australia, Brittany received her B.A. in International Studies from the University of Saskatchewan and started her career in economic development, equipping her with comprehensive knowledge of foreign direct investment and international business early in her career. She went on to obtain an M.A. in European and Russian Affairs from the University of Toronto in 2018, where she first discovered the potential of programming for political science and became fascinated with artificial intelligence (AI). Over the past three years, Brittany has worked in many facets of the AI industry, from leading research and development of new AI products for video game developers to building automated data pipelines for business intelligence and managing software engineering and client engagement teams. In that time, she has honed technical skills in full-stack development, machine learning, and data engineering. She recently struck out on her own to launch an online global event monitoring tool and deliver novel solutions to clients in the political risk and social enterprise sectors. Brittany is a firm believer in the potential for data-driven technologies for geopolitics and is excited to contribute to the many discoveries to be made in this space.

Fall 2021

  • Friday, 24 September 2021, noon - 1pm
    Karen Chapple, Department of Geography and Planning/School of Cities
    Data science and geography, planning, cities
    Karen Chapple is the inaugural Director of the School of Cities and Professor of Geography and Planning at the University of Toronto. Her research uses data science methods to identify and predict gentrification and displacement in cities. She is Professor Emerita at the University of California, Berkeley, where she helped to launch the undergraduate data science program.
  • Friday, 1 October 2021, noon - 1pm
    The 2021 Canadian Election
    David Andrews
    Daniel Rubenson
    Johnson Vo
    Eric Zhu, Brian Diep, Ashely (Jing Yuan) Zhang, Kristin (Xi Yu) Huang, and Tanvir Hyder
  • Friday, 8 October 2021, noon - 1pm
    Fedor Dokshin, Department of Sociology
    Data science and sociology
    Fedor Dokshin is an Assistant Professor of Sociology at the University of Toronto. He is a computational social scientist with research interests in social networks, organizations, and energy and the environment. Across these domains, Fedor leverages data science methods and novel data sources to improve existing measurement strategies.
  • Friday, 15 October 2021, noon - 1pm
    Drew Stommes, Department of Political Science, Yale University
    On the reliability of published findings using the regression discontinuity design in political science
    Drew Stommes is a doctoral candidate in the Department of Political Science at Yale University, where he researches democracy, political violence, and quantitative methods. He will talk about a recent working paper.
  • Friday, 22 October 2021, noon - 1pm
    Tegan Maharaj, Faculty of Information
    Data science and information
    I study AI systems and “what goes into” them, e.g. their real-world deployment context, and the effects that has on learning behaviour and generalization. I do that because I want to be able to use AI systems responsibly for problems I think are important, like impact and risk assessments for climate change, AI alignment, ecological management and other common-good problems. My website is: http://www.teganmaharaj.com/.
  • Friday, 29 October 2021, noon - 1pm
    Josh Speagle, Astronomy & Astrophysics, Dunlap Institute, Statistical Sciences
    Data science and astronomy
    Josh is a Banting & Dunlap Postdoctoral Fellow at the University of Toronto whose research focuses on using astrostatistics and “data science” to understand how galaxies like our own Milky Way form, behave, and evolve.
  • Friday, 5 November 2021, noon - 1pm
    Yun William Yu, Math Department, UofT; UTSC Computer & Mathematical Sciences
    Data science and math
    Yun William Yu is an assistant professor in the math department at UofT whose research focuses on algorithmic methods for computational biology and medical informatics.
  • Friday, 12 November 2021, noon - 1pm
    Ann Glusker, Doe Library, University of California Berkeley
    “Supporting Big Data Research at the University of California, Berkeley”
    Dr Ann Glusker is Librarian for Sociology, Demography, Public Policy, Psychology (fall 2021) & Quantitative Research at the Doe Library, University of California, Berkeley. She will discuss a recently released report ‘Supporting Big Data Research at the University of California, Berkeley’. This report provides insights on researcher practices and challenges in six thematic areas: data collection & processing; analysis: methods, tools, infrastructure; research outputs; collaboration; training; and balancing domain vs data science expertise.
  • Friday, 19 November 2021, noon - 1pm
    Radu Craiu, Statistical Sciences, Univeristy of Toronto
    Data science and statistical sciences
    Dr Radu V. Craiu is Professor and Chair of Statistical Sciences at the University of Toronto. His main research interests are in computational methods in statistics, especially, Markov chain Monte Carlo algorithms (MCMC), Bayesian inference, copula models, model selection procedures and statistical genetics.
  • Friday, 26 November 2021, noon - 1pm
    Kieran Campbell, Lunenfeld Tanenbaum Research Institute
    Data science and Biomedicine
    Dr Kieran Campbell is an investigator at the Lunenfeld-Tanenbaum Research Institute and an assistant professor at the Departments of Molecular Genetics and Statistical Sciences, University of Toronto. His research focusses on Bayesian models and machine learning for high dimensional biomedical data, including single-cell and cancer genomics. Recently, he has led efforts to develop statistical machine learning methodology to integrate single-cell RNA and DNA sequencing data to uncover the effects of tumour clonal identity on gene expression, as well as methods to automatically delineate the tumour microenvironment from single-cell RNA-sequencing data. Such findings can improve our understanding of cancer progression and of why certain tumours are resistant to therapies, leading to relapse.
  • Friday, 3 December 2021, noon - 1pm
    Leanne Trimble, UofT Libraries
    Data science and libraries
    Leanne Trimble is a data librarian at the Map & Data Library, University of Toronto Libraries.
  • Friday, 10 December 2021, noon - 1pm
    Nathan Taback, Departments of Statistical Sciences
    Teaching data science
    Nathan Taback is the director of Data Science programs and an Associate Professor, Teaching Stream in the Department of Statistical Sciences, and Computer Science (cross-appointed) at the University of Toronto. He currently serves as a Special Advisor to the Dean of Arts and Science on Computational and Data Science Education.

Summer 2021

  • Friday, 21 May 2021, noon - 1pm
    David Shor, OpenLabs
    Political data science. David is an American data scientist who tries to elect Democrats. He is known for analyzing political polls and currently serves as head of data science with OpenLabs, a progressive nonprofit, and also as a Senior Fellow with the Center for American Progress Action Fund.
  • Friday, 28 May 2021, noon - 1pm
    Samantha Pierre, University of Toronto
    And The Nominees Are… An Empirical Study of the Effects of a Tony Award Win and Nomination on a Show’s Success
    Samantha is a fourth-year statistics student studying at the University of Toronto. Throughout the past year she has combined her love for theatre and statistics to analyze trends in the theatre community. She volunteers as a member of PAIR-CG to create a representational framework for the international theatre community. She currently works at WOMBO, an app developed by former U of T students, as head of music content.
  • Friday, 4 June 2021, noon - 1pm
    Heather Krause, We All Count
    Equity in Data (or, how not to accidentally use data like a racist, sexist, colonialist, etc)
    Heather remains unconvinced. As a statistician with decades of global experience working on complex data problems and producing real-world knowledge, she has developed the Data Equity Framework to address the equity issues in data products and research projects. Her emphasis is on combining strong statistical analysis with clear and meaningful communication. She is currently working on implementing tools for equity and ethics in data. As the founder of two successful data science companies, she attacks the largest questions facing societies today, working with both civic and corporate organizations to improve outcomes and lives. Her relentless pursuit of clarity and realism in these projects pushed her beyond pure analysis to mastering the entire data ecosystem including award-winning work in data sourcing, modeling, and data storytelling, each incorporating bleeding edge theory and technologies. Heather is the founder of We All Count, a project for equity in data working with teams across the globe to embed a lens of ethics into their data products from funding to data collection to statistical analysis and algorithmic accountability. Her unique set of tools and contributions have been sought across a range of clients from MasterCard and Volkswagen to the United Nations, the Syrian Refugee Resettlement Secretariat, Airbnb, and the Bill and Melinda Gates Foundation. She is on the Data Advisory Board of the UNHCR.
  • Friday, 11 June 2021, noon - 1pm
    Laura Bronner, Data scientist
    Quantitative editing
    Laura is a data scientist, who most recently worked as the quantitative editor at FiveThirtyEight. More generally, she is a data scientist with an interest in causal inference, political science and quantitative text analysis. Before FiveThirtyEight, she was a Senior Analyst at the Analyst Institute, designing and analyzing field experiments for the 2018 election cycle. In September 2018, she completed a PhD in Political Science at the London School of Economics’ Department of Government.
  • Friday, 18 June 2021, noon - 1pm
    Jacob Matson, Simetric
    From data ask to dashboard
    Jacob is VP of Finance & Operations at Simetric, Inc..
  • Friday, 25 June 2021, noon - 1pm
    Laura Derksen, University of Toronto Mississauga
    The impact of student access to Wikipedia
    Laura is the Amgen Canada Professor in Health System Strategy at the University of Toronto Mississauga and assistant professor in Strategic Management at the Rotman School of Management. Her research interests are development and global health education, information and networks.
  • Friday, 2 July 2021, noon - 1pm
    Zachary McCaw, Google
    Zachary McCaw is a data scientist at Google.
  • Friday, 16 July 2021, noon - 1pm
    Kamilah Ebrahim, University of Toronto
    Trust in contact tracing apps
    Kamilah Ebrahim received a B.A. in Economics from the University of Waterloo in 2019 and is currently pursuing a Masters of Information in Human Centred Data Science at the University of Toronto. Kamilah is a 2020-21 Graduate Fellow at the University of Toronto Centre for Ethics focusing on the intersection between race, economics and data monopolies in Canada. Prior to joining the University of Toronto she held roles at the United Nation Economic and Social Commission for Asia and the Pacific (UN ESCAP), as well as the Canadian federal government.
  • Friday, 23 July 2021, noon - 1pm
    Annie Collins and Rohan Alexander, University of Toronto
    Reproducibility of COVID-19 pre-prints
    Annie Collins is an undergraduate student at the University of Toronto specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focusses her efforts on student governance, promoting women’s representation in STEM, and working with data in the non-profit and charitable sector.
    Rohan Alexander is an assistant professor at the University of Toronto in Information and Statistical Sciences, and a faculty affiliate at the Schwartz Reisman Institute for Technology and Society. He holds a PhD in Economics from the Australian National University.
  • Friday, 30 July 2021, noon - 1pm
    Keli Chiu, University of Toronto
    Detecting sexist and racist text contents with explanations accompanied with GPT-3
    Keli Chiu is a recent graduate of master in Information at the University of Toronto with the concentration in Human-Centred Data Science. Prior to pursuing her master in the information fields, she worked as a web developer and fell in love with data. Her research interests are natural language processing applications, text analysis and ethics in AI and machine learning. She received rstudio::global Diversity Scholarships in the year of 2021.
  • Friday, 6 August 2021, 12:30pm - 1pm
    Ijeamaka Anyene, Kaiser Permanente Division of Research
    Taking the next step past standard charts
    Ijeamaka is a data analyst working in healthcare research. She specializes in using R and SAS for data analysis, epidemiological research, and data visualizations. She is also passionate about computational art, knowledge sharing / dissemination, and how to mix the two.
  • Friday, 13 August 2021, noon - 1pm
    Independent Summer Statistics Community projects
    ‘Prospective Analytics’: Ashley Zhang, Eric Zhu, Muhammad Tsany and Sergio Zheng Zhou.
    ‘Statistically Significant’: Aliza Lakho, Chloris Jiang, Janhavi Agarwal, and José Casas on whether young professionals should move to Toronto.
    ‘Point Zero Five’: Pan Chen, Xiaoxuan Han, Yi Qin, and Yini Mao on the livability of Toronto for newcomers.
  • Friday, 20 August 2021, noon - 1pm
    Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan, University of Toronto
    Using statistical model to analyzing shark, lizard, and basketball movement data
    Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan are undergraduate students at the University of Toronto.

Winter 2021

Thanks to Paul Hodgetts for the Jays-inspired sticker.

  • Thursday, 14 January, 4:30-5:30pm
    Andrew Miles, University of Toronto
    Code, plots, and values.
    Jointly hosted with Elizabeth Parke and the UTM Collaborative Digital Research Space.
    Andrew Miles is Assistant Professor of Sociology at the University of Toronto and Director of the Morality, Action, and Cognition Lab.
  • Wednesday, 20 January, 4:30-5:30pm
    Zia Babar, University of Toronto
    Derivative data security
    Zia Babar obtained his PhD from the University of Toronto where his research studies focused on the analysis and design of data-centered information systems for enabling enterprise transformation. He is engaged in a multi-year research engagement with IBM Research Labs and is a startup technical mentor at WeWork Labs. He is the organizer of technology meetup groups in both Toronto and Waterloo, and a course instructor at the Faculty of Information, University of Toronto.
  • Thursday, 28 January, 4:30-5:30pm
    Irene Duah-Kessie, University of Toronto
    Exploring algorithmic bias and fairness and its impact on health outcomes faced by racialized communities
    Irene Duah-Kessie is a graduate of the University of Toronto’s Master of Science in Sustainability Management program. Throughout her studies, Irene published her research on racial income inequality in Toronto with the Wellesley Institute and is currently a part of the Turtle Island Journal of Indigenous Health Editorial Team. Irene is a Project Manager at Across Boundaries leading an initiative to address food security and mental health challenges in Toronto’s Black community. She is also the founder of Rise In STEM, a grassroots organization that aims to increase access to STEM learning opportunities in Black and marginalized communities.
  • Thursday, 4 February, 4:30-5:30pm
    Kathy Ge, Uber
    How data insights and experimentation help drive product design and intelligent recommendations on the Uber Eats platform
    Kathy is a data scientist with Uber Eats primarily focused on the shopping experience including ranking and recommendations throughout the order flow. She received her M.Sc. in Computer Science and B.Sc in Computer Science and Statistics from the University of Toronto.
  • Thursday, 11 February, 4:30-5:30pm
    Garrick Aden-Buie, R Studio
    Using R Markdown in general and in some specific projects
    Garrick is a Data Science Educator at RStudio who lives in sunny St. Petersburg, Florida. His passion is combining creative coding with programming education, using code to build tools that teach coding to new and advanced R users alike. Like tidyexplain: a project that used ggplot2 and gganimate to reimagine database operations as colorful flying boxes instead of the typical Venn diagrams. Garrick has developed a number of open source addins and packages for RStudio—such regexplain, shrtcts and rsthemes—and is always easily distracted by projects that combine R Markdown and online learning or teaching.
  • Monday, 15 February, Noon-1:00pm
    Emily Riederer, Capital One
    Causal design patterns for data analysts
    Emily is a Senior Analytics Manager at Capital One. Emily’s team focuses on reimagining analytical infrastructure by building data products, elevating business analysis with novel data sources and statistical methods, and providing consultation and training to partner teams.
  • Thursday, 18 February, 4:30-5:30pm
    University of Toronto DoSS toolkit launch
    Special guest Bethany White (Department of Statistical Sciences).
    Annie Collins, Haoluan Chen, Isaac Ehrlich, Mariam Walaa, Marija Pejcinovska, Mathew Wankiewicz, Michael Chong, Paul Hodgetts, Rohan Alexander, Samantha-Jo Caetano, Shirley Deng, and Yena Joo, University of Toronto
    The DoSS toolkit is a series of self-paced lessons that students can go through ahead of class, to achieve badges for various levels of accomplishment with R. Instructors can use the badges to work out the level of the class and either direct students to the toolkit to address deficiencies or cover missing aspects themselves.
  • Thursday, 4 March, 4:30-5:30pm
    Petros Pechlivanoglou, The Hospital for Sick Children (SickKids) Research Institute
    Simulation and retrospective data for health economic decision making
    Petros Pechlivanoglou, PhD, is a Scientist at The Hospital for Sick Children (SickKids) Research Institute and an Assistant Professor at the University of Toronto, Institute of Health Policy Management and Evaluation. He studied economics in his native country, Greece, econometrics at the University of Groningen, the Netherlands and obtained a PhD in health econometrics from the same university. He completed a post-doctoral fellowship at the University of Toronto, within the Toronto Health Economics and Technology Assessment (THETA) Collaborative where he focused on methodological aspects around the application of decision analysis in health-care policy.
  • Thursday, 11 March, 4:30-5:30pm
    Lucas Cherkewski, Canadian Digital Service
    Using publicly-available data to better understand the government’s operations
    Lucas Cherkewski is a policy advisor at the Canadian Digital Service (CDS). He helps delivery teams improve government services. From that experience, he advises on structural changes to make better services the default. This work includes plenty of data-enabled research and analysis—Lucas is in a happy place when his work leads him to spend an afternoon poking around a dataset, trying to better understand government so he can help change it.
  • Monday, 15 March, 4:00-5:00pm
    Todd Feathers, Freelance reporter
    Major Universities are Using Race as a “High Impact Predictor” of Student Success
    Jointly hosted with Maryclare Griffin, University of Massachusetts Amherst.
    Todd Feathers is a freelance journalist covering artificial intelligence, surveillance, and the technologies changing our world. He spent years at daily newspapers reporting on politics, criminal justice, and health care. On every beat, new tech is solving problems and creating them. His goal is to use data, scientific research, and inside sources to cut through the hype and examine what our gadgets and algorithms really do. Writing in Vice, OneZero, The Wall Street Journal, and others.
  • Thursday, 18 March, 4:30-5:30pm
    Sofia Ruiz-Suarez, National University of Comahue
    Animal tracking data
    Sofia Ruiz-Suarez holds an undergraduate degree in mathematics from the University of Buenos Aires and now is a PhD candidate at the institute for Research on Biodiversity and Environment. She also teaches mathematics at the University of Comahue and leads R-Ladies at her local city. Her research is focused on Bayesian statistics with applications in animal behaviour and movement.
  • Thursday, 25 March, 4:30-5:30pm
    Alex Cookson, Muse
    The power of great datasets Alex Cookson is a Data Scientist at Muse, where he helps make the most of their data. In his spare time, you can find him participating in Tidy Tuesday or thinking up cool datasets to explore. And when he’s not doing that, he’s probably cycling around Toronto or doting on his two cats, Tom Tom and Ruby.
    Alex will explore the power of great datasets, and discuss the importance of interesting, fun datasets as a way to guide and motivate learning R.
  • Thursday, 1 April, 4:30-5:30pm
    Vik Pant, Natural Resources Canada
    Supporting the Integration of Science and Policy through Data Science and Artificial Intelligence
    Dr. Vik Pant is the Chief Scientist and Chief Science Advisor of Natural Resources Canada (NRCan). He leads the Office of the Chief Scientist at NRCan and reports directly to the Deputy Minister. His office oversees the development and implementation of evidence-based science policy across NRCan sectors and agencies. His office also manages NRCan’s enterprise-wide technology strategy and portfolio of science products. He also runs the Digital Accelerator, which is an innovation platform for designing and launching AI-driven software products in NRCan. Vik is also the Founder of Synthetic Intelligence Forum, which is a leading community of practice focused on the industrial application of Artificial Intelligence (AI). He earned a doctorate from the Faculty of Information (iSchool) in the University of Toronto, a master’s degree in business administration with distinction from the University of London, a master’s degree in information technology from Harvard University, where he received the Dean’s List Academic Achievement Award, and an undergraduate degree in business administration from Villanova University. Vik serves as an Adjunct Professor in the Faculty of Information (iSchool) at the University of Toronto.
  • Thursday, 8 April, 4:30-5:30pm
    Mining Process Models from Email Data
    Faria Khandaker, University of Toronto
    Faria is a 2nd year student of the Information Systems and Design Concentration at the Faculty of Information and is one of the co-hosts of the Toronto Data Workshop. She holds an Honour’s Bachelor of Science Degree in Anthropology and Human Biology from the University of Toronto Scarborough. Since starting her masters, she became interested in research related to data-driven decision making within organizations. Under the supervision of Professor Arik Senderovich, she is researching topics related to the application of Machine Learning within the field of Process Mining and exploring various methodologies for gaining insights from email driven business processes.
    Faria will discuss mining process models from email data.
  • Thursday, 15 April, 4:30-5:30pm
    Emily A. Sellars, Yale University
    Missing data and mis-measurement in Mexico’s 1900 census and the Historical Archive of Localities (AHL)
    Emily A. Sellars is an assistant professor in the Department of Political Science at Yale University. Before coming to Yale, she was an assistant professor in the Bush School of Government and Public Service at Texas A&M University and a postdoctoral scholar at the University of Chicago’s Harris School of Public Policy. She received her Ph. D. in Political Science and Agricultural and Applied Economics from the University of Wisconsin–Madison in 2015. Her research interests are at the intersection of political economy and development economics. Her research examines the political economy of emigration and population.
  • Thursday, 22 April, 4:30-5:30pm
    Panel on teaching data-focused topics
    Aimee Schwab-McCoy, Creighton University
    Ashley Juavinett, UC San Diego
    Chris Papalia, St. Andrew’s College
    Samantha-Jo Caetano, University of Toronto
    Aimee Schwab-McCoy is an Assistant Professor of Statistics at Creighton University. Ashley Juavinett is a neuroscientist, an educator, and a writer, currently working as an Assistant Teaching Professor at UC San Diego. Chris Papalia is a Mathematics and Science Teacher and Head of House at St. Andrew’s College. Samantha-Jo Caetano is an Assistant Professor, Teaching Stream, at the University of Toronto.

Fall 2020

Thanks to Hidaya Ismail for the hex sticker.

  • Thursday, 3 September 2020, 4-5pm
    Using hospital data
    Erik Drysdale, The Hospital for Sick Children
    Erik works as a Machine Learning Specialist at the Hospital for Sick Children (SickKids) for the Goldenberg Lab and AI in Medicine (AIM) initiative. His professional responsibilities include the development and training of the machine learning models for various pediatric data science projects. His research interests are focused on the intersection of statistics and machine learning methods such as high-dimensional inference, survival analysis, and optimization methods. Erik will talk about the challenges of applying ML in hospital data and why statistics still matters in ML.
  • Tuesday, 8 September 2020, 3:30-4:30pm
    UK A levels algorithm issues Jointly hosted with Gillian Hadfield and the Schwartz Reisman Institute for Technology and Society.
    Sophie Bennett, Industry data scientist
    Sophie Bennett holds an undergraduate degree in Experimental Psychology from the University of Oxford and a PhD in Neuroscience from King’s College. She is the lead data scientist at Up Learn, a London-based online learning platform specialising in A levels. In this role, she conducts evaluations of course effectiveness and uses data to improve instruction and curriculum design. She is passionate about increasing the use of responsible evidence and statistics to guide social policy, and, in her spare time, enjoys working with publicly available datasets to explore London demographics, social issues and infrastructure. Sophie will discuss A Levels, Ofqual and algorithms.
  • Thursday, 10 September 2020, 4-5pm
    Toronto Data Lab launch event.
    A Mahfouz, University of Toronto
    Diego Mamanche Castellanos, University of Toronto
    Hidaya Ismail, University of Toronto
    Ke-Li Chiu, University of Toronto
    Paul Hodgetts, University of Toronto
    Various Toronto Data Lab projects will be presented including arxivdl, aRianna, cesR, and more!
  • Thursday, 17 September 2020, 4-5pm
    Cancer and AI
    Amber Simpson, Queen’s University
    Dr. Simpson is the Canada Research Chair in Biomedical Computing and Informatics and Associate Professor in the School of Computing (Faculty of Arts and Science) and Department of Biomedical and Molecular Sciences (Faculty of Health Sciences). She specializes in biomedical data science and computer-aided surgery. Her research group is focused on developing novel computational strategies for improving human health. Dr Simpson will discuss cancer and AI.
  • Thursday, 24 September 2020, 4-5pm
    Talking to non-statisticians about statistics
    Chelsea Parlett-Pelleriti, Chapman University
    Chelsea is a PhD candidate and full-time instructional faculty at Chapman University where her research focuses on using novel statistical and Machine Learning methods (mostly Bayesian statistics, IRT models, and clustering) to behavioral data. As an instructor she teaches Python, R and Data Science, and loves using novel technology (like TikTok, Twitch, and flipped classes) to better engage and inspire students. Chelsea will discuss talking to non-DS team members about DS topics.
  • Thursday, 1 October 2020, 4-5pm
    Canadian demographics by riding (1991-2015)
    Florence Vallée-Dubois, Université de Montréal
    Florence Vallée-Dubois is a Ph.D. candidate at the department of political science of the University of Montreal. She is also a member of the Centre for the Study of Democratic Citizenship and Canada Research Chair in Electoral Democracy. Her research interests focus on Quebec and Canadian politics, political behaviour and quantitative methods. Her doctoral project focuses on the political behaviour and democratic representation of seniors in Canada. Florence will discuss Canadian Demographics by Electoral Riding (1991-2015).
  • Thursday, 8 October 2020, 4-5pm
    Yim Register, University of Washington Data Lab
    Self-advocacy within machine learning systems
    Yim Register (they/them) is a radical optimist, child advocate, and PhD student at the University of Washington Data Lab exploring what self-advocacy looks like within machine learning systems. They study how empowering novices with Data Science knowledge can impact their participation and joy in an AI-driven world! Their passion project right now is writing a book called Life Lessons from Algorithms, a book that teaches how machine learning algorithms work through trauma recovery skills. Yim will discuss self-advocacy within machine learning systems.
  • Thursday, 22 October 2020, 4-5pm
    Panel discussion on data-focused resources at the University of Toronto.
    Jeff Waldman, University of Toronto
    Leanne Trimble, University of Toronto
    Leslie Barnes, University of Toronto
    Lisa Strug, University of Toronto
    Jeff Waldman is the Manager, Institutional Data Governance; Leslie Barnes is the Digital Scholarship Librarian at UTL; Leanne Trimble is the Data and Statistics Librarian at UTL; Lisa Strug is a Senior Scientist in the Program of Genetics and Genome Biology, Associate Director of The Centre for Applied Genomics, Professor of Statistical Sciences and Biostatistics at the University of Toronto, and Director of CANSSI Ontario.
  • Thursday, 29 October 2020, 4-5pm
    Fei Chiang, McMaster University
    Data currency and applications
    Fei Chiang is an Associate Professor in the Department of Computing and Software (Faculty of Engineering), the Director of the Data Science Research Group, and a Faculty Fellow at the IBM Centre for Advanced Studies. She served as an inaugural Associate Director of the MacData Institute. Her research interests and industrial experience is in data management, spanning data cleaning, data quality, data privacy, data fusion, and database systems. Professor Chiang will discuss data currency and its applications.
  • Thursday, 5 November 2020, 4-5pm
    Andrew Whitby, Industry data scientist
    Censuses: The sum of the people
    Andrew is a data scientist and economist currently looking for his next challenge. He is particularly interested in the economics of technology, creativity, innovation and growth. He wrote The Sum of the People: How the Census Has Shaped Nations from the Ancient World to the Modern Age which was published in March 2020. Previously, he worked as a Data Scientist at the World Bank, and at Nesta, the UK’s innovation think tank. His academic background combines economics, statistics and computer science. He completed his doctoral research in the Department of Economics at the University of Oxford. Andrew will discuss censuses.
  • Monday, 9 November 2020, 4-5pm
    Tom Cardoso, Globe and Mail
    Bias Behind Bars
    Tom Cardoso is a crime and justice reporter and data journalist for The Globe and Mail. Tom will discuss his Bias Behind Bars series of articles which show Black and Indigenous inmates in Canada are more likely to get worse scores than white inmates, based solely on their race.
  • Thursday, 12 November 2020, 4-5pm
    Kevin Armstrong, University of Toronto
    Measuring poverty for NGOs
    Kevin Armstrong is a Masters of Information student at the University of Toronto, and a data consultant for ‘Women’s Integrated Sexual Health’ (WISH) - a three-year program delivering integrated health care in 16 countries in Africa and South Asia. Kevin will discuss methods for measuring poverty and use cases for NGOs.
  • Thursday, 19 November 2020, 4-5pm
    High-throughput Bayesian modelling workflow
    Michael Chong, University of Toronto
    Michael is a PhD student in the Department of Statistical Sciences at the University of Toronto building models for demographic estimation. Previously, he completed his BSc in Integrated Science at McMaster University. Michael will discuss lessons from a high-throughput Bayesian modelling workflow
  • Thursday, 3 December 2020, 5-6pm
    Monica Alexander, University of Toronto
    Using Facebook advertising data to estimate migration Monica Alexander is an Assistant Professor in Statistical Sciences and Sociology at the University of Toronto. She received her PhD in Demography from the University of California, Berkeley. Her research interests include statistical demography, mortality and health inequalities, and computational social science. Monica will talk about using Facebook advertising data to estimate migration. A recording of the talk is available here: https://youtu.be/xM1vf_KT76g.
  • Thursday, 10 December 2020, 4-5pm
    Population data estimation
    Shabrina Mardevi, United Nations Population Fund and University of Toronto
    Romesh Silva, United Nations Population Fund
    Shabrina is a Masters of Information student at the University of Toronto and a Population Data Estimation and Analysis Intern at the United Nations Population Fund. Romesh holds a PhD in Demography from the University of California, Berkeley, and is a Technical Specialist, Health & Social Inequalities, at the United Nations Population Fund.
  • Thursday, 17 December 2020, 4-5pm
    Panel discussion on teaching data-focused topics.
    Liza Bolton, University of Toronto
    Maria Tackett, Duke University
    Nathalie Moon, University of Toronto
    Teon Brooks, Mozilla Firefox
    Liza Bolton is an Assistant Professor, Teaching Stream, at the University of Toronto. Maria Tackett is an Assistant Professor of the Practice in the Department of Statistical Science at Duke University. Nathalie Moon is an Assistant Professor, Teaching Stream, University of Toronto. Teon L. Brooks, holds a PhD in experimental psychology from NYU, and now works as a data scientist for Mozilla Firefox. He also serves as the technical advisor and President of BrainWaves, an NIH-funded project to teach experimentation and cognitive neuroscience to high school students in NYC, and has co-founded Computation in Education Labs (CIEL), a nonprofit that aims to further the mission of the BrainWaves project while focusing on data science and computational thinking.

Summer 2020

  • Thursday, 21 May 2020, 4-5pm
    Rohan Alexander, University of Toronto, Faculty of Information
    Rohan Alexander is a post-doctoral fellow at the Faculty of Information, University of Toronto. He holds a PhD in Economics from the Australian National University. He will talk about getting data from PDFs into R, with an application to the Kenyan census.
  • Thursday, 28 May 2020, 4-5pm
    Shiro Kuriwaki, Harvard University, Government
    Shiro is a Ph.D. Candidate at the Department of Government, Harvard University. His research focuses on democratic representation in American Politics, for instance cast vote records, public opinion, survey methods, and applied statistics more generally. Shiro will bring together best practices for organizing data and code in the social sciences that experts have proposed with some of his own experience. He will propose a project-oriented workflow that adopts a minimal and consistent file organization structure within a single project, using RStudio Projects and GitHub. He will then discuss how to organize multiple projects that share common components, and propose the use of custom R packages to share code and Dataverse to share large datasets. He will use some of his own projects involving the Cooperative Congressional Election Study (CCES), one of the largest political surveys of American Politics, as a demonstration.
    Recording available here.
  • Thursday, 4 June 2020, 4-5pm
    Marija Pejcinovska, University of Toronto, Department of Statistical Sciences
    Marija is a second-year Ph.D. student in Statistics at the University of Toronto. Her research interests are in applied statistics, specifically the application of Bayesian methods to data and modeling challenges that arise in demography, public health, and certain areas of the social sciences. Marija will talk about a current project with the World Health Organization (WHO) focused on estimating global maternal mortality to share her R workflow and the different tools and packages she’s found helpful in the data processing stage. More specifically, she’ll be sharing a few ways of dealing with text and date data in R.
  • Thursday, 11 June 2020, 4-5pm
    Harrison Jones, Deloitte
    Harrison is a Manager at Deloitte in Toronto, where he focuses on data analytics and machine learning in the property & casualty insurance, life insurance, health insurance, pensions, and the public sector. Harrison will talk about using R with actuarial data.
  • Thursday, 18 June 2020
    Cancelled in support of the Black Lives Matter movement and to provide an opportunity for reflection and learning. We also announced stipends in support of University of Toronto students or recent graduates who identify as a member of a visible minority group, racialized group, or as a person of colour.
  • Thursday, 25 June 2020, 4-5pm
    A Mahfouz, University of Toronto, Information
    A is a Master of Information student at the University of Toronto with a background in geography. Their prior work has been largely concerned with data pipelines. A will talk about geographic data cleaning, extracting mappable data from Google Directions API results in Python.
    Slides available here.
  • Thursday, 2 July 2020, 4-5pm
    Heather McBrien, University of Toronto, Department of Statistical Sciences
    Heather just graduated from the Statistics BSc program at the University of Toronto, and is interested in modelling in population health research, particularly using novel data sources to answer questions where traditional data is lacking. Heather will talk about how the data that we collect can bias the results that we obtain and our knowledge of the problem.
    Slides available here.
  • Thursday, 9 July 2020, 4-5pm
    Roxanne Chui, University of Toronto, Faculty of Information
    Roxanne is an emerging anthropological data science professional. She did her BSc program in Forensic anthropology and worked in the pharmaceutical industry before doing her Masters in data science. She is passionate about excavating context from data for predicting future patterns of human behaviour. Roxanne will talk about an EDA approach to Tokyo AirBnB datasets and pattern discovery in listing prices using R - ‘What do we have here among millions of observations?’
    Slides available here.
  • Thursday, 16 July 2020, 4-5pm
    Casey Breen, University of California, Berkeley, Demography
    Casey is a PhD student in the Demography Department at Berkeley. He previously worked at the Institute for Social Research and Data Innovation, home of IPUMS. Casey will talk about CenSoc, which is a project to link 1940 Census data with Social Security Administration mortality records in the US.
  • Thursday, 23 July 2020, 11am-noon
    Marta Kołczyńska, Institute of Political Studies of the Polish Academy of Sciences
    Marta is an Assistant Professor at the Institute of Political Studies of the Polish Academy of Sciences and a visiting researcher in the Probabilistic Machine Learning Group, Department of Computer Science, Aalto University. Her research interests include comparative analyses of political attitudes and behavior across nations and over time, as well as the methodology of comparative research, in particular cross-national surveys. Marta will talk about cleaning survey data, in particular a project in which she gathers political trust items from different cross-national survey datasets to model time trends, and the tools she has developed to facilitate this work.
  • Thursday, 30 July 2020, 4-5pm
    Alex Luscombe, University of Toronto, Criminology and Sociolegal Studies
    Alexander McClelland, Carleton University, Criminology and Criminal Justice
    Alex Luscombe is a PhD student in the Centre for Criminology & Sociolegal Studies at the University of Toronto and a Junior Fellow at Massey College. Alexander McClelland is an Assistant Professor at the Institute of Criminology and Criminal Justice, Carleton University. They will talk about Policing the Pandemic, which is a project that was launched on 4 April, 2020, to track and visualize the massive and extraordinary expansions of police power in response to the COVID-19 Pandemic and the unequal patterns of enforcement that may arise as a result.
    Slides available here.
  • Thursday, 6 August 2020, 4-5pm
    Sharla Gelfand, Freelance R Developer
    Sharla is a freelance R developer specializing in enabling easy access to data and replacing manual, redundant processes with ones that are automated, reproducible, and repeatable. They will talk about creativity in R.
    Slides available here.
  • Thursday, 13 August 2020, 4-5pm
    Richard Iannone, R Studio
    Rich is a Software Engineer at R Studio. Rich will talk about pointblank, which is an R package that allows workflows involving nice and easy data validation in reproducible documents.
  • Thursday, 20 August 2020, 4-5pm
    Aije Egwaikhide, IBM
    Aije Egwaikhide holds an undergraduate degree in Economics and Statistics from the University of Manitoba, and a post-graduate degree in Business Analytics from St. Lawrence College, Kingston. She works at IBM where she is a Lead Data Scientist on the System Enablement group. Aije will talk about preparing data for optical character recognition (OCR).

Winter 2020

  • Friday, 24 January 2020, noon
    Steven Pimentel, University of Toronto, business intelligence
  • Friday, 31 January 2020, noon
    Arik Senderovich, University of Toronto, Information
  • Friday, 7 February 2020, noon
    Kathy Chung, University of Toronto, Records of Early English Drama
  • Friday, 14 February 2020, noon
    Josh Harris, KOHO
  • Friday, 28 February 2020, noon
    Eugene Joh, St. Michael’s Hospital
  • Friday, 6 March 2020, noon
    Fatemeh Nargesian, University of Rochester, Computer Science

Fall 2019

  • Thursday, 26 September 2019, noon
    Periklis Andritsos, ODAIA & University of Toronto, Information
  • Thursday, 10 October 2019, noon
    Hassan Teimoori, Deloitte, Omnia AI
    Ludovic Rheault, University of Toronto, Political Science
  • Wednesday, 16 October 2019, noon
    Lauren Kennedy, Columbia University
  • Thursday, 24 October 2019, noon
    Sharla Gelfand, Freelance R and Shiny developer
  • Thursday, 7 November 2019, noon
    Maria D’Angelo, Delphia
    Hareem Naveed, Munich Re
  • Thursday, 21 November 2019, noon
    Michelle Alexopoulos, University of Toronto, Economics
    Paraskevi Massara, University of Toronto, Medicine