Toronto Data Workshop
Overview
The Toronto Data Workshop (TDW) brings together academia and industry to share data science best practice. We are broadly interested, but especially in the data-centric aspects of a data science project. We meet weekly for an hour, typically Fridays at noon (Toronto time), hybrid (in-person at the Faculty of Information and online via Zoom), with most talks recorded and shared. For an invitation please sign up here. Anyone is welcome to attend—it is free and you don’t need to be affiliated with the university.
Current organizing committee:
- Kelly Lyons,
- Michaela Drouillard,
- Rohan Alexander.
Past committee members:
- Amy Farrow (2021-22)
- Faria Khandaker (2020-21)
- Lorena Almaraz De La Garza (2021-22)
The TDW is a joint initiative between the Faculty of Information and the Department of Statistical Sciences at the University of Toronto and we especially thank Wendy Duff and Radu Craiu for their support.
Fall 2023
- Friday 22 September 2023, noon - 1pm
- Saloni Dattani, Our World in Data
- Missing data in global health
- Saloni Dattani is a researcher on global health at Our World in Data, and an editor at Works in Progress. She’s interested in everything related to global health, epidemiology and meta-science.
- Friday 29 September 2023, noon - 1pm
- Nima Sarajpoor, Manulife
- STUMPY: A powerful tool for modern time series analysis
- Nima Sarajpoor is a data scientist in Manulife, working in Fraud Detection. He has been contributing to the software STUMPY for about two years
- Nima Sarajpoor, Manulife
- Friday 6 October 2023, noon - 1pm
- Fabrizio Dell’Acqua, Harvard Business School
- Experimental evidence from a study of Boston Consulting Group consultants on the effect of access to GPT-4 on performance. Paper here.
- Fabrizio Dell’Acqua is a postdoctoral research fellow and teaching fellow at Harvard Business School and the Laboratory for Innovation Science at Harvard (LISH). He received a Ph.D. in Management from Columbia Business School. His research focuses on the areas of automation, human/AI collaboration, and business ethics.
- Friday 13 October 2023, noon - 1pm
- Tom Cardoso, The Globe and Mail
- Secret Canada
- Tom Cardoso is a member of The Globe and Mail’s investigations team based in Toronto. Since the fall of 2021, he has been investigating Canada’s broken freedom of information systems.
- Friday 20 October 2023, noon - 1pm
- Marzieh Fadaee, Cohere for AI
- Aya: An Open Science Initiative to Accelerate Multilingual AI Progress
- Friday 27 October 2023, noon - 1pm
- Wendy Foster, Shopify
- Socio-technical processes for data integrity
- Wendy Foster is Director of Engineering and Data, Optimize, at Shopify
- Friday 3 November 2023, noon - 1pm
- Apoorva Lal, Netflix
- Modern balancing methods for causal inference
- Data scientist on the experimentation team at Netflix working on observational causal inference with spatial and panel data
- Friday 17 November 2023, noon - 1pm
- Kevin Wilson, Borealis AI
Past talks
Fall 2022
- Friday 9 September 2022, noon - 1pm
Ryan Briggs, University of Guelph
Statistical power in political science
Ryan Briggs is a social scientist at the University of Guelph - Thursday 22 September 2022, 5pm - 6pm
Melina Vidoni, Australian National University
Dr Vidoni is a Lecturer at the Australian National University in the CECS School of Computing, where she continues her domestic and international collaborations with Canada and Germany. Dr Vidoni’s main research interests are mining software repositories, technical debt, software development, and empirical software engineering when applied to data science and scientific software.
- Friday 30 September 2022, noon - 1pm
Emily Giambalvo and Ence Morse, The Washington Post
How the NFL blocks Black coaches
Emily Giambalvo covers University of Maryland athletics for The Washington Post, where she has worked since June 2018. Emily grew up in South Carolina and graduated from the University of Georgia.
Clara Ence Morse is an Investigative Reporting Workshop intern with The Washington Post’s data desk. She is a student at Columbia University and the editor in chief of the Columbia Daily Spectator.
- Friday 7 October 2022, noon - 1pm
Rohan Alexander, University of Toronto
Rethinking data science - Friday 14 October 2022, noon - 1pm
April Wang, University of Michigan
Reimagining Tools for Collaborative Data Science
April Wang is a Ph.D. candidate at University of Michigan School of Information, advised by Dr. Steve Oney and Dr. Christopher Brooks. Her research in human-computer interaction (HCI) explores barriers in real-world data science programming practices, and reimagines the workflow and interfaces for collaborative data science environments.
- Friday 21 October 2022, noon - 1pm
Meg Risdal, Kaggle (Google)
Meg Risdal is a lead Product Manager for Kaggle (a Google company) where she works with software developers, designers, and researchers to create great experiences for people learning ML, ML practitioners, and ML researchers.
- Friday 4 November 2022, noon - 1pm
Maitreyee Sidhaye and Meggie Debnath, St. Michael’s Hospital, Unity Health Toronto
The Things We Learned from Deploying AI in Healthcare
Maitreyee and Meggie are data scientists in the Data Science & Advanced Analytics (DSAA) unit at St. Michael’s Hospital in Toronto. DSAA is a multiteam unit in the hospital that provides data sciecne and machine learning solutions across a variety of problems: clinical prediction tools, staffing optimization, imaging, and more. DSAA works very closely with others in the hospital to collaborate on understanding problems and providing solutions. Maitreyee and Meggie will share their experiences and learnings from building and deploying machine learning tools in the hospital.
- Friday 18 November 2022, noon - 1pm
Lindsay Katz, University of Toronto
A new, comprehensive database of all proceedings of the Australian Parliamentary Debates
Lindsay Katz holds a Masters of Statistics from the University of Toronto and a Bachelor of Arts and Science from the University of Guelph where she specialized in Mathematical Science and International Development. At Guelph she worked with Professor Ryan Briggs to explore lived poverty in Africa using Afrobarometer data. At Toronto she works with Professor Monica Alexander to research demographic variation in short-term migration patterns using Facebook data, and with Professor Rohan Alexander to digitize the Australian parliamentary debates from 1901 to present. As an interdisciplinary researcher, she is interested in using statistics to better understand social processes in the world. - Friday 25 November 2022, noon - 1pm
Marcel Fortin and Leanne Trimble, U of T Map & Data Library
Recent additions to the Map and Data Library
The Map and Data Library’s data collections, software & support, with a focus on recently acquired datasets.
Leanne Trimble is a Data & Statistics Librarian, and Marcel Fortin is Head, Map and Data Library. - Friday 16 December 2022, noon - 1pm
Zane Schwartz, Investigative Journalism Foundation
Zane Schwartz is the editor-in-chief of the Investigative Journalism Foundation.
Winter 2022
- Fri 28 Jan 2022, noon - 1pm
Ashok Chaurasia, University of Waterloo
Multiple Imputation: Old and New Combining Rules for Statistical Inference
Dr. Ashok Chaurasia is an Assistant Professor (of Statistics) in the School of Public Health Sciences at University of Waterloo. His background/training is in Statistics, with research interests in topics of Missing Data, Data Imputation, Model Selection, and Longitudinal Data Analysis Methodology. - Fri 4 Feb 2022, noon - 1pm
Nick Huntington-Klein, Seattle University
The Influence of Hidden Researcher Decisions in Applied Microeconomics
I am an economics professor at Seattle University, with research that focuses on higher education, econometrics, and metascience. - Fri 11 Feb 2022, noon - 1pm
Silvia Canelón, University of Pennsylvania
Lessons Learned from EHR Research
Silvia Canelón is a postdoctoral research scientist in the Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania where she applies biomedical informatics to population health research. She uses R to work on projects that develop novel data mining methods to extract pregnancy-related information from Electronic Health Records (EHR) and that study the relationship between environment and disease. - Fri 18 Feb 2022, noon - 1pm
Vincent Arel-Bundock, Université de Montréal
Whatmodelsummary
taught me about R package development
I am a political science professor at the Université de Montréal. - Fri 4 Mar 2022, noon - 1pm
Maria Kamenetsky, University of Wisconsin-Madison
Spatial clustering
I am a PhD candidate in Epidemiology at the University of Wisconsin-Madison, where I also completed my MS in Statistics. My research focuses on methods in spatial epidemiology, specifically working on statistical methods and applications in spatial cluster detection. - Fri 11 Mar 2022, noon - 1pm
Irena Papst, McMaster University
Using models to help guide pandemic response
I’m a postdoctoral fellow in McMaster’s Mathematics & Statistics department, where I work on mathematical modelling, especially of infectious disease dynamics. I did my PhD in Cornell’s Center for Applied Mathematics. I care deeply about reproducible research, clear scientific communication, good teaching, and big salads. - Fri 18 Mar 2022, noon - 1pm
May Chan and Ramses Van Zon
- Thu 24 Mar 2022, 5pm - 6pm
Emi Tanaka, Monash University
An anthology of experimental designs
Dr. Emi Tanaka is an assistant professor in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area include data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter’s Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops. - Fri 1 Apr 2022, noon - 1pm
Brittany Witham, Geopolitica
Data science at a startup
Originally from Melbourne, Australia, Brittany received her B.A. in International Studies from the University of Saskatchewan and started her career in economic development, equipping her with comprehensive knowledge of foreign direct investment and international business early in her career. She went on to obtain an M.A. in European and Russian Affairs from the University of Toronto in 2018, where she first discovered the potential of programming for political science and became fascinated with artificial intelligence (AI). Over the past three years, Brittany has worked in many facets of the AI industry, from leading research and development of new AI products for video game developers to building automated data pipelines for business intelligence and managing software engineering and client engagement teams. In that time, she has honed technical skills in full-stack development, machine learning, and data engineering. She recently struck out on her own to launch an online global event monitoring tool and deliver novel solutions to clients in the political risk and social enterprise sectors. Brittany is a firm believer in the potential for data-driven technologies for geopolitics and is excited to contribute to the many discoveries to be made in this space.
Fall 2021
- Friday, 24 September 2021, noon - 1pm
Karen Chapple, Department of Geography and Planning/School of Cities
Data science and geography, planning, cities
Karen Chapple is the inaugural Director of the School of Cities and Professor of Geography and Planning at the University of Toronto. Her research uses data science methods to identify and predict gentrification and displacement in cities. She is Professor Emerita at the University of California, Berkeley, where she helped to launch the undergraduate data science program. - Friday, 1 October 2021, noon - 1pm
The 2021 Canadian Election
David Andrews
Daniel Rubenson
Johnson Vo
Eric Zhu, Brian Diep, Ashely (Jing Yuan) Zhang, Kristin (Xi Yu) Huang, and Tanvir Hyder - Friday, 8 October 2021, noon - 1pm
Fedor Dokshin, Department of Sociology
Data science and sociology
Fedor Dokshin is an Assistant Professor of Sociology at the University of Toronto. He is a computational social scientist with research interests in social networks, organizations, and energy and the environment. Across these domains, Fedor leverages data science methods and novel data sources to improve existing measurement strategies.
- Friday, 15 October 2021, noon - 1pm
Drew Stommes, Department of Political Science, Yale University
On the reliability of published findings using the regression discontinuity design in political science
Drew Stommes is a doctoral candidate in the Department of Political Science at Yale University, where he researches democracy, political violence, and quantitative methods. He will talk about a recent working paper. - Friday, 22 October 2021, noon - 1pm
Tegan Maharaj, Faculty of Information
Data science and information
I study AI systems and “what goes into” them, e.g. their real-world deployment context, and the effects that has on learning behaviour and generalization. I do that because I want to be able to use AI systems responsibly for problems I think are important, like impact and risk assessments for climate change, AI alignment, ecological management and other common-good problems. My website is: http://www.teganmaharaj.com/.
- Friday, 29 October 2021, noon - 1pm
Josh Speagle, Astronomy & Astrophysics, Dunlap Institute, Statistical Sciences
Data science and astronomy
Josh is a Banting & Dunlap Postdoctoral Fellow at the University of Toronto whose research focuses on using astrostatistics and “data science” to understand how galaxies like our own Milky Way form, behave, and evolve. - Friday, 5 November 2021, noon - 1pm
Yun William Yu, Math Department, UofT; UTSC Computer & Mathematical Sciences
Data science and math
Yun William Yu is an assistant professor in the math department at UofT whose research focuses on algorithmic methods for computational biology and medical informatics. - Friday, 12 November 2021, noon - 1pm
Ann Glusker, Doe Library, University of California Berkeley
“Supporting Big Data Research at the University of California, Berkeley”
Dr Ann Glusker is Librarian for Sociology, Demography, Public Policy, Psychology (fall 2021) & Quantitative Research at the Doe Library, University of California, Berkeley. She will discuss a recently released report ‘Supporting Big Data Research at the University of California, Berkeley’. This report provides insights on researcher practices and challenges in six thematic areas: data collection & processing; analysis: methods, tools, infrastructure; research outputs; collaboration; training; and balancing domain vs data science expertise.
- Friday, 19 November 2021, noon - 1pm
Radu Craiu, Statistical Sciences, Univeristy of Toronto
Data science and statistical sciences
Dr Radu V. Craiu is Professor and Chair of Statistical Sciences at the University of Toronto. His main research interests are in computational methods in statistics, especially, Markov chain Monte Carlo algorithms (MCMC), Bayesian inference, copula models, model selection procedures and statistical genetics. - Friday, 26 November 2021, noon - 1pm
Kieran Campbell, Lunenfeld Tanenbaum Research Institute
Data science and Biomedicine
Dr Kieran Campbell is an investigator at the Lunenfeld-Tanenbaum Research Institute and an assistant professor at the Departments of Molecular Genetics and Statistical Sciences, University of Toronto. His research focusses on Bayesian models and machine learning for high dimensional biomedical data, including single-cell and cancer genomics. Recently, he has led efforts to develop statistical machine learning methodology to integrate single-cell RNA and DNA sequencing data to uncover the effects of tumour clonal identity on gene expression, as well as methods to automatically delineate the tumour microenvironment from single-cell RNA-sequencing data. Such findings can improve our understanding of cancer progression and of why certain tumours are resistant to therapies, leading to relapse. - Friday, 3 December 2021, noon - 1pm
Leanne Trimble, UofT Libraries
Data science and libraries
Leanne Trimble is a data librarian at the Map & Data Library, University of Toronto Libraries. - Friday, 10 December 2021, noon - 1pm
Nathan Taback, Departments of Statistical Sciences
Teaching data science
Nathan Taback is the director of Data Science programs and an Associate Professor, Teaching Stream in the Department of Statistical Sciences, and Computer Science (cross-appointed) at the University of Toronto. He currently serves as a Special Advisor to the Dean of Arts and Science on Computational and Data Science Education.
Summer 2021
- Friday, 21 May 2021, noon - 1pm
David Shor, OpenLabs
Political data science. David is an American data scientist who tries to elect Democrats. He is known for analyzing political polls and currently serves as head of data science with OpenLabs, a progressive nonprofit, and also as a Senior Fellow with the Center for American Progress Action Fund.
- Friday, 28 May 2021, noon - 1pm
Samantha Pierre, University of Toronto
And The Nominees Are… An Empirical Study of the Effects of a Tony Award Win and Nomination on a Show’s Success
Samantha is a fourth-year statistics student studying at the University of Toronto. Throughout the past year she has combined her love for theatre and statistics to analyze trends in the theatre community. She volunteers as a member of PAIR-CG to create a representational framework for the international theatre community. She currently works at WOMBO, an app developed by former U of T students, as head of music content.
- Friday, 4 June 2021, noon - 1pm
Heather Krause, We All Count
Equity in Data (or, how not to accidentally use data like a racist, sexist, colonialist, etc)
Heather remains unconvinced. As a statistician with decades of global experience working on complex data problems and producing real-world knowledge, she has developed the Data Equity Framework to address the equity issues in data products and research projects. Her emphasis is on combining strong statistical analysis with clear and meaningful communication. She is currently working on implementing tools for equity and ethics in data. As the founder of two successful data science companies, she attacks the largest questions facing societies today, working with both civic and corporate organizations to improve outcomes and lives. Her relentless pursuit of clarity and realism in these projects pushed her beyond pure analysis to mastering the entire data ecosystem including award-winning work in data sourcing, modeling, and data storytelling, each incorporating bleeding edge theory and technologies. Heather is the founder of We All Count, a project for equity in data working with teams across the globe to embed a lens of ethics into their data products from funding to data collection to statistical analysis and algorithmic accountability. Her unique set of tools and contributions have been sought across a range of clients from MasterCard and Volkswagen to the United Nations, the Syrian Refugee Resettlement Secretariat, Airbnb, and the Bill and Melinda Gates Foundation. She is on the Data Advisory Board of the UNHCR.
- Friday, 11 June 2021, noon - 1pm
Laura Bronner, Data scientist
Quantitative editing
Laura is a data scientist, who most recently worked as the quantitative editor at FiveThirtyEight. More generally, she is a data scientist with an interest in causal inference, political science and quantitative text analysis. Before FiveThirtyEight, she was a Senior Analyst at the Analyst Institute, designing and analyzing field experiments for the 2018 election cycle. In September 2018, she completed a PhD in Political Science at the London School of Economics’ Department of Government.
- Friday, 18 June 2021, noon - 1pm
Jacob Matson, Simetric
From data ask to dashboard
Jacob is VP of Finance & Operations at Simetric, Inc..
- Friday, 25 June 2021, noon - 1pm
Laura Derksen, University of Toronto Mississauga
The impact of student access to Wikipedia
Laura is the Amgen Canada Professor in Health System Strategy at the University of Toronto Mississauga and assistant professor in Strategic Management at the Rotman School of Management. Her research interests are development and global health education, information and networks.
- Friday, 2 July 2021, noon - 1pm
Zachary McCaw, Google
Zachary McCaw is a data scientist at Google.
- Friday, 16 July 2021, noon - 1pm
Kamilah Ebrahim, University of Toronto
Trust in contact tracing apps
Kamilah Ebrahim received a B.A. in Economics from the University of Waterloo in 2019 and is currently pursuing a Masters of Information in Human Centred Data Science at the University of Toronto. Kamilah is a 2020-21 Graduate Fellow at the University of Toronto Centre for Ethics focusing on the intersection between race, economics and data monopolies in Canada. Prior to joining the University of Toronto she held roles at the United Nation Economic and Social Commission for Asia and the Pacific (UN ESCAP), as well as the Canadian federal government.
- Friday, 23 July 2021, noon - 1pm
Annie Collins and Rohan Alexander, University of Toronto
Reproducibility of COVID-19 pre-prints
Annie Collins is an undergraduate student at the University of Toronto specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focusses her efforts on student governance, promoting women’s representation in STEM, and working with data in the non-profit and charitable sector.
Rohan Alexander is an assistant professor at the University of Toronto in Information and Statistical Sciences, and a faculty affiliate at the Schwartz Reisman Institute for Technology and Society. He holds a PhD in Economics from the Australian National University.
- Friday, 30 July 2021, noon - 1pm
Keli Chiu, University of Toronto
Detecting sexist and racist text contents with explanations accompanied with GPT-3
Keli Chiu is a recent graduate of master in Information at the University of Toronto with the concentration in Human-Centred Data Science. Prior to pursuing her master in the information fields, she worked as a web developer and fell in love with data. Her research interests are natural language processing applications, text analysis and ethics in AI and machine learning. She received rstudio::global Diversity Scholarships in the year of 2021. - Friday, 6 August 2021, 12:30pm - 1pm
Ijeamaka Anyene, Kaiser Permanente Division of Research
Taking the next step past standard charts
Ijeamaka is a data analyst working in healthcare research. She specializes in using R and SAS for data analysis, epidemiological research, and data visualizations. She is also passionate about computational art, knowledge sharing / dissemination, and how to mix the two.
- Friday, 13 August 2021, noon - 1pm
Independent Summer Statistics Community projects
‘Prospective Analytics’: Ashley Zhang, Eric Zhu, Muhammad Tsany and Sergio Zheng Zhou.
‘Statistically Significant’: Aliza Lakho, Chloris Jiang, Janhavi Agarwal, and José Casas on whether young professionals should move to Toronto.
‘Point Zero Five’: Pan Chen, Xiaoxuan Han, Yi Qin, and Yini Mao on the livability of Toronto for newcomers.
- Friday, 20 August 2021, noon - 1pm
Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan, University of Toronto
Using statistical model to analyzing shark, lizard, and basketball movement data
Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan are undergraduate students at the University of Toronto.
Winter 2021
Thanks to Paul Hodgetts for the Jays-inspired sticker.
- Thursday, 14 January, 4:30-5:30pm
Andrew Miles, University of Toronto
Code, plots, and values.
Jointly hosted with Elizabeth Parke and the UTM Collaborative Digital Research Space.
Andrew Miles is Assistant Professor of Sociology at the University of Toronto and Director of the Morality, Action, and Cognition Lab.
- Wednesday, 20 January, 4:30-5:30pm
Zia Babar, University of Toronto
Derivative data security
Zia Babar obtained his PhD from the University of Toronto where his research studies focused on the analysis and design of data-centered information systems for enabling enterprise transformation. He is engaged in a multi-year research engagement with IBM Research Labs and is a startup technical mentor at WeWork Labs. He is the organizer of technology meetup groups in both Toronto and Waterloo, and a course instructor at the Faculty of Information, University of Toronto.
- Thursday, 28 January, 4:30-5:30pm
Irene Duah-Kessie, University of Toronto
Exploring algorithmic bias and fairness and its impact on health outcomes faced by racialized communities
Irene Duah-Kessie is a graduate of the University of Toronto’s Master of Science in Sustainability Management program. Throughout her studies, Irene published her research on racial income inequality in Toronto with the Wellesley Institute and is currently a part of the Turtle Island Journal of Indigenous Health Editorial Team. Irene is a Project Manager at Across Boundaries leading an initiative to address food security and mental health challenges in Toronto’s Black community. She is also the founder of Rise In STEM, a grassroots organization that aims to increase access to STEM learning opportunities in Black and marginalized communities.
- Thursday, 4 February, 4:30-5:30pm
Kathy Ge, Uber
How data insights and experimentation help drive product design and intelligent recommendations on the Uber Eats platform
Kathy is a data scientist with Uber Eats primarily focused on the shopping experience including ranking and recommendations throughout the order flow. She received her M.Sc. in Computer Science and B.Sc in Computer Science and Statistics from the University of Toronto.
- Thursday, 11 February, 4:30-5:30pm
Garrick Aden-Buie, R Studio
Using R Markdown in general and in some specific projects
Garrick is a Data Science Educator at RStudio who lives in sunny St. Petersburg, Florida. His passion is combining creative coding with programming education, using code to build tools that teach coding to new and advanced R users alike. Like tidyexplain: a project that used ggplot2 and gganimate to reimagine database operations as colorful flying boxes instead of the typical Venn diagrams. Garrick has developed a number of open source addins and packages for RStudio—such regexplain, shrtcts and rsthemes—and is always easily distracted by projects that combine R Markdown and online learning or teaching.
- Monday, 15 February, Noon-1:00pm
Emily Riederer, Capital One
Causal design patterns for data analysts
Emily is a Senior Analytics Manager at Capital One. Emily’s team focuses on reimagining analytical infrastructure by building data products, elevating business analysis with novel data sources and statistical methods, and providing consultation and training to partner teams.
- Thursday, 18 February, 4:30-5:30pm
University of Toronto DoSS toolkit launch
Special guest Bethany White (Department of Statistical Sciences).
Annie Collins, Haoluan Chen, Isaac Ehrlich, Mariam Walaa, Marija Pejcinovska, Mathew Wankiewicz, Michael Chong, Paul Hodgetts, Rohan Alexander, Samantha-Jo Caetano, Shirley Deng, and Yena Joo, University of Toronto
The DoSS toolkit is a series of self-paced lessons that students can go through ahead of class, to achieve badges for various levels of accomplishment with R. Instructors can use the badges to work out the level of the class and either direct students to the toolkit to address deficiencies or cover missing aspects themselves. - Thursday, 4 March, 4:30-5:30pm
Petros Pechlivanoglou, The Hospital for Sick Children (SickKids) Research Institute
Simulation and retrospective data for health economic decision making
Petros Pechlivanoglou, PhD, is a Scientist at The Hospital for Sick Children (SickKids) Research Institute and an Assistant Professor at the University of Toronto, Institute of Health Policy Management and Evaluation. He studied economics in his native country, Greece, econometrics at the University of Groningen, the Netherlands and obtained a PhD in health econometrics from the same university. He completed a post-doctoral fellowship at the University of Toronto, within the Toronto Health Economics and Technology Assessment (THETA) Collaborative where he focused on methodological aspects around the application of decision analysis in health-care policy.
- Thursday, 11 March, 4:30-5:30pm
Lucas Cherkewski, Canadian Digital Service
Using publicly-available data to better understand the government’s operations
Lucas Cherkewski is a policy advisor at the Canadian Digital Service (CDS). He helps delivery teams improve government services. From that experience, he advises on structural changes to make better services the default. This work includes plenty of data-enabled research and analysis—Lucas is in a happy place when his work leads him to spend an afternoon poking around a dataset, trying to better understand government so he can help change it.
- Monday, 15 March, 4:00-5:00pm
Todd Feathers, Freelance reporter
Major Universities are Using Race as a “High Impact Predictor” of Student Success
Jointly hosted with Maryclare Griffin, University of Massachusetts Amherst.
Todd Feathers is a freelance journalist covering artificial intelligence, surveillance, and the technologies changing our world. He spent years at daily newspapers reporting on politics, criminal justice, and health care. On every beat, new tech is solving problems and creating them. His goal is to use data, scientific research, and inside sources to cut through the hype and examine what our gadgets and algorithms really do. Writing in Vice, OneZero, The Wall Street Journal, and others.
- Thursday, 18 March, 4:30-5:30pm
Sofia Ruiz-Suarez, National University of Comahue
Animal tracking data
Sofia Ruiz-Suarez holds an undergraduate degree in mathematics from the University of Buenos Aires and now is a PhD candidate at the institute for Research on Biodiversity and Environment. She also teaches mathematics at the University of Comahue and leads R-Ladies at her local city. Her research is focused on Bayesian statistics with applications in animal behaviour and movement.
- Thursday, 25 March, 4:30-5:30pm
Alex Cookson, Muse
The power of great datasets Alex Cookson is a Data Scientist at Muse, where he helps make the most of their data. In his spare time, you can find him participating in Tidy Tuesday or thinking up cool datasets to explore. And when he’s not doing that, he’s probably cycling around Toronto or doting on his two cats, Tom Tom and Ruby.
Alex will explore the power of great datasets, and discuss the importance of interesting, fun datasets as a way to guide and motivate learning R. - Thursday, 1 April, 4:30-5:30pm
Vik Pant, Natural Resources Canada
Supporting the Integration of Science and Policy through Data Science and Artificial Intelligence
Dr. Vik Pant is the Chief Scientist and Chief Science Advisor of Natural Resources Canada (NRCan). He leads the Office of the Chief Scientist at NRCan and reports directly to the Deputy Minister. His office oversees the development and implementation of evidence-based science policy across NRCan sectors and agencies. His office also manages NRCan’s enterprise-wide technology strategy and portfolio of science products. He also runs the Digital Accelerator, which is an innovation platform for designing and launching AI-driven software products in NRCan. Vik is also the Founder of Synthetic Intelligence Forum, which is a leading community of practice focused on the industrial application of Artificial Intelligence (AI). He earned a doctorate from the Faculty of Information (iSchool) in the University of Toronto, a master’s degree in business administration with distinction from the University of London, a master’s degree in information technology from Harvard University, where he received the Dean’s List Academic Achievement Award, and an undergraduate degree in business administration from Villanova University. Vik serves as an Adjunct Professor in the Faculty of Information (iSchool) at the University of Toronto.
- Thursday, 8 April, 4:30-5:30pm
Mining Process Models from Email Data
Faria Khandaker, University of Toronto
Faria is a 2nd year student of the Information Systems and Design Concentration at the Faculty of Information and is one of the co-hosts of the Toronto Data Workshop. She holds an Honour’s Bachelor of Science Degree in Anthropology and Human Biology from the University of Toronto Scarborough. Since starting her masters, she became interested in research related to data-driven decision making within organizations. Under the supervision of Professor Arik Senderovich, she is researching topics related to the application of Machine Learning within the field of Process Mining and exploring various methodologies for gaining insights from email driven business processes.
Faria will discuss mining process models from email data. - Thursday, 15 April, 4:30-5:30pm
Emily A. Sellars, Yale University
Missing data and mis-measurement in Mexico’s 1900 census and the Historical Archive of Localities (AHL)
Emily A. Sellars is an assistant professor in the Department of Political Science at Yale University. Before coming to Yale, she was an assistant professor in the Bush School of Government and Public Service at Texas A&M University and a postdoctoral scholar at the University of Chicago’s Harris School of Public Policy. She received her Ph. D. in Political Science and Agricultural and Applied Economics from the University of Wisconsin–Madison in 2015. Her research interests are at the intersection of political economy and development economics. Her research examines the political economy of emigration and population.
- Thursday, 22 April, 4:30-5:30pm
Panel on teaching data-focused topics
Aimee Schwab-McCoy, Creighton University
Ashley Juavinett, UC San Diego
Chris Papalia, St. Andrew’s College
Samantha-Jo Caetano, University of Toronto
Aimee Schwab-McCoy is an Assistant Professor of Statistics at Creighton University. Ashley Juavinett is a neuroscientist, an educator, and a writer, currently working as an Assistant Teaching Professor at UC San Diego. Chris Papalia is a Mathematics and Science Teacher and Head of House at St. Andrew’s College. Samantha-Jo Caetano is an Assistant Professor, Teaching Stream, at the University of Toronto.
Fall 2020
Thanks to Hidaya Ismail for the hex sticker.
- Thursday, 3 September 2020, 4-5pm
Using hospital data
Erik Drysdale, The Hospital for Sick Children
Erik works as a Machine Learning Specialist at the Hospital for Sick Children (SickKids) for the Goldenberg Lab and AI in Medicine (AIM) initiative. His professional responsibilities include the development and training of the machine learning models for various pediatric data science projects. His research interests are focused on the intersection of statistics and machine learning methods such as high-dimensional inference, survival analysis, and optimization methods. Erik will talk about the challenges of applying ML in hospital data and why statistics still matters in ML. - Tuesday, 8 September 2020, 3:30-4:30pm
UK A levels algorithm issues Jointly hosted with Gillian Hadfield and the Schwartz Reisman Institute for Technology and Society.
Sophie Bennett, Industry data scientist
Sophie Bennett holds an undergraduate degree in Experimental Psychology from the University of Oxford and a PhD in Neuroscience from King’s College. She is the lead data scientist at Up Learn, a London-based online learning platform specialising in A levels. In this role, she conducts evaluations of course effectiveness and uses data to improve instruction and curriculum design. She is passionate about increasing the use of responsible evidence and statistics to guide social policy, and, in her spare time, enjoys working with publicly available datasets to explore London demographics, social issues and infrastructure. Sophie will discuss A Levels, Ofqual and algorithms.
- Thursday, 10 September 2020, 4-5pm
Toronto Data Lab launch event.
A Mahfouz, University of Toronto
Diego Mamanche Castellanos, University of Toronto
Hidaya Ismail, University of Toronto
Ke-Li Chiu, University of Toronto
Paul Hodgetts, University of Toronto
Various Toronto Data Lab projects will be presented includingarxivdl
,aRianna
,cesR
, and more! - Thursday, 17 September 2020, 4-5pm
Cancer and AI
Amber Simpson, Queen’s University
Dr. Simpson is the Canada Research Chair in Biomedical Computing and Informatics and Associate Professor in the School of Computing (Faculty of Arts and Science) and Department of Biomedical and Molecular Sciences (Faculty of Health Sciences). She specializes in biomedical data science and computer-aided surgery. Her research group is focused on developing novel computational strategies for improving human health. Dr Simpson will discuss cancer and AI.
- Thursday, 24 September 2020, 4-5pm
Talking to non-statisticians about statistics
Chelsea Parlett-Pelleriti, Chapman University
Chelsea is a PhD candidate and full-time instructional faculty at Chapman University where her research focuses on using novel statistical and Machine Learning methods (mostly Bayesian statistics, IRT models, and clustering) to behavioral data. As an instructor she teaches Python, R and Data Science, and loves using novel technology (like TikTok, Twitch, and flipped classes) to better engage and inspire students. Chelsea will discuss talking to non-DS team members about DS topics. - Thursday, 1 October 2020, 4-5pm
Canadian demographics by riding (1991-2015)
Florence Vallée-Dubois, Université de Montréal
Florence Vallée-Dubois is a Ph.D. candidate at the department of political science of the University of Montreal. She is also a member of the Centre for the Study of Democratic Citizenship and Canada Research Chair in Electoral Democracy. Her research interests focus on Quebec and Canadian politics, political behaviour and quantitative methods. Her doctoral project focuses on the political behaviour and democratic representation of seniors in Canada. Florence will discuss Canadian Demographics by Electoral Riding (1991-2015). - Thursday, 8 October 2020, 4-5pm
Yim Register, University of Washington Data Lab
Self-advocacy within machine learning systems
Yim Register (they/them) is a radical optimist, child advocate, and PhD student at the University of Washington Data Lab exploring what self-advocacy looks like within machine learning systems. They study how empowering novices with Data Science knowledge can impact their participation and joy in an AI-driven world! Their passion project right now is writing a book called Life Lessons from Algorithms, a book that teaches how machine learning algorithms work through trauma recovery skills. Yim will discuss self-advocacy within machine learning systems. - Thursday, 22 October 2020, 4-5pm
Panel discussion on data-focused resources at the University of Toronto.
Jeff Waldman, University of Toronto
Leanne Trimble, University of Toronto
Leslie Barnes, University of Toronto
Lisa Strug, University of Toronto
Jeff Waldman is the Manager, Institutional Data Governance; Leslie Barnes is the Digital Scholarship Librarian at UTL; Leanne Trimble is the Data and Statistics Librarian at UTL; Lisa Strug is a Senior Scientist in the Program of Genetics and Genome Biology, Associate Director of The Centre for Applied Genomics, Professor of Statistical Sciences and Biostatistics at the University of Toronto, and Director of CANSSI Ontario. - Thursday, 29 October 2020, 4-5pm
Fei Chiang, McMaster University
Data currency and applications
Fei Chiang is an Associate Professor in the Department of Computing and Software (Faculty of Engineering), the Director of the Data Science Research Group, and a Faculty Fellow at the IBM Centre for Advanced Studies. She served as an inaugural Associate Director of the MacData Institute. Her research interests and industrial experience is in data management, spanning data cleaning, data quality, data privacy, data fusion, and database systems. Professor Chiang will discuss data currency and its applications. - Thursday, 5 November 2020, 4-5pm
Andrew Whitby, Industry data scientist
Censuses: The sum of the people
Andrew is a data scientist and economist currently looking for his next challenge. He is particularly interested in the economics of technology, creativity, innovation and growth. He wrote The Sum of the People: How the Census Has Shaped Nations from the Ancient World to the Modern Age which was published in March 2020. Previously, he worked as a Data Scientist at the World Bank, and at Nesta, the UK’s innovation think tank. His academic background combines economics, statistics and computer science. He completed his doctoral research in the Department of Economics at the University of Oxford. Andrew will discuss censuses.
- Monday, 9 November 2020, 4-5pm
Tom Cardoso, Globe and Mail
Bias Behind Bars
Tom Cardoso is a crime and justice reporter and data journalist for The Globe and Mail. Tom will discuss his Bias Behind Bars series of articles which show Black and Indigenous inmates in Canada are more likely to get worse scores than white inmates, based solely on their race.
- Thursday, 12 November 2020, 4-5pm
Kevin Armstrong, University of Toronto
Measuring poverty for NGOs
Kevin Armstrong is a Masters of Information student at the University of Toronto, and a data consultant for ‘Women’s Integrated Sexual Health’ (WISH) - a three-year program delivering integrated health care in 16 countries in Africa and South Asia. Kevin will discuss methods for measuring poverty and use cases for NGOs. - Thursday, 19 November 2020, 4-5pm
High-throughput Bayesian modelling workflow
Michael Chong, University of Toronto
Michael is a PhD student in the Department of Statistical Sciences at the University of Toronto building models for demographic estimation. Previously, he completed his BSc in Integrated Science at McMaster University. Michael will discuss lessons from a high-throughput Bayesian modelling workflow - Thursday, 3 December 2020, 5-6pm
Monica Alexander, University of Toronto
Using Facebook advertising data to estimate migration Monica Alexander is an Assistant Professor in Statistical Sciences and Sociology at the University of Toronto. She received her PhD in Demography from the University of California, Berkeley. Her research interests include statistical demography, mortality and health inequalities, and computational social science. Monica will talk about using Facebook advertising data to estimate migration. A recording of the talk is available here: https://youtu.be/xM1vf_KT76g. - Thursday, 10 December 2020, 4-5pm
Population data estimation
Shabrina Mardevi, United Nations Population Fund and University of Toronto
Romesh Silva, United Nations Population Fund
Shabrina is a Masters of Information student at the University of Toronto and a Population Data Estimation and Analysis Intern at the United Nations Population Fund. Romesh holds a PhD in Demography from the University of California, Berkeley, and is a Technical Specialist, Health & Social Inequalities, at the United Nations Population Fund.
- Thursday, 17 December 2020, 4-5pm
Panel discussion on teaching data-focused topics.
Liza Bolton, University of Toronto
Maria Tackett, Duke University
Nathalie Moon, University of Toronto
Teon Brooks, Mozilla Firefox
Liza Bolton is an Assistant Professor, Teaching Stream, at the University of Toronto. Maria Tackett is an Assistant Professor of the Practice in the Department of Statistical Science at Duke University. Nathalie Moon is an Assistant Professor, Teaching Stream, University of Toronto. Teon L. Brooks, holds a PhD in experimental psychology from NYU, and now works as a data scientist for Mozilla Firefox. He also serves as the technical advisor and President of BrainWaves, an NIH-funded project to teach experimentation and cognitive neuroscience to high school students in NYC, and has co-founded Computation in Education Labs (CIEL), a nonprofit that aims to further the mission of the BrainWaves project while focusing on data science and computational thinking.
Summer 2020
- Thursday, 21 May 2020, 4-5pm
Rohan Alexander, University of Toronto, Faculty of Information
Rohan Alexander is a post-doctoral fellow at the Faculty of Information, University of Toronto. He holds a PhD in Economics from the Australian National University. He will talk about getting data from PDFs into R, with an application to the Kenyan census. - Thursday, 28 May 2020, 4-5pm
Shiro Kuriwaki, Harvard University, Government
Shiro is a Ph.D. Candidate at the Department of Government, Harvard University. His research focuses on democratic representation in American Politics, for instance cast vote records, public opinion, survey methods, and applied statistics more generally. Shiro will bring together best practices for organizing data and code in the social sciences that experts have proposed with some of his own experience. He will propose a project-oriented workflow that adopts a minimal and consistent file organization structure within a single project, using RStudio Projects and GitHub. He will then discuss how to organize multiple projects that share common components, and propose the use of custom R packages to share code and Dataverse to share large datasets. He will use some of his own projects involving the Cooperative Congressional Election Study (CCES), one of the largest political surveys of American Politics, as a demonstration.
Recording available here. - Thursday, 4 June 2020, 4-5pm
Marija Pejcinovska, University of Toronto, Department of Statistical Sciences
Marija is a second-year Ph.D. student in Statistics at the University of Toronto. Her research interests are in applied statistics, specifically the application of Bayesian methods to data and modeling challenges that arise in demography, public health, and certain areas of the social sciences. Marija will talk about a current project with the World Health Organization (WHO) focused on estimating global maternal mortality to share her R workflow and the different tools and packages she’s found helpful in the data processing stage. More specifically, she’ll be sharing a few ways of dealing with text and date data in R. - Thursday, 11 June 2020, 4-5pm
Harrison Jones, Deloitte
Harrison is a Manager at Deloitte in Toronto, where he focuses on data analytics and machine learning in the property & casualty insurance, life insurance, health insurance, pensions, and the public sector. Harrison will talk about using R with actuarial data. - Thursday, 18 June 2020
Cancelled in support of the Black Lives Matter movement and to provide an opportunity for reflection and learning. We also announced stipends in support of University of Toronto students or recent graduates who identify as a member of a visible minority group, racialized group, or as a person of colour. - Thursday, 25 June 2020, 4-5pm
A Mahfouz, University of Toronto, Information
A is a Master of Information student at the University of Toronto with a background in geography. Their prior work has been largely concerned with data pipelines. A will talk about geographic data cleaning, extracting mappable data from Google Directions API results in Python.
Slides available here. - Thursday, 2 July 2020, 4-5pm
Heather McBrien, University of Toronto, Department of Statistical Sciences
Heather just graduated from the Statistics BSc program at the University of Toronto, and is interested in modelling in population health research, particularly using novel data sources to answer questions where traditional data is lacking. Heather will talk about how the data that we collect can bias the results that we obtain and our knowledge of the problem.
Slides available here. - Thursday, 9 July 2020, 4-5pm
Roxanne Chui, University of Toronto, Faculty of Information
Roxanne is an emerging anthropological data science professional. She did her BSc program in Forensic anthropology and worked in the pharmaceutical industry before doing her Masters in data science. She is passionate about excavating context from data for predicting future patterns of human behaviour. Roxanne will talk about an EDA approach to Tokyo AirBnB datasets and pattern discovery in listing prices using R - ‘What do we have here among millions of observations?’
Slides available here. - Thursday, 16 July 2020, 4-5pm
Casey Breen, University of California, Berkeley, Demography
Casey is a PhD student in the Demography Department at Berkeley. He previously worked at the Institute for Social Research and Data Innovation, home of IPUMS. Casey will talk about CenSoc, which is a project to link 1940 Census data with Social Security Administration mortality records in the US. - Thursday, 23 July 2020, 11am-noon
Marta Kołczyńska, Institute of Political Studies of the Polish Academy of Sciences
Marta is an Assistant Professor at the Institute of Political Studies of the Polish Academy of Sciences and a visiting researcher in the Probabilistic Machine Learning Group, Department of Computer Science, Aalto University. Her research interests include comparative analyses of political attitudes and behavior across nations and over time, as well as the methodology of comparative research, in particular cross-national surveys. Marta will talk about cleaning survey data, in particular a project in which she gathers political trust items from different cross-national survey datasets to model time trends, and the tools she has developed to facilitate this work. - Thursday, 30 July 2020, 4-5pm
Alex Luscombe, University of Toronto, Criminology and Sociolegal Studies
Alexander McClelland, Carleton University, Criminology and Criminal Justice
Alex Luscombe is a PhD student in the Centre for Criminology & Sociolegal Studies at the University of Toronto and a Junior Fellow at Massey College. Alexander McClelland is an Assistant Professor at the Institute of Criminology and Criminal Justice, Carleton University. They will talk about Policing the Pandemic, which is a project that was launched on 4 April, 2020, to track and visualize the massive and extraordinary expansions of police power in response to the COVID-19 Pandemic and the unequal patterns of enforcement that may arise as a result.
Slides available here. - Thursday, 6 August 2020, 4-5pm
Sharla Gelfand, Freelance R Developer
Sharla is a freelance R developer specializing in enabling easy access to data and replacing manual, redundant processes with ones that are automated, reproducible, and repeatable. They will talk about creativity in R.
Slides available here. - Thursday, 13 August 2020, 4-5pm
Richard Iannone, R Studio
Rich is a Software Engineer at R Studio. Rich will talk aboutpointblank
, which is an R package that allows workflows involving nice and easy data validation in reproducible documents. - Thursday, 20 August 2020, 4-5pm
Aije Egwaikhide, IBM
Aije Egwaikhide holds an undergraduate degree in Economics and Statistics from the University of Manitoba, and a post-graduate degree in Business Analytics from St. Lawrence College, Kingston. She works at IBM where she is a Lead Data Scientist on the System Enablement group. Aije will talk about preparing data for optical character recognition (OCR).
Winter 2020
- Friday, 24 January 2020, noon
Steven Pimentel, University of Toronto, business intelligence - Friday, 31 January 2020, noon
Arik Senderovich, University of Toronto, Information
- Friday, 7 February 2020, noon
Kathy Chung, University of Toronto, Records of Early English Drama - Friday, 14 February 2020, noon
Josh Harris, KOHO - Friday, 28 February 2020, noon
Eugene Joh, St. Michael’s Hospital - Friday, 6 March 2020, noon
Fatemeh Nargesian, University of Rochester, Computer Science
Fall 2019
- Thursday, 26 September 2019, noon
Periklis Andritsos, ODAIA & University of Toronto, Information - Thursday, 10 October 2019, noon
Hassan Teimoori, Deloitte, Omnia AI
Ludovic Rheault, University of Toronto, Political Science - Wednesday, 16 October 2019, noon
Lauren Kennedy, Columbia University
- Thursday, 24 October 2019, noon
Sharla Gelfand, Freelance R and Shiny developer
- Thursday, 7 November 2019, noon
Maria D’Angelo, Delphia
Hareem Naveed, Munich Re
- Thursday, 21 November 2019, noon
Michelle Alexopoulos, University of Toronto, Economics
Paraskevi Massara, University of Toronto, Medicine