We share best practice for data science, and are especially
interested in the data munging/cleaning/prep/comms stages that folks
typically don’t talk about. We meet for an hour each week and are
interested in anything that is data-focused across academia and
industry. Typically, Fridays at noon (Toronto time) via Zoom, with most
talks recorded and shared. All welcome. Free. Sign up here.
Overview
The Toronto Data Workshop (TDW) brings together academia and industry
to consider, collate, share, and disseminate best practices in doing
data science, especially in the data-centric steps of a data science
project: collection; cleaning; storage; retrieval; dissemination;
protection; and communication. We meet weekly for an hour and aim to
have a roughly even split of participants from academia and industry
over the course of each term. For an invitation please sign up here. Anyone is welcome
to attend - you don’t need to be affiliated with the university.
The TDW is a joint initiative between the Faculty of Information and
the Department of Statistical Sciences at the University of Toronto and
we especially thank Dean Wendy Duff and Chair Radu Craiu for their
support.
Fri 28 Jan 2022, noon - 1pm, Ashok Chaurasia,
University of Waterloo, ‘Multiple Imputation: Old and New Combining
Rules for Statistical Inference’
Dr. Ashok Chaurasia is an Assistant Professor (of Statistics) in the
School of Public Health Sciences at University of Waterloo. His
background/training is in Statistics, with research interests in topics
of Missing Data, Data Imputation, Model Selection, and Longitudinal Data
Analysis Methodology.
Fri 4 Feb 2022, noon - 1pm, Nick Huntington-Klein,
Seattle University
I am an economics professor at Seattle University, with research that
focuses on higher education, econometrics, and metascience.
Fri 11 Feb 2022, noon - 1pm, Silvia Canelón,
University of Pennsylvania, ‘Topic: Lessons Learned from EHR
Research’
Silvia Canelón is a postdoctoral research scientist in the Department of
Biostatistics, Epidemiology, and Informatics at the University of
Pennsylvania where she applies biomedical informatics to population
health research. She uses R to work on projects that develop novel data
mining methods to extract pregnancy-related information from Electronic
Health Records (EHR) and that study the relationship between environment
and disease.
Fri 18 Feb 2022, noon - 1pm, Vincent Arel-Bundock,
Université de Montréal, ‘What modelsummary taught me
about R package development’
I am a political science professor at the Université de Montréal.
Fri 25 Feb 2022, noon - 1pm, Reading week
break
Fri 4 Mar 2022, noon - 1pm, Maria Kamenetsky,
University of Wisconsin-Madison, ‘Spatial clustering’
I am a PhD candidate in Epidemiology at the University of
Wisconsin-Madison, where I also completed my MS in Statistics. My
research focuses on methods in spatial epidemiology, specifically
working on statistical methods and applications in spatial cluster
detection.
Fri 11 Mar 2022, noon - 1pm, Irena Papst, McMaster
University, ‘Some or all of: COVID-19 data, modelling, experiences
using models to help guide pandemic response’
I’m a postdoctoral fellow in McMaster’s Mathematics & Statistics
department, where I work on mathematical modelling, especially of
infectious disease dynamics. I did my PhD in Cornell’s Center for
Applied Mathematics. I care deeply about reproducible research, clear
scientific communication, good teaching, and big salads.
Fri 18 Mar 2022, noon - 1pm, May Chan and Ramses Van
Zon
Thu 24 Mar 2022, 5pm - 6pm, Emi Tanaka, Monash
University, ‘An anthology of experimental designs’
Dr. Emi Tanaka is an assistant professor in statistics at Monash
University whose primary interest is to develop impactful statistical
methods and tools that can readily be used by practitioners. Her
research area include data visualisation, mixed models and experimental
designs, motivated primarily by problems in bioinformatics and
agricultural sciences. She is currently the President of the Statistical
Society of Australia Victorian Branch and the recipient of the
Distinguished Presenter’s Award from the Statistical Society of
Australia for her delivery of a wide-range of R workshops.
Fri 1 Apr 2022, noon - 1pm, Brittany Witham,
Geopolitica, ‘Data science at a startup’
Originally from Melbourne, Australia, Brittany received her B.A. in
International Studies from the University of Saskatchewan and started
her career in economic development, equipping her with comprehensive
knowledge of foreign direct investment and international business early
in her career. She went on to obtain an M.A. in European and Russian
Affairs from the University of Toronto in 2018, where she first
discovered the potential of programming for political science and became
fascinated with artificial intelligence (AI). Over the past three years,
Brittany has worked in many facets of the AI industry, from leading
research and development of new AI products for video game developers to
building automated data pipelines for business intelligence and managing
software engineering and client engagement teams. In that time, she has
honed technical skills in full-stack development, machine learning, and
data engineering. She recently struck out on her own to launch an online
global event monitoring tool and deliver novel solutions to clients in
the political risk and social enterprise sectors. Brittany is a firm
believer in the potential for data-driven technologies for geopolitics
and is excited to contribute to the many discoveries to be made in this
space.
Past schedules
Fall 2021
This term is mostly a special series of talks featuring University of
Toronto speakers on the relationship between data science and their
other field of expertise.
Friday, 24 September 2021, noon - 1pm Karen Chapple, Department of Geography and
Planning/School of Cities Karen Chapple is the inaugural
Director of the School of Cities and Professor of Geography and Planning
at the University of Toronto. Her research uses data science methods to
identify and predict gentrification and displacement in cities. She is
Professor Emerita at the University of California, Berkeley, where she
helped to launch the undergraduate data science program.
Friday, 1 October 2021, noon - 1pm Special meeting on 2021 Canadian Election
Discussion and presentations by:
Eric Zhu, Brian Diep, Ashely (Jing Yuan) Zhang, Kristin (Xi Yu
Huang), and Tanvir Hyder on their model of the 2021 election.
Friday, 8 October 2021, noon - 1pm Fedor Dokshin, Department of Sociology Fedor Dokshin is an Assistant
Professor of Sociology at the University of Toronto. He is a
computational social scientist with research interests in social
networks, organizations, and energy and the environment. Across these
domains, Fedor leverages data science methods and novel data sources to
improve existing measurement strategies.
Friday, 15 October 2021, noon - 1pm Drew Stommes, Department of Political Science, Yale
University Drew Stommes is a doctoral
candidate in the Department of Political Science at Yale University,
where he researches democracy, political violence, and quantitative
methods. He will talk about a recent working paper, ‘On the
reliability of published findings using the regression discontinuity
design in political science’.
Friday, 22 October 2021, noon - 1pm Tegan Maharaj, Faculty of Information
I study AI systems and “what goes into” them, e.g. their real-world
deployment context, and the effects that has on learning behaviour and
generalization. I do that because I want to be able to use AI systems
responsibly for problems I think are important, like impact and risk
assessments for climate change, AI alignment, ecological management and
other common-good problems. My website is: http://www.teganmaharaj.com/.
Friday, 29 October 2021, noon - 1pm Josh Speagle, Astronomy & Astrophysics, Dunlap
Institute, Statistical Sciences Josh is a Banting &
Dunlap Postdoctoral Fellow at the University of Toronto whose research
focuses on using astrostatistics and “data science” to understand how
galaxies like our own Milky Way form, behave, and evolve.
Friday, 5 November 2021, noon - 1pm Yun William Yu, Math Department, UofT; UTSC Computer
& Mathematical Sciences Yun
William Yu is an assistant professor in the math department at UofT
whose research focuses on algorithmic methods for computational biology
and medical informatics.
Friday, 12 November 2021, noon - 1pm Ann Glusker, Doe Library, University of California
Berkeley
Dr Ann Glusker is Librarian for Sociology, Demography, Public Policy,
Psychology (fall 2021) & Quantitative Research at the Doe Library,
University of California, Berkeley. She will discuss a recently released
report
‘Supporting Big Data Research at the University of California,
Berkeley’. This report provides insights on researcher practices and
challenges in six thematic areas: data collection & processing;
analysis: methods, tools, infrastructure; research outputs;
collaboration; training; and balancing domain vs data science
expertise.
Friday, 19 November 2021, noon - 1pm Radu Craiu, Statistical Sciences @ U of T Dr. Radu V. Craiu is
Professor and Chair of Statistical Sciences at the University of
Toronto. His main research interests are in computational methods in
statistics, especially, Markov chain Monte Carlo algorithms (MCMC),
Bayesian inference, copula models, model selection procedures and
statistical genetics.
Friday, 26 November 2021, noon - 1pm Kieran Campbell, Lunenfeld Tanenbaum Research
Institute Dr. Kieran Campbell is an
investigator at the Lunenfeld-Tanenbaum Research Institute and an
assistant professor at the Departments of Molecular Genetics and
Statistical Sciences, University of Toronto. His research focusses on
Bayesian models and machine learning for high dimensional biomedical
data, including single-cell and cancer genomics. Recently, he has led
efforts to develop statistical machine learning methodology to integrate
single-cell RNA and DNA sequencing data to uncover the effects of tumour
clonal identity on gene expression, as well as methods to automatically
delineate the tumour microenvironment from single-cell RNA-sequencing
data. Such findings can improve our understanding of cancer progression
and of why certain tumours are resistant to therapies, leading to
relapse.
Friday, 3 December 2021, noon - 1pm Leanne Trimble, UofT Libraries
Leanne Trimble is a data librarian at the Map & Data Library,
University of Toronto Libraries.
Friday, 10 December 2021, noon - 1pm Nathan Taback, Departments of Statistical
Sciences Nathan Taback is the
director of Data Science programs and an Associate Professor, Teaching
Stream in the Department of Statistical Sciences, and Computer Science
(cross-appointed) at the University of Toronto. He currently serves as a
Special Advisor to the Dean of Arts and Science on Computational and
Data Science Education.
Friday, 21 May 2021, Noon - 1pm David Shor, OpenLabs,
Bio: David is an American data
scientist who tries to elect Democrats. He is known for analyzing
political polls and currently serves as head of data science with
OpenLabs, a progressive nonprofit, and also as a Senior Fellow with the
Center for American Progress Action Fund.
Topic: Political data science.
Recording: https://youtu.be/_IEPKapa9_0
Friday, 28 May 2021, Noon - 1pm Samantha Pierre, University of Toronto,
Bio: Samantha is a fourth-year statistics student studying at the
University of Toronto. Throughout the past year she has combined her
love for theatre and statistics to analyze trends in the theatre
community. She volunteers as a member of PAIR-CG to create a
representational framework for the international theatre community. She
currently works at WOMBO, an app developed by former U of T students, as
head of music content.
Topic: And The Nominees Are… An Empirical Study of the Effects of a Tony
Award Win and Nomination on a Show’s Success.
Recording: https://youtu.be/rFojvBN0qGk
Friday, 4 June 2021, Noon - 1pm Heather Krause, We All Count,
Bio: Heather remains unconvinced.
As a statistician with decades of global experience working on complex
data problems and producing real-world knowledge, she has developed the
Data Equity Framework to address the equity issues in data products and
research projects. Her emphasis is on combining strong statistical
analysis with clear and meaningful communication. She is currently
working on implementing tools for equity and ethics in data. As the
founder of two successful data science companies, she attacks the
largest questions facing societies today, working with both civic and
corporate organizations to improve outcomes and lives. Her relentless
pursuit of clarity and realism in these projects pushed her beyond pure
analysis to mastering the entire data ecosystem including award-winning
work in data sourcing, modeling, and data storytelling, each
incorporating bleeding edge theory and technologies. Heather is the
founder of We All Count, a project for equity in data working with teams
across the globe to embed a lens of ethics into their data products from
funding to data collection to statistical analysis and algorithmic
accountability. Her unique set of tools and contributions have been
sought across a range of clients from MasterCard and Volkswagen to the
United Nations, the Syrian Refugee Resettlement Secretariat, Airbnb, and
the Bill and Melinda Gates Foundation. She is on the Data Advisory Board
of the UNHCR.
Topic: Equity in Data (or, how not to accidentally use data like a
racist, sexist, colonialist, etc).
Recording: https://youtu.be/Yu_l8MpKK-E
Friday, 11 June 2021, Noon - 1pm Laura Bronner, Data scientist,
Bio: Laura is a data
scientist, who most recently worked as the quantitative editor at
FiveThirtyEight. More generally, she is a data scientist with an
interest in causal inference, political science and quantitative text
analysis. Before FiveThirtyEight, she was a Senior Analyst at the
Analyst Institute, designing and analyzing field experiments for the
2018 election cycle. In September 2018, she completed a PhD in Political
Science at the London School of Economics’ Department of
Government.
Topic: Quantitative editing.
Recording: https://youtu.be/LI5m9RzJgWc
Friday, 18 June 2021, Noon - 1pm Jacob Matson, Simetric,
Bio: Jacob is VP
of Finance & Operations at Simetric,
Inc..
Topic: From data ask to dashboard.
Recording: https://youtu.be/U8-6QKtWXCQ
Friday, 25 June 2021, Noon - 1pm Laura Derksen, University of Toronto Mississauga,
Bio: Laura is
the Amgen Canada Professor in Health System Strategy at the University
of Toronto Mississauga and assistant professor in Strategic Management
at the Rotman School of Management. Her research interests are
development and global health education, information and networks.
Topic: The impact of student access to Wikipedia. Recording: https://youtu.be/Coz-HFesTsw
Friday, 2 July 2021, Noon - 1pm Zachary McCaw, Google
Bio: Zachary McCaw is a data scientist at Google.
Friday, 16 July 2021, Noon - 1pm Kamilah Ebrahim, University of Toronto,
Bio: Kamilah Ebrahim received a B.A. in Economics from the University of
Waterloo in 2019 and is currently pursuing a Masters of Information in
Human Centred Data Science at the University of Toronto. Kamilah is a
2020-21 Graduate Fellow at the University of Toronto Centre for Ethics
focusing on the intersection between race, economics and data monopolies
in Canada. Prior to joining the University of Toronto she held roles at
the United Nation Economic and Social Commission for Asia and the
Pacific (UN ESCAP), as well as the Canadian federal government.
Topic: Trust in contact tracing apps.
Recording: https://youtu.be/f_3bpEeRdhI
Friday, 23 July 2021, Noon - 1pm Annie Collins & Rohan Alexander, University of
Toronto
Bio: Annie Collins is an undergraduate student at the University of
Toronto specializing in applied mathematics and statistics with a minor
in history and philosophy of science. In her free time, she focusses her
efforts on student governance, promoting women’s representation in STEM,
and working with data in the non-profit and charitable sector. Rohan
Alexander is an assistant professor at the University of Toronto in
Information and Statistical Sciences, and a faculty affiliate at the
Schwartz Reisman Institute for Technology and Society. He holds a PhD in
Economics from the Australian National University.
Topic: Reproducibility of COVID-19 pre-prints.
Recording: https://youtu.be/_ncpTbhe8qA
Friday, 30 July 2021, Noon - 1pm Keli Chiu, University of Toronto
Bio: Keli Chiu is a recent graduate
of master in Information at the University of Toronto with the
concentration in Human-Centred Data Science. Prior to pursuing her
master in the information fields, she worked as a web developer and fell
in love with data. Her research interests are natural language
processing applications, text analysis and ethics in AI and machine
learning. She received rstudio::global Diversity Scholarships in the
year of 2021.
Topic: Detecting sexist and racist text contents with explanations
accompanied with GPT-3
Recording: https://youtu.be/xmmoVD5zTOQ
Friday, 6 August 2021, 12:30pm - 1pm Ijeamaka Anyene, Kaiser Permanente Division of
Research,
Bio: Ijeamaka is a
data analyst working in healthcare research. She specializes in using R
and SAS for data analysis, epidemiological research, and data
visualizations. She is also passionate about computational art,
knowledge sharing / dissemination, and how to mix the two.
Topic: Taking the next step past standard charts.
Recording: https://youtu.be/LlVf8foXUmM
Friday, 13 August 2021, Noon - 1pm Student groups from the Independent Summer Statistics
Community, University of Toronto
‘Prospective Analytics’ comprised of Ashley Zhang, Eric Zhu, Muhammad
Tsany and Sergio Zheng Zhou.
‘Statistically Significant’ comprised of Aliza Lakho, Chloris Jiang,
Janhavi Agarwal, and José Casas on whether young professionals should
move to Toronto.
‘Point Zero Five’ comprised of Pan Chen, Xiaoxuan Han, Yi Qin, and Yini
Mao on the livability of Toronto for newcomers.
Recording: https://youtu.be/zkuMedB23f8
Friday, 20 August 2021, Noon - 1pm Students from Vianey Leos Barajas’ research group,
University of Toronto
Bio: Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and
Yun-Hsiang Chan are undergraduate students at the University of
Toronto.
Topic: The presentations will use statistical model to analyzing shark,
lizard, and basketball movement data. The data was collected through
drones, accelerometers or video tracking software.
Recording: https://youtu.be/p697exbcMZE
Winter 2021
Thanks to Paul Hodgetts for the
Jays-inspired sticker.
Date
Speaker
Topic
Recording
Thu 14 Jan, 4:30-5:30pm
Andrew Miles, University of Toronto (jointly hosted with
the UTM Collaborative Digital Research Space.)
Annie Collins, Haoluan Chen, Isaac Ehrlich, Mariam Walaa, Marija
Pejcinovska, Mathew Wankiewicz, Michael Chong, Paul Hodgetts, Rohan
Alexander, Samantha-Jo Caetano, Shirley Deng, and Yena Joo,
University of Toronto
Aimee Schwab-McCoy, Creighton University, Ashley Juavinett,
UC San Diego, Chris Papalia, St. Andrew’s College,
Samantha-Jo Caetano, University of Toronto
Panel on teaching data-focused topics.
Thursday, 14 January, 4:30-5:30pm Andrew Miles, University of Toronto Jointly hosted with Elizabeth Parke and the UTM Collaborative
Digital Research Space. Andrew Miles is Assistant
Professor of Sociology at the University of Toronto and Director of the
Morality, Action, and Cognition Lab.
Topic: Code, plots, and values.
Recording: https://youtu.be/mdjOoKT-f7E
Wednesday, 20 January, 4:30-5:30pm Zia Babar, University of Toronto
Zia Babar obtained his PhD from the University of Toronto where his
research studies focused on the analysis and design of data-centered
information systems for enabling enterprise transformation. He is
engaged in a multi-year research engagement with IBM Research Labs and
is a startup technical mentor at WeWork Labs. He is the organizer of
technology meetup groups in both Toronto and Waterloo, and a course
instructor at the Faculty of Information, University of Toronto.
Zia will provide a background on data security approaches, and a
demonstration of machine learning and deep learning techniques that can
be used for providing derivative data security.
Thursday, 28 January, 4:30-5:30pm Irene Duah-Kessie, University of Toronto
Irene Duah-Kessie is a graduate of the University of Toronto’s Master of
Science in Sustainability Management program. Throughout her studies,
Irene published her research on racial income inequality in Toronto with
the Wellesley Institute and is currently a part of the Turtle Island
Journal of Indigenous Health Editorial Team. Irene is a Project Manager
at Across Boundaries leading an initiative to address food security and
mental health challenges in Toronto’s Black community. She is also the
founder of Rise In STEM, a grassroots organization that aims to increase
access to STEM learning opportunities in Black and marginalized
communities.
Topic: Exploring algorithmic bias and fairness and its impact on health
outcomes faced by racialized communities.
Thursday, 4 February, 4:30-5:30pm Kathy Ge, Uber
Kathy is a data scientist with Uber Eats primarily focused on the
shopping experience including ranking and recommendations throughout the
order flow. She received her M.Sc. in Computer Science and B.Sc in
Computer Science and Statistics from the University of Toronto.
Topic: How data insights and experimentation help drive product design
and intelligent recommendations on the Uber Eats platform.
Thursday, 11 February, 4:30-5:30pm Garrick Aden-Buie, R Studio Garrick is a Data Science
Educator at RStudio who lives in sunny St. Petersburg, Florida. His
passion is combining creative coding with programming education, using
code to build tools that teach coding to new and advanced R users alike.
Like tidyexplain: a project that used ggplot2 and gganimate to reimagine
database operations as colorful flying boxes instead of the typical Venn
diagrams. Garrick has developed a number of open source addins and
packages for RStudio—such regexplain, shrtcts and rsthemes—and is always
easily distracted by projects that combine R Markdown and online
learning or teaching.
Topic: Using R Markdown in general and in some specific projects.
Monday, 15 February, Noon-1:00pm Emily Riederer, Capital One Emily is a Senior
Analytics Manager at Capital One. Emily’s team focuses on reimagining
analytical infrastructure by building data products, elevating business
analysis with novel data sources and statistical methods, and providing
consultation and training to partner teams.
Topic: Causal
design patterns for data analysts.
Thursday, 18 February, 4:30-5:30pm Special guest Bethany White (Department of Statistical
Sciences). Annie Collins, Haoluan Chen, Isaac Ehrlich, Mariam Walaa, Marija
Pejcinovska, Mathew Wankiewicz, Michael Chong, Paul Hodgetts, Rohan
Alexander, Samantha-Jo Caetano, Shirley Deng, and Yena Joo,
University of Toronto
University of Toronto DoSS toolkit launch.
The DoSS toolkit is a series of self-paced lessons that students can go
through ahead of class, to achieve badges for various levels of
accomplishment with R. Instructors can use the badges to work out the
level of the class and either direct students to the toolkit to address
deficiencies or cover missing aspects themselves.
Thursday, 4 March, 4:30-5:30pm Petros Pechlivanoglou, The Hospital for Sick Children
(SickKids) Research Institute
Petros Pechlivanoglou, PhD, is a Scientist at The Hospital for Sick
Children (SickKids) Research Institute and an Assistant Professor at the
University of Toronto, Institute of Health Policy Management and
Evaluation. He studied economics in his native country, Greece,
econometrics at the University of Groningen, the Netherlands and
obtained a PhD in health econometrics from the same university. He
completed a post-doctoral fellowship at the University of Toronto,
within the Toronto Health Economics and Technology Assessment (THETA)
Collaborative where he focused on methodological aspects around the
application of decision analysis in health-care policy.
Petros will talk about marrying simulation modeling and retrospective
data for health economic decision making.
Thursday, 11 March, 4:30-5:30pm Lucas Cherkewski, Canadian Digital Service Lucas Cherkewski is a policy
advisor at the Canadian Digital Service (CDS). He helps delivery teams
improve government services. From that experience, he advises on
structural changes to make better services the default. This work
includes plenty of data-enabled research and analysis—Lucas is in a
happy place when his work leads him to spend an afternoon poking around
a dataset, trying to better understand government so he can help change
it.
Lucas will talk about a few of these small CDS research projects, using
publicly-available data to better understand the government’s
operations.
Monday, 15 March, 4:00-5:00pm Todd Feathers, Freelance reporter
Jointly hosted with Maryclare
Griffin, University of Massachusetts Amherst. Todd Feathers is a freelance
journalist covering artificial intelligence, surveillance, and the
technologies changing our world. He spent years at daily newspapers
reporting on politics, criminal justice, and health care. On every beat,
new tech is solving problems and creating them. His goal is to use data,
scientific research, and inside sources to cut through the hype and
examine what our gadgets and algorithms really do. Writing in Vice,
OneZero, The Wall Street Journal, and others.
Todd will talk about ‘Major
Universities are Using Race as a “High Impact Predictor” of Student
Success’ published in The Markup on 2 March 2021.
Thursday, 18 March, 4:30-5:30pm Sofia Ruiz-Suarez, National University of Comahue Sofia Ruiz-Suarez holds
an undergraduate degree in mathematics from the University of Buenos
Aires and now is a PhD candidate at the institute for Research on
Biodiversity and Environment. She also teaches mathematics at the
University of Comahue and leads R-Ladies at her local city. Her research
is focused on Bayesian statistics with applications in animal behaviour
and movement.
Sofia will talk about animal tracking data. She will explain how sensors
are used to track animal movement and activities, the characteristics of
this type of data and how to deal with it.
Thursday, 25 March, 4:30-5:30pm Alex Cookson, Muse Alex Cookson is a Data
Scientist at Muse, where he helps
make the most of their data. In his spare time, you can find him
participating in Tidy Tuesday or thinking up cool datasets to explore.
And when he’s not doing that, he’s probably cycling around Toronto or
doting on his two cats, Tom Tom and Ruby.
Alex will explore the power of great datasets, and discuss the
importance of interesting, fun datasets as a way to guide and motivate
learning R.
Thursday, 1 April, 4:30-5:30pm Vik Pant, Natural Resources Canada.
Dr. Vik Pant is the Chief Scientist and Chief Science Advisor of Natural
Resources Canada (NRCan). He leads the Office of the Chief Scientist at
NRCan and reports directly to the Deputy Minister. His office oversees
the development and implementation of evidence-based science policy
across NRCan sectors and agencies. His office also manages NRCan’s
enterprise-wide technology strategy and portfolio of science products.
He also runs the Digital Accelerator, which is an innovation platform
for designing and launching AI-driven software products in NRCan. Vik is
also the Founder of Synthetic Intelligence Forum, which is a leading
community of practice focused on the industrial application of
Artificial Intelligence (AI). He earned a doctorate from the Faculty of
Information (iSchool) in the University of Toronto, a master’s degree in
business administration with distinction from the University of London,
a master’s degree in information technology from Harvard University,
where he received the Dean’s List Academic Achievement Award, and an
undergraduate degree in business administration from Villanova
University. Vik serves as an Adjunct Professor in the Faculty of
Information (iSchool) at the University of Toronto.
Vik will discuss supporting the Integration of Science and Policy
through Data Science and Artificial Intelligence.
Thursday, 8 April, 4:30-5:30pm Faria Khandaker, University of Toronto
Faria is a 2nd year student of the Information Systems and Design
Concentration at the Faculty of Information and is one of the co-hosts
of the Toronto Data Workshop. She holds an Honour’s Bachelor of Science
Degree in Anthropology and Human Biology from the University of Toronto
Scarborough. Since starting her masters, she became interested in
research related to data-driven decision making within organizations.
Under the supervision of Professor Arik Senderovich, she is researching
topics related to the application of Machine Learning within the field
of Process Mining and exploring various methodologies for gaining
insights from email driven business processes.
Faria will discuss mining process models from email data.
Thursday, 15 April, 4:30-5:30pm Emily A. Sellars, Yale University Emily A. Sellars is an assistant
professor in the Department of Political Science at Yale University.
Before coming to Yale, she was an assistant professor in the Bush School
of Government and Public Service at Texas A&M University and a
postdoctoral scholar at the University of Chicago’s Harris School of
Public Policy. She received her Ph. D. in Political Science and
Agricultural and Applied Economics from the University of
Wisconsin–Madison in 2015. Her research interests are at the
intersection of political economy and development economics. Her
research examines the political economy of emigration and
population.
Emily will discuss missing data and mis-measurement in Mexico’s 1900
census and the Historical Archive of Localities (AHL).
Thursday, 22 April, 4:30-5:30pm Aimee Schwab-McCoy, Creighton University,
Ashley Juavinett, UC San Diego, Chris
Papalia, St. Andrew’s College, Samantha-Jo
Caetano, University of Toronto Aimee Schwab-McCoy is
an Assistant Professor of Statistics at Creighton University. Ashley Juavinett is a
neuroscientist, an educator, and a writer, currently working as an
Assistant Teaching Professor at UC San Diego. Chris Papalia is a
Mathematics and Science Teacher and Head of House at St. Andrew’s
College. Samantha-Jo Caetano is an Assistant Professor, Teaching Stream,
at the University of Toronto.
Panel on teaching data-focused topics.
Fall 2020
Thanks to Hidaya Ismail for the
brilliant maple leaf and dinosaur hex stickers.
Date (Toronto time)
Speaker
Topic
Recording
Thu, 3 Sep, 4-5pm
Erik Drysdale (The Hospital for Sick Children)
Using hospital data
Tue, 8 Sep, 3:30-4:30pm
Sophie Bennett (Industry data scientist)
UK A levels algorithm issues (jointly hosted with SRI)
Thu, 10 Sep, 4-5pm
A Mahfouz, Diego Mamanche Castellanos, Hidaya Ismail, Ke-Li Chiu
& Paul Hodgetts (University of Toronto)
Various R packages and research developed by students
Thu, 17 Sep, 4-5pm
Amber Simpson (Queen’s University)
Cancer and AI
Thu, 24 Sep, 4-5pm
Chelsea Parlett-Pelleriti (Chapman University)
Talking to non-statisticians about statistics
Thu, 1 Oct, 4-5pm
Florence Vallée-Dubois (Université de Montréal)
Canadian demographics by riding (1991-2015)
Thu, 8 Oct, 4-5pm
Yim Register (University of Washington Data Lab)
Self-advocacy within machine learning systems
Thu, 22 Oct, 4-5pm
Jeff Waldman, Leanne Trimble, Leslie Barnes, & Lisa Strug
(University of Toronto)
Thursday, 3 September 2020, 4-5pm Special guests Dean Wendy Duff (Faculty of Information) and
Department Chair Radu Craiu (Department of Statistical
Sciences). Erik Drysdale (The Hospital for Sick Children) Erik works as a Machine Learning
Specialist at the Hospital for Sick Children (SickKids) for the
Goldenberg Lab and AI in Medicine (AIM) initiative. His professional
responsibilities include the development and training of the machine
learning models for various pediatric data science projects. His
research interests are focused on the intersection of statistics and
machine learning methods such as high-dimensional inference, survival
analysis, and optimization methods. Erik will talk about the challenges
of applying ML in hospital data and why statistics still matters in
ML.
Tuesday, 8 September 2020, 3:30-4:30pm Jointly hosted with Gillian Hadfield and the Schwartz Reisman Institute for
Technology and Society. Sophie Bennett (Industry data scientist) Sophie Bennett holds
an undergraduate degree in Experimental Psychology from the University
of Oxford and a PhD in Neuroscience from King’s College. She is the lead
data scientist at Up Learn, a London-based online learning platform
specialising in A levels. In this role, she conducts evaluations of
course effectiveness and uses data to improve instruction and curriculum
design. She is passionate about increasing the use of responsible
evidence and statistics to guide social policy, and, in her spare time,
enjoys working with publicly available datasets to explore London
demographics, social issues and infrastructure. Sophie will discuss A
Levels, Ofqual and algorithms.
Thursday, 10 September 2020, 4-5pm Toronto Data Lab launch event. A Mahfouz (University of Toronto) Diego Mamanche Castellanos (University of
Toronto) Hidaya Ismail (University of Toronto) Ke-Li Chiu (University of Toronto) Paul Hodgetts (University of Toronto)
Various Toronto Data Lab projects will be presented including
arxivdl, aRianna, cesR, and
more!
Thursday, 17 September 2020, 4-5pm Amber Simpson (Queen’s University) Dr. Simpson is the Canada Research
Chair in Biomedical Computing and Informatics and Associate Professor in
the School of Computing (Faculty of Arts and Science) and Department of
Biomedical and Molecular Sciences (Faculty of Health Sciences). She
specializes in biomedical data science and computer-aided surgery. Her
research group is focused on developing novel computational strategies
for improving human health. Dr Simpson will discuss cancer and AI.
Thursday, 24 September 2020, 4-5pm Chelsea Parlett-Pelleriti (Chapman University) Chelsea is a PhD
candidate and full-time instructional faculty at Chapman University
where her research focuses on using novel statistical and Machine
Learning methods (mostly Bayesian statistics, IRT models, and
clustering) to behavioral data. As an instructor she teaches Python, R
and Data Science, and loves using novel technology (like TikTok, Twitch,
and flipped classes) to better engage and inspire students. Chelsea will
discuss talking to non-DS team members about DS topics.
Thursday, 1 October 2020, 4-5pm Florence Vallée-Dubois (Université de Montréal) Florence Vallée-Dubois
is a Ph.D. candidate at the department of political science of the
University of Montreal. She is also a member of the Centre for the Study
of Democratic Citizenship and Canada Research Chair in Electoral
Democracy. Her research interests focus on Quebec and Canadian politics,
political behaviour and quantitative methods. Her doctoral project
focuses on the political behaviour and democratic representation of
seniors in Canada. Florence will discuss Canadian Demographics by
Electoral Riding (1991-2015).
Thursday, 8 October 2020, 4-5pm Yim Register (University of Washington Data Lab) Yim Register (they/them)
is a radical optimist, child advocate, and PhD student at the University
of Washington Data Lab exploring what self-advocacy looks like within
machine learning systems. They study how empowering novices with Data
Science knowledge can impact their participation and joy in an AI-driven
world! Their passion project right now is writing a book called Life
Lessons from Algorithms, a book that teaches how machine learning
algorithms work through trauma recovery skills. Yim will discuss
self-advocacy within machine learning systems.
Thursday, 22 October 2020, 4-5pm Panel discussion on data-focused resources at the University of
Toronto. Jeff Waldman (University of Toronto) Leanne Trimble (University of Toronto) Leslie Barnes (University of Toronto) Lisa Strug (University of Toronto)
Jeff Waldman is the Manager, Institutional Data Governance; Leslie
Barnes is the Digital Scholarship Librarian at UTL; Leanne Trimble is
the Data and Statistics Librarian at UTL; Lisa Strug is a Senior
Scientist in the Program of Genetics and Genome Biology, Associate
Director of The Centre for Applied Genomics, Professor of Statistical
Sciences and Biostatistics at the University of Toronto, and Director of
CANSSI Ontario.
Thursday, 29 October 2020, 4-5pm Fei Chiang (McMaster University) Fei Chiang is an
Associate Professor in the Department of Computing and Software (Faculty
of Engineering), the Director of the Data Science Research Group, and a
Faculty Fellow at the IBM Centre for Advanced Studies. She served as an
inaugural Associate Director of the MacData Institute. Her research
interests and industrial experience is in data management, spanning data
cleaning, data quality, data privacy, data fusion, and database systems.
Professor Chiang will discuss data currency and its applications.
Thursday, 5 November 2020, 4-5pm Andrew Whitby (Industry data scientist) Andrew is a data scientist and
economist currently looking for his next challenge. He is particularly
interested in the economics of technology, creativity, innovation and
growth. He wrote The Sum of the People: How the Census Has Shaped
Nations from the Ancient World to the Modern Age which was published in
March 2020. Previously, he worked as a Data Scientist at the World Bank,
and at Nesta, the UK’s innovation think tank. His academic background
combines economics, statistics and computer science. He completed his
doctoral research in the Department of Economics at the University of
Oxford. Andrew will discuss censuses.
Monday, 9 November 2020, 4-5pm Tom Cardoso (Globe and Mail) Tom
Cardoso is a crime and justice reporter and data journalist for The
Globe and Mail. Tom will discuss his Bias
Behind Bars series of articles which show Black and Indigenous
inmates in Canada are more likely to get worse scores than white
inmates, based solely on their race.
Thursday, 12 November 2020, 4-5pm Kevin Armstrong (University of Toronto)
Kevin Armstrong is a Masters of Information student at the University of
Toronto, and a data consultant for ‘Women’s Integrated Sexual Health’
(WISH)
- a three-year program delivering integrated health care in 16 countries
in Africa and South Asia. Kevin will discuss methods for measuring
poverty and use cases for NGOs.
Thursday, 19 November 2020, 4-5pm Michael Chong (University of Toronto) Michael is a PhD student
in the Department of Statistical Sciences at the University of Toronto
building models for demographic estimation. Previously, he completed his
BSc in Integrated Science at McMaster University. Michael will discuss
lessons from a high-throughput Bayesian modelling workflow
Thursday, 26 November 2020, 4-5pm Postponed
Thursday, 3 December 2020, 5-6pm Monica Alexander (University of Toronto) Monica Alexander is
an Assistant Professor in Statistical Sciences and Sociology at the
University of Toronto. She received her PhD in Demography from the
University of California, Berkeley. Her research interests include
statistical demography, mortality and health inequalities, and
computational social science. Monica will talk about using Facebook
advertising data to estimate migration. A recording of the talk is
available here: https://youtu.be/xM1vf_KT76g.
Thursday, 10 December 2020, 4-5pm Shabrina Mardevi (United Nations Population Fund and
University of Toronto) Romesh Silva (United Nations Population Fund)
Shabrina is a Masters of Information student at the University of
Toronto and a Population Data Estimation and Analysis Intern at the
United Nations Population Fund. Romesh holds a PhD in Demography from
the University of California, Berkeley, and is a Technical Specialist,
Health & Social Inequalities, at the United Nations Population
Fund.
Thursday, 17 December 2020, 4-5pm Panel discussion on teaching data-focused topics. Liza Bolton (University of Toronto) Maria Tackett (Duke University) Nathalie Moon (University of Toronto) Teon Brooks (Mozilla Firefox) Liza
Bolton is an Assistant Professor, Teaching Stream, at the University
of Toronto. Maria Tackett is
an Assistant Professor of the Practice in the Department of Statistical
Science at Duke University. Nathalie
Moon is an Assistant Professor, Teaching Stream, University of
Toronto. Teon L. Brooks, holds a
PhD in experimental psychology from NYU, and now works as a data
scientist for Mozilla Firefox. He also
serves as the technical advisor and President of BrainWaves, an NIH-funded
project to teach experimentation and cognitive neuroscience to high
school students in NYC, and has co-founded Computation in Education Labs
(CIEL), a nonprofit that aims to
further the mission of the BrainWaves project while focusing on data
science and computational thinking.
Summer 2020
Thursday, 21 May 2020, 4-5pm Rohan Alexander (University of Toronto,
Information) Rohan is a post-doctoral fellow at the
Faculty of Information, University of Toronto. He holds a PhD in
Economics from the Australian National University. He will talk about
getting data from PDFs into R, with an application to the Kenyan
census.
Thursday, 28 May 2020, 4-5pm Shiro Kuriwaki (Harvard University, Government) Shiro is a Ph.D. Candidate
at the Department of Government, Harvard University. His research
focuses on democratic representation in American Politics, for instance
cast vote records, public opinion, survey methods, and applied
statistics more generally. Shiro will bring together best practices for
organizing data and code in the social sciences that experts have
proposed with some of his own experience. He will propose a
project-oriented workflow that adopts a minimal and consistent file
organization structure within a single project, using RStudio Projects
and GitHub. He will then discuss how to organize multiple projects that
share common components, and propose the use of custom R packages to
share code and Dataverse to share large datasets. He will use some of
his own projects involving the Cooperative Congressional Election Study
(CCES), one of the largest political surveys of American Politics, as a
demonstration.
Recording available here.
Thursday, 4 June 2020, 4-5pm Marija Pejcinovska (University of Toronto, Statistical
Sciences)
Marija is a second-year Ph.D. student in Statistics at the University of
Toronto. Her research interests are in applied statistics, specifically
the application of Bayesian methods to data and modeling challenges that
arise in demography, public health, and certain areas of the social
sciences. Marija will talk about a current project with the World Health
Organization (WHO) focused on estimating global maternal mortality to
share her R workflow and the different tools and packages she’s found
helpful in the data processing stage. More specifically, she’ll be
sharing a few ways of dealing with text and date data in R.
Thursday, 11 June 2020, 4-5pm Harrison Jones (Deloitte) Harrison
is a Manager at Deloitte in Toronto, where he focuses on data analytics
and machine learning in the property & casualty insurance, life
insurance, health insurance, pensions, and the public sector. Harrison
will talk about using R with actuarial data.
Thursday, 18 June 2020 Cancelled in support of the Black Lives Matter movement
and to provide an opportunity for reflection and learning. We also announced
stipends in support of University of Toronto students or recent
graduates who identify as a member of a visible minority group,
racialized group, or as a person of colour.
Thursday, 25 June 2020, 4-5pm A Mahfouz (University of Toronto, Information)
A is a Master of Information student at the University of Toronto with a
background in geography. Their prior work has been largely concerned
with data pipelines. A will talk about geographic data cleaning,
extracting mappable data from Google Directions API results in
Python.
Slides available here.
Thursday, 2 July 2020, 4-5pm Heather McBrien (University of Toronto, Statistical
Sciences)
Heather just graduated from the Statistics BSc program at the University
of Toronto, and is interested in modelling in population health
research, particularly using novel data sources to answer questions
where traditional data is lacking. Heather will talk about how the data
that we collect can bias the results that we obtain and our knowledge of
the problem.
Slides available here.
Thursday, 9 July 2020, 4-5pm Roxanne Chui (University of Toronto, Information)
Roxanne is an emerging anthropological data science professional. She
did her BSc program in Forensic anthropology and worked in the
pharmaceutical industry before doing her Masters in data science. She is
passionate about excavating context from data for predicting future
patterns of human behaviour. Roxanne will talk about an EDA approach to
Tokyo AirBnB datasets and pattern discovery in listing prices using R -
‘What do we have here among millions of observations?’
Slides available here.
Thursday, 16 July 2020, 4-5pm Casey Breen (University of California, Berkeley,
Demography)
Casey is a PhD student in the Demography Department at Berkeley. He
previously worked at the Institute for Social Research and Data
Innovation, home of IPUMS. Casey will talk about CenSoc, which is a project to
link 1940 Census data with Social Security Administration mortality
records in the US.
Thursday, 23 July 2020, 11am-noon Marta Kołczyńska (Institute of Political Studies of the
Polish Academy of Sciences) Marta is an Assistant
Professor at the Institute of Political Studies of the Polish Academy of
Sciences and a visiting researcher in the Probabilistic Machine Learning
Group, Department of Computer Science, Aalto University. Her research
interests include comparative analyses of political attitudes and
behavior across nations and over time, as well as the methodology of
comparative research, in particular cross-national surveys. Marta will
talk about cleaning survey data, in particular a project in which she
gathers political trust items from different cross-national survey
datasets to model time trends, and the tools she has developed to
facilitate this work.
Thursday, 30 July 2020, 4-5pm Alex Luscombe (University of Toronto, Criminology and
Sociolegal Studies) Alexander McClelland (Carleton University, Criminology
and Criminal Justice) Alex Luscombe is a PhD student in
the Centre for Criminology & Sociolegal Studies at the University of
Toronto and a Junior Fellow at Massey College. Alexander McClelland is
an Assistant Professor at the Institute of Criminology and Criminal
Justice, Carleton University. They will talk about Policing the Pandemic,
which is a project that was launched on 4 April, 2020, to track and
visualize the massive and extraordinary expansions of police power in
response to the COVID-19 Pandemic and the unequal patterns of
enforcement that may arise as a result.
Slides available here.
Thursday, 6 August 2020, 4-5pm Sharla Gelfand (Freelance R Developer) Sharla is a freelance R developer
specializing in enabling easy access to data and replacing manual,
redundant processes with ones that are automated, reproducible, and
repeatable. They will talk about creativity in R. Slides available here.
Thursday, 13 August 2020, 4-5pm Richard Iannone (R Studio) Rich is a Software Engineer at R
Studio. Rich will talk about pointblank,
which is an R package that allows workflows involving nice and easy data
validation in reproducible documents.
Thursday, 20 August 2020, 4-5pm Aije Egwaikhide (IBM)
Aije Egwaikhide holds an undergraduate degree in Economics and
Statistics from the University of Manitoba, and a post-graduate degree
in Business Analytics from St. Lawrence College, Kingston. She works at
IBM where she is a Lead Data Scientist on the System Enablement group.
Aije will talk about preparing data for optical character recognition
(OCR).
Winter 2020
Noon, Friday, 24 January 2020 Steven
Pimentel (U of T, business intelligence)
Noon, Friday, 31 January 2020 Arik
Senderovich (U of T, Information)
Noon, Friday, 7 February 2020 Kathy
Chung (U of T, Records of Early English Drama)