Toronto Data Workshop

Overview

The Toronto Data Workshop brings together academia and industry to share data science and AI best practice. We are broadly interested, but especially in the code- and data-centric aspects of a project that are often glossed over. We meet weekly for an hour and most talks are recorded.

If you would like to attend, please sign up here: https://rohanalexander.substack.com

Everyone is welcome—it is free and you do not need to be affiliated with the university.

The current organizing committee is: Kelly Lyons, and Rohan Alexander. Past committee members: Michaela Drouillard (2022-24), Amy Farrow (2021-22), Lorena Almaraz De La Garza (2021-22), and Faria Khandaker (2020-21).

2024

Past

Friday 29 November 2024, noon (EST)
- Caroline Weis, gsk.ai
- “MALDI-TOF MS based clinical antimicrobial resistance prediction with machine learning”
- Caroline Weis is a Senior AI/ML Engineer and team lead at gsk.ai. In 2021, she completed her PhD in Machine Learning for Computational Biology and Healthcare at ETH Zurich. Her research interests lie in the development of personalized healthcare through data analysis and machine learning on medical and biological data.
Friday 22 November 2024, noon (EST)
- Yiqin Fu, Stanford University
- “Understanding Chinese Industrial Policy Through Government Investments and Patents”
- Born and raised in China, Yiqin Fu spent many of her formative years in the U.S. and the U.K. Yiqin (pronounced ee-ching) is studying towards a Ph.D. in political science at Stanford University, after having worked as a research associate at Yale Law School’s Paul Tsai China Center in New Haven, Connecticut and Beijing, China. She holds a B.A. in Philosophy, Politics, and Economics from the University of Oxford and is broadly interested in innovation, U.S.-China relations, and comparative political and electoral systems.
Wednesday 13 November 2024, noon (EST)
- Sayash Kapoor, Princeton
- “Five surprises in evaluating AI agents”
- Sayash Kapoor is a computer science Ph.D. candidate at Princeton University’s Center for Information Technology Policy. His research focuses on the societal impact of AI. He has previously worked on AI in industry and academia, including Facebook, Columbia University, and EPFL Switzerland. He received a best paper award at ACM FAccT, an impact recognition award at ACM CSCW, and was included in TIME’s inaugural list of the 100 most influential people in AI. His recently released book with Arvind Narayanan, AI Snake Oil, looks critically at what AI can and cannot do.
Friday 8 November 2024, noon (EST)
- Jacob Baldwin, Pro Football Focus (PFF)
- “Pass Coverage Metrics in the NFL”
- Jacob Baldwin is a Senior Data Scientist at PFF. He holds an online M.S. degree in Applied Mathematics from the University of Washington, and graduated from Clarkson University with a B.S. in Physics, a B.S. in Applied Mathematics, and a minor in Computer Science.
Friday 1 November 2024, noon (EDT)
- Xiaojun Su, Unilever
- “Unlocking Business Value through Machine Learning Innovation”
- Xiaojun Su is a Machine Learning Lead, Horizon 3 Labs, Unilever where she leads cross-functional teams of data engineers, software developer, data scientists, postgraduate researchers, and 3rd party vendors to launch in-house models to drive significant ROIs. She holds a M.Sc from the University of Toronto.
Friday 25 October 2024, noon (EDT)
- Jay Alammar, Cohere
- “Hands-On Large Language Models: Language Understanding and Generation”
- Jay Alammar is Director and Engineering Fellow at Cohere (pioneering provider of large language models as an API). In this role, he advises and educates enterprises and the developer community on using language models for practical use cases.
Friday 11 October 2024, noon (EDT)
- Rona Fang-Yu Hu, University of Michigan
- “Predicting the Unobserved: Integrating ‘Something Else’ Sexual Identity Responses in Health Disparity Studies Using Machine Learning and Resampling Techniques”
- Rona Hu is a second-year Master’s student in the Michigan Program in Survey and Data Science. She graduated from National Chengchi University with a B.S. in Psychology. Before coming to the United States for her graduate studies, she was a Research Associate and Chief Operating Officer at Quanthon Corporation in Taiwan. She has presented papers at the conference of the American Association of Public Opinion Research and the Joint Statistical Meetings since 2022.
Friday 4 October 2024, noon (EDT)
- Sean Taylor, Motif
- “Casual Discovery for Product Analytics”
- I will discuss leveraging causal discovery in product analytics to uncover new insights and improve product development. First, I introduce a framework for categorizing causal questions, highlighting common challenges in product analytics, and emphasizing the need for discovering new causes that drive outcomes. I will present a process for generating hypotheses about causal relationships that are likely to lead to successful ideas for product improvements that can be validated through experiments. The key methodological insight is to model event data before they are aggregated, which helps us isolate causal relationships between events and allows for bias reduction to be automated under certain assumptions.
- Sean Taylor is a data scientist, social scientist, statistician, and software developer. He mostly specializes in methods for solving causal inference and business decision problems, and is particularly interested in building tools for practitioners working on real-world problems. He is a co-founder and chief scientist at Motif.
Friday 27 September 2024, noon (EDT)
- Annie Collins, GivingTuesday
- “Leveraging Data for Generosity: GivingPulse and the GivingTuesday Data Commons”
- The American non-profit sector faces significant challenges in fully understanding the complex landscape in which it operates, particularly when it comes to accessing reliable data on individual giving in the U.S. At the same time, there is increasing interest in gaining a more comprehensive view of the generosity ecosystem—one that encompasses not only monetary donations to formal non-profits but also volunteerism, in-kind contributions, advocacy, and peer-to-peer giving. To this end, this presentation will explore the objectives of the GivingTuesday Data Commons, its core features, and potential use cases. Additionally, it will introduce the GivingPulse project, an ongoing weekly survey that tracks a broad spectrum of individual giving behaviours and related attitudes.
- Annie Collins is a Data Scientist at GivingTuesday, a US-based nonprofit focused on researching generosity and charitable giving behaviours. Beyond GivingTuesday, Annie has spent several years in data management and research roles within the Canadian nonprofit sector. She holds a Bachelors of Science in applied mathematics and statistics from the University of Toronto, and uses her experience to provide data for the public good and support a more data-driven social sector worldwide.
Friday 20 September 2024, noon (EDT)
- Sana Shams, University of British Columbia
- Waris Bhatia, University of British Columbia
- “Accessible Investigative Journalism: Navigating Canada’s Largest Corpus of Government Documents”
- Sana is a research fellow through UBC’s Data Science for Social Good Program. She is currently pursuing a BSc in cognitive systems with a minor in data science and is passionate about the intersection of technological development and ethical compliance, particularly in the fields of data science and machine learning. Waris is a Data Science Intern with the University of British Columbia’s (UBC) Data Science Institute. He is a senior undergraduate studying Computer Science at UBC.
- “Open By Default” (OBD) is a dataset from the Investigative Journalism Foundation which is Canada’s largest collection of government documents, comprising over 4.5 million pages of Access To Information and Privacy (ATIP) requests and corresponding government documentation. This project enhanced data capture using optical character recognition (OCR), improved search performance through Large Language Model (LLM) vectorization, and topic modelling to reveal the high-level subject matter represented in the OBD dataset. The final development of the project was a Retrieval Augmented Generation (RAG) LLM pipeline, which enables a chatbot to provide tailored, context-rich responses to user queries, paired with follow-up research directions.
Friday 19 July 2024, noon (EDT)
- Majeed Kazemitabaar, University of Toronto
- “Improving Steering and Verification in AI-Assisted Data Analysis”
- Majeed Kazemitabaar is a fourth-year Ph.D. student in Computer Science at the University of Toronto, working with Professor Tovi Grossman. His research in Human-Computer Interaction and Computer Science Education focuses on creating engaging computational learning experiences and has been published in CHI, UIST, and IDC, and earned Best Paper and Best Late-Breaking Work awards.
Thursday 18 July 2024, noon (EDT)
- Hannah Rose Kirk, Oxford Internet Institute, University of Oxford
- “What does it mean for AI to be “aligned”?”
- Alignment has become a buzzword that frequently surfaces throughout the AI ecosystem. But what does it mean to align LLMs to the hundreds of millions of people who now interact with their outputs, each with different preferences for language, conversational norms, value systems, and political beliefs? This talk will explore the gnarly concept of alignment, specifically for large language models (LLMs). Using insights from the recently released PRISM Alignment Dataset, we’ll cover key questions around empirical alignment efforts, such as how to collect human feedback using various scales and signals, where to focus human labour, who to ask for feedback, and whether to target objectives of personalised or collective alignment. Through more comprehensive and representative data, PRISM highlights the challenges of aligning LLMs to meet the diverse expectations and beliefs of a global audience.
- Hannah Rose Kirk is a PhD student at the University of Oxford, UK. Hannah’s research centres on the role of granular and diverse human feedback for aligning large language models. Her body of published work spans computational linguistics, computer vision, ethics and sociology, addressing a broad range of issues such as AI safety, bias, fairness, and hate speech from a multidisciplinary perspective. Hannah holds degrees from the University of Oxford, the University of Cambridge and Peking University. Alongside academia, she collaborates often with industry projects at Google, OpenAI and MetaAI, and previously worked as a Data Scientist in the Public Policy Team at The Alan Turing Institute.
Friday 12 July 2024, noon (EDT)
- Naman Jain, UC Berkeley
- “LiveCodeBench: Holistic and contamination free evaluation of large language models for code”
- In this talk we introduce LiveCodeBench, a comprehensive and contamination-free benchmark for LLMs in code, which continuously collects new problems from LeetCode, AtCoder, and CodeForces. LiveCodeBench evaluates a wide range of capabilities, including self-repair, code execution, and test output prediction. It currently hosts 400 coding problems published between May 2023 and May 2024. We evaluated 18 base LLMs and 34 instruction-tuned LLMs, presenting findings on contamination, performance comparisons, and potential overfitting.
- Naman Jain is a CS Ph.D. student at UC Berkeley, focusing on using machine learning to enhance developer productivity tools like program analysis, synthesis, and repair. He also explores how synthesis and verification can improve algorithm generalizability and explainability. He holds an undergraduate degree from IIT Bombay, where he researched NLP robustness and computer vision. Before his Ph.D., he was a predoctoral research fellow at Microsoft Research India, working on program repair, improving large language models, and learning decision trees.
Thursday 11 July 2024, 9am (EDT)
- Terry Yue Zhuo, Monash University
- “BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions”
- In this talk we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. Our evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
- Terry Yue Zhuo is a PhD candidate in Computer Science at Monash University and the CSIRO’s Data61. He holds a Bachelor of Computer Science (Honours) from Monash University. He is additionally an associate member of the Sea AI Lab, a visiting scholar at Singapore Management University, and a research technician at CSIRO’s Data61. His research has been published at venues including EMNLP, ICLR, EACL, and TMLR.
Friday 5 July 2024, noon (EDT)
- Kobi Hackenburg, Oxford Internet Institute, University of Oxford
- “Evaluating the persuasive influence of political microtargeting with large language models”
- Advances in LLMs have raised concerns over scalable, personalized political persuasion. In this talk, based on a paper recently published in PNAS, we integrate user data into GPT-4 prompts in real-time, facilitating the live creation of messages tailored to persuade individual users on political issues. We then deploy this application at scale to test whether personalized, microtargeted messaging offers a persuasive advantage compared to nontargeted messaging. We find that while messages generated by GPT-4 were persuasive, in aggregate, the persuasive impact of microtargeted messages was not statistically different from that of nontargeted messages. These findings suggest—contrary to widespread speculation—that the influence of current LLMs may reside not in their ability to tailor messages to individuals but rather in the persuasiveness of their generic, nontargeted messages.
- Kobi Hackenburg is a PhD candidate in Social Data Science at the Oxford Internet Institute, University of Oxford. His doctoral research, funded by a Clarendon Scholarship and supervised by Helen Margetts and Scott Hale, investigates the persuasive influence of personalized AI systems. More broadly, his work lies at the intersection of computation, language, and society. Alongside his PhD, he works as a Doctoral Researcher in the Public Policy Programme at The Alan Turing Institute, the UK’s national institute for AI and data science.
Friday 28 June 2024, noon (EDT)
- Sasha Issenberg
- “The Lie Detectives: In Search of a Playbook for Winning Elections in the Disinformation Age”
- Sasha Issenberg is a journalist and author of “The Lie Detectives: In Search of a Playbook for Winning Elections in the Disinformation Age” and four previous books, most recently “The Engagement: America’s Quarter-Century Struggle Over Same-Sex Marriage.” He teaches in the UCLA Department of Political Science and is a correspondent for Monocle. His work has also appeared in New York, The New York Times Magazine and George, where he was a contributing editor.
Wednesday 26 June 2024, 1pm (EDT)
- Matthew Gentzkow, Stanford University
- “The effects of Facebook and Instagram on the 2020 election: A deactivation experiment”
- The talk will be based on this paper which studies the effect of Facebook and Instagram access on political beliefs, attitudes, and behavior by randomizing a subset of 19,857 Facebook users and 15,585 Instagram users to deactivate their accounts for 6 weeks before the 2020 U.S. election.
- Matthew Gentzkow is the Landau Professor of Technology and the Economy at Stanford University. He studies applied microeconomics with a focus on media and technology industries. He received the 2014 John Bates Clark Medal, given by the American Economic Association to the American economist under the age of forty who has made the most significant contribution to economic thought and knowledge. He is a member of the National Academy of Sciences, a fellow of the American Academy of Arts and Sciences, a fellow of the Econometric Society, a senior fellow at the Stanford Institute for Economic Policy Research, and the Editor of American Economic Review: Insights.
Friday 14 June 2024, noon (EDT)
- Jae Yeon Kim, SNF Agora Institute at Johns Hopkins University
- “Field experimentation in the U.S. safety net”
- Jae Yeon Kim is an incoming assistant research scientist at the SNF Agora Institute at Johns Hopkins University and a research fellow at the Center for Public Leadership at Harvard Kennedy School. Previously, he worked as a senior data scientist at Code for America, where he collaborated with all levels of the U.S. government to improve access to safety net programs. He completed his PhD in Political Science from the University of California, Berkeley in 2021.
Friday 7 June 2024, noon (EDT)
- Lars Vilhuber, Cornell University
- “Privacy protection in RCTs: The challenge of privacy protection in the field”
- Lars Vilhuber holds a Ph.D. in Economics from Université de Montréal, Canada, and is currently on the faculty of the Cornell University Economics Department. He has interests in labor economics, statistical disclosure limitation and data dissemination, and reproducibility and replicability in the social sciences. He is the Data Editor of the American Economic Association, and Managing Editor of the Journal of Privacy and Confidentiality.
Friday 31 May 2024, noon (EDT)
- Ethan Busby, Brigham Young University
- “AI-Enabled Persuasion Research: Experimenting with Effective Political Messaging”
- Ethan Busby is an Assistant Professor of Political Science at Brigham Young University, specializing in political psychology, extremism, artificial intelligence, and computational social science. His research relies on various methods, using lab experiments, quasi-experiments, survey experiments, text-as-data, surveys, artificial intelligence, and large-language models. He studies extremism in democracies, including what extremism is, who people blame for extremism, and what encourages and discourages extremism.
Friday 24 May 2024, 10am (EDT)
- Belinda Li, MIT
- “Eliciting Human Preferences with Language Models”
- Belinda Li a PhD candidate at MIT CSAIL, affiliated with the language & intelligence (LINGO) lab @ MIT. Her work focuses on improving the human-interpretability, reliability, and usability of language models: examining and improving representations of both (objective) world states and (subjective) human preferences in language models. She is funded by an NDSEG Fellowship and Clare Boothe Luce Graduate Fellowship. Previously, she spent a year at Facebook AI Applied Research, and before that, obtained her B.S. in Computer Science at the University of Washington.
Friday 17 May 2024, noon (EDT)
- Victoria Angelova, Harvard University
- “Algorithmic Recommendations and Human Discretion”
- Victoria Angelova is a fourth-year PhD student in Economics at Harvard University interested in Applied Microeconomics. She received a AB in Economics from Wellesley College in 2018. Prior to starting her PhD, she was a Research Assistant at the Industrial Relations Section at Princeton University.
Friday 10 May 2024, noon (EDT)
- Amanda Coston, Microsoft Research and Berkeley
- “Addressing validity in decision-making algorithms”
- Amanda Coston is a Postdoc at Microsoft Research in the Machine Learning and Statistics Team. In fall 2024 she will join the Department of Statistics at UC Berkeley as an Assistant Professor. Her work considers how – and when – machine learning and causal inference can improve decision-making in societally high-stakes settings. Her research addresses real-world data problems that challenge the validity, equity, and reliability of algorithmic decision support systems and data-driven policy-making. A central focus of her research is identifying when algorithms, data used for policy-making, and human decisions disproportionately impact marginalized groups. Amanda earned her PhD in Machine Learning and Public Policy at Carnegie Mellon University (CMU) where she was advised by Alexandra Chouldechova and Edward H. Kennedy. Amanda is a Rising Star in EECS, Machine Learning and Data Science, Meta Research PhD Fellow, NSF GRFP Fellow, K & L Gates Presidential Fellow in Ethics and Computational Technologies, and Tata Consultancy Services Presidential Fellow. Her work has been recognized by best paper awards and featured in The Wall Street Journal and VentureBeat.
Friday 3 May 2024, noon (EDT)
- Kosuke Imai, Harvard University
- “Does AI help humans make better decisions? A methodological framework for experimental evaluation”
- Kosuke Imai is Professor in the Department of Government and the Department of Statistics at Harvard University. He is also an affiliate of the Institute for Quantitative Social Science where his office is located. Before moving to Harvard in 2018, Imai taught at Princeton University for 15 years where he was the founding director of the Program in Statistics and Machine Learning. Imai specializes in the development of statistical methods and machine learning algorithms and their applications to social science research. His areas of expertise include causal inference, computational social science, and survey methodology. Imai leads the Algorithm-Assisted Redistricting Methodology Project (ALARM) and served as an expert witness for several high-profile legislative redistricting cases. In addition, he is the author of Quantitative Social Science: An Introduction (Princeton University Press, 2017). Outside of Harvard, Imai served as the President of the Society for Political Methodology from 2017 to 2019. His current research interests include: data-driven policy learning and evaluation, causal inference with high-dimensional and unstructured treatments (e.g., texts, images, videos, and maps), fairness and racial disparity analysis, algorithmic redistricting analysis, data fusion and record linkage, census and privacy.
Friday 26 April 2024, noon (EDT)
- Abel Brodeur, University of Ottawa
- “Mass Reproducibility and Replicability: A New Hope”
- Abel Brodeur is an Associate Professor in the Department of Economics at the University of Ottawa. He earned a Ph.D. from the Paris School of Economics in 2015, and participated in the European Doctoral Program at the London School of Economics. His research interests include applied microeconomics, with a focus on reproductions and replications in economic research. He has served as a guest editor for special issues dedicated to these topics in Economic Inquiry and Research & Politics. Brodeur is also the founder and chair of the Institute for Replication (I4R) and co-directs the Ottawa Applied Microeconomics Lab.
- The talk will be based on a recent paper available here.
Thursday 4 April 2024, noon - 1pm
- Lenny Bronner, The Washington Post
- “Election Modeling at The Washington Post”
- Lenny Bronner is a data scientist at The Washington Post, specializing in elections.
Thursday 28 March 2024, noon - 1pm
- Cameron Buckner, University of Houston
- “The philosophy of Large Language Models”
- Cameron Buckner is an Associate Professor of Philosophy at University of Houston, and author of From Deep Learning to Rational Machines.
Thursday 21 March 2024, noon - 1pm
- Laura Plein, University of Luxembourg
- “Can LLMs demystify Bug Reports and translate them into Test Cases?”
- Laura Plein conducted this research as an intern in the TruX (Trustworthy Software Engineering) Team at the SnT, University of Luxembourg. Her research focuses on leveraging Large Language models to automate Software Engineering processes such as automatic test case generation and test input extraction in the context of automated program repair. She is currently affiliated to the Saarland University.
Thursday 14 March 2024, noon - 1pm
- Tom Davidson, Rutgers University
- “Start Generating: Harnessing Generative Artificial Intelligence for Sociological Research”
- How can generative artificial intelligence (GAI) be used for sociological research? This talk explores applications to the study of text and images across multiple domains, including computational, qualitative, and experimental research. Drawing upon recent research and stylized experiments with DALL-E and GPT-4, discuss the potential applications of text-to-text, image-to-text, and text-to-image models for sociological research. Across these areas, GAI can make advanced computational methods more efficient, flexible, and accessible. The paper also emphasizes several challenges raised by these technologies, including interpretability, transparency, reliability, reproducibility, ethics, and privacy, as well as the implications of bias and bias mitigation efforts and the trade-offs between proprietary models and open-source alternatives. When used with care, these technologies can help advance many different areas of sociological methodology, complementing and enhancing our existing toolkits. See: https://osf.io/preprints/socarxiv/u9nft.
- Thomas Davidson is an Assistant Professor of Sociology at Rutgers University—New Brunswick. He specializes in using computational methods and data from social media to analyze far-right activism, populism, and hate speech. He is currently researching the relationship between ranking and recommendation algorithms and activism, the applications of generative AI to sociological research, and the social context of content moderation. His work has been published in venues including Social Forces, Mobilization, and Socius.
Friday 8 March 2024, noon - 1pm
- Jonathan Mellon, West Point
- “Using LLMs to code open-text social survey responses at scale”
- We compare the accuracy of six LLMs using a few-shot approach, three supervised learning algorithms (SVM, DistilRoBERTa, and a neural network trained on BERT embeddings), and a second human coder on the task of categorizing “most important issue” responses from the British Election Study Internet Panel into 50 categories. For the scenario where a researcher lacks existing training data, the accuracy of the highest-performing LLM (Claude-1.3: 93.9%) neared human performance (94.7%) and exceeded the highest-performing supervised approach trained on 1000 randomly sampled cases (neural network: 93.5%). In a scenario where previous data has been labeled but a researcher wants to label novel text, the best LLM’s (Claude-1.3: 80.9%) few-shot performance is only slightly behind the human (88.6%) and exceeds the best supervised model trained on 576,000 cases (DistilRoBERTa: 77.8%). PaLM-2, Llama-2, and the SVM all performed substantially worse than the best LLMs and supervised models across all metrics and scenarios. Our results suggest that LLMs may allow for greater use of open-ended survey questions in the future. Open access paper published in Research & Politics by Jonathan Mellon, Jack Bailey, Ralph Scott, James Breckwoldt, Marta Miori, Phillip Schmedeman and replication code/data (includes all prompts/API calls/local models etc).
- Jonathan Mellon is an Associate Professor at West Point’s Department of Systems Engineering and co-director of the British Election Study. His research focuses on improving measurement and causal inference in social science. He studies electoral behavior, online citizen engagement, and measuring public opinion. He holds a DPhil in Sociology from Nuffield College, University of Oxford.
Thursday 7 March 2024, noon - 1pm
- Matheus Facure, Nubank
- “Why Banking has the Coolest Stats/Data Science Problems”
- Matheus Facure has a background in Economics and is a Staff Data Scientist at Nubank. He works mostly with credit underwriting and causal inference. He is the author of Causal Inference in Python (O’Reilly) and Causal Inference for the Brave and True (Online, Open Source).
Friday 1 March 2024, noon - 1pm
- Jacob Austin, Google DeepMind
- “Resolving Code Review Comments with Machine Learning”
- Code reviews are a critical part of the software development process, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. With machine learning, we have an opportunity to automate and streamline the code-review process, e.g., by proposing code changes based on a comment’s text.
- Jacob Austin is a Senior Research Engineer at Google DeepMind, working on program synthesis and large language models. He was previously an AI Resident at Google Brain, and a Research Intern at NVIDIA with Anima Anandkumar. He holds a bachelor’s degree in computer science and mathematics from Columbia University. He studied machine learning and robotics at the Columbia Creative Machines Lab. He is also a pianist who plays a lot of chamber music, and has performed at Carnegie Hall, Music Mountain, Apple Hill, and Kinhaven.
Thursday 29 February 2024, noon - 1pm
- Sky CH-Wang, Columbia University
- “Do Androids Know They’re Only Dreaming of Electric Sheep?”
- Sky CH-Wang is a PhD Candidate in Computer Science at Columbia University advised by Zhou Yu and Smaranda Muresan. His research primarily revolves around Natural Language Processing (NLP). He is broadly interested in the area where NLP meets Computational Social Science (CSS). Here, his research primarily revolves around three major areas: (1) revealing and designing for social difference and inequality, (2) cross-cultural NLP, and (3) mechanistic interpretability. His research is supported by a NSF Graduate Research Fellowship.
Thursday 22 February 2024, noon - 1pm
- Jessica Otis, George Mason University
- “By the Numbers: Numeracy, Religion, and the Quantitative Transformation of Early Modern England”
- Dr Jessica Marie Otis is Assistant Professor of History and Director of Public Projects at the Roy Rosenzweig Center for History and New Media at George Mason University. She is the author of the new book By the Numbers: Numeracy, Religion, and the Quantitative Transformation of Early Modern England published by Oxford University Press.
Thursday 15 February 2024, noon - 1pm
- Richard Iannone, Posit, PBC
- “Using Great Tables to Make Presentable Tables in Python”
- Rich is a software engineer at Posit, PBC (formerly RStudio). He focuses on making useful R and (more lately) Python packages for data analysis and presentation workflows.
Thursday 8 February 2024, noon - 1pm
- Andreas Florath, Deutsche Telekom
- “LLM Interactive Optimization of Open Source Python Libraries – Case Studies and Generalization”
- Andreas Florath is a Cloud Architect at Deutsche Telekom AG, with a focus on Edge-Cloud Continuum Architecture. His tenure of nearly 25 years at the company has seen him contribute significantly to the development of innovative technology solutions, particularly through his work with the Open Grid and EU Cloud Alliances. Andreas is proficient in complex system analysis and programming languages such as C, C++, and Python, which he leverages to directly implement and test architectures, especially in the realms of cloud federation and AI/ML applications within the telecommunications industry. His expertise is further underscored by his involvement in numerous projects and his commitment to advancing the field through research and collaboration.
Tuesday 6 February 2024, 6:10pm - 7pm
- Bradley Congelio, Kutztown University of Pennsylvania
- “Introduction to NFL Analytics”
- Bradley Congelio is an Assistant Professor in the College of Business at Kutztown University of Pennsylvania. His main area of instruction & research is in Data Analytics and Sport Analytics. He is the author of Introduction to NFL Analytics with R, which was published by CRC Press in December 2023. His research focuses on using big data, the R programming language, and analytics to explore the impact of professional stadiums on neighboring communities. He uses the proprietary Zillow ZTRAX database as well as the U.S. Census and other forms of data to create robust, applied, and useful insight into how best to protect those livings in areas where stadiums are proposed for construction. His work in sport analytics, specifically the NFL, has been featured on numerous media outlets, including the USA Today and Sports Illustrated.
Thursday 1 February 2024, noon - 1pm
- Oliver Giesecke, Stanford University
- “AI at the Frontiers of Economic Research”
- Oliver Giesecke is a research fellow at the Hoover Institution at Stanford University. Giesecke works on topics related to asset pricing and public finance. His recent work studies the finances of state and local governments across the United States. This includes the capital structure of state governments, the book and market equity position of city governments, and the status quo and trend of public pension obligations. For his work on city governments’ finances, he was awarded the NASDAQ OMX Award for the best paper on asset pricing. His work on pension obligations was instrumental to shaping state legislation. In addition, Giesecke has conducted a large-scale survey that elicits the retirement plan preferences of public sector employees across the United States. He is the author of the Stanford municipal finances dashboard which provides, for the first time, credit spreads and fiscal fundamentals for many state and local governments in the United States. The dashboard has received national media coverage in The Bond Buyer. Prior to his academic career, he has worked for Germany’s Federal Agency for Financial Market Stabilization (FMSA) and as a senior quantitative finance consultant. Giesecke received a Ph.D. in finance and economics from Columbia University, a Master’s in economics from the Graduate Institute in Geneva, Switzerland, and a BA from Frankfurt University, Germany.
Thursday 25 January 2024, noon - 1pm
- Kristina Gligorić, Stanford University
- “In-class Data Analysis Replications: Teaching Students while Testing Science”
- Kristina Gligorić is a Postdoctoral Scholar at Stanford University Computer Science Department, advised by Dan Jurafsky at the NLP group. Previously she obtained her Ph.D. in Computer Science at EPFL, where she was advised by Robert West. Her research focuses on developing computational approaches to solve burning societal issues, understand and improve human well-being, and promote social good. She leverages large-scale textual data and digital behavioral traces and tailors computational methods drawn from AI, NLP, and causal inference. Her work has been published in top computer science conferences (such as ACM CSCW, AAAI ICWSM, and TheWebConf) and broad audience journals (Nature Communications and Nature Medicine). She is a Swiss National Science Foundation Fellow and University of Chicago Rising star in Data Science. She received awards for her work, including EPFL Thesis Distinction, CSCW 2021 Best Paper Honorable Mention Award, ICWSM 2021 and 2023 Best Reviewer Award, and EPFL Best Teaching Assistant Award.
Thursday 18 January 2024, noon - 1pm
- Gregory Zuckerman, Wall Street Journal
- “Renaissance Technologies, data, and Wall Street”
- Gregory Zuckerman is a special writer at The Wall Street Journal and non-fiction author. His non-fiction books include “The Greatest Trade Ever: The Behind-the-Scenes Story of How John Paulson Defied Wall Street and Made Financial History”, “The Frackers: The Outrageous Inside Story of the New Billionaire Wildcatters”, two books for children, “The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution” and “A Shot to Save the World: The Inside Story of the Life-or-Death Race for a COVID-19 Vaccine”. He has been awarded the Gerald Loeb Award, the highest honor in business journalism, three times.

2023

Friday 24 November 2023, noon - 1pm
- Wendy Foster, Shopify
- “Socio-technical processes for data integrity”
- Wendy Foster is Director of Engineering and Data, Optimize, at Shopify.
Friday 17 November 2023, 1pm - 2pm
- Chiara Alcantara, University of Waterloo
- Franklin Ramirez, University of Waterloo
- Helena Xu, University of Waterloo
- “Exploring Alternatives to REST for Accessing Public Data Sets”
- In the past decade, governments, non-profits, and other civic institutions have increasingly been sharing data on a wide range of topics, supporting various sectors of society from private enterprise to academic research. However, using this data in a programmatic and reproducible fashion is hampered by the myriad nature of the APIs provided to access it. In this talk, we explore one potential alternative to the predominant REST paradigm: storing tabular data in the open Parquet file format on publicly accessible endpoints. By using this standardized file format and a simple backing infrastructure, new tools based on Apache Arrow, such as DuckDB and Polars, can be used to build reusable front ends and other cost-effective solutions. We present a case study based on Statistics Canada’s library of public data sets, showcasing how such a system can simplify initial selection and filtering through common SQL queries.
- Chiara, Franklin, and Helena are second-year computer science students at the University of Waterloo who are interested in applying new software and data engineering tooling to increase the accessibility of public data sets.
Friday 10 November 2023, noon - 1pm
- Alexander Coppock, Yale University
- “Research design in the social sciences”
- Alex Coppock is an Associate Professor (on term) of Political Science at Yale University. He is the author of Persuasion in Parallel: How Information Changes Minds about Politics and a member of the DeclareDesign team. He is a co-author of the recently published book Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign.
Friday 3 November 2023, noon - 1pm
- Apoorva Lal, Netflix
- “Modern balancing methods for causal inference”
- Data scientist on the experimentation team at Netflix working on observational causal inference with spatial and panel data.
Friday 3 November 2023, 11am - noon
- John Yang, Princeton University
- “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?”
- John Yang is a research assistant working at Princeton University, advised by Professor Karthik Narasimhan. He is interested in Language Grounding & Interaction, Benchmarks for LLMs, Software Engineering, and Code Generation.
Friday 27 October 2023, 11am - noon
- James Zou, Stanford University
- “Can large language models provide useful feedback on research papers? A large-scale empirical analysis”
- James Zou is an Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering at Stanford University. He works on making machine learning more reliable, human-compatible and statistically rigorous, and am especially interested in applications in human disease and health. He is a two-time Chan-Zuckerberg Investigator and the faculty director of the university-wide Stanford Data4Health hub. His research is supported by the Sloan Fellowship, the NSF CAREER Award, and Google, Amazon and Adobe AI awards.
Friday 20 October 2023, noon - 1pm
- Marzieh Fadaee, Cohere for AI
- “Aya: An Open Science Initiative to Accelerate Multilingual AI Progress”
- Marzieh Fadaee is a Senior Research Scientist @ Cohere For AI. She holds a PhD from the Language Technology Lab at the University of Amsterdam, where she developed models to understand and utilize interesting phenomena in data, and was advised by Christof Monz and Arianna Bisazza. She received a B.Sc. from Sharif University majoring in Computer Engineering and M.Sc. from University of Tehran majoring in Artificial Intelligence.
Friday 20 October 2023, 11am - noon
- Yangjun Ruan, University of Toronto
- “Identifying the Risks of Language Model Agents with an Language Model Emulated Sandbox”
- Yangjun Ruan is a Ph.D. Candidate in Computer Science at the University of Toronto, where he is advised by Chris Maddison and Jimmy Ba. Previously, he was a student researcher at Google Research and a research intern at Microsoft Research. In summer 2019, he was a visiting student at UCLA, where he worked with Cho-Jui Hsieh. He obtained a Bachelor degree in Information Engineering from Zhejiang University.
Friday 13 October 2023, noon - 1pm
- Tom Cardoso, The Globe and Mail
- “Secret Canada: An investigation into Canada’s freedom of information systems”
- Tom Cardoso is a member of The Globe and Mail’s investigations team based in Toronto. Since the fall of 2021, he has been investigating Canada’s broken freedom of information systems.
Friday 6 October 2023, noon - 1pm
- Fabrizio Dell’Acqua, Harvard Business School
- “Experimental evidence on the effect of access to GPT-4 on performance”
- Fabrizio Dell’Acqua is a postdoctoral research fellow and teaching fellow at Harvard Business School and the Laboratory for Innovation Science at Harvard (LISH). He received a Ph.D. in Management from Columbia Business School. His research focuses on the areas of automation, human/AI collaboration, and business ethics.
Friday 29 September 2023, noon - 1pm
- Nima Sarajpoor, Manulife
- “STUMPY: A powerful tool for modern time series analysis”
- Nima Sarajpoor is a data scientist in Manulife, working in Fraud Detection. He has been contributing to the software STUMPY for about two years.
Friday 22 September 2023, noon - 1pm
- Saloni Dattani, Our World in Data
- “Missing data in global health”
- Saloni Dattani is a researcher on global health at Our World in Data, and an editor at Works in Progress. She’s interested in everything related to global health, epidemiology and meta-science.

2022

Friday, 16 December 2022, noon - 1pm
- Zane Schwartz, Investigative Journalism Foundation
- Zane Schwartz is the editor-in-chief of the Investigative Journalism Foundation.
Friday, 25 November 2022, noon - 1pm
- Marcel Fortin, U of T Map & Data Library
- Leanne Trimble, U of T Map & Data Library
- Recent additions to the Map and Data Library
- The Map and Data Library’s data collections, software & support, with a focus on recently acquired datasets.
- Leanne Trimble is a Data & Statistics Librarian, and Marcel Fortin is Head, Map and Data Library.
Friday, 18 November 2022, noon - 1pm
- Lindsay Katz, University of Toronto
- A new, comprehensive database of all proceedings of the Australian Parliamentary Debates
- Lindsay Katz holds a Masters of Statistics from the University of Toronto and a Bachelor of Arts and Science from the University of Guelph where she specialized in Mathematical Science and International Development. At Guelph she worked with Professor Ryan Briggs to explore lived poverty in Africa using Afrobarometer data. At Toronto she works with Professor Monica Alexander to research demographic variation in short-term migration patterns using Facebook data, and with Professor Rohan Alexander to digitize the Australian parliamentary debates from 1901 to present. As an interdisciplinary researcher, she is interested in using statistics to better understand social processes in the world.
Friday, 4 November 2022, noon - 1pm
- Maitreyee Sidhaye, St. Michael’s Hospital, Unity Health Toronto
- Meggie Debnath, St. Michael’s Hospital, Unity Health Toronto
- The Things We Learned from Deploying AI in Healthcare
- Maitreyee and Meggie are data scientists in the Data Science & Advanced Analytics (DSAA) unit at St. Michael’s Hospital in Toronto. DSAA is a multiteam unit in the hospital that provides data sciecne and machine learning solutions across a variety of problems: clinical prediction tools, staffing optimization, imaging, and more. DSAA works very closely with others in the hospital to collaborate on understanding problems and providing solutions. Maitreyee and Meggie will share their experiences and learnings from building and deploying machine learning tools in the hospital.
Friday, 21 October 2022, noon - 1pm
- Meg Risdal, Kaggle (Google)
- Meg Risdal is a lead Product Manager for Kaggle (a Google company) where she works with software developers, designers, and researchers to create great experiences for people learning ML, ML practitioners, and ML researchers.
Friday, 14 October 2022, noon - 1pm
- April Wang, University of Michigan
- Reimagining Tools for Collaborative Data Science
- April Wang is a Ph.D. candidate at University of Michigan School of Information, advised by Dr. Steve Oney and Dr. Christopher Brooks. Her research in human-computer interaction (HCI) explores barriers in real-world data science programming practices, and reimagines the workflow and interfaces for collaborative data science environments.
Friday, 7 October 2022, noon - 1pm
- Rohan Alexander, University of Toronto
- Rethinking data science
Friday, 30 September 2022, noon - 1pm
- Emily Giambalvo, The Washington Post
- Ence Morse, The Washington Post
- How the NFL blocks Black coaches
- Emily Giambalvo covers University of Maryland athletics for The Washington Post, where she has worked since June 2018. Emily grew up in South Carolina and graduated from the University of Georgia.
- Clara Ence Morse is an Investigative Reporting Workshop intern with The Washington Post’s data desk. She is a student at Columbia University and the editor in chief of the Columbia Daily Spectator.
Thursday, 22 September 2022, 5pm - 6pm
- Melina Vidoni, Australian National University
- Dr Vidoni is a Lecturer at the Australian National University in the CECS School of Computing, where she continues her domestic and international collaborations with Canada and Germany. Dr Vidoni’s main research interests are mining software repositories, technical debt, software development, and empirical software engineering when applied to data science and scientific software.
Friday, 9 September 2022, noon - 1pm
- Ryan Briggs, University of Guelph
- Statistical power in political science
- Ryan Briggs is a social scientist at the University of Guelph
Friday, 1 Apr 2022, noon - 1pm
- Brittany Witham, Geopolitica
- Data science at a startup
- Originally from Melbourne, Australia, Brittany received her B.A. in International Studies from the University of Saskatchewan and started her career in economic development, equipping her with comprehensive knowledge of foreign direct investment and international business early in her career.She went on to obtain an M.A. in European and Russian Affairs from the University of Toronto in 2018, where she first discovered the potential of programming for political science and became fascinated with artificial intelligence (AI). Over the past three years, Brittany has worked in many facets of the AI industry, from leading research and development of new AI products for video game developers to building automated data pipelines for business intelligence and managing software engineering and client engagement teams. In that time, she has honed technical skills in full-stack development, machine learning, and data engineering. She recently struck out on her own to launch an online global event monitoring tool and deliver novel solutions to clients in the political risk and social enterprise sectors. Brittany is a firm believer in the potential for data-driven technologies for geopolitics and is excited to contribute to the many discoveries to be made in this space.
Thursday, 24 March 2022, 5pm - 6pm
- Emi Tanaka, Monash University
- An anthology of experimental designs
- Dr. Emi Tanaka is an assistant professor in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area include data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter’s Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.
Friday, 18 March 2022, noon - 1pm
- May Chan, University of Toronto, Library
- Ramses Van Zon, University of Toronto, SciNet
Friday, 11 March 2022, noon - 1pm
- Irena Papst, McMaster University
- Using models to help guide pandemic response
- I’m a postdoctoral fellow in McMaster’s Mathematics & Statistics department, where I work on mathematical modelling, especially of infectious disease dynamics. I did my PhD in Cornell’s Center for Applied Mathematics. I care deeply about reproducible research, clear scientific communication, good teaching, and big salads.
Friday, 4 March 2022, noon - 1pm
- Maria Kamenetsky, University of Wisconsin-Madison
- Spatial clustering
- I am a PhD candidate in Epidemiology at the University of Wisconsin-Madison, where I also completed my MS in Statistics. My research focuses on methods in spatial epidemiology, specifically working on statistical methods and applications in spatial cluster detection.
Friday, 18 February 2022, noon - 1pm
- Vincent Arel-Bundock, Université de Montréal
- What modelsummary taught me about R package development
- I am a political science professor at the Université de Montréal.
Friday, 11 February 2022, noon - 1pm
- Silvia Canelón, University of Pennsylvania
- Lessons Learned from EHR Research
- Silvia Canelón is a postdoctoral research scientist in the Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania where she applies biomedical informatics to population health research. She uses R to work on projects that develop novel data mining methods to extract pregnancy-related information from Electronic Health Records (EHR) and that study the relationship between environment and disease.
Friday, 4 February 2022, noon - 1pm
- Nick Huntington-Klein, Seattle University
- The Influence of Hidden Researcher Decisions in Applied Microeconomics
- I am an economics professor at Seattle University, with research that focuses on higher education, econometrics, and metascience.
Friday, 28 January 2022, noon - 1pm
- Ashok Chaurasia, University of Waterloo
- Multiple Imputation: Old and New Combining Rules for Statistical Inference
- Dr. Ashok Chaurasia is an Assistant Professor (of Statistics) in the School of Public Health Sciences at University of Waterloo. His background/training is in Statistics, with research interests in topics of Missing Data, Data Imputation, Model Selection, and Longitudinal Data Analysis Methodology.

2021

Friday, 10 December 2021, noon - 1pm
- Nathan Taback, Departments of Statistical Sciences
- “Teaching data science”
- Nathan Taback is the director of Data Science programs and an Associate Professor, Teaching Stream in the Department of Statistical Sciences, and Computer Science (cross-appointed) at the University of Toronto. He currently serves as a Special Advisor to the Dean of Arts and Science on Computational and Data Science Education.
Friday, 3 December 2021, noon - 1pm
- Leanne Trimble, UofT Libraries
- “Data science and libraries”
- Leanne Trimble is a data librarian at the Map & Data Library, University of Toronto Libraries.
Friday, 26 November 2021, noon - 1pm
- Kieran Campbell, Lunenfeld Tanenbaum Research Institute
- “Data science and Biomedicine”
- Dr Kieran Campbell is an investigator at the Lunenfeld-Tanenbaum Research Institute and an assistant professor at the Departments of Molecular Genetics and Statistical Sciences, University of Toronto. His research focusses on Bayesian models and machine learning for high dimensional biomedical data, including single-cell and cancer genomics. Recently, he has led efforts to develop statistical machine learning methodology to integrate single-cell RNA and DNA sequencing data to uncover the effects of tumour clonal identity on gene expression, as well as methods to automatically delineate the tumour microenvironment from single-cell RNA-sequencing data. Such findings can improve our understanding of cancer progression and of why certain tumours are resistant to therapies, leading to relapse.
Friday, 19 November 2021, noon - 1pm
- Radu Craiu, Statistical Sciences, University of Toronto
- “Data science and statistical sciences”
- Dr Radu V. Craiu is Professor and Chair of Statistical Sciences at the University of Toronto. His main research interests are in computational methods in statistics, especially, Markov chain Monte Carlo algorithms (MCMC), Bayesian inference, copula models, model selection procedures and statistical genetics.
Friday, 12 November 2021, noon - 1pm
- Ann Glusker, Doe Library, University of California Berkeley
- “Supporting Big Data Research at the University of California, Berkeley”
- Dr Ann Glusker is Librarian for Sociology, Demography, Public Policy, Psychology (fall 2021) & Quantitative Research at the Doe Library, University of California, Berkeley. She will discuss a recently released report ‘Supporting Big Data Research at the University of California, Berkeley’. This report provides insights on researcher practices and challenges in six thematic areas: data collection & processing; analysis: methods, tools, infrastructure; research outputs; collaboration; training; and balancing domain vs data science expertise.
Friday, 5 November 2021, noon - 1pm
- Yun William Yu, Math Department, UofT; UTSC Computer & Mathematical Sciences
- “Data science and math”
- Yun William Yu is an assistant professor in the math department at UofT whose research focuses on algorithmic methods for computational biology and medical informatics.
Friday, 29 October 2021, noon - 1pm
- Josh Speagle, Astronomy & Astrophysics, Dunlap Institute, Statistical Sciences
- “Data science and astronomy”
- Josh is a Banting & Dunlap Postdoctoral Fellow at the University of Toronto whose research focuses on using astrostatistics and “data science” to understand how galaxies like our own Milky Way form, behave, and evolve.
Friday, 22 October 2021, noon - 1pm
- Tegan Maharaj, Faculty of Information
- “Data science and information”
- I study AI systems and “what goes into” them, e.g. their real-world deployment context, and the effects that has on learning behaviour and generalization. I do that because I want to be able to use AI systems responsibly for problems I think are important, like impact and risk assessments for climate change, AI alignment, ecological management and other common-good problems. My website is: http://www.teganmaharaj.com/.
Friday, 15 October 2021, noon - 1pm
- Drew Stommes, Department of Political Science, Yale University
- “On the reliability of published findings using the regression discontinuity design in political science”
- Drew Stommes is a doctoral candidate in the Department of Political Science at Yale University, where he researches democracy, political violence, and quantitative methods. He will talk about a recent working paper.
Friday, 8 October 2021, noon - 1pm
- Fedor Dokshin, Department of Sociology
- “Data science and sociology”
- Fedor Dokshin is an Assistant Professor of Sociology at the University of Toronto. He is a computational social scientist with research interests in social networks, organizations, and energy and the environment. Across these domains, Fedor leverages data science methods and novel data sources to improve existing measurement strategies.
Friday, 1 October 2021, noon - 1pm
- “The 2021 Canadian Election”
- David Andrews
- Daniel Rubenson
- Johnson Vo
- Eric Zhu
- Brian Diep
- Ashely (Jing Yuan) Zhang
- Kristin (Xi Yu) Huang
- Tanvir Hyder
Friday, 24 September 2021, noon - 1pm
- Karen Chapple, Department of Geography and Planning/School of Cities
- “Data science and geography, planning, cities”
- Karen Chapple is the inaugural Director of the School of Cities and Professor of Geography and Planning at the University of Toronto. Her research uses data science methods to identify and predict gentrification and displacement in cities. She is Professor Emerita at the University of California, Berkeley, where she helped to launch the undergraduate data science program.
Friday, 20 August 2021, noon - 1pm
- Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan, University of Toronto
- “Using statistical model to analyzing shark, lizard, and basketball movement data”
- Jessica Long, Simone Collier, Vinky Wang, Sophie Berkowitz, and Yun-Hsiang Chan are undergraduate students at the University of Toronto.
Friday, 13 August 2021, noon - 1pm
- “Independent Summer Statistics Community projects”
- ‘Prospective Analytics’: Ashley Zhang, Eric Zhu, Muhammad Tsany and Sergio Zheng Zhou.
- ‘Statistically Significant’: Aliza Lakho, Chloris Jiang, Janhavi Agarwal, and José Casas on whether young professionals should move to Toronto.
- ‘Point Zero Five’: Pan Chen, Xiaoxuan Han, Yi Qin, and Yini Mao on the livability of Toronto for newcomers.
Friday, 6 August 2021, 12:30pm - 1pm
- Ijeamaka Anyene, Kaiser Permanente Division of Research
- “Taking the next step past standard charts”
- Ijeamaka is a data analyst working in healthcare research. She specializes in using R and SAS for data analysis, epidemiological research, and data visualizations. She is also passionate about computational art, knowledge sharing / dissemination, and how to mix the two.
Friday, 30 July 2021, noon - 1pm
- Keli Chiu, University of Toronto
- “Detecting sexist and racist text contents with explanations accompanied with GPT-3”
- Keli Chiu is a recent graduate of master in Information at the University of Toronto with the concentration in Human-Centred Data Science. Prior to pursuing her master in the information fields, she worked as a web developer and fell in love with data. Her research interests are natural language processing applications, text analysis and ethics in AI and machine learning. She received rstudio::global Diversity Scholarships in the year of 2021.
Friday, 23 July 2021, noon - 1pm
- Annie Collins and Rohan Alexander, University of Toronto
- “Reproducibility of COVID-19 pre-prints”
- Annie Collins is an undergraduate student at the University of Toronto specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focusses her efforts on student governance, promoting women’s representation in STEM, and working with data in the non-profit and charitable sector.
- Rohan Alexander is an assistant professor at the University of Toronto in Information and Statistical Sciences, and a faculty affiliate at the Schwartz Reisman Institute for Technology and Society. He holds a PhD in Economics from the Australian National University.
Friday, 16 July 2021, noon - 1pm
- Kamilah Ebrahim, University of Toronto
- “Trust in contact tracing apps”
- Kamilah Ebrahim received a B.A. in Economics from the University of Waterloo in 2019 and is currently pursuing a Masters of Information in Human Centred Data Science at the University of Toronto. Kamilah is a 2020-21 Graduate Fellow at the University of Toronto Centre for Ethics focusing on the intersection between race, economics and data monopolies in Canada. Prior to joining the University of Toronto she held roles at the United Nation Economic and Social Commission for Asia and the Pacific (UN ESCAP), as well as the Canadian federal government.
Friday, 2 July 2021, noon - 1pm
- Zachary McCaw, Google
- Zachary McCaw is a data scientist at Google.
Friday, 25 June 2021, noon - 1pm
- Laura Derksen, University of Toronto Mississauga
- “The impact of student access to Wikipedia”
- Laura is the Amgen Canada Professor in Health System Strategy at the University of Toronto Mississauga and assistant professor in Strategic Management at the Rotman School of Management. Her research interests are development and global health education, information and networks.
Friday, 18 June 2021, noon - 1pm
- Jacob Matson, Simetric
- “From data ask to dashboard”
- Jacob is VP of Finance & Operations at Simetric, Inc..
Friday, 11 June 2021, noon - 1pm
- Laura Bronner, Data scientist
- “Quantitative editing”
- Laura is a data scientist, who most recently worked as the quantitative editor at FiveThirtyEight. More generally, she is a data scientist with an interest in causal inference, political science and quantitative text analysis. Before FiveThirtyEight, she was a Senior Analyst at the Analyst Institute, designing and analyzing field experiments for the 2018 election cycle. In September 2018, she completed a PhD in Political Science at the London School of Economics’ Department of Government.
Friday, 4 June 2021, noon - 1pm
- Heather Krause, We All Count
- “Equity in Data (or, how not to accidentally use data like a racist, sexist, colonialist, etc)”
- Heather remains unconvinced. As a statistician with decades of global experience working on complex data problems and producing real-world knowledge, she has developed the Data Equity Framework to address the equity issues in data products and research projects. Her emphasis is on combining strong statistical analysis with clear and meaningful communication. She is currently working on implementing tools for equity and ethics in data. As the founder of two successful data science companies, she attacks the largest questions facing societies today, working with both civic and corporate organizations to improve outcomes and lives. Her relentless pursuit of clarity and realism in these projects pushed her beyond pure analysis to mastering the entire data ecosystem including award-winning work in data sourcing, modeling, and data storytelling, each incorporating bleeding edge theory and technologies. Heather is the founder of We All Count, a project for equity in data working with teams across the globe to embed a lens of ethics into their data products from funding to data collection to statistical analysis and algorithmic accountability. Her unique set of tools and contributions have been sought across a range of clients from MasterCard and Volkswagen to the United Nations, the Syrian Refugee Resettlement Secretariat, Airbnb, and the Bill and Melinda Gates Foundation. She is on the Data Advisory Board of the UNHCR.
Friday, 28 May 2021, noon - 1pm
- Samantha Pierre, University of Toronto
- “And The Nominees Are… An Empirical Study of the Effects of a Tony Award Win and Nomination on a Show’s Success”
- Samantha is a fourth-year statistics student studying at the University of Toronto. Throughout the past year she has combined her love for theatre and statistics to analyze trends in the theatre community. She volunteers as a member of PAIR-CG to create a representational framework for the international theatre community. She currently works at WOMBO, an app developed by former U of T students, as head of music content.
Friday, 21 May 2021, noon - 1pm
- David Shor, OpenLabs
- “Political data science”
- David is an American data scientist who tries to elect Democrats. He is known for analyzing political polls and currently serves as head of data science with OpenLabs, a progressive nonprofit, and also as a Senior Fellow with the Center for American Progress Action Fund.
Thursday, 22 April, 4:30-5:30pm
- “Panel on teaching data-focused topics”
- Aimee Schwab-McCoy, Creighton University
- Ashley Juavinett, UC San Diego
- Chris Papalia, St. Andrew’s College
- Samantha-Jo Caetano, University of Toronto
- Aimee Schwab-McCoy is an Assistant Professor of Statistics at Creighton University. Ashley Juavinett is a neuroscientist, an educator, and a writer, currently working as an Assistant Teaching Professor at UC San Diego. Chris Papalia is a Mathematics and Science Teacher and Head of House at St. Andrew’s College. Samantha-Jo Caetano is an Assistant Professor, Teaching Stream, at the University of Toronto.
Thursday, 15 April, 4:30-5:30pm
- Emily A. Sellars, Yale University
- “Missing data and mis-measurement in Mexico’s 1900 census and the Historical Archive of Localities (AHL)”
- Emily A. Sellars is an assistant professor in the Department of Political Science at Yale University. Before coming to Yale, she was an assistant professor in the Bush School of Government and Public Service at Texas A&M University and a postdoctoral scholar at the University of Chicago’s Harris School of Public Policy. She received her Ph. D. in Political Science and Agricultural and Applied Economics from the University of Wisconsin–Madison in 2015. Her research interests are at the intersection of political economy and development economics. Her research examines the political economy of emigration and population.
Thursday, 8 April, 4:30-5:30pm
- “Mining Process Models from Email Data”
- Faria Khandaker, University of Toronto
- Faria is a 2nd year student of the Information Systems and Design Concentration at the Faculty of Information and is one of the co-hosts of the Toronto Data Workshop. She holds an Honour’s Bachelor of Science Degree in Anthropology and Human Biology from the University of Toronto Scarborough. Since starting her masters, she became interested in research related to data-driven decision making within organizations. Under the supervision of Professor Arik Senderovich, she is researching topics related to the application of Machine Learning within the field of Process Mining and exploring various methodologies for gaining insights from email driven business processes.
- Faria will discuss mining process models from email data.
Thursday, 1 April, 4:30-5:30pm
- Vik Pant, Natural Resources Canada
- “Supporting the Integration of Science and Policy through Data Science and Artificial Intelligence”
- Dr. Vik Pant is the Chief Scientist and Chief Science Advisor of Natural Resources Canada (NRCan). He leads the Office of the Chief Scientist at NRCan and reports directly to the Deputy Minister. His office oversees the development and implementation of evidence-based science policy across NRCan sectors and agencies. His office also manages NRCan’s enterprise-wide technology strategy and portfolio of science products. He also runs the Digital Accelerator, which is an innovation platform for designing and launching AI-driven software products in NRCan. Vik is also the Founder of Synthetic Intelligence Forum, which is a leading community of practice focused on the industrial application of Artificial Intelligence (AI). He earned a doctorate from the Faculty of Information (iSchool) in the University of Toronto, a master’s degree in business administration with distinction from the University of London, a master’s degree in information technology from Harvard University, where he received the Dean’s List Academic Achievement Award, and an undergraduate degree in business administration from Villanova University. Vik serves as an Adjunct Professor in the Faculty of Information (iSchool) at the University of Toronto.
Thursday, 25 March, 4:30-5:30pm
- Alex Cookson, Muse
- “The power of great datasets”
- Alex Cookson is a Data Scientist at Muse, where he helps make the most of their data. In his spare time, you can find him participating in Tidy Tuesday or thinking up cool datasets to explore. And when he’s not doing that, he’s probably cycling around Toronto or doting on his two cats, Tom Tom and Ruby.
- Alex will explore the power of great datasets, and discuss the importance of interesting, fun datasets as a way to guide and motivate learning R.
Thursday, 18 March, 4:30-5:30pm
- Sofia Ruiz-Suarez, National University of Comahue
- “Animal tracking data”
- Sofia Ruiz-Suarez holds an undergraduate degree in mathematics from the University of Buenos Aires and now is a PhD candidate at the institute for Research on Biodiversity and Environment. She also teaches mathematics at the University of Comahue and leads R-Ladies at her local city. Her research is focused on Bayesian statistics with applications in animal behaviour and movement.
Monday, 15 March, 4:00-5:00pm
- Todd Feathers, Freelance reporter
- “Major Universities are Using Race as a ”High Impact Predictor” of Student Success”
- Jointly hosted with Maryclare Griffin, University of Massachusetts Amherst.
- Todd Feathers is a freelance journalist covering artificial intelligence, surveillance, and the technologies changing our world. He spent years at daily newspapers reporting on politics, criminal justice, and health care. On every beat, new tech is solving problems and creating them. His goal is to use data, scientific research, and inside sources to cut through the hype and examine what our gadgets and algorithms really do.Writing in Vice, OneZero, The Wall Street Journal, and others.
Thursday, 11 March, 4:30-5:30pm
- Lucas Cherkewski, Canadian Digital Service
- “Using publicly-available data to better understand the government’s operations”
- Lucas Cherkewski is a policy advisor at the Canadian Digital Service (CDS). He helps delivery teams improve government services. From that experience, he advises on structural changes to make better services the default. This work includes plenty of data-enabled research and analysis—Lucas is in a happy place when his work leads him to spend an afternoon poking around a dataset, trying to better understand government so he can help change it.
Thursday, 4 March, 4:30-5:30pm
- Petros Pechlivanoglou, The Hospital for Sick Children (SickKids) Research Institute
- Simulation and retrospective data for health economic decision making
- Petros Pechlivanoglou, PhD, is a Scientist at The Hospital for Sick Children (SickKids) Research Institute and an Assistant Professor at the University of Toronto, Institute of Health Policy Management and Evaluation. He studied economics in his native country, Greece, econometrics at the University of Groningen, the Netherlands and obtained a PhD in health econometrics from the same university. He completed a post-doctoral fellowship at the University of Toronto, within the Toronto Health Economics and Technology Assessment (THETA) Collaborative where he focused on methodological aspects around the application of decision analysis in health-care policy.
Thursday, 18 February, 4:30-5:30pm
- University of Toronto DoSS toolkit launch
- Special guest Bethany White (Department of Statistical Sciences).
- Annie Collins, Haoluan Chen, Isaac Ehrlich, Mariam Walaa, Marija Pejcinovska, Mathew Wankiewicz, Michael Chong, Paul Hodgetts, Rohan Alexander, Samantha-Jo Caetano, Shirley Deng, and Yena Joo, University of Toronto
- The DoSS toolkit is a series of self-paced lessons that students can go through ahead of class, to achieve badges for various levels of accomplishment with R. Instructors can use the badges to work out the level of the class and either direct students to the toolkit to address deficiencies or cover missing aspects themselves.
Monday, 15 February, Noon-1:00pm
- Emily Riederer, Capital One
- Causal design patterns for data analysts
- Emily is a Senior Analytics Manager at Capital One. Emily’s team focuses on reimagining analytical infrastructure by building data products, elevating business analysis with novel data sources and statistical methods, and providing consultation and training to partner teams.
Thursday, 11 February, 4:30-5:30pm
- Garrick Aden-Buie, R Studio
- Using R Markdown in general and in some specific projects
- Garrick is a Data Science Educator at RStudio who lives in sunny St. Petersburg, Florida. His passion is combining creative coding with programming education, using code to build tools that teach coding to new and advanced R users alike. Like tidyexplain: a project that used ggplot2 and gganimate to reimagine database operations as colorful flying boxes instead of the typical Venn diagrams. Garrick has developed a number of open source addins and packages for RStudio—such regexplain, shrtcts and rsthemes—and is always easily distracted by projects that combine R Markdown and online learning or teaching.
Thursday, 4 February, 4:30-5:30pm
- Kathy Ge, Uber
- How data insights and experimentation help drive product design and intelligent recommendations on the Uber Eats platform
- Kathy is a data scientist with Uber Eats primarily focused on the shopping experience including ranking and recommendations throughout the order flow. She received her M.Sc. in Computer Science and B.Sc in Computer Science and Statistics from the University of Toronto.
Thursday, 28 January, 4:30-5:30pm
- Irene Duah-Kessie, University of Toronto
- Exploring algorithmic bias and fairness and its impact on health outcomes faced by racialized communities
- Irene Duah-Kessie is a graduate of the University of Toronto’s Master of Science in Sustainability Management program. Throughout her studies, Irene published her research on racial income inequality in Toronto with the Wellesley Institute and is currently a part of the Turtle Island Journal of Indigenous Health Editorial Team. Irene is a Project Manager at Across Boundaries leading an initiative to address food security and mental health challenges in Toronto’s Black community. She is also the founder of Rise In STEM, a grassroots organization that aims to increase access to STEM learning opportunities in Black and marginalized communities.
Wednesday, 20 January, 4:30-5:30pm
- Zia Babar, University of Toronto
- Derivative data security
- Zia Babar obtained his PhD from the University of Toronto where his research studies focused on the analysis and design of data-centered information systems for enabling enterprise transformation. He is engaged in a multi-year research engagement with IBM Research Labs and is a startup technical mentor at WeWork Labs. He is the organizer of technology meetup groups in both Toronto and Waterloo, and a course instructor at the Faculty of Information, University of Toronto.
Thursday, 14 January, 4:30-5:30pm
- Andrew Miles, University of Toronto
- Code, plots, and values.
- Jointly hosted with Elizabeth Parke and the UTM Collaborative Digital Research Space.
- Andrew Miles is Assistant Professor of Sociology at the University of Toronto and Director of the Morality, Action, and Cognition Lab.

2020

Thursday, 17 December 2020, 4-5pm
- Liza Bolton, University of Toronto
- Maria Tackett, Duke University
- Nathalie Moon, University of Toronto
- Teon Brooks, Mozilla Firefox
- “Panel discussion on teaching data-focused topics”
- Liza Bolton is an Assistant Professor, Teaching Stream, at the University of Toronto. Maria Tackett is an Assistant Professor of the Practice in the Department of Statistical Science at Duke University. Nathalie Moon is an Assistant Professor, Teaching Stream, University of Toronto. Teon L. Brooks, holds a PhD in experimental psychology from NYU, and now works as a data scientist for Mozilla Firefox. He also serves as the technical advisor and President of BrainWaves, an NIH-funded project to teach experimentation and cognitive neuroscience to high school students in NYC, and has co-founded Computation in Education Labs (CIEL), a nonprofit that aims to further the mission of the BrainWaves project while focusing on data science and computational thinking.
Thursday, 10 December 2020, 4-5pm
- Shabrina Mardevi, United Nations Population Fund and University of Toronto
- Romesh Silva, United Nations Population Fund
- “Population data estimation”
- Shabrina is a Masters of Information student at the University of Toronto and a Population Data Estimation and Analysis Intern at the United Nations Population Fund. Romesh holds a PhD in Demography from the University of California, Berkeley, and is a Technical Specialist, Health & Social Inequalities, at the United Nations Population Fund.
Thursday, 3 December 2020, 5-6pm
- Monica Alexander, University of Toronto
- “Using Facebook advertising data to estimate migration”
- Monica Alexander is an Assistant Professor in Statistical Sciences and Sociology at the University of Toronto. She received her PhD in Demography from the University of California, Berkeley. Her research interests include statistical demography, mortality and health inequalities, and computational social science. Monica will talk about using Facebook advertising data to estimate migration.
Thursday, 19 November 2020, 4-5pm
- Michael Chong, University of Toronto
- “High-throughput Bayesian modelling workflow”
- Michael is a PhD student in the Department of Statistical Sciences at the University of Toronto building models for demographic estimation. Previously, he completed his BSc in Integrated Science at McMaster University.
- Michael will discuss lessons from a high-throughput Bayesian modelling workflow.
Thursday, 12 November 2020, 4-5pm
- Kevin Armstrong, University of Toronto
- “Measuring poverty for NGOs”
- Kevin Armstrong is a Masters of Information student at the University of Toronto, and a data consultant for ‘Women’s Integrated Sexual Health’ (WISH) - a three-year program delivering integrated health care in 16 countries in Africa and South Asia.
Monday, 9 November 2020, 4-5pm
- Tom Cardoso, Globe and Mail
- “Bias Behind Bars”
- Tom Cardoso is a crime and justice reporter and data journalist for The Globe and Mail.
- Tom will discuss his Bias Behind Bars series of articles which show Black and Indigenous inmates in Canada are more likely to get worse scores than white inmates, based solely on their race.
Thursday, 5 November 2020, 4-5pm
- Andrew Whitby, Industry data scientist
- “Censuses: The sum of the people”
- Andrew is a data scientist and economist currently looking for his next challenge. He is particularly interested in the economics of technology, creativity, innovation and growth. He wrote The Sum of the People: How the Census Has Shaped Nations from the Ancient World to the Modern Age which was published in March 2020. Previously, he worked as a Data Scientist at the World Bank, and at Nesta, the UK’s innovation think tank. His academic background combines economics, statistics and computer science. He completed his doctoral research in the Department of Economics at the University of Oxford.
Thursday, 29 October 2020, 4-5pm
- Fei Chiang, McMaster University
- “Data currency and applications”
- Fei Chiang is an Associate Professor in the Department of Computing and Software (Faculty of Engineering), the Director of the Data Science Research Group, and a Faculty Fellow at the IBM Centre for Advanced Studies. She served as an inaugural Associate Director of the MacData Institute. Her research interests and industrial experience is in data management, spanning data cleaning, data quality, data privacy, data fusion, and database systems.
Thursday, 22 October 2020, 4-5pm
- Jeff Waldman, University of Toronto
- Leanne Trimble, University of Toronto
- Leslie Barnes, University of Toronto
- Lisa Strug, University of Toronto
- “Panel discussion on data-focused resources at the University of Toronto”
- Jeff Waldman is the Manager, Institutional Data Governance; Leslie Barnes is the Digital Scholarship Librarian at UTL; Leanne Trimble is the Data and Statistics Librarian at UTL; Lisa Strug is a Senior Scientist in the Program of Genetics and Genome Biology, Associate Director of The Centre for Applied Genomics, Professor of Statistical Sciences and Biostatistics at the University of Toronto, and Director of CANSSI Ontario.
Thursday, 8 October 2020, 4-5pm
- Yim Register, University of Washington Data Lab
- “Self-advocacy within machine learning systems”
- Yim Register (they/them) is a radical optimist, child advocate, and PhD student at the University of Washington Data Lab exploring what self-advocacy looks like within machine learning systems. They study how empowering novices with Data Science knowledge can impact their participation and joy in an AI-driven world! Their passion project right now is writing a book called Life Lessons from Algorithms, a book thatteaches how machine learning algorithms work through trauma recovery skills.
Thursday, 1 October 2020, 4-5pm
- Florence Vallée-Dubois, Université de Montréal
- “Canadian demographics by riding (1991-2015)”
- Florence Vallée-Dubois is a Ph.D. candidate at the department of political science of the University of Montreal. She is also a member of the Centre for the Study of Democratic Citizenship and Canada Research Chair in Electoral Democracy. Her research interests focus on Quebec and Canadian politics, political behaviour and quantitative methods. Her doctoral project focuses on the political behaviour and democratic representation of seniors in Canada.
Thursday, 24 September 2020, 4-5pm
- Chelsea Parlett-Pelleriti, Chapman University
- “Talking to non-statisticians about statistics”
- Chelsea is a PhD candidate and full-time instructional faculty at Chapman University where her research focuses on using novel statistical and Machine Learning methods (mostly Bayesian statistics, IRT models, and clustering) to behavioral data. As an instructor she teaches Python, R and Data Science, and loves using novel technology (like TikTok, Twitch, and flipped classes) to better engage and inspire students.
Thursday, 17 September 2020, 4-5pm
- Amber Simpson, Queen’s University
- “Cancer and AI”
- Dr. Simpson is the Canada Research Chair in Biomedical Computing and Informatics and Associate Professor in the School of Computing (Faculty of Arts and Science) and Department of Biomedical and Molecular Sciences (Faculty of Health Sciences). She specializes in biomedical data science and computer-aided surgery. Her research group is focused on developing novel computational strategies for improving human health.
Thursday, 10 September 2020, 4-5pm
- A Mahfouz, University of Toronto
- Diego Mamanche Castellanos, University of Toronto
- Hidaya Ismail, University of Toronto
- Ke-Li Chiu, University of Toronto
- Paul Hodgetts, University of Toronto
- “arxivdl, aRianna, and cesR”
Tuesday, 8 September 2020, 3:30-4:30pm
- Sophie Bennett, Industry data scientist
- UK A levels algorithm issues
- (Jointly hosted with Gillian Hadfield and the Schwartz Reisman Institute for Technology and Society.)
- Sophie Bennett holds an undergraduate degree in Experimental Psychology from the University of Oxford and a PhD in Neuroscience from King’s College. She is the lead data scientist at Up Learn, a London-based online learning platform specialising in A levels. In this role, she conducts evaluations of course effectiveness and uses data to improve instruction and curriculum design. She is passionate about increasing the use of responsible evidence and statistics to guide social policy, and, in her spare time, enjoys working with publicly available datasets to explore London demographics, social issues and infrastructure.
Thursday, 3 September 2020, 4-5pm
- Erik Drysdale, The Hospital for Sick Children
- “Using hospital data”
- Erik works as a Machine Learning Specialist at the Hospital for Sick Children (SickKids) for the Goldenberg Lab and AI in Medicine (AIM) initiative. His professional responsibilities include the development and training of the machine learning models for various pediatric data science projects. His research interests are focused on the intersection of statistics and machine learning methods such as high-dimensional inference, survival analysis, and optimization methods.
Thursday, 20 August 2020, 4-5pm
- Aije Egwaikhide, IBM
- “Preparing data for optical character recognition (OCR)”
- Aije Egwaikhide holds an undergraduate degree in Economics and Statistics from the University of Manitoba, and a post-graduate degree in Business Analytics from St. Lawrence College, Kingston. She works at IBM where she is a Lead Data Scientist on the System Enablement group.
Thursday, 13 August 2020, 4-5pm
- Richard Iannone, R Studio
- “pointblank”
- Rich is a Software Engineer at R Studio.
- Rich will talk about pointblank, which is an R package that allows workflows involving nice and easy data validation in reproducible documents.
Thursday, 6 August 2020, 4-5pm
- Sharla Gelfand, Freelance R Developer
- “Creativity in R”
- Sharla is a freelance R developer specializing in enabling easy access to data and replacing manual, redundant processes with ones that are automated, reproducible, and repeatable.
Thursday, 30 July 2020, 4-5pm
- Alex Luscombe, University of Toronto, Criminology and Sociolegal Studies
- Alexander McClelland, Carleton University, Criminology and Criminal Justice
- “Policing the Pandemic”
- Alex Luscombe is a PhD student in the Centre for Criminology & Sociolegal Studies at the University of Toronto and a Junior Fellow at Massey College. Alexander McClelland is an Assistant Professor at the Institute of Criminology and Criminal Justice, Carleton University.
- Policing the Pandemic is a project that was launched on 4 April, 2020, to track and visualize the massive and extraordinary expansions of police power in response to the COVID-19 Pandemic and the unequal patterns of enforcement that may arise as a result.
Thursday, 23 July 2020, 11am-noon
- Marta Kołczyńska, Institute of Political Studies of the Polish Academy of Sciences
- “Cleaning survey data to measure political trust”
- Marta is an Assistant Professor at the Institute of Political Studies of the Polish Academy of Sciences and a visiting researcher in the Probabilistic Machine Learning Group, Department of Computer Science, Aalto University. Her research interests include comparative analyses of political attitudes and behavior across nations and over time, as well as the methodology of comparative research, in particular cross-national surveys.
- Marta will talk about cleaning survey data, in particular a project in which she gathers political trust items from different cross-national survey datasets to model time trends, and the tools she has developed to facilitate this work.
Thursday, 16 July 2020, 4-5pm
- Casey Breen, University of California, Berkeley, Demography
- “CenSoc: A project to link US 1940 Census data with Social Security Administration mortality records”
- Casey is a PhD student in the Demography Department at Berkeley. He previously worked at the Institute for Social Research and Data Innovation, home of IPUMS.
Thursday, 9 July 2020, 4-5pm
- Roxanne Chui, University of Toronto, Faculty of Information
- “What do we have here among millions of observations? EDA for Tokyo AirBnB data and pattern discovery in listing prices using R”
- Roxanne is an emerging anthropological data science professional. She did her BSc program in Forensic anthropology and worked in the pharmaceutical industry before doing her Masters in data science. She is passionate about excavating context from data for predicting future patterns of human behaviour.
Thursday, 2 July 2020, 4-5pm
- Heather McBrien, University of Toronto, Department of Statistical Sciences
- “How the data that we collect can bias the results that we obtain and our knowledge of the problem”
- Heather just graduated from the Statistics BSc program at the University of Toronto, and is interested in modelling in population health research, particularly using novel data sources to answer questions where traditional data is lacking.
Thursday, 25 June 2020, 4-5pm
- A Mahfouz, University of Toronto, Information
- “Geographic data cleaning, extracting mappable data from Google Directions API results in Python”
- A is a Master of Information student at the University of Toronto with a background in geography. Their prior work has been largely concerned with data pipelines.
Thursday, 11 June 2020, 4-5pm
- Harrison Jones, Deloitte
- “Using R with actuarial data”
- Harrison is a Manager at Deloitte in Toronto, where he focuses on data analytics and machine learning in the property & casualty insurance, life insurance, health insurance, pensions, and the public sector.
Thursday, 4 June 2020, 4-5pm
- Marija Pejcinovska, University of Toronto, Department of Statistical Sciences
- “Estimating global maternal mortality”
- Marija is a second-year Ph.D. student in Statistics at the University of Toronto. Her research interests are in applied statistics, specifically the application of Bayesian methods to data and modeling challenges that arise in demography, public health, and certain areas of the social sciences.
- Marija will talk about a current project with the World Health Organization (WHO) focused on estimating global maternal mortality to share her R workflow and the different tools and packages she’s found helpful in the data processing stage. More specifically, she’ll be sharing a few ways of dealing with text and date data in R.
Thursday, 28 May 2020, 4-5pm
- Shiro Kuriwaki, Harvard University, Government
- “Project-oriented workflow”
- Shiro is a Ph.D. Candidate at the Department of Government, Harvard University. His research focuses on democratic representation in American Politics, for instance cast vote records, public opinion, survey methods, and applied statistics more generally. Shiro will bring together best practices for organizing data and code in the social sciences that experts have proposed with some of his own experience. He will propose a project-oriented workflow that adopts a minimal and consistent file organization structure within a single project, using RStudio Projects and GitHub. He will then discuss how to organize multiple projects that share common components, and propose the use of custom R packages to share code and Dataverse to share large datasets. He will use some of his own projects involving the Cooperative Congressional Election Study (CCES), one of the largest political surveys of American Politics, as a demonstration.
Thursday, 21 May 2020, 4-5pm
- Rohan Alexander, University of Toronto, Faculty of Information
- “OCR with applications to the Kenyan census”
- Rohan Alexander is a post-doctoral fellow at the Faculty of Information, University of Toronto. He holds a PhD in Economics from the Australian National University.
Friday, 6 March 2020, noon
- Fatemeh Nargesian, Computer Science, University of Rochester
Friday, 28 February 2020, noon
- Eugene Joh, St. Michael’s Hospital
Friday, 14 February 2020, noon
- Josh Harris, KOHO
Friday, 7 February 2020, noon
- Kathy Chung, Records of Early English Drama, University of Toronto
Friday, 31 January 2020, noon
- Arik Senderovich, Information, University of Toronto
Friday, 24 January 2020, noon
- Steven Pimentel, Business intelligence, University of Toronto

2019

Thursday, 21 November 2019, noon
- Michelle Alexopoulos, University of Toronto, Economics
- Paraskevi Massara, University of Toronto, Medicine
Thursday, 7 November 2019, noon
- Maria D’Angelo, Delphia
- Hareem Naveed, Munich Re
Thursday, 24 October 2019, noon
- Sharla Gelfand, Freelance R and Shiny developer
Wednesday, 16 October 2019, noon
- Lauren Kennedy, Columbia University
Thursday, 10 October 2019, noon
- Hassan Teimoori, Deloitte, Omnia AI
- Ludovic Rheault, University of Toronto, Political Science
Thursday, 26 September 2019, noon
- Periklis Andritsos, ODAIA & University of Toronto, Information