Supporters
We gratefully acknowledge the support of the Faculty of Information and the Department of Statistical Sciences at the University of Toronto, and CANSSI Ontario; in particular Dean Wendy Duff, Chair Radu Craiu, and Professor Lisa Strug for their support.
Overview
The Faculty of Information and the Department of Statistical Sciences at the University of Toronto are excited to host a two-day conference bringing together academic and industry participants on the critical issue of reproducibility in applied statistics and related areas. The conference is free and will be hosted online on Thursday and Friday 25-26 February 2021. Everyone is welcome, you don’t need to be affiliated with a university, and you can register here.
The conference has three broad areas of focus:
- Evaluating reproducibility: Systematically looking at the extent of reproducibility of a paper or even in a whole field is important to understand where weaknesses exist. Does, say, economics fall flat while demography shines? How should we approach these reproductions? What aspects contribute to the extent of reproducibility.
- Practices of reproducibility: We need new tools and approaches that encourage us to think more deeply about reproducibility and integrate it into everyday practice.
- Teaching reproducibility: While it is probably too late for most of us, how can we ensure that today’s students don’t repeat our mistakes? What are some case studies that show promise? How can we ensure this doesn’t happen again?
We intend to record the presentations and will add links here after the conference. Again, the conference is free and online via Zoom, everyone is welcome - you don’t need to be affiliated with a university. If you would like to attend, then please sign up here.
Draft schedule
All times are Toronto / US east coast. 9am in Toronto 🇨🇦 is: 3pm in Berlin 🥨; 2pm in London 💂; 11am in Santiago 🇨🇱; 6am in Vancouver 🎿; and 1am in Melbourne 🦘.
Thursday, 25 February, 2021
9:00-9:10am |
Rohan Alexander, University of Toronto |
Welcome |
9:10-9:20am |
Radu Craiu, University of Toronto |
Opening remarks |
9:20-9:30am |
Wendy Duff, University of Toronto |
Opening remarks |
9:30-10:25am |
Mine Çetinkaya-Rundel, University of Edinburgh |
Keynote - Teaching |
10:30-10:55am |
Tyler Girard, University of Western Ontario |
Teaching |
11:00-11:25am |
Shiro Kuriwaki, Harvard University |
Teaching |
11:30-11:55am |
Tiffany Timbers, University of British Columbia |
Teaching |
Noon-12:55pm |
Riana Minocher, Max Planck Institute for Evolutionary Anthropology |
Keynote - Evaluating |
1:00-1:25pm |
Break |
|
1:30-1:55pm |
Tom Barton, Royal Holloway, University of London |
Evaluating |
2:00-2:25pm |
Wijdan Tariq, University of Toronto |
Evaluating |
2:30-2:55pm |
Mauricio Vargas, Catholic University of Chile & Nicolas Didier Arizona State University |
Evaluating |
3:00-3:25pm |
Jake Bowers, University of Illinois & The Policy Lab |
Practices |
3:30-3:55pm |
Danielle Smalls-Perkins, Google |
|
4:00-4:25pm |
Garret Christensen, US FDIC |
Evaluating |
4:30-4:55pm |
Yanbo Tang, University of Toronto |
Practices |
5:00-5:25pm |
Lauren Kennedy, Monash University |
|
5:30-6:00pm |
Fiona Fidler, University of Melbourne |
Evaluating |
6:00-6:30pm |
Lisa Strug, University of Toronto & CANSSI Ontario |
Closing remarks |
Friday, 26 February, 2021
8:00-8:30am |
Nick Radcliffe, Global Open Finance Centre of Excellence & University of Edinburgh |
Practices |
8:30-9:00am |
Julia Schulte-Cloos, LMU Munich |
Practices |
9:00-9:25am |
Simeon Carstens, Tweag/IO |
Practices |
9:30-9:55am |
Heidi Seibold, Helmholtz AI Cooperation Unit |
Practices |
10:00-10:55am |
Eva Vivalt, University of Toronto |
Keynote - Practices |
11:00-11:25am |
Andrés Cruz, Pontificia Universidad Católica de Chile |
Practices |
11:30-11:55am |
Emily Riederer, Capital One |
Practices |
Noon-12:25pm |
Florencia D’Andrea, National Institute of Agricultural Technology |
|
12:30-12:55pm |
John Blischak, Freelance scientific software developer |
Practices |
1:00-1:25pm |
Tania Allard, Quansight labs |
|
1:30-1:55pm |
Shemra Rizzo, Genentech |
Practices |
2:00-2:25pm |
Amber Simpson, Queens University |
Practices |
2:30-2:55pm |
John McLevey, University of Waterloo |
|
3:00-3:25pm |
Sharla Gelfand, Freelance R Developer |
Practices |
3:30-3:55pm |
Ryan Briggs, University of Guelph |
Practices |
4:00-4:25pm |
Monica Alexander, University of Toronto |
Practices |
4:30-4:55pm |
Annie Collins, University of Toronto |
Practices |
5:00-5:25pm |
Nancy Reid, University of Toronto |
|
5:30-6:00pm |
Rohan Alexander, University of Toronto |
Closing remarks |
Presenter biographies and abstracts
Keynotes:
- Eva Vivalt is an Assistant Professor in the Department of Economics at the University of Toronto. Her main research interests are in cash transfers, reducing barriers to evidence-based decision-making, and global priorities research.
Abstract: TBA
- Mine Çetinkaya-Rundel is a Senior Lecturer in Statistics and Data Science in the School of Maths at University of Edinburgh, and currently on leave as Associate Professor of the Practice in the Department of Statistical Science at Duke University as well as a Professional Educator and Data Scientist at RStudio. She is the author of three open source statistics textbooks and is an instructor for Coursera. She is the chair-elect of the Statistical Education Section of the American Statistical Association. Her work focuses on innovation in statistics pedagogy, with an emphasis on student-centered learning, computation, reproducible research, and open-source education.
Abstract: TBA
- Riana Minocher is a doctoral student at the Max Planck Institute for Evolutionary Anthropology in Leipzig. She is an evolutionary biologist with broad interests. She has worked on a range of projects on human and non-human primate behaviour and ecology. She is particularly interested in the evolutionary processes that create and shape diversity between and within groups. Through her PhD research, she is keen on exploring the dynamics of cultural transmission and learning in human populations, to better understand the diverse patterns of behaviour we observe.
Abstract: TBA
Invited presentations:
- Amber Simpson is the Canada Research Chair in Biomedical Computing and Informatics and Associate Professor in the School of Computing (Faculty of Arts and Science) and Department of Biomedical and Molecular Sciences (Faculty of Health Sciences). She specializes in biomedical data science and computer-aided surgery. Her research group is focused on developing novel computational strategies for improving human health. She joined the Queen’s University faculty in 2019, after four years as faculty at Memorial Sloan Kettering Cancer Center in New York and three years as a Research Assistant Professor in Biomedical Engineering at Vanderbilt University in Nashville. She is an American Association of Cancer Research award winner and the holder of multiple National Institutes of Health grants. She received her PhD in Computer Science from Queen’s University.
Abstract: The development of predictive and prognostic biomarkers is a major area of investigation in cancer research. Our lab specializes in the development of quantitative imaging markers for personalized treatment of cancer. Progress in developing these novel markers is limited by a lack of optimization, standardization, and validation, all critical barriers to clinical use. This talk will describe our work in the repeatability and reproducibility of imaging biomarkers.
- Andrés Cruz is an adjunct instructor at Pontificia Universidad Católica de Chile, where he teaches computational social science. He holds a BA and MA in Political Science, and is the co-editor of “R for Political Data Science: A Practical Guide” (CRC Press, 2020), an R manual for social science students and practitioners.
Abstract: inexact
is an RStudio addin to supervise fuzzy joins. Merging data sets is a simple procedure in most statistical software packages. However, applied researchers frequently face problems when dealing with data in which ID variables are not properly standardized. For instance, politicians’ names can be spelled differently in multiple sources (press reports, official documents, etc.), causing regular merging methods to fail. The most common approach to fix this issue when working with small and medium data sets is manually fixing the problematic values before merging. However, this solution is time-consuming and not reproducible. An RStudio addin called “inexact” was created to help with this. The package draws from approximate string matching algorithms, which quantify the distance between two given strings. When merging data sets with non-standardized ID variables, inexact
users benefit from automatic match suggestions, while also being able to override the automatic choices when needed, using a user-friendly graphical user interface (GUI). The output is simply code to perform the corrected merging procedure, which registers the employed algorithm and any corrections made by the user, ensuring reproducibility. A development version of inexact
is available on GitHub.
- Annie Collins is an undergraduate student in the Department of Mathematics specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focusses her efforts on student governance, promoting women’s representation in STEM, and working with data in the non-profit and charitable sector.
Abstract: TBA
- Danielle Smalls-Perkins works at Google in Natural Language Processing. She is a co-founder of MiR, which exists to support and increase the underrepresented minority users in the R community.
Abstract: TBA
- Emily Riederer is a Senior Analytics Manager at Capital One. Her team focuses on reimagining our analytical infrastructure by building data products, elevating business analysis with novel data sources and statistical methods, and providing consultation and training to our partner teams.
Abstract: Complex software systems make performance guarantees through documentation and unit tests, and they communicate these to users with conscientious interface design. However, published data tables exist in a gray area; they are static enough not to be considered a ‘service’ or ‘software’, yet too raw to earn attentive user interface design. This ambiguity creates a disconnect between data producers and consumers and poses a risk for analytical correctness and reproducibility. In this talk, I will explain how controlled vocabularies can be used to form contracts between data producers and data consumers. Explicitly embedding meaning in each component of variable names is a low-tech and low-friction approach which builds a shared understanding of how each field in the dataset is intended to work. Doing so can offload the burden of data producers by facilitating automated data validation and metadata management. At the same time, data consumers benefit by a reduction in the cognitive load to remember names, a deeper understanding of variable encoding, and opportunities to more efficiently analyze the resulting dataset. After discussing the theory of controlled vocabulary column-naming and related workflows, I will illustrate these ideas with a demonstration of the convo
R package, which aids in the creation, upkeep, and application of controlled vocabularies. This talk is based on my related blog post and R package.
- Fiona Fidler is a professor at the University of Melbourne, with a joint appointment in the Schools of BioSciences and History and Philosophy of Science. She is broadly interested in how experts, including scientists, make decisions and change their minds. Her past research has examined how methodological change occurs in different disciplines, including psychology, medicine and ecology, and developed methods for eliciting reliable expert judgements to improve decision making. She originally trained as a psychologist, and maintains a strong interest in psychological methods. She also has an abiding interest is statistical controversies, for example, the ongoing debate over Null Hypothesis Significance Testing. She is a current Australian Research Council Future Fellow, and leads the University of Melbourne’s Interdisciplinary MetaResearch Group (IMeRG), and the lead PI of the repliCATS project.
Abstract: The repliCATS (Collaborative Assessments for Trustworthy Science) project is a meta-research project about scientific research practices and replication rates in science. We can’t afford to test and replicate every piece of research before it’s published and the aim of this project is to crowdsource group predictions of the replicability of 3,000 published research claims in the social and behavioural sciences. We use a structured group discussion method, the Investigate, Discuss, Estimage, Aggregate (IDEA) protocol, developed at the University of Melbourne, to develop accurate prediction methods of replicability that could be especially useful for end-users of research including policymakers.
- Florencia D’Andrea is a post-doc at the Argentine National Institute of Agricultural Technology where she develops computer tools to assess the risk of pesticide applications for aquatic ecosystems. She holds a PhD in Biological Sciences from the University of Buenos Aires, Argentina.
Abstract: TBA
- Garret Christensen received his economics PhD from UC Berkeley in 2011. He is an economist with the FDIC. Before that he worked for the Census Bureau, and he was a project scientist with the Berkeley Initiative for Transparency in the Social Sciences and a Data Science Fellow with the Berkeley Institute for Data Science.
Abstract: Adoption of Open Science Practices is Increasing: Survey Evidence on Attitudes, Norms and Behavior in the Social Sciences. Has there been meaningful movement toward open science practices within the social sciences in recent years? Discussions about changes in practices such as posting data and pre-registering analyses have been marked by controversy—including controversy over the extent to which change has taken place. This study, based on the State of Social Science (3S) Survey, provides the first comprehensive assessment of awareness of, attitudes towards, perceived norms regarding, and adoption of open science practices within a broadly representative sample of scholars from four major social science disciplines: economics, political science, psychology, and sociology. We observe a steep increase in adoption: as of 2017, over 80% of scholars had used at least one such practice, rising from one quarter a decade earlier. Attitudes toward research transparency are on average similar between older and younger scholars, but the pace of change differs by field and methodology. According with theories of normal science and scientific change, the timing of increases in adoption coincides with technological innovations and institutional policies. Patterns are consistent with most scholars underestimating the trend toward open science in their discipline.
- Heidi Seibold is a Team leader at Helmholtz AI. Her group combines open science, AI and health research. It focuses on reproducible and open research, as well as on machine learning methods for personalized medicine that improve patient treatment.
Abstract: In this talk I will discuss some practical steps for making your work reproducible. I will also share my thoughts on what role data scientists and statisticians can play in the reproducibility crisis.
- Jake Bowers is a Senior Scientist at The Policy Lab and a member of the Lab’s data science practice. Jake is Associate Professor of Political Science and Statistics at the University of Illinois Urbana-Champaign. He has served as a Fellow in the Office of Evaluation Sciences in the General Services Administration of the US Federal Government and is Methods Director for the Evidence in Governance and Politics network. Jake holds a PhD in Political Science from the University of California, Berkeley, and a BA in Ethics, Politics and Economics from Yale University.
Abstract: For evidence-based public policy to grow in impact and importance, practices to enhance scientific credibility should be brought into governmental contexts and also should be modified for those contexts. For example, few analyses of governmental data allow data sharing (in contrast with most scientific studies); and many analyses of governmental administrative data inform high stakes immediate decisions (in contrast with the slow accumulation of scientific knowledge). We make several proposals to adjust scientific norms of reproducibility and pre-registration to the policy context.
- John Blischak is a freelance scientific software developer for the life sciences industry. He is the primary author of the R package workflowr and the co-maintainer of the CRAN Task View on Reproducible Research. He received his PhD in Genetics from the University of Chicago.
Abstract: The workflowr
R package helps organize computational research in a way that promotes effective project management, reproducibility, collaboration, and sharing of results. workflowr
combines literate programming (knitr and rmarkdown) and version control (Git, via git2r) to generate a website containing time-stamped, versioned, and documented results. Any R user can quickly and easily adopt workflowr
, which includes four key features: (1) workflowr
automatically creates a directory structure for organizing data, code, and results; (2) workflowr
uses the version control system Git to track different versions of the code and results without the user needing to understand Git syntax; (3) to support reproducibility, workflowr
automatically includes code version information in webpages displaying results and; (4) workflowr
facilitates online Web hosting (e.g. GitHub Pages) to share results. Our goal is that workflowr
will make it easier for researchers to organize and communicate reproducible results. Documentation and source code are available.
- John McLevey is an Associate Professor in the Department of Knowledge Integration at University of Waterloo, and cross-appointed to Sociology & Legal Studies, the School of Environment, Resources, & Sustainability (SERS), and Geography & Environmental Management (GEM). He primarily works in the areas of social network analysis and computational social science, with substantive interests in the sociology of science, environmental sociology, political sociology, and cognitive sociology.
Abstract: TBA
- Julia Schulte-Cloos is a Marie Skłodowska-Curie funded research fellow at LMU Munich. She has earned her PhD in Political Science from the European University Institute. Julia is passionate about developing tools and templates for generating reproducible workflows and creating reproducible research outputs with R Markdown.
Abstract: We present a template package in R that allows users without any prior knowledge of R Markdown to implement reproducible research practices in their scientific workflows. We provide a single Rmd-file that is fully optimized for two different output formats, HTML and PDF. While in the stage of explorative analysis and when focusing on content only, researchers may rely on the ‘draft mode’ of our template that knits to HTML When in the stage of research dissemination and when focusing on the presentation of results, in contrast, researchers may rely on the ‘manuscript mode’ that knits to PDF. Our template outlines the basics for successfully writing a reproducible paper in R Markdown by showing how to include citations, figures, and cross-references. It also provides examples for the use of ggplot2
to include plots, both in static and animated outputs, and it shows how to present the most commonly used tables in scientific research (descriptive statistics and regression tables). Finally, in our template, we discuss some more advanced features of literate programming and helpful tweaks in R Markdown.
- Lauren Kennedy is a lecturer at Monash University (Melbourne, Australia), where she works creating survey weights and thinking about non-representative data, multilevel modelling, poststratification, causal inference, Bayesian inference and all manner of other related fields to making inference from the social sciences.
Abstract: TBA
- Mauricio Vargas Sepúlveda loves working with data and statistical programming, and is constantly learning new skills and tooling in his spare time. He mostly works in R due to its huge number of libraries and emphasis on reproducible analysis.
Abstract: Evidence-based policymaking has turned into a high priority for governments across the world. The possibility of gaining efficiencies in the public expenditure and linking the policy design to the desired outcomes have been presented as significant advantages for the field of comparative policy. However, the same movement that supports the use of evidence in public policy decision making has brought a great concern about the sources of the supposed evidence. How should policymakers evaluate the evidence? The possibilities are open and depend on the institutional arrangements that support governmental operation and the possibility of properly judging the nature of the evidence. The movement of science reproducibility could enlighten the discussion about the quality of the evidence by providing a structured approach towards the source’s validity based on the possibility of reproducing the logic and analysis proper of scientific communication. This paper attempts to analyze the nature and quality of civil society organizations’ contributions to develop evidence for policymaking process from reproducibility perspective.
- Monica Alexander is an Assistant Professor in Statistical Sciences and Sociology at the University of Toronto. She received her PhD in Demography from the University of California, Berkeley. Her research interests include statistical demography, mortality and health inequalities, and computational social science.
Abstract: TBA
- Nancy Reid is Professor of Statistical Sciences at the University of Toronto and Canada Research Chair in Statistical Theory and Applications. Her main area of research is theoretical statistics. This treats the foundations and properties of methods of statistical inference. She is interested in how best to use information in the data to construct inferential statements about quantities of interest. A very simple example of this is the widely quoted ‘margin of error’ in the reporting of polls, another is the ubiquitous ‘p-value’ reported in medical and health studies. Much of her research considers how to ensure that these inferential statements are both accurate and effective at summarizing complex sets of data.
Abstract: TBA
- Nick Radcliffe is the founder of the data science consulting and software firm, Stochastic Solutions Limited, the Interim Chief Scientist at the Global Open Finance Centre of Excellence, and a Visiting Professor in Maths and Stats at University of Edinburgh, Scotland. His background combines theoretical physics, operations research, machine learning and stochastic optimization. Nick’s current research interests include a focus on test-driven data analysis, (an approach to improving correctness of analytical results that combines ideas from reproducible research and test-driven development) and privacy-respecting analysis. He is the lead author of the open-source Python tdda package, which provides practical tools for testing analytical software and data, and also of the Miró data analysis suite.
Abstract: The Global Open Finance Centre of Excellence is currently engaged in analysis of the financial impact of COVID-19 on the citizens and businesses of the UK. This research uses non-consented but de-identified financial data on individuals and businesses, on the basis of legitimate interest. All analysis is carried out in a highly locked-down analytical environment known as a Safe Haven. This talk will explain our approach to the challenges of ensuring the correctness and robustness of results in an environment where neither code nor input data can be opened up for review and even outputs need to be subject to disclosure control to reduce further any risks to privacy. Topics will include: testing input data for conformance and lack of personal identifiers using constraints; multiple implementations and verification of equivalence of results; regression tests and reference tests; verification of output artefacts; verification of output disclosure controls; data provenance and audit trails; test-driven data analysis—the underlying philosophy (and library) that we use to underpin this work.
- Nicolas Didier is studying for a PhD in Public Administration and Policy at the Arizona State University. During his PhD studies and previous studies, he has worked extensively on developing evidence that addresses policy in labour markets and public expenditure.
Abstract: Please see Mauricio Vargas Sepúlveda.
- Ryan Briggs is a social scientist who studies the political economy of poverty alleviation. Most of his research focuses on the spatial targeting of foreign aid. He is an Assistant Professor in the Guelph Institute of Development Studies and Department of Political Science at the University of Guelph. Before that, he taught at Virginia Tech and American University.
Abstract: TBA
- Sharla Gelfand is a freelance R and Shiny developer specializing in enabling easy access to data and replacing manual, redundant processes with ones that are automated, reproducible, and repeatable. They also co-organize R-Ladies Toronto and the Greater Toronto Area R User Group. They like R (of course), dogs, learning Spanish, playing bass, and punk.
Abstract: Getting stuck, looking around for a solution, and eventually asking for help is an inevitable and constant aspect of being a programmer. If you’ve ever looked up a question only to find some brave soul getting torn apart on Stack Overflow for not providing a minimum working example, you know it’s also one of the most intimidating parts! A minimum working example, or a reproducible example as it’s more often called in the R world, is one of the best ways to get help with your code - but what exactly is a reproducible example? How do you create one, and do it efficiently? Why is it so scary? This talk will cover what components are needed to make a good reproducible example to maximize your ability to get help (and to help yourself!), strategies for coming up with an example and testing its reproducibility, and why you should care about making one. We will also discuss how to extend the concept of reproducible examples beyond “Help! my code doesn’t work” to other environments where you might want to share code, like teaching and blogging.
- Shemra Rizzo is a senior data scientist in Genentech’s Personalized Healthcare group. Shemra’s role includes research on COVID-19 using electronic health records, and the development of data-driven approaches to evaluate clinical trial eligibility criteria. Shemra obtained her PhD in Biostatistics from UCLA. Before joining Genentech, Shemra was an assistant professor of statistics at UC Riverside, where her research covered topics in mental health, health disparities, and nutrition. In her free time, Shemra enjoys spending time with her family and running.
Abstract: Reproducibility when working with EHR licensed data.
- Shiro Kuriwaki is a PhD Candidate in the Department of Government at Harvard University. His research focuses on democratic representation in American Politics. In an ongoing project, he studies the structure of voter’s choices across levels of government and the political economy of local elections, using cast vote records and surveys. His other projects also help understand the mechanics of representation, including: public opinion and Congress, modern survey statistics and causal inference, and election administration. Prior to and during graduate school, he worked at the Analyst Institute in Washington D.C.
Abstract: TBA
- Simeon Carstens is a Data Scientist at Tweag I/O, a software innovation lab and consulting company. Originally a physicist, Simeon did a PhD and postdoc research in computational biology, focusing on Bayesian determination of three-dimensional chromosome structures.
Abstract: Data analysis often requires a complex software environment containing one or several programming languages, language-specific modules and external dependencies, all in compatible versions. This poses a challenge to reproducibility: what good is a well-designed, tested and documented data analysis pipeline if it is difficult to replicate the software environment required to run it? Standard tools such as Python / R virtual environments solve part of the problem, but do not take into account external and system-level dependencies. Nix is a fully declarative, open-source package manager solving this problem: a program packaged with Nix comes with a complete description of its full dependency tree, down to system libraries. In this presentation, I will give an introduction to Nix, show in a live demo how to set up a fully reproducible software environment and compare Nix to existing solutions such as virtual environments and Docker.
- Tania Allard is the co-director at Quansight Labs. At my current role she is mainly focused on all things full-stack data science, scientific computing, and open source. As a co-director her priorities are driving Quansight Labs direction, vision and overall support and growth of the PyData ecosystem. She is particularly interested in: reproducible research; reproducible machine learning; data engineering; MLOps / DataOps; all things automation; creative coding; serverless computing; and open tech knowledge.
- Tiffany Timbers is an Assistant Professor of Teaching in the Department of Statistics and an Co-Director for the Master of Data Science program (Vancouver Option) at the University of British Columbia. In these roles she teaches and develops curriculum around the responsible application of Data Science to solve real-world problems. One of her favourite courses she teaches is a graduate course on collaborative software development, which focuses on teaching how to create R and Python packages using modern tools and workflows.
Abstract: TBA
- Tom Barton is a PhD student in Politics at Royal Holloway, University of London. His PhD focuses on the impact of Voter Identification laws on political participation and attitudes. More generally his interests include elections, public opinion (particularly social values) and quantitative research methods.
Abstract: I reproduce Surridge, 2016, ‘Education and liberalism: pursuing the link’, Oxford Review of Education, 42:2, pp. 146-164, using the 1970 British Cohort Study (BCS70), instead using a difference-in-difference regression approach with more waves of data. I find that whilst there is evidence for both socialisation and self-selection models, self-selection dominates the link between social values and university attendance. This is counter to what Surridge (2016) concluded. The need for re-specification was two-fold, first Surridge’s methodology did not fully test for causality and secondly later waves have data have become available since.
- Tyler Girard is a PhD Candidate in political science at the University of Western Ontario (London, Ontario, Canada). His dissertation research seeks to explain the origins and diffusion of the global financial inclusion agenda by focusing on the role of ambiguous ideas in mobilizing and consolidating transnational coalitions. More generally, his work also explores new approaches to conceptual measurement in international relations.
Abstract: In what ways can we incorporate reproducible practices in pedagogy for social science courses? I discuss how individual and group exercises centered around the replication of existing datasets and analyses offer a flexible tool for experiential learning. However, maximizing the benefits of such an approach requires customizing the activity to the students and the availability of instructor support. I offer several suggestions for effectively using replication exercises in both undergraduate and graduate level courses.
- Wijdan Tariq is an undergraduate student in the Department of Statistical Sciences at the University of Toronto.
Abstract: I undertake a narrow replication of Caicedo, 2019, ‘The Mission: Human Capital Transmission, Economic Persistence, and Culture in South America’, Quarterly Journal of Economics, 134:1, pp. 507-556. Caicedo reports of a remarkable, religiously inspired human capital intervention that took place in remote parts of South America 250 years ago and whose positive economic effects, he claims, persist to this day. I replicate some of the paper’s key results using data files that are available on the Harvard Dataverse portal. I discuss some lessons learned in the process of replicating this paper and share some reflections on the state of reproducibility in economics.
- Yanbo Tang is a PhD candidate at the University of Toronto in the Department of Statistical Sciences, under the joint supervision of Nancy Reid and Daniel Roy. He is interested in the study and application of methods in higher order asymptotics and statistical inference in the presence of many nuisance parameters. Nowadays, he works under the careful gaze of his pet parrot.
Abstract: Hypothesis testing results often rely on simple, yet important assumptions about the behavior of the distribution of p-values under the null and alternative. We show that commonly held beliefs regarding the distribution of p-values are misleading when the variance or location of the test statistic are not well-calibrated or when the higher order cumulants of the test statistic are not negligible. We further examine the impact of having these misleading p-values on reproducibility of scientific studies, with some examples focused on GWAS studies. Certain corrected tests are proposed and are shown to perform better than their traditional counterparts in certain settings.