4 July 2022: Annie Collins and I had
Reproducibility of COVID-19 pre-prints
published in Scientometrics
I am an assistant professor at the University of Toronto in Information and Statistical Sciences. I am also the assistant director of CANSSI Ontario, a senior fellow at Massey College, a faculty affiliate at the Schwartz Reisman Institute for Technology and Society, and a co-lead of the DSI Thematic Program in Reproducibility.
My book on foundational data skills, tentatively titled Telling Stories With Data, was accepted for publication by CRC Press in May 2021. And I am co-editor (alongside Lauren Kennedy and Andrew Gelman) of a book tentatively titled Multilevel Regression and Poststratification: A Practical Guide and New Developments, which was accepted for publication by Cambridge University Press in July 2021.
I use statistical models to try to understand the world. I am particularly interested in: how we get the data that go into those models; whose data are systematically missing; how we clean, prepare, and tidy data before they are modelled; the effects of all this on the implications of our models; and how we can reproducibly share the totality of this process. This research interest has a few different applications. One of those is Natural Language Processing (NLP), where I am interested in understanding the effects of bringing together large, biased datasets and enormous models, and how this can be improved. Another is Multilevel Regression with Post-stratification (MRP), where I examine the effects of trying to establish a correspondence between two datasets.
Students in my research group develop skills not only in using statistical methods in reproducible ways across various disciplines, but also appreciate their limitations, and think deeply about the broader context of their work. Some recent papers include: ‘Reproducibility of COVID-19 pre-prints’, ‘heapsofpapers’, ‘Detecting Hate Speech with GPT-3’, and ‘On consistency scores in text data with an implementation in R’.
I enjoy teaching and aim to help students from a wide range of backgrounds learn how to use data to tell convincing stories. In the Faculty of Information, I have taught ‘Experimental Design’ and lead reading courses in ‘Ethics and Data Science’, ‘Information Management in Interdisciplinary Research’ and ‘Reproducible Data Science’. In Statistical Sciences I have taught ‘Surveys, Sampling, and Observational Data’ and lead a reading course in ‘Natural Language Processing’. I am a RStudio Certified Tidyverse Trainer and an associate editor of the Journal of Statistics and Data Science Education.
I am married to Monica Alexander, and we have two young children. I probably spend too much money on books, and certainly too much time at libraries (in a pre-COVID world). You can see some of the books that I recommend here. If you have any book recommendations of your own, then I’d love to hear them.