Courses: Generative AI and Social Science

Generative AI and Social Science

Last updated: 2026-04-08

This course may be taken as a reading course in DoSS.

Overview

The purpose of this course is to get up to speed with generative AI in the social sciences, and then use that knowledge to make some small contribution of your own.

This course would suit students interested in self-driven learning who are looking at going to grad school. You should have familiarity with a programming language, such as Python or R, confidence using GitHub, and comfort being self-sufficient and self-motivated.

Thank you to Jessica Hullman for sharing her course on Gen AI for behavioral science.

Learning objectives

Develop an understanding of the literature on generative AI and social sciences.
Critically evaluate when and where LLMs are appropriate to use instead of real humans.
Understand how to evaluate validity, reliability, and generalizability of AI-based social science methods.
Contribute, in some small way, to the literature at the intersection of AI and social science.

Content

Week 1: Overview

Technical foundations:

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al., (2020) "Language Models Are Few-Shot Learners", arXiv, 10.48550/arXiv.2005.14165.
Grattafiori, Aaron, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, et al., (2024) "The Llama 3 Herd of Models", arXiv, 10.48550/arXiv.2407.21783.
Anthropic, (2026) "System Card: Claude Mythos Preview", anthropic.com.

Broader issues:

Chambliss, Daniel, (1989) "The Mundanity of Excellence: An Ethnographic Report on Stratification and Olympic Swimmers", Sociological Theory 7 (1): 70--86. https://doi.org/10.2307/202063.

Core:

Bail, Christopher A., (2024) "Can Generative AI Improve Social Science?", Proceedings of the National Academy of Sciences, 121 (21), 10.1073/pnas.2314021121.
Chiu, Ke-Li, Annie Collins, and Rohan Alexander, (2022) "Detecting Hate Speech with GPT-3", arXiv, https://arxiv.org/abs/2103.12407.
Davidson, Thomas, (2024) "Start Generating: Harnessing Generative Artificial Intelligence for Sociological Research", Socius: Sociological Research for a Dynamic World 10 (January), 10.1177/23780231241259651.
Horton, John J., Apostolos Filippas, and Benjamin Manning, (2026) "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?", Working Paper 31122, National Bureau of Economic Research, 10.3386/w31122.

Week 2: Silicon samples

Technical foundations:

Alammar, Jay, and Maarten Grootendorst, (2024) Hands-on Large Language Models, O'Reilly, Chapters 1--2.

Broader issues:

Karamanis, Minas, (2026) "The machines are fine. I'm worried about us.", ergosphere.blog/posts/the-machines-are-fine/.

Core:

Alexander, Rohan and Annie Collins, (2026) "Simulating Gun Control Attitudes After the 2025 Bondi Beach Shooting Using Persona-Conditioned LLMs" [PDF will be provided].
Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate, (2023) "Out of One, Many: Using Language Models to Simulate Human Samples", Political Analysis, 31 (3), 337--51. 10.1017/pan.2023.2.
Bisbee, James, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer Larson, (2024) "Synthetic Replacements for Human Survey Data? The Perils of Large Language Models", Political Analysis, 32 (4), 401--16. 10.1017/pan.2024.5.
Dillion, Danica, Niket Tandon, Yuling Gu, and Kurt Gray, (2023) "Can AI language models replace human participants?", Trends in Cognitive Sciences, 27 (7), 597--600. 10.1016/j.tics.2023.04.008.
Kozlowski, Austin and James Evans, (2025) "Simulating subjects: The promise and peril of artificial intelligence stand-ins for social agents and interactions", Sociological Methods & Research, 54 (3), 10.1177/00491241251337316.

Week 3: Agents

Technical foundations:

Alammar, Jay, and Maarten Grootendorst, (2024) Hands-on Large Language Models, O'Reilly, Chapters 3--4.

Core:

Binz, Marcel, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, et al., (2025) "A Foundation Model to Predict and Capture Human Cognition", Nature, 644 (8078): 1002--9, 10.1038/s41586-025-09215-4.
Cui, Ziyan, Ning Li, and Huaikang Zhou, (2025) "A Large-Scale Replication of Scenario-Based Experiments in Psychology and Management Using Large Language Models", Nature Computational Science, 5 (8): 627--34, 10.1038/s43588-025-00840-7.
Manning, Benjamin S., and John J. Horton, (2025) "General Social Agents", arXiv, 10.48550/arXiv.2508.17407.
Peng, Tianyi, George Gui, Melanie Brucks, Daniel J. Merlau, Grace Jiarui Fan, Malek Ben Sliman, Eric J. Johnson, et al., (2026) "Digital Twins as Funhouse Mirrors: Five Key Distortions", arXiv, 10.48550/arXiv.2509.19088.
Vafa, Keyon, Peter G. Chang, Ashesh Rambachan, and Sendhil Mullainathan, (2025) "What Has a Foundation Model Found? Inductive Bias Reveals World Models", In Forty-Second International Conference on Machine Learning, https://openreview.net/forum?id=i9npQatSev.

Week 4: Failures, bias, and other concerns

Technical foundations:

Alammar, Jay, and Maarten Grootendorst, (2024) Hands-on Large Language Models, O'Reilly, Chapters 5--6.

Core:

Asadi, Mohammad , Jack W. O'Sullivan, Fang Cao, Tahoura Nedaee, Kamyar Fardi, Fei-Fei Li, Ehsan Adeli, Euan Ashley, (2026) "MIRAGE: The Illusion of Visual Understanding", arXiv, 10.48550/arXiv.2603.21687.
Barrie, Christopher, and Roberto Cerina, (2026) "Synthetic Personas Distort the Structure of Human Belief Systems", SocArXiv, 10.31235/osf.io/n7fq8_v1.
Cummins, Jamie, (2025) "The threat of analytic flexibility in using large language models to simulate human data: A call to attention", arXiv, 10.48550/arXiv.2509.13397.
Dominguez-Olmedo, Ricardo, Moritz Hardt, and Celestine Mendler-Dünner, (2024) "Questioning the survey responses of large language models", Proceedings of the 38th International Conference on Neural Information Processing Systems, 37, 45850--45878, 10.5555/3737916.3739374.
Messeri, Lisa and M. J. Crockett, (2024) "Artificial intelligence and illusions of understanding in scientific research", Nature, 627(8002), 49--58. 10.1038/s41586-024-07146-0.
Spiegelhalter, David, (2020) "Should We Trust Algorithms?", Harvard Data Science Review, 31 January, 10.1162/99608f92.cb91a35a.
Taday Morocho, Erika Elizabeth, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, and Stefano Cresci, (2026) "Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents", arXiv, 10.48550/arXiv.2602.18462.
Wang, Angelina, Jamie Morgenstern and John P. Dickerson, (2025) "Large language models that replace human participants can harmfully misportray and flatten identity groups", Nature Machine Intelligence, 7(3), 400--411. 10.1038/s42256-025-00986-z.

Week 5: Reliability

Technical foundations:

Alammar, Jay, and Maarten Grootendorst, (2024) Hands-on Large Language Models, O'Reilly, Chapters 7--8.

Core:

Allen, Ryan, and Aticus Peterson, (2026) "Intelligence Without Integrity: Why Capable LLMs May Undermine Reliability", arXiv, 10.48550/arXiv.2602.20440.
Bertran, Martin, Riccardo Fogliato, and Zhiwei Steven Wu, (2026) "Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse", arXiv, 10.48550/arXiv.2602.18710.
Cui, Jiaxin, and Rohan Alexander, (2026) "Same Prompt, Different Outcomes: Evaluating the Reproducibility of Data Analysis by LLMs", arXiv, 10.48550/arXiv.2602.14349.
Gao, Ruijiang and Xiao, Steven Chong, (2026) "Non-standard Errors in AI Agents", SSRN, 10.2139/ssrn.6427518.
Rabanser, Stephan, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, and Arvind Narayanan, (2026) "Towards a Science of AI Agent Reliability", arXiv, 10.48550/arXiv.2602.16666.

Week 6: Validation

Technical foundations:

Boykis, Vicky, (2023) What are embeddings, 10.5281/zenodo.8015029.

Core:

Barrie, Christopher, Alexis Palmer, and Arthur Spirling, (2025) "Replication for Language Models: Problems, Principles, and Best Practices for Political Science", https://arthurspirling.org/documents/BarriePalmerSpirling/%5FTrustMeBro.pdf.
Bisbee, James, and Arthur Spirling, (2026) "What to Do When Humans Are No Longer the Gold Standard: Large Language Models, State of the Art and Robustness for Politics Research", https://github.com/ArthurSpirling/futureProofR.
Egami, Naoki, Musashi Hinck, Brandon M. Stewart, Hanying Wei, (2024) "Using Large Language Model Annotations for the Social Sciences: A General Framework of Using Predicted Variables in Downstream Analyses", https://naokiegami.com/paper/dsl_ss.pdf.
Hoffman, Kentaro, Stephen Salerno, Awan Afiaz, Jeffrey T. Leek, and Tyler H. McCormick, (2024) "Do we really even need data?", arXiv, 10.48550/arXiv.2401.08702.
Hullman, Jessica, David Broska, Huaman Sun, and Aaron Shaw, (2026) "This human study did not involve human subjects: Validating LLM simulations as behavioral evidence". arXiv. 10.48550/arXiv.2602.15785.
Krsteski, Stefan, Giuseppe Russo, Serina Chang, Robert West, and Kristina Gligorić, (2025) "Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification", arXiv, 10.48550/arXiv.2510.11408.
Lyman, Alex, Bryce Hepner, Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, and David Wingate, (2025) "Balancing Large Language Model Alignment and Algorithmic Fidelity in Social Science Research", Sociological Methods & Research 54 (3): 1110--55. 10.1177/00491241251342008.
Neumann, Terrence, Maria De-Arteaga, and Sina Fazelpour, (2025) "Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation", arXiv, 10.48550/arXiv.2504.08954.
Song, Yilin Song, Dan M. Kluger, Harsh Parikh, Tian Gu, (2026) "Demystifying Prediction Powered Inference", arXiv, 10.48550/arXiv.2601.20819.
Zrnic, Tijana and Emmanuel J. Candès, (2024) "Active statistical inference", arXiv, 10.48550/arXiv.2403.03208.

Week 7: AI and social sciences

Technical foundations:

Huyen, Chip, (2025) AI Engineering, O'Reilly, Chapters 1--2.

Core:

Argyle, Lisa P., Ethan C. Busby, Joshua R. Gubler, Bryce Hepner, Alex Lyman, and David Wingate, (2025) "Arti-'Fickle' Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences". Nature Computational Science 5 (9): 737--44. 10.1038/s43588-025-00843-4.
Grossmann, Igor, Matthew Feinberg, Dawn C. Parker, Nicholas A. Christakis, Philip E. Tetlock, and William A. Cunningham, (2023) "AI and the Transformation of Social Science Research", Science, 380 (6650), pp. 1108--1109. 10.1126/science.adi1778.
Mellon, Jonathan, Jack Bailey, Ralph Scott, James Breckwoldt, Marta Miori, and Phillip Schmedeman, (2024) "Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale", Research & Politics, 11 (1), 10.1177/20531680241231468.
Westwood, Sean J., (2025) "The potential existential threat of large language models to online survey research", Proceedings of the National Academy of Sciences, 122(47), e2518075122. 10.1073/pnas.2518075122.

Week 8: Automation

Technical foundations:

Huyen, Chip, (2025) AI Engineering, O'Reilly, Chapters 3--4.

Core:

Hogan, Brendan R., Xiwen Chen, James T. Wilson, Kashif Rasul, Adel Boyarsky, Thomas Kamei, Anderson Schneider, Yuriy Nevmyvaka, (2026) "ALPHALAB: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs", https://brendanhogan.github.io/alphalab-paper/
Hughes, Evelyn, and Rohan Alexander, (2026) "Benchmarking AI Performance on End-to-End Data Science Projects", arXiv, 10.48550/arXiv.2602.14284.
Lu, Chris, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foerster, David Ha, and Jeff Clune, (2026) "Towards End-to-End Automation of AI Research", Nature, 651 (8107): 914--19. 10.1038/s41586-026-10265-5.
Manning, Benjamin S., Kehang Zhu, and John J. Horton, (2024) "Automated Social Science: Language Models as Scientist and Subjects", arXiv, 10.48550/arXiv.2404.11794.
Misra, Sanjog, (2025) "Foundation Priors". arXiv. 10.48550/arXiv.2512.01107.
Musslick, Sebastian, Laura K. Bartlett, Suyog H. Chandramouli, Marina Dubova, Fernand Gobet, Thomas L. Griffiths, Jessica Hullman, et al., (2025) "Automating the practice of science: Opportunities, challenges, and implications", Proceedings of the National Academy of Sciences, 10.1073/pnas.2401238121.

Weeks 9-12: Focus on drafting the paper

Technical foundations:

Zinsser, William, (1976) On Writing Well.

Assessment

Donaldson paper

Due date: Friday, Week 1.
Weight: 10 per cent.
Task: Complete the Donaldson paper. Do not use AI for any aspect.

Weekly discussion

Due dates: On Fridays, Weeks 1-8, we will meet for 30 minutes to discuss the papers you read this week.
Weight: Each is worth 5 per cent, for a total of 40 per cent.
Task: We will discuss the core papers and I expect you to be able to sensibly discuss them.

Research paper

Due date: Fridays, Weeks 9-12, and then as agreed in Reading Week and Exam Block.
Weight: 50 per cent.
Task: Working closely with me, you and I will write a research paper. You will start submitting drafts from Week 9, and I will give you feedback. The final evaluation is based on the final day of exam block.