Applications of Natural Language Processing
Last updated: 2025-03-06
This course may be taken as a reading course in DoSS or Information. Please get in touch if interested.
Overview
The purpose of this course is to develop students who can use NLP methods, and especially LLMs, to:
- work productively to implement existing methods;
- contribute to our understanding of the world; and
- engage in thoughtful, ethical, critique.
Learning objectives
By the end of the course, you should have:
- an good understanding of applied NLP, especially LLMs, and its place in the world;
- exceptional written and verbal communication skills; and
- contributed in some small way to our understanding of some aspect of the world.
Pre-requisites
- Comfortable with R and Python, and using GitHub.
- Go through Silge, Julia, & David Robinson, 2020, Text Mining with R.
Content
Week 1 "Overview"
Week 2 "Tokenisation and n-grams"
- Hvitfeldt, Emil & Julia Silge, 2020, Supervised Machine Learning for Text Analysis in R, Chapter 2 Tokenization.
- Jurafsky, Dan, and James H. Martin, 2024, Speech and Language Processing, 3rd ed., Chapter 3 N-gram Language Models.
Week 3 "Embeddings"
- Alammar, Jay, and Maarten Grootendorst, 2024, Hands-On Large Language Models, Chapter 2 Tokens and Embeddings
- Boykis, Vicky, 2023, What are embeddings, 10.5281/zenodo.8015029.
Week 4 "Modeling"
- Hvitfeldt, Emil & Julia Silge, 2020, Supervised Machine Learning for Text Analysis in R, Chapters 4-6.
- Jurafsky, Dan, and James H. Martin, 2024, Speech and Language Processing, 3rd ed., Chapters 4 and 5.
Week 5 "Neural Nets I"
- Jurafsky, Dan, and James H. Martin, 2020, Speech and Language Processing, 3rd ed., Chapters 6 and 7.
- François Chollet, 2021, Deep Learning with Python, Chapters 1-4.
Week 6 "Neural Nets II"
- Hvitfeldt, Emil & Julia Silge, 2020, Supervised Machine Learning for Text Analysis in R, Chapters 7-9.
Week 7 "Deep learning I"
- François Chollet, 2021, Deep Learning with Python, Chapter 6.
Week 8 "LSTM"
- Jurafsky, Dan, and James H. Martin, 2020, Speech and Language Processing, 3rd ed., Chapters 8 and 9.
Week 9 "Transformers I"
Week 10 "Transformers II"
- Manning, Vaswani and Huang, 2019, "Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention".
- Alammar, Jay, and Maarten Grootendorst, 2024, Hands-On Large Language Models, O'Reilly, Chapters 4-5.
- Uszkoreit, Jakob, 2017, "Transformer: A Novel Neural Network Architecture for Language Understanding", Google AI Blog, 31 August.
Week 11 "Transformers III"
- Karpathy, Andrej, 2022, "Neural Networks: Zero to Hero - Building make more videos".
- Alammar, Jay, and Maarten Grootendorst, 2024, Hands-On Large Language Models, O'Reilly, Chapters 6-8.
- Ashish Vaswani, et al., 2017, "Attention Is All You Need", arXiv.
- Jacob Devlin and Ming-Wei Chang, 2018, "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing", Google AI Blog.
- Jacob Devlin, et al., 2018, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", arXiv.
- Rush, Alexander, 2018, "The Annotated Transformer".
Week 12 "Transformers IV"
Assessment
Notebook
- Due date: Try to keep this updated weekly, on average, over the course of the term.
- Task: Use Quarto to keep a notebook of what you read in the style of this one by Andrew Heiss.
- Weight: 25 per cent.
Paper #1
- Due date: Thursday, noon, Week 1.
- Task: Donaldson Paper (although vary the dataset, so that it is something related to NLP).
- Weight: 10 per cent.
Paper #2
- Due date: Thursday, noon, Week 6.
- Task: Write a paper applying what you are learning.
- Weight: 25 per cent.
Final Paper
- Due date: Thursday, noon, Week 12 + two weeks.
- Task: Write a paper that involves doing original research.
- Weight: 40 per cent.