Courses

Applications of Natural Language Processing

Last updated: 2025-03-06

This course may be taken as a reading course in DoSS or Information. Please get in touch if interested.

Overview

The purpose of this course is to develop students who can use NLP methods, and especially LLMs, to:

  • work productively to implement existing methods;
  • contribute to our understanding of the world; and
  • engage in thoughtful, ethical, critique.

Learning objectives

By the end of the course, you should have:

  1. a good understanding of applied NLP, especially LLMs, and its place in the world;
  2. exceptional written and verbal communication skills; and
  3. contributed in some small way to our understanding of some aspect of the world.

Pre-requisites

  • Comfortable with R and Python, and using GitHub.
  • Go through Silge, Julia, & David Robinson, 2020, Text Mining with R.

Content

Week 1 "Overview"

Week 2 "Tokenisation and n-grams"

Week 3 "Embeddings"

  • Alammar, Jay, and Maarten Grootendorst, 2024, Hands-On Large Language Models, Chapter 2 Tokens and Embeddings
  • Boykis, Vicky, 2023, What are embeddings, 10.5281/zenodo.8015029.

Week 4 "Modeling"

  • Hvitfeldt, Emil & Julia Silge, 2020, Supervised Machine Learning for Text Analysis in R, Chapters 4-6.
  • Jurafsky, Dan, and James H. Martin, 2024, Speech and Language Processing, 3rd ed., Chapters 4 and 5.

Week 5 "Neural Nets I"

  • Jurafsky, Dan, and James H. Martin, 2020, Speech and Language Processing, 3rd ed., Chapters 6 and 7.
  • François Chollet, 2021, Deep Learning with Python, Chapters 1-4.

Week 6 "Neural Nets II"

  • Hvitfeldt, Emil & Julia Silge, 2020, Supervised Machine Learning for Text Analysis in R, Chapters 7-9.

Week 7 "Deep learning I"

  • François Chollet, 2021, Deep Learning with Python, Chapter 6.

Week 8 "LSTM"

  • Jurafsky, Dan, and James H. Martin, 2020, Speech and Language Processing, 3rd ed., Chapters 8 and 9.

Week 9 "Transformers I"

Week 10 "Transformers II"

Week 11 "Transformers III"

Week 12 "Transformers IV"

Assessment

Notebook

  • Due date: Try to keep this updated weekly, on average, over the course of the term.
  • Task: Use Quarto to keep a notebook of what you read in the style of this one by Andrew Heiss.
  • Weight: 25 per cent.

Paper #1

  • Due date: Thursday, noon, Week 1.
  • Task: Donaldson Paper (although vary the dataset, so that it is something related to NLP).
  • Weight: 10 per cent.

Paper #2

  • Due date: Thursday, noon, Week 6.
  • Task: Write a paper applying what you are learning.
  • Weight: 25 per cent.

Final Paper

  • Due date: Thursday, noon, Week 12 + two weeks.
  • Task: Write a paper that involves doing original research.
  • Weight: 40 per cent.