TIES4200 Natural Language Processing (5 cr)
Description
Modern Natural Language (NLP) techniques including high-profile models like BERT and GPT use (large-scale) language modelling to create foundational models adaptable to different tasks. This course gives a language modelling -focussed introduction to NLP. Practical exercises in the course include implementing scaled-down versions of the algorithms used by these models as well as making use of high-level NLP libraries. Students will complete a final project of their choice.
The course includes
- Foundational material on rule-based and traditional statistical approaches to NLP, their drawbacks and limitations, and how they relate to current language modelling-based methods
- An introduction to neural sequence models, building up to attention and the transformer architecture
- Text classification and regression with pretrained language models
- Material and exercises on the evaluation of NLP systems and language models
- The link between linear algebra and text: (subword) tokenisation and encoding/decoding
- Topical material on emerging techniques and issues which may include one or more of: Explainability, Reinforcement Learning from Human Feedback for InstructGPT/ChatGPT style models; curation of massive training corpora for large language models; and prompt engineering
Learning outcomes
On completion of the course the student will:
- Have an understanding of why current systems have converged upon language modelling as a key objective
- Have some knowhow about how to build NLP systems based on existing library code
- Be able to modify and reimplement algorithms underlying generative language models
- Be able to empirically evaluate the performance of NLP systems
- Have gained some skills for working on and presenting practical projects involving NLP
Description of prerequisites
Basic/intermediate level programming skills, basic knowledge of Python. High-school level mathematics skills, introductory linear algebra such as vectors, matrices, and their products.
Study materials
The teacher will provide course materials including lecture slides and supplementary materials and reading. There will be some overlap between the material and the following book:
Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin