TIES4200 Natural Language Processing (5 cr)

Study level:
Advanced studies
Grading scale:
0-5
Language:
English
Responsible organisation:
Faculty of Information Technology
Curriculum periods:
2023-2024

Description

Modern Natural Language (NLP) techniques including high-profile models like BERT and GPT use (large-scale) language modelling to create foundational models adaptable to different tasks. This course gives a language modelling -focussed introduction to NLP. Practical exercises in the course include implementing scaled-down versions of the algorithms used by these models as well as making use of high-level NLP libraries. Students will complete a final project of their choice.


The course includes

  • Foundational material on rule-based and traditional statistical approaches to NLP, their drawbacks and limitations, and how they relate to current language modelling-based methods
  • An introduction to neural sequence models, building up to attention and the transformer architecture
  • Text classification and regression with pretrained language models
  • Material and exercises on the evaluation of NLP systems and language models
  • The link between linear algebra and text: (subword) tokenisation and encoding/decoding
  • Topical material on emerging techniques and issues which may include one or more of: Explainability, Reinforcement Learning from Human Feedback for InstructGPT/ChatGPT style models; curation of massive training corpora for large language models; and prompt engineering

Learning outcomes

On completion of the course the student will:

  • Have an understanding of why current systems have converged upon language modelling as a key objective
  • Have some knowhow about how to build NLP systems based on existing library code
  • Be able to modify and reimplement algorithms underlying generative language models
  • Be able to empirically evaluate the performance of NLP systems
  • Have gained some skills for working on and presenting practical projects involving NLP

Description of prerequisites

Basic/intermediate level programming skills, basic knowledge of Python. High-school level mathematics skills, introductory linear algebra such as vectors, matrices, and their products.

Study materials

The teacher will provide course materials including lecture slides and supplementary materials and reading. There will be some overlap between the material and the following book:

Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin

Completion methods

Method 1

Description:
Taught course completed through assignments and project work.
Evaluation criteria:
Assignments and project work.
Select all marked parts
Parts of the completion methods
x

Participation in teaching (5 cr)

Type:
Participation in teaching
Grading scale:
0-5
Language:
English

Teaching