Applied NLP Lectures

The following sections contain the materials for the eight applied NLP lectures. The applied NLP papers to review each week are listed here as well.

NLP resources
Lecture 1: NLP Introduction
- Recommended readings
Lecture 2: Syntax
Lecture 3: Semantics
- Recommended readings
- Recommended resources:
Lecture 4: Evaluation NLP
Lecture 5: ML for NLP
- Recommended readings
- Recommended resources:
Lecture 6: Natural Language Generation
Lecture 7: NLP annotations
- Recommended readings
Lecture 8: Word embeddings
- Recommended readings
- ⚠️ Paper P4 to review

NLP resources

Books

Speech and Language Processing (3rd ed.) by Dan Jurafsky and James H. Martin. Available online.

Natural language processing by Jacob Eisenstein Also available online

Software

A number of open-source retrieval toolkits exist. They have different strengths and weaknesses. Which ones are applicable to your project depend to some extent in your taste of programming languages and the kind of NLP task you are working on:

Open Source Natural Language Processing Tools

NLTK
Apache OpenNLP
spaCy + Stanza (formerly StanfordNLP)
GATE (for Twitter)

Also in Dutch:

SentiStrength
Polyglot
Clips

Datasets

Movie reviews
Stanford sentiment treebank
Sentiment on Twitter
Sentiment Analysis in Twitter
Various text datasets, UCI
Stance detection
Irony Detection in English Tweets

Lecture 1: NLP Introduction

Natural Language processing describes computational methods that allow computers to "understand" human communication. This lecture explains what NLP can do, and describes common NLP applications. It describes the kind of tasks that are solved by NLP, and the kind of components/sub-tasks that make it possible for us to solve these tasks.

Slides (PDF) are available here. The slides will be updated (as needed) after the lecture.

Lecture 2: Syntax

While human language is very flexible, it does follow certain rules, principles, and processes that govern the structure of sentences. We can use that structure to improve machine understanding of human language, and solve many NLP tasks. Therefore this lecture focuses on syntax, or the structure of sentences.

The lectures slides (PDF) are available here.

** NLP project proposal: due February 21st.**

Recommended resources:

The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary:

NLTK Dutch PoS:

⚠️ Paper P1 to review

Offspring from reproduction problems: What replication failure teaches us

Fokkens, Antske, et al. "Offspring from reproduction problems: What replication failure teaches us." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.

Review P1: handed out February 14, due February 21.

Lecture 3: Semantics

What does it mean, what does it all mean? Unlike controlled languages, natural language is full of ambiguities. Words have multiple meanings, and words are related to each other in different ways. This lecture looks at semantics, or meaning in language.

Lectures slides (PDF) are available here.

Recommended resources:

NLTK similarity

Wordnet::similarity

DutchSemCor

Lecture 4: Evaluation NLP

How do we know if our NLP methods are working well? What are the methods used in NLP, and what are the metrics? How do we interpret these are when are they suitable? This lecture looks at evaluation, and assessing the performance of NLP systems.

Lectures slides (PDF) are available here.

Recommended resources:

ASIYA: Open toolkit for evaluating text using automatic metrics

⚠️ Paper P2 to review

Best practices for the human evaluation of automatically generated text

Best practices for the human evaluation of automatically generated text. Chris van der Lee, Albert Gatt, Emiel van Miltenburg, Sander Wubben and Emiel Krahmer

Review P2: handed out February 21, due February 28.

Lecture 5: ML for NLP

How do we apply what we know about classifiers and regression to NLP problems? What are commonpitfalls and mistakes? Which kind of biases in the data and analysis should we look out for as ethical data scientists? This lecture focuses on applying classical machine learning techniques to natural language processing.

NLP intermediate project report: due March 4.

Last year's lectures slides (PDF) are available here. Slides will be updated after the lecture as needed.

Recommended resources:

Scikit-learn, machine learning in Python

NLTK machine learning (more limited for ML)

Lecture 6: Natural Language Generation

So far we've only looked at how to improve computational understanding natural language. However in conversational systems (like, but not limited to chatbots), we also might want a computer to communicate with us. There is an area of research that focuses on going from abstract, often rich and complex, representations to natural language that people can understand. In this lecture we will introduce this area of research, which is called Natural Language Generation.

Slides (PDF) are available here.

Recommended resources

simplenlg realizer

Academic Reference: A Gatt and E Reiter (2009). SimpleNLG: A realisation engine for practical applications. Proceedings of ENLG-2009

⚠️ Paper P3 to review

Thumbs up?: sentiment classification using machine learning techniques

Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.

Review P3: handed out February 28, due March 6.

Lecture 7: Bias in NLP

The results that we get in many NLP tasks are dependent on the quality and properties of the underlying data. In many (most) cases this is as important as applying the right machine learning techniques. In addition, a lot of the annotation is noisy or simply subjective. In this lecture we discuss the challenges and some of the state-of-the-art solutions.

Last year's lectures slides (PDF) are available here

NLP final project report: due March 11 (interviews 12 and 13th)

Lecture 8: Word embeddings

The theory of distributional semantics effectively builds on the principle "A word is known by the company it keeps". In this lecture we will have an introduction to how to learn abstract word vectors that can help us compute the semantic (meaning) distance between different words. We will look at different ways these kinds of vectors can be used, and also talk about some of their limitations.

Last year's lectures slides (PDF) are available here

NLP project interviews: March 12 and March 13.

Recommended resources

GloVe word2vec

⚠️ Paper P4 to review

Exploiting 'subjective' annotations

Reidsma, Dennis. "Exploiting 'subjective' annotations." Proceedings of the Workshop on Human Judgements in Computational Linguistics. Association for Computational Linguistics, 2008.

Review 4: handed out March 6, due March 13.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lecturesAppliedNLP.md

lecturesAppliedNLP.md

Applied NLP Lectures

Table of contents

NLP resources

Books

Software

Open Source Natural Language Processing Tools

Datasets

Lecture 1: NLP Introduction

Recommended readings

Lecture 2: Syntax

Recommended readings

Recommended resources:

⚠️ Paper P1 to review

Lecture 3: Semantics

Recommended readings

Recommended resources:

Lecture 4: Evaluation NLP

Recommended readings

Recommended resources:

⚠️ Paper P2 to review

Lecture 5: ML for NLP

Recommended readings

Recommended resources:

Lecture 6: Natural Language Generation

Recommended resources

⚠️ Paper P3 to review

Lecture 7: Bias in NLP

Recommended readings

Lecture 8: Word embeddings

Recommended readings

Recommended resources

⚠️ Paper P4 to review

Files

lecturesAppliedNLP.md

Latest commit

History

lecturesAppliedNLP.md

File metadata and controls

Applied NLP Lectures

Table of contents

NLP resources

Books

Software

Open Source Natural Language Processing Tools

Datasets

Lecture 1: NLP Introduction

Recommended readings

Lecture 2: Syntax

Recommended readings

Recommended resources:

⚠️ Paper P1 to review

Lecture 3: Semantics

Recommended readings

Recommended resources:

Lecture 4: Evaluation NLP

Recommended readings

Recommended resources:

⚠️ Paper P2 to review

Lecture 5: ML for NLP

Recommended readings

Recommended resources:

Lecture 6: Natural Language Generation

Recommended resources

⚠️ Paper P3 to review

Lecture 7: Bias in NLP

Recommended readings

Lecture 8: Word embeddings

Recommended readings

Recommended resources

⚠️ Paper P4 to review