Skip to content

Latest commit

 

History

History
247 lines (149 loc) · 12.8 KB

lecturesAppliedNLP.md

File metadata and controls

247 lines (149 loc) · 12.8 KB

Applied NLP Lectures

The following sections contain the materials for the eight applied NLP lectures. The applied NLP papers to review each week are listed here as well.

Table of contents

NLP resources

Books

Speech and Language Processing (3rd ed.) by Dan Jurafsky and James H. Martin. Available online.

Natural language processing by Jacob Eisenstein Also available online

Software

A number of open-source retrieval toolkits exist. They have different strengths and weaknesses. Which ones are applicable to your project depend to some extent in your taste of programming languages and the kind of NLP task you are working on:

Open Source Natural Language Processing Tools

Also in Dutch:

Datasets

Lecture 1: NLP Introduction

Natural Language processing describes computational methods that allow computers to "understand" human communication. This lecture explains what NLP can do, and describes common NLP applications. It describes the kind of tasks that are solved by NLP, and the kind of components/sub-tasks that make it possible for us to solve these tasks.

Slides (PDF) are available here. The slides will be updated (as needed) after the lecture.

Recommended readings

Lecture 2: Syntax

While human language is very flexible, it does follow certain rules, principles, and processes that govern the structure of sentences. We can use that structure to improve machine understanding of human language, and solve many NLP tasks. Therefore this lecture focuses on syntax, or the structure of sentences.

The lectures slides (PDF) are available here.

** NLP project proposal: due February 21st.**

Recommended readings

Large Language Models in Machine Translation ("Stupid Backoff") NLP book - Part-of-speech tagging:

Stanford parser FAQ: Sentiment analysis:

Recommended resources:

The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary:

NLTK Dutch PoS:

⚠️ Paper P1 to review

Offspring from reproduction problems: What replication failure teaches us

Fokkens, Antske, et al. "Offspring from reproduction problems: What replication failure teaches us." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.

Review P1: handed out February 14, due February 21.

Lecture 3: Semantics

What does it mean, what does it all mean? Unlike controlled languages, natural language is full of ambiguities. Words have multiple meanings, and words are related to each other in different ways. This lecture looks at semantics, or meaning in language.

Lectures slides (PDF) are available here.

Recommended readings

Similarity for news recommender systems

Determining the sentiment of opinions

Mining and summarizing customer reviews.

Recommended resources:

NLTK similarity

Wordnet::similarity

DutchSemCor

Lecture 4: Evaluation NLP

How do we know if our NLP methods are working well? What are the methods used in NLP, and what are the metrics? How do we interpret these are when are they suitable? This lecture looks at evaluation, and assessing the performance of NLP systems.

Lectures slides (PDF) are available here.

Recommended readings

Ehud Reiter on NLG evaluation

Why Most Published Research Findings Are False

More offline evaluation metrics from NIST

The Pyramid method: Nenkova, Ani, Rebecca Passonneau, and Kathleen McKeown. "The pyramid method: Incorporating human content selection variation in summarization evaluation." ACM Transactions on Speech and Language Processing (TSLP) 4.2 (2007): 4.

Recommended resources:

ASIYA: Open toolkit for evaluating text using automatic metrics

⚠️ Paper P2 to review

Best practices for the human evaluation of automatically generated text

Best practices for the human evaluation of automatically generated text. Chris van der Lee, Albert Gatt, Emiel van Miltenburg, Sander Wubben and Emiel Krahmer

Review P2: handed out February 21, due February 28.

Lecture 5: ML for NLP

How do we apply what we know about classifiers and regression to NLP problems? What are commonpitfalls and mistakes? Which kind of biases in the data and analysis should we look out for as ethical data scientists? This lecture focuses on applying classical machine learning techniques to natural language processing.

NLP intermediate project report: due March 4.

Last year's lectures slides (PDF) are available here. Slides will be updated after the lecture as needed.

Recommended readings

Recommended resources:

Scikit-learn, machine learning in Python

NLTK machine learning (more limited for ML)

Lecture 6: Natural Language Generation

So far we've only looked at how to improve computational understanding natural language. However in conversational systems (like, but not limited to chatbots), we also might want a computer to communicate with us. There is an area of research that focuses on going from abstract, often rich and complex, representations to natural language that people can understand. In this lecture we will introduce this area of research, which is called Natural Language Generation.

Slides (PDF) are available here.

Recommended resources

simplenlg realizer

Academic Reference: A Gatt and E Reiter (2009). SimpleNLG: A realisation engine for practical applications. Proceedings of ENLG-2009

⚠️ Paper P3 to review

Thumbs up?: sentiment classification using machine learning techniques

Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.

Review P3: handed out February 28, due March 6.

Lecture 7: Bias in NLP

The results that we get in many NLP tasks are dependent on the quality and properties of the underlying data. In many (most) cases this is as important as applying the right machine learning techniques. In addition, a lot of the annotation is noisy or simply subjective. In this lecture we discuss the challenges and some of the state-of-the-art solutions.

Last year's lectures slides (PDF) are available here

NLP final project report: due March 11 (interviews 12 and 13th)

Recommended readings

[Geva, Mor, Yoav Goldberg, and Jonathan Berant. "Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets." EMNLP-IJCNLP (2019)] (https://www.aclweb.org/anthology/D19-1107.pdf)

Sun, Tony, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. "Mitigating gender bias in natural language processing: Literature review." ACL (2019).

Lecture 8: Word embeddings

The theory of distributional semantics effectively builds on the principle "A word is known by the company it keeps". In this lecture we will have an introduction to how to learn abstract word vectors that can help us compute the semantic (meaning) distance between different words. We will look at different ways these kinds of vectors can be used, and also talk about some of their limitations.

Last year's lectures slides (PDF) are available here

NLP project interviews: March 12 and March 13.

Recommended readings

Recommended resources

GloVe word2vec

⚠️ Paper P4 to review

Exploiting 'subjective' annotations

Reidsma, Dennis. "Exploiting 'subjective' annotations." Proceedings of the Workshop on Human Judgements in Computational Linguistics. Association for Computational Linguistics, 2008.

Review 4: handed out March 6, due March 13.