The following sections contain the materials for the eight applied NLP lectures. The applied NLP papers to review each week are listed here as well.
- NLP resources
- Lecture 1: NLP Introduction
- Lecture 2: Syntax
- Lecture 3: Semantics
- Lecture 4: Evaluation NLP
- Lecture 5: ML for NLP
- Lecture 6: Natural Language Generation
- Lecture 7: NLP annotations
- Lecture 8: Word embeddings
Speech and Language Processing (3rd ed.) by Dan Jurafsky and James H. Martin. Available online.
Natural language processing by Jacob Eisenstein Also available online
A number of open-source retrieval toolkits exist. They have different strengths and weaknesses. Which ones are applicable to your project depend to some extent in your taste of programming languages and the kind of NLP task you are working on:
Also in Dutch:
- Movie reviews
- Stanford sentiment treebank
- Sentiment on Twitter
- Sentiment Analysis in Twitter
- Various text datasets, UCI
- Stance detection
- Irony Detection in English Tweets
Natural Language processing describes computational methods that allow computers to "understand" human communication. This lecture explains what NLP can do, and describes common NLP applications. It describes the kind of tasks that are solved by NLP, and the kind of components/sub-tasks that make it possible for us to solve these tasks.
Slides (PDF) are available here. The slides will be updated (as needed) after the lecture.
While human language is very flexible, it does follow certain rules, principles, and processes that govern the structure of sentences. We can use that structure to improve machine understanding of human language, and solve many NLP tasks. Therefore this lecture focuses on syntax, or the structure of sentences.
The lectures slides (PDF) are available here.
** NLP project proposal: due February 21st.**
Large Language Models in Machine Translation ("Stupid Backoff") NLP book - Part-of-speech tagging:
Stanford parser FAQ: Sentiment analysis:
The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary:
Offspring from reproduction problems: What replication failure teaches us
Fokkens, Antske, et al. "Offspring from reproduction problems: What replication failure teaches us." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.
Review P1: handed out February 14, due February 21.
What does it mean, what does it all mean? Unlike controlled languages, natural language is full of ambiguities. Words have multiple meanings, and words are related to each other in different ways. This lecture looks at semantics, or meaning in language.
Lectures slides (PDF) are available here.
Similarity for news recommender systems
Determining the sentiment of opinions
Mining and summarizing customer reviews.
How do we know if our NLP methods are working well? What are the methods used in NLP, and what are the metrics? How do we interpret these are when are they suitable? This lecture looks at evaluation, and assessing the performance of NLP systems.
Lectures slides (PDF) are available here.
Why Most Published Research Findings Are False
More offline evaluation metrics from NIST
ASIYA: Open toolkit for evaluating text using automatic metrics
Best practices for the human evaluation of automatically generated text
Best practices for the human evaluation of automatically generated text. Chris van der Lee, Albert Gatt, Emiel van Miltenburg, Sander Wubben and Emiel Krahmer
Review P2: handed out February 21, due February 28.
How do we apply what we know about classifiers and regression to NLP problems? What are commonpitfalls and mistakes? Which kind of biases in the data and analysis should we look out for as ethical data scientists? This lecture focuses on applying classical machine learning techniques to natural language processing.
NLP intermediate project report: due March 4.
Last year's lectures slides (PDF) are available here. Slides will be updated after the lecture as needed.
-
Jurafsky, Dan. Speech & language processing
- Chapter 6 Naive-Bayes
- Chapter 7 logistic regression
-
Bing Liu. Sentiment analysis and Opinion Mining
-
Chapter 10 (fake reviews)
-
Scikit-learn, machine learning in Python
NLTK machine learning (more limited for ML)
So far we've only looked at how to improve computational understanding natural language. However in conversational systems (like, but not limited to chatbots), we also might want a computer to communicate with us. There is an area of research that focuses on going from abstract, often rich and complex, representations to natural language that people can understand. In this lecture we will introduce this area of research, which is called Natural Language Generation.
Slides (PDF) are available here.
Academic Reference: A Gatt and E Reiter (2009). SimpleNLG: A realisation engine for practical applications. Proceedings of ENLG-2009
Thumbs up?: sentiment classification using machine learning techniques
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.
Review P3: handed out February 28, due March 6.
The results that we get in many NLP tasks are dependent on the quality and properties of the underlying data. In many (most) cases this is as important as applying the right machine learning techniques. In addition, a lot of the annotation is noisy or simply subjective. In this lecture we discuss the challenges and some of the state-of-the-art solutions.
Last year's lectures slides (PDF) are available here
NLP final project report: due March 11 (interviews 12 and 13th)
[Geva, Mor, Yoav Goldberg, and Jonathan Berant. "Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets." EMNLP-IJCNLP (2019)] (https://www.aclweb.org/anthology/D19-1107.pdf)
The theory of distributional semantics effectively builds on the principle "A word is known by the company it keeps". In this lecture we will have an introduction to how to learn abstract word vectors that can help us compute the semantic (meaning) distance between different words. We will look at different ways these kinds of vectors can be used, and also talk about some of their limitations.
Last year's lectures slides (PDF) are available here
NLP project interviews: March 12 and March 13.
Exploiting 'subjective' annotations
Reidsma, Dennis. "Exploiting 'subjective' annotations." Proceedings of the Workshop on Human Judgements in Computational Linguistics. Association for Computational Linguistics, 2008.
Review 4: handed out March 6, due March 13.