Course notes and code: including formula derivation, course knowledge summary, homework and model implementation code.
Lesson Link:CS224N 2022 Schedule 👈 You Can get Slides from here.
Lesson Video: Bilibili | YouTube 💡
- Word Vectors
- Efficient Estimation of Word Representations in Vector Space (original word2vec paper)
- Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)
- Word Vectors 2 and Word Window Classification
- GloVe: Global Vectors for Word Representation (original GloVe paper)
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
- Evaluation methods for unsupervised word embeddings
- A Latent Variable Model Approach to PMI-based Word Embeddings
- Linear Algebraic Structure of Word Senses, with Applications to Polysemy
- On the Dimensionality of Word Embedding
- Appendix: VSM,LSA,PMI,N-garm,NNLM,RNNLM,SVD
- Backprop and Neural Networks
- Learning Representations by Backpropagating Errors (seminal Rumelhart et al. backpropagation paper)
- Natural Language Processing (Almost) from Scratch
- Dependency Parsing
- Incrementality in Deterministic Dependency Parsing
- A Fast and Accurate Dependency Parser using Neural Networks
- Dependency Parsing (a book need to buy)
- Globally Normalized Transition-Based Neural Networks
- Universal Stanford Dependencies: A cross-linguistic typology
- UD Standard: Universal Dependencies
- Appendix: t-SNE,Understand Stanford Universal Dependencies,Beam Search,All NLP Task Evaluation
- Recurrent Neural Networks and Language Models
- N-gram Language Models (textbook chapter)
- Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.1 and 10.2)
- Vanishing Gradients, Fancy RNNs, Seq2Seq
- Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)
- On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem)
- Vanishing Gradient Solutions: Multi-Level Hierarchy, Long Short-Term Memory(LSTM)/Residual Neural Network, Rectified Linear Unit (ReLU) Activation Function
- Machine Translation, Attention, Subword Models
- BLEU (original paper)
- Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper)
- Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper)
- Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper)
- Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices)
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Revisiting Character-Based Neural Machine Translation with Capacity and Compression
- Final Projects: Custom and Default; Practical Tips
- Practical Methodology (Deep Learning book chapter)
- Addition:
- Model Pruning (Movement Pruning: Adaptive Sparsity by Fine-Tuning)
- Model Quantization (TRAINING WITH QUANTIZATION NOISE FOR EXTREME MODEL COMPRESSION)
- BabyAI (BabyAI 1.1)
- gSCAN (A Benchmark for Systematic Generalization in rounded Language Understanding)
- Appendix: NeurlPS2020 Pruning
- Transformers
- More about Transformers and Pretraining
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Contextual Word Representations: A Contextual Introduction
- Appendix:
- GLUE:Explanation,SQuAD v1.1-v2,Situations With Adversarial Generations(SWAG)
- Pretrain and Finetune: Pre training refers to a model of pre training or the process of pre training the model; Fine tuning refers to the process of applying the pre trained model to its own data set and adapting the parameters to its own data set.
- Question Answering
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Bidirectional Attention Flow for Machine Comprehension
- Reading Wikipedia to Answer Open-Domain Questions
- Latent Retrieval for Weakly Supervised Open Domain Question Answering
- Dense Passage Retrieval for Open-Domain Question Answering
- Learning Dense Representations of Phrases at Scale
- Appendix: TF-IDF,BM25
- Natural Language Generation
- Reference in Language and Coreference Resolution
- T5 and large language models: The good, the bad, and the ugly
- Integrating knowledge in language models
- ERNIE: Enhanced Language Representation with Informative Entities
- Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
- Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
- Language Models as Knowledge Bases?
- Appendix:
- Baselines:DrAQ,R3,DSQA,Evidence Aggregation,BERTserini,ORQA
- Social & Ethical Considerations in NLP Systems
- Model Analysis and Explanation
- Future of NLP + Deep Learning