This repository contains continuous assignments from the Natural Language Processing course at the University of Tehran. The assignments cover various NLP concepts, from tokenization to knowledge-based QA systems. Below is an overview of each assignment.
- Topics: Tokenization, Custom Tokenizers, BERT, GPT, N-Gram Language Models.
- Key Tasks:
- Implement a custom tokenizer using regular expressions.
- Compare tokenization methods in BERT and GPT.
- Build N-gram models for text completion.
- Topics: Sentiment Analysis, Sarcasm Detection, Word Embeddings (GloVe), Logistic Regression.
- Key Tasks:
- Perform sentiment analysis using Naive Bayes.
- Detect sarcasm using Logistic Regression and GloVe.
- Explore word similarities with skip-gram.
- Topics: SRL, LSTM and GRU Encoders, Encoder-Decoder Models.
- Key Tasks:
- Label semantic roles using SRL.
- Implement LSTM and GRU for SRL.
- Convert SRL tasks into question-answer pairs.
- Topics: Fine-Tuning, LoRA, QLoRA, In-Context Learning (ICL).
- Key Tasks:
- Fine-tune large models like Roberta and LLaMA.
- Use zero-shot and one-shot learning.
- Analyze model performance with LoRA and P-Tuning.
- Topics: Machine Translation, BPE, LSTM, Transformer Models.
- Key Tasks:
- Build an English-to-Farsi translation system using Fairseq.
- Train LSTM and Transformer models with BPE.
- Evaluate using BLEU and COMET scores.
- Topics: Knowledge-based QA, LangChain, Chain of Thought Reasoning.
- Key Tasks:
- Build a multi-step QA system using LangChain.
- Implement relevancy checks and context-based answers.
Each assignment is organized into folders, containing code, data, and reports. Implementations are in Python, utilizing libraries like PyTorch, Fairseq, and Huggingface.