NLP with LLMs: Text Classification using Transformer and Retrieval-Augmented Question Answering

This repository contains two homework notebooks from the NLP Practice course on LLMs course by BigData Team. Each notebook demonstrates different applications of Large Language Models (LLMs) for various NLP tasks, including text classification and question answering using retrieval-augmented generation (RAG).

Notebooks Overview

1. `NLP_LLM_Transformer_baseline.ipynb`

Task: Text Classification using Transformers

Implements an end-to-end NLP workflow using distilbert-base-uncased by default for text classification tasks
Features a custom dataset handling with tokenization and batching using PyTorch
Includes a comprehensive ModelTrainer class for loading datasets, training, validation, metrics calculation (f1_score, precision, recall), and model saving
Uses wandb for logging and experiment tracking
Offers multi-GPU support with data parallelism

This notebook serves as a strong baseline for fine-tuning transformer models for classification tasks and can be easily adapted for other datasets or models.

2. `LLM_Question_Answering_with_RAG.ipynb`

Task: Question Answering using Retrieval-Augmented Generation (RAG)

Demonstrates an experiment using a Large Language Model (LLM) for question answering, with and without retrieval-augmented generation (RAG)
Implements both plain LLM chain responses and RAG-based methods using the google/flan-t5-large model
Utilizes FAISS vector-based retrieval for document support with embeddings generated by sentence-transformers/all-MiniLM-L6-v2
Incorporates different configurations such as plain LLM responses, RAG with source tracking (RetrievalQA), and RAG with detailed source chains (RetrievalQAWithSourcesChain)
Uses external data (data/cats_content.txt) to support enhanced question-answering performance

This notebook provides a comprehensive exploration of how RAG can be used to improve the accuracy and reliability of LLM-based question answering.

The templates and resources were taken from the original course repository: big-data-team/nlp-course

Certificate of completion of the course with honors: Daniil Bogdanov

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
LLM_Question_Answering_with_RAG.ipynb		LLM_Question_Answering_with_RAG.ipynb
NLP_LLM_Transformer_baseline.ipynb		NLP_LLM_Transformer_baseline.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP with LLMs: Text Classification using Transformer and Retrieval-Augmented Question Answering

Notebooks Overview

1. `NLP_LLM_Transformer_baseline.ipynb`

2. `LLM_Question_Answering_with_RAG.ipynb`

About

Languages

exsandebest/bdt-nlp-course

Folders and files

Latest commit

History

Repository files navigation

NLP with LLMs: Text Classification using Transformer and Retrieval-Augmented Question Answering

Notebooks Overview

1. NLP_LLM_Transformer_baseline.ipynb

2. LLM_Question_Answering_with_RAG.ipynb

About

Topics

Resources

Stars

Watchers

Forks

Languages

1. `NLP_LLM_Transformer_baseline.ipynb`

2. `LLM_Question_Answering_with_RAG.ipynb`