Part-Of-Speech (POS) tagging is the process of assigning a part-of-speech tag (Noun, Verb, Adjective, etc.) to each word in an input text. In other words, the main objective is to identify which grammatical category do each word in given test belong to. POS Tagging is difficult because some words can represent more than one part of speech at different times, i.e. they are ambiguous in nature. Consider the following examples:
The whole team played well. adverb
You are doing well for yourself. adjective
Well, this is a lot of work. interjection
The well is dry. noun
Tears were beginning to well in her eyes. verb
For all these statements, the same word well
assumes different parts of speech. Hence, we use Hidden Markov Model which is a probabilistic model along with Viterbi Algorithm to assign parts of speech tags.
Machine Learning, Natural Language Processing, Dynamic Programming
Accuracy of the POS Tagging Model using Viterbi algorithm is 0.9531
. The accuracy of the model is determined by comparing it with true labels in /data/test.pos
.
Click here to get detailed description for all Parts-of-Speech Tags.
I have one apple and three oranges
Who is the president of USA?
India is my country of residence
For Documentation, click here or refer /documentation/README.md
👨💻POS-Tagging
┣ 📂assets // Contains all the reference gifs, images
┣ 📂components // Header Files
┃ ┣ 📄data.cpp
┃ ┣ 📄data.hpp
┃ ┣ 📄tokenize.cpp
┃ ┣ 📄tokenize.hpp
┃ ┣ 📄viterbi.cpp
┃ ┣ 📄viterbi.hpp
┃ ┣ 📄results.cpp
┃ ┣ 📄results.hpp
┣ 📂data // Dataset
┃ ┣ 📄dataset.pos
┃ ┣ 📄sample.pos
┃ ┣ 📄test.pos
┣ 📂documentation // Notes & Documentation for project
┃ ┣ 📄notes.pdf
┃ ┣ 📄README.md
┣ 📂Miscellaneous // .ipynb implementation
┃ ┣ 📄POS-Tagging-C2_W2_Assignment
┣ 📄main.cpp
┣ 📄README.md
To download and use this code, the minimum requirements are:
- g++ : The GNU C++ compiler, available as part of the GNU Compiler Collection (GCC) or Any C++ Compiler
- Windows 7 or later (64-bit), Any modern Linux distribution (e.g., Ubuntu, Debian, Fedora, Arch Linux)
- Microsoft VS Code or any other IDE
Clone the project by typing the following command in your Terminal/CommandPrompt
git clone https://github.com/PritK99/POS-Tagging.git
Navigate to the MazeBlaze-v2.1 folder
cd POS-Tagging
Once the requirements are satisfied, you can easily build and run the project on your machine. Use the following commands to
- Build the code:
g++ .\main.cpp .\components\data.cpp .\components\tokenize.cpp .\components\viterbi.cpp .\components\results.cpp
- Run the executable
./a.out (For Linux)
or
./a (For Windows)
- Natural Language Processing with Probabilistic Models by DeepLearning.AI
- YouTube video by Serrano.Academy explaining Hidden Markov Model and Viterbi Algorithm