Skip to content

This project aims to develop a system capable of detecting regional languages or dialects from text. We will collect a dataset containing text in various regional languages or dialects and train machine learning models to recognize and classify them.

Notifications You must be signed in to change notification settings

bimarakajati/Nusacular

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nusacular (Nusantara Vernacular)

This project aims to develop a system capable of detecting regional languages or dialects from text. We will collect a dataset containing text in various regional languages or dialects and train machine learning models to recognize and classify them. The outcome will be useful for text processing applications that require an understanding of regional languages and for the preservation of local culture.

📷 Features

Regional Language Detection:

Made using a Naive Bayes classifier and a TF-IDF vectorizer. The model is trained using a dataset from NusaX containing 10,000 sentences in 10 regional languages. The model is able to detect 10 regional languages correctly.  Web Interface

Chatbot:

Made using the Gemini (Google Generative AI) API. The user can chat with the chatbot in the context of regional languages. Web Interface

Regional Language Translation:

Made using the GoogleTrans library. The library is able to translate from Indonesian into Javanese and Sundanese. Web Interface

Text to Speech:

Made using the gTTS (Google Text-to-Speech) library. The library is able to convert text into speech in Javanese and Sundanese languages. Web Interface

Sentiment Analysis:

The feature is based on this notebook. Users can input a sentence, and the model will predict the sentiment of the given sentence. Web Interface

🎌 Supported Regional Languages:

  • Acehnese
  • Balinese
  • Banjarese
  • Buginese
  • Javanese
  • Sundanese
  • Madurese
  • Minangnese
  • Ngajunese
  • Toba Batak

🔮 Future Improvements:

  • Increase model accuracy
  • Add more language
  • Feature to create an account
  • Use an online database

✨ Author:

Name NIM
Bima Rakajati A11.2020.13088
Enrico Zada A11.2020.12972
Rosalia Natal Silalahi A11.2020.13084
Devi Kartika Sari A11.2020.12518

📙 Reference:

About

This project aims to develop a system capable of detecting regional languages or dialects from text. We will collect a dataset containing text in various regional languages or dialects and train machine learning models to recognize and classify them.

Topics

Resources

Stars

Watchers

Forks

Languages