Asterisco Decoder

Problem

Brazilian twitter users started hiding certain words from their tweets to obscure their messages, usually to vent or rant in a semi-public manner. Ex:

* like ** **** the ***.

For simplicity, we will call these "asterisk phrases".

Our Solution

Train a machine learning based language model (LM) to predict the most likely words that fill the gaps. Ex:

I like to feel the sun.
I like to race the car.
...

OBS.: For the sake of understanding the phrases are in English, but the final goal is to predict phrases that are in Brazilian Portuguese.

How it Works

Using the reddit API, we collect a large sample of phrases written in common "internet language". Then, we use these to train an LM on a recurrent neural network (RNN) using the Keras interface. With the trained model, we are able to predict the most likely sentence for any given "asterisk phrase".

Results

We can't share any results yet because we still haven't finished the LM. We will update when the project is finished.

TODO:

How to Use

Training

TODO:

Predicting

TODO:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
texts		texts
trainedModels		trainedModels
.gitignore		.gitignore
LMGen.py		LMGen.py
LMTrain.py		LMTrain.py
README.md		README.md
phrasesRetriver.py		phrasesRetriver.py
textCleaner.py		textCleaner.py
textRefactor.py		textRefactor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asterisco Decoder

Problem

Our Solution

How it Works

Results

How to Use

Training

Predicting

About

Releases

Packages

Contributors 2

Languages

PauloGoncalvesLima/AsteriscoDecoder

Folders and files

Latest commit

History

Repository files navigation

Asterisco Decoder

Problem

Our Solution

How it Works

Results

How to Use

Training

Predicting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages