The goal of this project is to record a voice signal to predict a path based on 2 points (origin, destination)
The project is composed 3 major components :
- A Speech Recognizer to record a voice signal and translate it into a string
- A Natural Language Processing component to discriminate french sentences and spot the origin point et the destination point
- A Path Finder to predict a path based on locations
For now, all of these components are called and chained together in the main.py
script at the root project folder.
In this component, we will mainly use SpaCy, which is a library specialized in NLP
data_build
folder gathers all scripts needed to generate some data with web scraping.data
folder gathers pieces of data. The most important file in here isfr-annotated.json
since it contains all of our generated sentences. Those sentences have been labelized with custom entities in order to train a custom Named Entity Recognizer pipe.land_detector_from_to_model
is our saved trained model. It contains 2 pipes :- detect_lang to get a prediction on the language being used (based on spacy_fastlang)
- from_to_location to retrieve the origin and destination locations
The sentences used to feed the NER pipe were labelized using this NER Annotator for Spacy tool.
In this component, we mainly use Networkxx to build:
- a
graph
from train stations we have in our dataset and also to create edges
which are distance between two train stations. After create graph and edges, we also addedweight
to our edges which is the duration of trip between two train stations. The weight facilitate the task to Networkxx to find a short path between two train stations in our dataset.
To find a short path between two train stations, we use Networkxx function shortest_path which is a function implemented based on algorithm of Dijkstra by default.