A chatbot written in Python powered by an encoder-decoder/seq2seq model. Includes functionality for parsing corpus data, training the model on inputted question/answer data, and text-to-speech and speech-to-text conversions for actually speaking with the bot.
python 3.7+ with modules tensorflow, keras, speech_recognition, pyttsx3, yaml
TTS dependent on installing sapi5 (Windows), nsss (MacOS X), or espeak (Linux).
STT dependent on installing pyaudio (all OSes).
Call python main.py
with parameters, listed in the table below.
Parameter | Default | Description |
---|---|---|
-s, --tts | False | Whether to include TTS and STT functionality. |
-qa, --qa_save | False | Whether to save parser-generated question/answer data. |
-g, --google | False | Whether to parse google corpus instead of parse simple corpus. |
-c, --corpus | N/A | Path to the corpus for the parser to parse. |
-m, --model_save | False | Whether to save processor-generated models. |
-l, --load | False | Whether to load model files into the bot. If true, ignores parser args. If false, ignores below args. |
-e, --encoder | "encoder.h5" | Path to encoder file to load. |
-d, --decoder | "decoder.h5" | Path to decoder file to load. |
-t, --tokenizer | "tokenizer.pickle" | Path to tokenizer file to load. |
Note: "corpus" argument mandatory if "load" is False.
python main.py -l -s
Command to load encoder file "encoder.h5", decoder file "decoder.h5", and tokenizer file "tokenizer.pickle" into the bot and to use TTS/SST to converse with the bot. In this command, nothing is parsed.
python main.py -c google_corpus.json -g -qa -m
Command to parse the file "google_corpus.json" using the google parsing method, to store the questions and answers to files, and to save the models generated by the processor.