All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning starting with version 0.7.0.
[Unreleased] - master
Note
This version is not yet released and is under active development.
- ngram character featurizer (allows better handling of out-of-vocab words)
- replaced pre-wired backends with more flexible pipeline definitions
- return top 10 intents with sklearn classifier #199
- python type annotations for nearly all public functions
- support for arbitrary spacy model names
- duckling components to provide normalized output for structured entities
- Conditional random field entity extraction (Markov model for entity tagging, better named entity recognition with low and medium data and similarly well at big data level)
- allow naming of trained models instead of generated model names
- dynamic check of requirements for the different components & error messages on missing dependencies
unified tokenizers, classifiers and feature extractors to implement common component interface
src
directory renamed torasa_nlu
when loading data in a foreign format (api.ai, luis, wit) the data gets properly split into intent & entity examples
- Configuration:
- added
max_number_of_ngrams
- added
pipeline
- added
luis_data_tokenizer
- removed
backend
- added
- parser output format changed
from
{"intent": "greeting", "confidence": 0.9, "entities": []}
to
{"intent": {"name": "greeting", "confidence": 0.9}, "entities": []}
camel cased MITIE classes (e.g.
MITIETokenizer
→MitieTokenizer
)model metadata changed, see migration guide
updated to spacy 1.7 (breaks existing spacy models!)
introduced compatibility with both Python 2 and 3
- properly parse
str
additionally tounicode
#210 - support entity only training #181
- resolved conflicts between metadata and configuration values #219
- removed tokenization when reading Luis.ai data (they changed their format) #241
- fixed regression in mitie entity extraction on special characters
- fixed spacy fine tuning and entity recognition on passed language instance
- python documentation about calling rasa NLU from python
- mitie tokenization value generation #207, thanks @cristinacaputo
- changed log file extension from
.json
to.log
, since the contained text is not proper json
This is a major version update. Please also have a look at the Migration Guide.
- Changelog ;)
- option to use multi-threading during classifier training
- entity synonym support
- proper temporary file creation during tests
- mitie_sklearn backend using mitie tokenization and sklearn classification
- option to fine-tune spacy NER models
- multithreading support of build in REST server (e.g. using gunicorn)
- multitenancy implementation to allow loading multiple models which share the same backend
- error propagation on failed vector model loading (spacy)
- escaping of special characters during mitie tokenization