Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUIS.ai tokenization #241

Closed
bkmeneguello opened this issue Mar 28, 2017 · 1 comment · Fixed by #251
Closed

LUIS.ai tokenization #241

bkmeneguello opened this issue Mar 28, 2017 · 1 comment · Fixed by #251
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@bkmeneguello
Copy link

rasa NLU version (Don't know? https://goo.gl/g9QQg2):
0.8.0a3

Used backend (mitie, spacy_sklearn, ...) & plattform (windows, osx, ...):
mitie_sklearn
Linux

Issue:
Although the warning message when LUIS fformat is used, the tokenization is misbehaving in a sensitive way. At luis data format, the startPos and endPos are character indices, but "load_luis_data" treats them as token indices. I think the "load_luis_data" could take its "start" and "end" attributes directly from raw text and then it even shouldn't need to be tokenized.

text = s.get("text")
intent = s.get("intent")
entities = []
for e in s.get("entities") or []:
    start, end = e["startPos"], e["endPos"] + 1
    val = text[start:end]
    entities.append({"entity": e["entity"], "value": val, "start": start, "end": end})
@amn41
Copy link
Contributor

amn41 commented Mar 29, 2017

thanks for creating this issue @bkmeneguello !

This is a really good catch: it looks like LUIS have changed the format of their json exports (for the better!)

They used to provide token indices, which were somewhat ambiguous. We will have to update rasa NLU accordingly

@amn41 amn41 added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 29, 2017
@tmbo tmbo closed this as completed in #251 Apr 11, 2017
taytzehao pushed a commit to taytzehao/rasa that referenced this issue Jul 14, 2023
vcidst pushed a commit that referenced this issue Feb 22, 2024
* instrument policy._prediction, add unit test

* fix fast cli test by moving imports, add changelog entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants