Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy 3.0 support, is Rasa 2.0 ready? #6906

Closed
koaning opened this issue Oct 5, 2020 · 1 comment · Fixed by #7869
Closed

spaCy 3.0 support, is Rasa 2.0 ready? #6906

koaning opened this issue Oct 5, 2020 · 1 comment · Fixed by #7869
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components area:rasa-oss/ml 👁 All issues related to machine learning type:dependencies Pull requests that update a dependency file type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.

Comments

@koaning
Copy link
Contributor

koaning commented Oct 5, 2020

spaCy 3.0 will be out, probably this month. It makes sense to check compatibility. The release notes on pypi suggests there's already dev versions.

@koaning koaning added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework and removed type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Oct 5, 2020
@tabergma tabergma added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Oct 5, 2020
@tabergma tabergma removed their assignment Oct 6, 2020
@koaning
Copy link
Contributor Author

koaning commented Nov 18, 2020

So it seems like spaCy will have transformer support for a lot of Non-English languages. The thing though is that the vectors are retrieved differently. This is the API that is currently in spaCy 3.0 nightly;

token_emb, sent_emb = nlp("i duck to hide")._.trf_data.tensors
token_emb.shape, sent_emb.shape

For transformer based models you can't fetch the tensors directly via vector anymore but via ._.trf_data.

Also note that the token_emb do not match the shape of the tokens that spaCy finds. More likely these tokens correspond to the bytepair-something tokens that the internal transformer is using.

@alwx alwx added area:rasa-oss/ml 👁 All issues related to machine learning area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components type:dependencies Pull requests that update a dependency file type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style. and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR labels Jan 29, 2021
@koaning koaning mentioned this issue Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components area:rasa-oss/ml 👁 All issues related to machine learning type:dependencies Pull requests that update a dependency file type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants