Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up yaml parsing #7953

Closed
3 tasks
wochinge opened this issue Feb 15, 2021 · 3 comments · Fixed by #7968
Closed
3 tasks

speed up yaml parsing #7953

wochinge opened this issue Feb 15, 2021 · 3 comments · Fixed by #7968
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.

Comments

@wochinge
Copy link
Contributor

Problem
When parsing yaml files we currently look for environment variables to interpolate in every single line. While this is a useful feature for configuration files (endpoints.yml, config.yml, credentials.yml) and e.g. also used for Rasa X, this makes the training data reading very slow as stated here.

Note that removing this would be a breaking change as some users might rely on it. We need some sort of smooth transition for it.

Note that I've not investigated how much this line also contributes to the slowness

Ideas for solutions:

  • disable environment variable interpolation for stories / nlu files by default
  • add a switch to disable it

Definition of Done

  • Environment variable interpolation no longer is a blocking factor when reading big training data files
  • investigate if this line is still needed / what it's impact is on training data loading time
  • document usage / migration path for environment variable interpolation
@wochinge wochinge added type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style. area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) labels Feb 15, 2021
@alwx alwx mentioned this issue Feb 16, 2021
4 tasks
@alwx
Copy link
Contributor

alwx commented Feb 17, 2021

@wochinge I have a much simpler idea — what if we just check if the file contains ${...} before parsing it and use a faster parser (= the one that's without environment variable interpolation) if there is nothing? In this case, the migration process for end users won't be needed at all.

@wochinge
Copy link
Contributor Author

That's also an option! Scanning a big file with open(...) as file should be quick, no?

@alwx
Copy link
Contributor

alwx commented Feb 17, 2021

yes! and we need to do this anyway — the YAML needs to be parsed no matter if it contains env variables or not

@TyDunn TyDunn added the feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers label Feb 17, 2021
@alwx alwx closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants