Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rasa data validate check for duplicated intents, forms, responses and slots when using domains split between multiple files #10444

Merged
merged 17 commits into from
Dec 14, 2021

Conversation

alwx
Copy link
Contributor

@alwx alwx commented Dec 2, 2021

Proposed changes:

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@alwx alwx changed the title rasa data validate should fail if conflicting slots/responses/forms/... are created in different domain files Make rasa data validate check for duplicated intents, forms, responses and slots when using domains split between multiple files Dec 2, 2021
@alwx alwx requested review from joejuzl and ancalita December 2, 2021 14:22
@alwx alwx marked this pull request as ready for review December 2, 2021 14:23
@alwx alwx requested a review from a team as a code owner December 2, 2021 14:23
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍🏼
I left mainly some documentation related suggestions.
I was also wondering if we want to raise InvalidDomain when duplicates are not None in from_dict? What happens now if you train with a Domain with conflicting slots for example?
This could be purely out of scope for the issue which refers only to rasa data validate, but it would be just an addition of a few lines of code 🤔

data/test_domains/test_domain_with_duplicates/domain2.yml Outdated Show resolved Hide resolved
rasa/shared/core/domain.py Outdated Show resolved Hide resolved
rasa/shared/core/domain.py Show resolved Hide resolved
tests/shared/core/test_domain.py Show resolved Hide resolved
rasa/shared/core/domain.py Outdated Show resolved Hide resolved
@alwx alwx requested a review from ancalita December 3, 2021 17:27
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🎉
I'd still explore with Ty and the rest of the squad looking at the other Domain.merge related issues whether a holistic approach is required for this entire umbrella + if to raise InvalidDomain whenever duplicates are found?

# this code merges lists of dicts of intents
dict1 = {list(i.keys())[0]: i for i in combined[KEY_INTENTS]}
dict2 = {list(i.keys())[0]: i for i in domain_dict[KEY_INTENTS]}
duplicates[KEY_INTENTS] = extract_duplicates(dict1, dict2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider just raising an error/warning here so that we don't need to store the duplicates on the class?

Copy link
Contributor Author

@alwx alwx Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to show this information only when running rasa data validate, as it's written in the description of this issue. We can update it to show the warning every time the domain is getting merged but in this case I don't see the point in doing any changes to rasa data validate and even link this issue to the validator which means the whole task will be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @TyDunn can take a look at it and say which approach we prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the approach that @joejuzl describe happen when you run rasa data validate too?

Copy link
Contributor Author

@alwx alwx Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TyDunn If we do it the way @joejuzl proposes then the warnings/exceptions will be raised every time the domain is getting merged (e.g. when just running rasa train or literally any other command that uses domain), not only when doing rasa data validate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context we do already have quite a lot of domain validation that happens just from loading it e.g. _check_domain_sanity in the __init__ looks for duplicates etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwx I believe Joe is out for the rest of the year, and I am not going to be able to gain enough context atm. Can you make a judgement call here yourself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TyDunn I think it makes sense to keep this one as it as and maybe create a new issue because what Joe was talking about makes the scope much bigger

@alwx alwx requested a review from joejuzl December 7, 2021 15:51
@alwx alwx requested a review from TyDunn December 13, 2021 17:53
@alwx
Copy link
Contributor Author

alwx commented Dec 13, 2021

@TyDunn please approve to merge this one.

@alwx alwx enabled auto-merge December 14, 2021 08:37
@alwx alwx merged commit 0c00ed2 into main Dec 14, 2021
@alwx alwx deleted the domain-duplicates branch December 14, 2021 09:52
@carlad carlad mentioned this pull request Jan 4, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rasa data validate should fail if conflicting slots are created in different domain files
4 participants