Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including RulePolicy leads to infinite max_history being used to deduplicate training trackers #8295

Closed
1 task
indam23 opened this issue Mar 25, 2021 · 21 comments · Fixed by #8851
Closed
1 task
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@indam23
Copy link
Contributor

indam23 commented Mar 25, 2021

Rasa version: 2.4.1

Issue:

When rules/RulePolicy are included in dialogue management, in order to check for conflicts with stories, effectively infinite max_history is used. This greatly increases the time it takes to load and process dialogue data.

Command or request that led to error:
edit train, not validate

rasa train core

Definition of Done

  • set max_history to lowest possible during rasa train core based on the training data
@indam23 indam23 added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 25, 2021
@Ghostvv
Copy link
Contributor

Ghostvv commented Mar 25, 2021

one can read rules first, calculate the length of the longest rule, then use it as rulepolicy’s effective max_history which can be compared with max_history of other policies to calculate unique_last_num_states limit for deduplication of generated trackers

@indam23 indam23 assigned Ghostvv and unassigned Ghostvv Mar 25, 2021
@Ghostvv Ghostvv changed the title Including RulePolicy leads to infinite max_history being used to determine story conflicts Including RulePolicy leads to infinite max_history being used to deduplicate training trackers Mar 25, 2021
@TyDunn TyDunn added the area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) label Apr 22, 2021
@wochinge wochinge added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Apr 23, 2021
@TyDunn TyDunn assigned wochinge and unassigned Imod7 May 25, 2021
@wochinge
Copy link
Contributor

@melindaloubser1 I tried to reproduce this on Sara but the validation is super quick. Note that the max_history currently has to be set manually (I tried with it set and with the default / none set) and we don't distinguish for story vs. rule validation.

I also don't see how max history should be a big deal in my opinion as rules are somewhat enforced to be very short anyway. cc @JEM-Mosig

@wochinge
Copy link
Contributor

And from what I can see from the code the included policies shouldn't matter for rasa data validate stories at all

@indam23
Copy link
Contributor Author

indam23 commented May 31, 2021

No you're right; I'm thinking I miswrote this and it applies only to loading during rasa train - would that make more sense?

@wochinge
Copy link
Contributor

That makes more sense I think 🤔 I think what you're then suggesting to set max_history here in an automated fashion then? What do you think of that @samsucik ?

@samsucik
Copy link
Contributor

samsucik commented Jun 1, 2021

@wochinge I explored the relevant code parts and I agree with automatically setting max_history to a value as low as possible. However, we can't compute this value before training the policy, right? Because a rule can lead to a long sequence of states even if it has just a single user intent (the intent can be followed by many actions), and so we need to see the actual rule data to determine the minimal max_history.

@wochinge
Copy link
Contributor

wochinge commented Jun 2, 2021

Can't we loop over the rule_trackers which we receive for training and count the numbers of actions for each rule tracker to determine the longest one and use that in turn to set max_history?

@samsucik
Copy link
Contributor

samsucik commented Jun 2, 2021

Yeah, fair, that's what would need to happen... Like this, right?

  1. initialise all policies
  2. load the data and determine max_history
  3. update rule policy's max_history

What about batches, though? I think we'd like to use the same number of turns for tracker deduplication across batches. And perhaps we can say that rule data won't come in batches, but story data will... Maybe we'd need an "initial" pass through the data to determine things like this unique_last_num_states value -- kind of similar to an initial pass in the NLU pipeline for the purposes of constructing the tokeniser's vocab? 🤔 I guess I'm less worried about it for now, but more in the context of Rasa 3.0 🙂

@wochinge
Copy link
Contributor

wochinge commented Jun 2, 2021

load the data and determine max_history

We can even do this during train within the RulePolicy, no? In my opinion we shouldn't leak such a RulePolicy-specific thing to code outside of the RulePolicy.

I don't understand the part about the batches. The RulePolicy has no concept of batches or whatsoever. Do you mean training in chunks?

@samsucik
Copy link
Contributor

samsucik commented Jun 2, 2021

We can even do this during train within the RulePolicy, no?

I think this doesn't work because we use the determined max_history value in loading the training stories for other policies, or not? 🤔

I don't understand the part about the batches. The RulePolicy has no concept of batches or whatsoever. Do you mean training in chunks?

Fair, we call these "chunks", not "batches". I know that RulePolicy currently doesn't do anything with chunks, but I was worried about this in terms of future versions of Rasa where, if we have some general way of loading data from disk and providing it to different policies, I don't see a good reason for saying "any policy can consume data in chunks, but RulePolicy can't". But maybe I'm misunderstanding the extent to which Rasa 3.0 will be flexible regarding this. If there's still the guarantee that all rule data will be loaded at once (and before any story data), then I'm not worried at all.

@wochinge
Copy link
Contributor

wochinge commented Jun 2, 2021

I think this doesn't work because we use the determined max_history value in loading the training stories for other policies, or not? 🤔

We do but that they can just use whatever max_history they want to use, right? The policies are not dependent on each other.

I know that RulePolicy currently doesn't do anything with chunks, but I was worried about this in terms of future versions of Rasa where

I think you made an excellent point here.

@samsucik
Copy link
Contributor

samsucik commented Jun 3, 2021

The policies are not dependent on each other.

Hm, fair point, I didn't think about it this way. I worried about it because right now we load & de-duplicate all the Core data at once, for all policies, and so all policies receive trackers that have been de-duplicated using the same unique_last_num_states value. If each policy receives data de-duplicated based on that concrete policy's max_history, then what you're saying sounds perfect and my worry vanishes 🙂

@wochinge
Copy link
Contributor

wochinge commented Jun 3, 2021

If each policy receives data de-duplicated based on that concrete policy's max_history, then what you're saying sounds perfect and my worry vanishes 🙂

I'm not sure that I understand this. Are you saying that all policies must have the same max_history because things would go wrong otherwise? I was under the impression that every policy gets fully trackers (deduplicated and augmented but nothing else) and than the featurizer in the policy handles featurization of the tracker based on the max_history param which we pass into it.

@samsucik
Copy link
Contributor

samsucik commented Jun 4, 2021

I'm getting lost in my own thoughts and am making mistakes along the way, sorry. I'll try to backtrack and clarify:

  • I'm not against the current approach where we take the maximum max_history, use it for de-duplication of all trackers, and pass these trackers to all policies.
  • The same way, I'm not against each policy using its own max_history (it can even do its own de-duplication of trackers, whatever)

I'll go back to:

The policies are not dependent on each other.

I agree with this. I guess what I had in mind is the fact that we load and de-duplicate all trackers (rules, stories) together. If we can process rules vs stories a bit separately, then I can totally imagine a scenario where we:

  • let the non-rule policies do their things as they do today (determine the maximum max_history value > load all stories > de-duplicate using that value)
  • do a separate thing for RulePolicy where we load the rules > determine max_history from them > de-duplicate the rule trackers

I hope I made things a bit clearer now, sorry for the convoluted thoughts in my earlier replies...

@wochinge
Copy link
Contributor

wochinge commented Jun 4, 2021

Yes, you did! Rule trackers are already deduplicated separately.
So your 2nd approach would be completely feasible.

I'll do some debugging later to find out where we spend so much time exactly (during deduplication or featurization) and then maybe a PR so we have something more tangible to discuss.

@samsucik
Copy link
Contributor

samsucik commented Jun 4, 2021

Oh, no, I completely overlooked the separation in the code. I learned something new today...

@wochinge
Copy link
Contributor

wochinge commented Jun 4, 2021

Investigation results on rasa-demo:

  • the contradiction checks takes the huge chunk of time (1.2 seconds for the actual training, 13 seconds for the contradiction check)
  • Setting a mix history for the RulePolicy doesn't change this
  • Configuring a max_history for the rule generation leads to contradictions 🤔
  • I also profiled the RulePolicy training (see results here - viewable with snakeviz). the results stress that the contradiction check takes the longest amount of time. In my opinion this clearly shows that it's not related to max_history but rather to the amount of predictions. The biggest part seems to be caused by the json.loads calls. If we actually wanna do something then we need to revisit the mechanics of the policy itself (e.g. how data is serialized) and how we can speed up predictions. This should in my opinion be done in a separate issue.

A quick fix for this issue would to use lru_cache (with max_size > 1000) on _rule_key_to_state. By doing this I lowered the contradiction check time from 13 seconds to ~4 seconds. The issue is that it rather addresses symptoms than causes.

@samsucik
Copy link
Contributor

samsucik commented Jun 4, 2021

@wochinge wow, these are very interesting findings! Btw have you uploaded the right profiling results? In these ones, I don't see json.loads or any other expected bits anywhere...

I agree with creating a separate issue for the root cause.

As for the quick fix using caching, do you think it's safe with regards to Rasa's usage of RAM? If the usage stays below the kind of magnitudes we require for actually training the policies, then it sounds good to me and it would definitely make people's lives (mine included) easier -- way before we get to address the actual cause.

Aaand I'm especially curious about the details around this:

Configuring a max_history for the rule generation leads to contradictions 🤔

@wochinge
Copy link
Contributor

wochinge commented Jun 7, 2021

Sorry, it was the wrong link 🙈 Updated it with the correct one 👍🏻

As for the quick fix using caching, do you think it's safe with regards to Rasa's usage of RAM?

We can set a max_size for the cache 👍🏻 It's just a couple jsons and their dict representations so I'd expect it be tiny.

So next steps:

  • create PR with cache (I include details about memory usage)
  • separate issue for overall performance?

didn't really dive deeper into it tbh. I used the max_history which was specific in the config of the AugmentedMemoizationPolicy for Sara. I guess 5 might be too short for some rules?

@samsucik
Copy link
Contributor

samsucik commented Jun 7, 2021

I used the max_history which was specific in the config of the AugmentedMemoizationPolicy for Sara. I guess 5 might be too short for some rules?

Ah, okay, makes sense. I'm also getting such errors when I use only RulePolicy and hard-code its max_history to anything smaller than 4 🙂

@wochinge
Copy link
Contributor

wochinge commented Jun 8, 2021

Posting the details for the memory usage here as I haven't created the PR yet and need to put the results somewhere 😄

  • With Sara I've a 248 cached items with 1 424 684 hits after the training.
  • The cache stores a mapping of call params and output. 248 items have a total size of 862 920 (88 992 + 773 928) bytes. With a max_size of 1000 items for the cache we have max memory usage of 3 479 516 bytes (3.32 MiB) which seems okay to me 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants