3.0 architecture revamp/9340/e2e lookup #9405

ka-bu · 2021-08-20T08:33:19Z

Proposed changes:

closes [Re-implement featurization] Move to integration branch, adapt Policies, and add checks #9340
build the lookup table:
- lookup table implementation and tests (see rasa.core.featurizers.precomputation)
- components for prep/build of lookup table (start and end of e2e featurization pipeline) (see rasa.core.featurizers.precomputation)
using the lookup table:
- adaption of SingleStateFeaturizer and TrackerFeaturizer (use the lookup table)
- adaption of unit tests for SingleStateFeaturizer and TrackerFeaturizer - and refactoring of unit tests for SingleStateFeaturizer
  - only the tests for SingleStateFeaturizer really make use of the lookup table
  - tests for TrackerFeaturizer use lookup table set to None which corresponds to usage of RegexInterpreter() -- which is exactly corresponding to the tests as they were before (i.e. tracker featurizer never included tests with a different interpreter)
- adaption of TEDPolicy (use the lookup table) and its unit tests
  - tests for TEDPolicy use lookup table set to None which corresponds to usage of RegexInterpreter() -- which is exactly corresponding to the tests as they were before
not the lookup table 😄
- generalized functionality related to Features and added that to Features (see rasa.shared.nlu.training_data.features)
  - note: this was useful because it made the adaption of SingleStateFeaturizer and the implementation of the lookup table much cleaner than with existing functionalities

Ignore:

as always, ignore the "new" modules with leading underscore

Not included (yet):

caching (i.e. we don't want to have to run BERT again and again if only the policies change but not the featurization pipeline)
tests computing and using a lookup table wrt. full bot example (Will be done with regression tests later on. We could add more specific tests on the "end to end" usage but that requires featurizers etc.)

Status (please check what you already did):

added some tests for the functionality
~~updated the documentation~~
~~updated the changelog (please check changelog for instructions)~~
reformat files using black (please check Readme for instructions)

wochinge

Wow, what a PR 💯 Thanks for taking this on 🚀 I've to admit that I skipped the test_single_state_featurizers and test_features for now. In my opinion somebody from Research should have a look at these and the related changes in the production code as they are the code owners for this.

rasa/core/policies/ted_policy.py

wochinge · 2021-08-24T12:34:32Z

rasa/core/policies/ted_policy.py

        self,
        tracker: DialogueStateTracker,
        domain: Domain,
-        interpreter: NaturalLanguageInterpreter,
+        precomputations: Optional[CoreFeaturizationPrecomputations] = None,


optional: How about making it a little less vague?

Suggested change

precomputations: Optional[CoreFeaturizationPrecomputations] = None,

precomputed_features: Optional[CoreFeaturizationPrecomputations] = None,

As mentioned before, the precomputations must be messages and not just features which is why I went with the more generic name

how about featurized_messages then? 🤔

tokenized_and_featurized_message ... :D Long term there could also be classifications in there from more complex recipes? 🤔 And it is not like all features end up in that dictionary because the SingleStateFeaturizer still creates all the multi-hot like features and creates sentence features from sequence features and so on

okay, makes sense 👍🏻 Should we still stress that this is specific for end-to-end? We can still rename this once it's no longer specific to end-to-end

The term "end to end" is used in lots of places. Here it was used to point out that the policies work with text directly, right? But we could have e.g. data with intent names and then someone decides to use bert to convert those intent names to dense features. .... Does that make sense or am I missing something here?

That's true. But in that case we can simply rename this component, no?

How about MessageContainerForCoreFeaturization ?

rasa/core/policies/ted_policy.py

rasa/core/featurizers/tracker_featurizers.py

tests/core/featurizers/test_precomputation.py

Co-authored-by: Tobias Wochinger <[email protected]>

JEM-Mosig

Some minor comments. The code changed while I was reviewing, so something might be outdated.

rasa/core/featurizers/precomputation.py

tests/shared/nlu/training_data/test_features.py

rasa/shared/nlu/training_data/features.py

joejuzl

Reviewed rasa/core/featurizers/precomputation.py and the policy changes.

Really great work! Was a joy to review 💯

Comments are mainly around docstrings and naming.

rasa/core/featurizers/precomputation.py

joejuzl · 2021-09-01T09:39:25Z

rasa/core/featurizers/precomputation.py

+        # extract the message
+        existing_message = self._table[key_attribute].get(key_value)
+        if existing_message is not None:
+            if hash(existing_message) != hash(message_with_one_key_attribute):


Out of interest, when could this occur?

Shouldn't happen at the moment - but could happen if we e.g. some day in the far future merge training data loaded from disk that has been featurized with different featurizers (though I'm not sure the hash really includes knowledge about the actual features there 🤔 )

rasa/core/featurizers/precomputation.py

joejuzl · 2021-09-01T09:43:50Z

rasa/core/featurizers/precomputation.py

+
+        Args:
+          sub_state: substate for which we want to extract the relevent features
+          attributes: if not `None`, this specifies the list of the attributes of the


What are the possible values of these attributes? The same as the key attributes?

This can be arbitrary attributes. If the NLU pipeline has added attributes and features for these attributes, then it is possible that we want to list attributes here which are not key attributes. (Should not happen at the moment - we add new attributes (TOKENS) but the attribute field in the features should be "TEXT" 🤔 ... Good that you asked! I find this a bit confusing ... Guess we'll definitely add that to the documentation on the featurizers that we've just started here )

oh, no actually it is not that confusing because tokens are always stored as TOKEN_NAMES[] and the features then just remember :) .... well, at least it is not confusing as long as we only have a single tokenizer :D

rasa/core/featurizers/precomputation.py

joejuzl · 2021-09-01T09:55:26Z

rasa/core/featurizers/precomputation.py

+        )
+        container.derive_messages_from_events_and_add(events=all_events)
+
+        # Reminder: in case of complex recipes that train CountVectorizers, we'll have


Should this not be fixed in the CountVectorizer?

True - Shall I remove that reminder, or shall we keep it until that is fixed there?

Do you know if there is an issue open for it yet? If not we should create one.

I think @aeshky is working on migrating the count vectorizer - we just fix that there (i.e. let the count vectorizer not break because it hasn't been trained but instead just not add any features to the message, like in other components), no?

Yes - sounds good.

rasa/core/featurizers/precomputation.py

Co-authored-by: Joe Juzl <[email protected]>

joejuzl

Looks good to me! once conflicts are dealt with.

…mp/9340/e2e-lookup

…https://github.com/RasaHQ/rasa into 3.0-architecture-revamp/9330/NLUTrainingDataProvider * '3.0-architecture-revamp/9330/NLUTrainingDataProvider' of https://github.com/RasaHQ/rasa: 3.0 architecture revamp/9340/e2e lookup (#9405) Fixed automatic importing of mitie (#9482) narrow scopes a bit more try with narrower scope to release memory Update branch despite failing check runs (#9541) Use concurrency group to cancel workflows (#9540) Fix team name (#9544)

Co-authored-by: Tobias Wochinger <[email protected]> Co-authored-by: Joe Juzl <[email protected]>

ka-bu added 2 commits August 20, 2021 10:15

migrate e2e lookup

5cd8d57

minor (rm copy)

42a4c1d

ka-bu mentioned this pull request Aug 20, 2021

Architecture prototype/e2e featurization contd #9350

Closed

4 tasks

ka-bu added 5 commits August 23, 2021 10:26

minor (lint)

50dab1d

fix (ignore some tests due to featurizer mismatch)

c014532

minor (revert responseselectorbot config)

f78eeeb

add (tests for new components); fix (check in prepare for train, mypy)

5f43940

minor (add some more features check, types, add note,...)

a0fe7a2

ka-bu requested review from JEM-Mosig and wochinge August 23, 2021 12:53

ka-bu marked this pull request as ready for review August 23, 2021 12:53

ka-bu requested a review from a team as a code owner August 23, 2021 12:53

ka-bu requested review from a team and removed request for a team August 23, 2021 12:53

wochinge reviewed Aug 24, 2021

View reviewed changes

Apply suggestions from code review

8c0e560

Co-authored-by: Tobias Wochinger <[email protected]>

JEM-Mosig approved these changes Aug 25, 2021

View reviewed changes

ka-bu added 5 commits August 26, 2021 15:28

Merge branch 'main' into 3.0-architecture-revamp/9340/e2e-lookup

727e847

address comments; simplify (no entities); include unexpecTED in tests

b85ac00

lint

d653d45

fix

e887cac

fix (return type, refactor function name)

709dc65

ka-bu requested a review from joejuzl August 27, 2021 10:39

joejuzl suggested changes Sep 1, 2021

View reviewed changes

ka-bu and others added 2 commits September 1, 2021 18:29

Apply suggestions from code review

2c50a63

Co-authored-by: Joe Juzl <[email protected]>

fixes and replace message iterator with list

2eb293d

ka-bu requested a review from joejuzl September 1, 2021 16:44

joejuzl approved these changes Sep 3, 2021

View reviewed changes

merge main; update memoization

3f15509

minor (fix type)

a964e87

ka-bu enabled auto-merge (squash) September 6, 2021 08:26

ka-bu added 2 commits September 6, 2021 18:42

Merge remote-tracking branch 'origin/main' into 3.0-architecture-reva…

3ec2a86

…mp/9340/e2e-lookup

Merge branch 'main' into 3.0-architecture-revamp/9340/e2e-lookup

a2f4a59

ka-bu merged commit 19fb70a into main Sep 6, 2021

ka-bu deleted the 3.0-architecture-revamp/9340/e2e-lookup branch September 6, 2021 17:36

ErickGiffoni pushed a commit to FGA-GCES/rasa that referenced this pull request Sep 9, 2021

3.0 architecture revamp/9340/e2e lookup (RasaHQ#9405)

080eabe

Co-authored-by: Tobias Wochinger <[email protected]> Co-authored-by: Joe Juzl <[email protected]>

ka-bu mentioned this pull request Sep 24, 2021

Lookup Table for featurized tracker messages #9020

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0 architecture revamp/9340/e2e lookup #9405

3.0 architecture revamp/9340/e2e lookup #9405

ka-bu commented Aug 20, 2021 •

edited

Loading

wochinge left a comment

wochinge Aug 24, 2021

ka-bu Aug 25, 2021

wochinge Aug 25, 2021

ka-bu Aug 25, 2021

wochinge Aug 25, 2021

ka-bu Aug 25, 2021

wochinge Aug 25, 2021

ka-bu Aug 27, 2021

JEM-Mosig left a comment

joejuzl left a comment

joejuzl Sep 1, 2021

ka-bu Sep 1, 2021

joejuzl Sep 1, 2021

ka-bu Sep 1, 2021

ka-bu Sep 1, 2021

joejuzl Sep 1, 2021

ka-bu Sep 1, 2021

joejuzl Sep 3, 2021

ka-bu Sep 3, 2021

joejuzl Sep 6, 2021

joejuzl left a comment

	precomputations: Optional[CoreFeaturizationPrecomputations] = None,
	precomputed_features: Optional[CoreFeaturizationPrecomputations] = None,

3.0 architecture revamp/9340/e2e lookup #9405

3.0 architecture revamp/9340/e2e lookup #9405

Conversation

ka-bu commented Aug 20, 2021 • edited Loading

wochinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JEM-Mosig left a comment

Choose a reason for hiding this comment

joejuzl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joejuzl left a comment

Choose a reason for hiding this comment

ka-bu commented Aug 20, 2021 •

edited

Loading