Meta entities #3889

amn41 · 2019-06-28T05:15:24Z

Proposed changes:
This is experimental and may well not make it into the codebase.
Looking at treating entity 'roles' as just another NER problem.

Idea for doing this: use 2 CRFs

Data format includes roles:

heading for [Buenos Aires](location@destination) and flying out of [Quito](location@origin)
I need [2](number@rooms) rooms for [5](number@guests) people.

Composite entities get grouped using numerical roles:

give me a [small](size@1) pizza with [mushrooms](topping@1) and a [large](size@2) [pepperoni](topping^2)
show me some [black](color@1) [shoes](item@1) and a [grey](color@2) [suit](item@2)

Pt 1: Entity recognition

remove role annotations (the @xxx part) so it’s just

heading for [Buenos Aires](location) and flying out of [Quito](location)

then train the CRF as normal.

Pt 2: role recognition

remove entity values and replace with entity type:

heading for [location](destination) and flying out of [location](origin)

train a second CRF just to pick up the destination and origin ‘entities’

Pt 3: recombine at prediction time
Status (please check what you already did):

made PR ready for code review
added some tests for the functionality
updated the documentation
updated the changelog
reformat files using black (please check Readme for instructions)

…ng roles to/from entities

tabergma · 2019-07-10T14:52:55Z

@amn41 Can you please explain in a couple of sentences how your approach works? Looking at the code, I'm not 100% sure if I can follow. How do you distinguish between entities and roles? It seems like roles are marked with '@' in the training data. How does the 'Message' in the end look like? Is it correct that 'roles' are added to the 'entities'? How do you know what role belongs to which entity? I guess some explanation would help to understand the PR better. Thanks.

amn41 · 2019-07-10T15:14:24Z

Sorry I thought I'd included the notes but they were only on slack. Updated description

tabergma

I like the idea of adding a second CRF for the role detection 👍 I'm curious to see, how well it actually works. Especially, in the cases were you group stuff (give me a [small](size@1) pizza with [mushrooms](topping@1) and a [large](size@2) [pepperoni](topping@2)). I thing using @ is a good way to format it in markdown.
We should also think about

Testing the approach: How well does the CRF pick up the different roles?
How are we going to add it to the other training data formats we have?
I guess, the roles should also be accessible in actions. How are we doing that? Same way as for entities?

Other then that I added some comments to the code itself.

tabergma · 2019-07-11T07:55:22Z

rasa/nlu/extractors/crf_role_extractor.py

+                "for details".format(message.text)
+            )
+
+    def process(self, message: Message, **kwargs: Any) -> None:


As far as I can tell, only the methods train, process , and extract_roles are different from the CRFEntityExtractor class. To avoid code duplication, we should create an abstract class CRFExtractor and inherit from that in CRFEntityExtractor and CRFRoleExtractor (or reuse the class EntityExtractor).

tabergma · 2019-07-11T08:21:02Z

rasa/nlu/training_data/formats/markdown.py

+#    r"\[(?P<entity_text>[^\]]+)" r"\]\((?P<entity>[^:)]*?)" r"(?:\:(?P<value>[^)]+))?\)"
+#)
+
+# regex for: `[entity_text](entity_type(^entity_role)?)`


Shouldn't it be an @ instead of the ^?

# regex for: `[entity_text](entity_type(@entity_role)?)`

tabergma · 2019-07-11T08:21:57Z

rasa/nlu/training_data/formats/markdown.py

@@ -19,10 +19,16 @@
 available_sections = [INTENT, SYNONYM, REGEX, LOOKUP]

 # regex for: `[entity_text](entity_type(:entity_synonym)?)`
+#ent_regex = re.compile(


Does this mean we cannot define synonyms anymore?

tabergma · 2019-07-12T12:21:59Z

rasa/nlu/training_data/formats/markdown.py


            start_index = match.start() - offset
            end_index = start_index + len(entity_text)
            offset += len(match.group(0)) - len(entity_text)

-            entity = build_entity(start_index, end_index, entity_value, entity_type)
+            entity = build_entity(start_index, end_index, entity_value, entity_type, role=entity_role)


entity_role is unassigned if no role was found, should be set to None per default.

tabergma · 2019-07-12T12:33:19Z

rasa/nlu/extractors/crf_role_extractor.py

+            text_data = self._from_text_to_crf(role_message)
+            features = self._sentence_to_features(text_data)
+            ents = self.ent_tagger.predict_marginals_single(features)
+            formatted = self._from_crf_to_json(role_message, ents)


tabergma · 2019-07-12T13:14:09Z

rasa/nlu/extractors/__init__.py

+               ent_with_role = ent.copy()
+               ent_with_role["role"] = starts[start_idx]["entity"]
+               entities.append(ent_with_role)
+           else:


I guess we don't want the else statement. Currently, we get in case no role was detected:

"entities": [ { "start": x, "end": y, "value": "Buenos Aires", "entity": "location", "confidence": 0.9792075801341894, "extractor": "CRFEntityExtractor" } ], ... "roles": [ { "start": x, "end": y, "value": "Buenos Aires", "entity": "location", "confidence": 0.9792075801341894, "extractor": "CRFEntityExtractor" } ],

The roles is just a duplicate of entities. Shouldn't roles only contain "entites" that are actual roles?

tabergma · 2019-07-12T13:16:00Z

rasa/nlu/extractors/__init__.py

+               role_ent = ent.copy()
+               role_ent["entity"] = ent["role"]
+               role_ent["value"] = ent["entity"]
+               # TODO update start and end values


So would you simply replace Buenos Aires by location and then just obtain the start and end value for location?

tabergma · 2019-07-12T13:16:36Z

rasa/nlu/extractors/crf_role_extractor.py

+        self._check_spacy_doc(message)
+
+        extracted = self.extract_roles(message)
+        #extracted = self.add_extractor_name(self.extract_roles(message))


can be removed

tabergma · 2019-07-12T13:21:42Z

rasa/nlu/extractors/crf_role_extractor.py

+        """Take a sentence and return roles in json format"""
+
+        if self.ent_tagger is not None:            
+            role_message = self.replace_entities_with_roles(message)


I guess at prediction time this will not work. We want to have a sentence like heading for location and flying out of location in the end. However, the function replace_entites_with_roles would do heading for [location](destination) and flying out of [location](origin). It is not replacing the actual text, is it? Or do I miss something?

amn41 · 2019-07-12T14:14:34Z

thanks @tabergma ! sorry I should have clarified that the code is terrible and was mostly looking for feedback on the approach :)

Testing the approach: How well does the CRF pick up the different roles?
I created some training data to test it myself and it seemed to work, but dataset was much too small to be conclusive. We'd have to put some more work into that.
How are we going to add it to the other training data formats we have?

This is a good question, especially if you have both entity synonyms AND roles. Maybe we should switch to having a dict like I am going to [Buenos Aires]{"entity": "location", "role": "destination"}

I guess, the roles should also be accessible in actions. How are we doing that? Same way as for entities?

Yes I think it would become part of the Message class

ufukhurriyetoglu · 2019-09-14T20:14:24Z

This interpretation for composite entities and the solution idea looks similar to SNIPS NLU Entity/Slot Recognition. Having a look may be helpful. I have a concern about the interpretation of composite entities. A composite entity is a composition of different type entities as child entities. This is how composite entities described and used by LUIS. So the separate entities semantically compose and behave like unique entity. But in the given example different entities labeled by their semantic roles which is something different. Thanks for the awesome RASA library :)

tmbo · 2019-09-30T21:27:06Z

@amn41 what is the plan for this - is someone working on this?

tabergma · 2019-10-01T06:24:57Z

I guess, the plan is to tackle this once the update of our NER is done. I would close this PR and keep the branch.

tmbo · 2019-10-01T07:25:14Z

Sounds good

cyrilthank · 2019-10-04T09:34:33Z

Hi @tmbo @tabergma @JustinaPetr my organization would like to leverage our existing partnership with rasa to work together on this for a custom solution

Can you please advise how best we could take this forward?

amn41 added 3 commits June 27, 2019 17:18

wip - add methods to entityextractor base class for adding and removi…

5ef0b05

…ng roles to/from entities

wip add component for role labeling

d6d8d62

wip markdown regex which supports extracting roles

88c3f2f

tabergma self-requested a review July 9, 2019 15:32

amn41 mentioned this pull request Jul 9, 2019

Composite Entities as part of Rasa NLU #3765

Closed

tabergma reviewed Jul 12, 2019

View reviewed changes

tmbo added the status:stale label Sep 30, 2019

stale bot removed the status:stale label Oct 1, 2019

tmbo closed this Oct 1, 2019

tmbo deleted the meta-entities branch November 22, 2023 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta entities #3889

Meta entities #3889

amn41 commented Jun 28, 2019 •

edited

Loading

tabergma commented Jul 10, 2019

amn41 commented Jul 10, 2019

tabergma left a comment

tabergma Jul 11, 2019

tabergma Jul 11, 2019

tabergma Jul 11, 2019

tabergma Jul 12, 2019

tabergma Jul 12, 2019

tabergma Jul 12, 2019

tabergma Jul 12, 2019

amn41 Jul 12, 2019

tabergma Jul 12, 2019

tabergma Jul 12, 2019

amn41 commented Jul 12, 2019

ufukhurriyetoglu commented Sep 14, 2019 •

edited

Loading

tmbo commented Sep 30, 2019

tabergma commented Oct 1, 2019

tmbo commented Oct 1, 2019

cyrilthank commented Oct 4, 2019

Meta entities #3889

Meta entities #3889

Conversation

amn41 commented Jun 28, 2019 • edited Loading

tabergma commented Jul 10, 2019

amn41 commented Jul 10, 2019

tabergma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amn41 commented Jul 12, 2019

ufukhurriyetoglu commented Sep 14, 2019 • edited Loading

tmbo commented Sep 30, 2019

tabergma commented Oct 1, 2019

tmbo commented Oct 1, 2019

cyrilthank commented Oct 4, 2019

amn41 commented Jun 28, 2019 •

edited

Loading

ufukhurriyetoglu commented Sep 14, 2019 •

edited

Loading