-
Notifications
You must be signed in to change notification settings - Fork 18
Entities
iKnow's primary function is to identify phrase boundaries that define Entities, entirely based on the syntactic structure of the sentences, rather than relying on an upfront dictionary or pre-trained model. This makes iKnow well-suited for initial exploration of a new corpus.
iKnow Entities are not Named Entities in the NER (Named Entity Recognition) sense, but rather the word groups that need to be considered together, representing a concept or relationship as coined by the text author in its entirety. The following examples show the importance of this phrase level to fully capture what the author meant:
iKnow Entity | Meaning |
---|---|
Dopamine | small molecule |
Dopamine receptor | drug target |
Dopamine receptor antagonist | chemical drug |
Dopamine receptor gene | gene, molecular sequence |
Dopamine receptor gene mutation | physiological process |
iKnow labels every entity with a simple role that is either Concept (usually corresponding to Noun Phrases in Part-Of-Speech lingo) or Relation (verbs, prepositions, ...). Typical stop words that have little meaning of their own get categorized as PathRelevant (e.g. pronouns) or NonRelevant parts, depending on whether they play a role in the sentence structure or are just linguistic fodder.
In the following sample sentence, we've highlighted Concepts, Relations and PathRelevants separately.
Belgian geuze is well-known across the continent for its delicate balance.