Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Georgian: numeral entities are incorrectly resolved #439

Open
Bagdu opened this issue Dec 5, 2019 · 2 comments
Open

Georgian: numeral entities are incorrectly resolved #439

Bagdu opened this issue Dec 5, 2019 · 2 comments

Comments

@Bagdu
Copy link

Bagdu commented Dec 5, 2019

Hi, i am working on duckling and I have following problem.

Duckling doesn't find 'ten' in for example 'ptenz' because ten is just a substring same happens with russian language. But in georgian 'ათი' means ten and Duckling finds 10 in every word that has 'ათი' as a substring. for example this is the answer on the text 'პათიზ'

[ { "body": "ათი", "start": 1, "value": { "value": 10, "type": "value" }, "end": 4, "dim": "number", "latent": false } ]

I commented all the rules and I only left one which knows that 'ათი' is ten but I still get the same result. So I thinks it's not because of the rules could it be problem with the encoding?

@chessai
Copy link
Contributor

chessai commented Nov 6, 2020

Hi! Thanks for the helpful issue. Some questions:

  1. Did you regenerate the classifiers after commenting out all those rules? you will typically want to regenerate classifiers after making changes to rules/corpora.
  2. If your locale is set to russian (e.g. via makeLocale RU Nothing), duckling will surely not determine the entity string ptenz to contain the numeral 10, because ten should only resolve to the numeral 10 when the locale is EN (english).

Here is an example showing that you have indeed uncovered an issue, because EN does not behave this way (nor should it):

> debug (makeLocale EN Nothing) "ptenz" [Seal Numeral]
[]
> debug (makeLocale KA Nothing) "პათიზ" [Seal Numeral]
integer (0..19) (ათი)
-- regex (ათი)
[Entity {dim = "number", body = "\4304\4311\4312", value = RVal Numeral (NumeralValue {vValue = 10.0}), start = 1, end = 4, latent = False, enode = Node {nodeRange = Range 1 4, token = Token Numeral (NumeralData {value = 10.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 1 4, token = Token RegexMatch (GroupMatch ["\4304\4311\4312",""]), children = [], rule = Nothing}], rule = Just "integer (0..19)"}}]

@chessai chessai changed the title Problem in Georgian language Georgian: numeral entities are incorrectly resolved Nov 6, 2020
@Bagdu
Copy link
Author

Bagdu commented Nov 6, 2020

HI! thank you for your response,
In the first question, it was a long time ago, but as I remember I regenerated classifier after commenting all those rules,
and in the second question when I was testing I have set locale to English so that's not the case either.
Maybe this issue can help to you #442

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants