Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem parsing rules in Freebase #21

Open
lgalarra opened this issue May 29, 2020 · 3 comments
Open

Problem parsing rules in Freebase #21

lgalarra opened this issue May 29, 2020 · 3 comments
Assignees

Comments

@lgalarra
Copy link
Collaborator

Hi,

AMIE cannot parse rules such as:

?a /film/actor/film./film/performance/film /m/0340hj => ?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

I have boiled down the problem to the method rules in KB.java. I am proposing a new regex for parsing. We do not supported typed literals on the other hand. My fix is available in #19 .

Cheers,
Luis

@falcaopetri
Copy link
Contributor

Hi,

PR #19 broke some experiments of mine (nothing too bad), so I started debugging.
@lgalarra, could you confirm that the new regex is able to parse your example:

?a /film/actor/film./film/performance/film /m/0340hj => 
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

It seems that the numbers are not being parsed correctly (0340hj, in this case). The same thing happened with some of my triples. What I get here is:

?a /film/actor/film./film/performance/film /m/ => 
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

I've included some tests in PR #22. I hope it helps spotting this things in the future. I'm not so good with regex, so I'd be glad if you checked this issue with numbers.

Also, AMIEParser is recognizing the following "rule" from AMIE's output: Lossless (query refinement => ) heuristics enabled. Should I just remove the "header" from AMIE's output, or is AMIEParser supposed to work with it? Apparently, previous regex didn't capture it as a rule pattern.

Regards,
Antonio.

lajus pushed a commit that referenced this issue Jun 6, 2020
Tests for rule parsing + small bug fix - #21
@lajus
Copy link
Contributor

lajus commented Jun 6, 2020

Hi,

The regexp should be fixed (according to @falcaopetri 's test cases, thanks a lot for those).

Problem was in URI pattern, @lgalarra removed "\w" that was necessary to match numbers (\p{L} only match letters).
We may wanna consider: Use \p{Nd} instead of \w to match numbers (pure unicode numbers) ? Unicode punctuation (’ U+2019) ? Make triplePattern consistent with amieTriplePattern ?

@falcaopetri
Copy link
Contributor

Hi,

I just would like to point out that KB.triples still parses Lossless (query refinement) heuristics enabled as the triples: { Lossless (query refinement , ) heuristics enabled }.

KB.rule returns null though, since the string does not contain :- , or =>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants