Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mining with oneVar bias - Exception and Explanation #29

Open
kliegr opened this issue Jun 26, 2020 · 4 comments
Open

Mining with oneVar bias - Exception and Explanation #29

kliegr opened this issue Jun 26, 2020 · 4 comments

Comments

@kliegr
Copy link

kliegr commented Jun 26, 2020

Exception

When mining rules with the following command:

java -jar amie-dev.jar  -minis 1  -minc 0.1  -maxad 5  -bias oneVar sample4.tsv

I get

 Starting the mining phase... Exception in thread "main" java.lang.IllegalArgumentException: Invalid combination of variables in ?x <https://w3id.org/biolink/vocab/interacts_with> ?z  pos1 = 1 pos2=1
    at amie.data.KB.resultsTwoVariablesByPos(KB.java:1603)
    at amie.data.KB.frequentBindingsOf(KB.java:2753)
    at amie.mining.assistant.MiningAssistant.getInitialAtoms(MiningAssistant.java:525)

This works in amie_plus:

java -jar amie_plus.jar  -minis 1  -minc 0.1  -maxad 5  -bias oneVar sample4.tsv

And returns three rules, such as:

 ?b  <https://w3id.org/biolink/vocab/interacts_with>  ?f  ?j  <https://w3id.org/biolink/vocab/interacts_with>  ?f  ?a  <https://w3id.org/biolink/vocab/interacts_with>  ?j   => ?a  <https://w3id.org/biolink/vocab/interacts_with>  ?b      0.333333333     1       1       3       3       3       ?a      0.0     0.0     0.0

Explanation

Additionally, would it be possible to explain what the oneVar option does and why does not amiePlus return any rules by default (when oneVar option) is omitted? Also, I had a hard time finding paths in the input KG that meet some of the rules generated by AMIE+ with the oneVar option. For debugging purposes, I created a small TSV dataset sample3debug.tsv consisting of three triples:

  <C>	<rel>	<B>
  <D>	<rel>	<B>
  <E>	<rel>	<B>

running

 java -jar amie_plus.jar  -minis 1  -minc 0.1  -maxad 5  -bias oneVar sample4debug.tsv

gives me:

Rule    Head Coverage   Std Confidence  PCA Confidence  Positive Examples       Body size       PCA Body size   Functional variable     Std. Lower Bound        PCA Lower Bound PCA Conf estimation
?i  <rel>  ?b  ?a  <rel>  ?f  ?i  <rel>  ?f   => ?a  <rel>  ?b  1       1       1       3       3       3       ?a      0.0     0.0     0.0

I do not understand how this rule can positively cover three instances in the dataset, cf. the following illustration. Probably I am missing something or misinterpreting the rule?

oneVarResult
sample4.tsv.zip

@lajus
Copy link
Contributor

lajus commented Jun 26, 2020

Hi,

The oneVar mining assistant is legacy code and has known inconsistencies. I have not found the time to look into it yet.

Jonathan

@lgalarra
Copy link
Collaborator

lgalarra commented Jun 29, 2020

Hi,

If the rule ?i ?b ?a ?f ?i ?f => ?a ?b was mined from the 3 triples you mentioned, it is possible to have 3 positive examples. Recall that we count the different bindings of ?a, i.e., we do not care if another variable, e.g., ?i is bound to multiple values for the same value of ?a:

 <C>	<rel>	<B>
  <D>	<rel>	<B>
  <E>	<rel>	<B>
[?a = <C>, ?b = <B>, ?i = <D>/<E>, ?f = <B>]
[?a = <D>, ?b = <B>, ?i = <C>/<D>, ?f = <B>] 
[?a = <E>, ?b = <B>, ?i = <D>/<E>, ?f = <B>] 

Cheers,
Luis

@kliegr
Copy link
Author

kliegr commented Jul 1, 2020

Thank you both for these very detailed explanations, very much appreciated. I will look into this. If you happen to have some time to spend on explanations, what is the motivation behind "oneVar" (or the newer biases in AMIE 3)? I guess it extends the number of rules that can be generated but at the same time having a variable being bound to multiple values for the same value of another variable is somewhat counterintuitive.

@lgalarra
Copy link
Collaborator

lgalarra commented Jul 1, 2020

Hi,

In regards to oneVar, the reasons were purely scientific, i.e., we wanted to experiment with a simpler and more efficient way to define support, and we needed to compare against WARMR, which counts support on one variable. We realized later that counting that way -- for rules without constants -- does not make sense because the "currency" for rules are predictions. For example, the oneVar strategy would count as one hit the fact that we may predict all the children of a person, or all her nationalities.

Luis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants