This repository contains implementation of function-free first-order logic mining algorithms enhanced by pruning using learned domain theories, which is also called saturation-based pruning. It was originally published in Pruning hypothesis spaces using learned domain theories ILP'17. The current code base contains some extensions beyond the original feature extractor, e.g., domain-theory learner published in Relaxing Deductive and Inductive Reasoning in Relational Learning (dissertation, CTU'24). A prototypical version of this pruning was also employed under the hood of Stacked structure learning for lifted relational neural networks published at ILP'17.
If you'd like to cite us, please use the following bib-entry:
@inproceedings{svatovs2018pruning,
title={Pruning hypothesis spaces using learned domain theories},
author={Svato{\v{s}}, Martin and {\v{S}}ourek, Gustav and {\v{Z}}elezn{\'{y}}, Filip and Schockaert, Steven and Ku{\v{z}}elka, Ond{\v{r}}ej},
booktitle={Inductive Logic Programming: 27th International Conference, ILP 2017, Orl{\'e}ans, France, September 4-6, 2017, Revised Selected Papers 27},
pages={152--168},
year={2018},
organization={Springer}
}
The folder datasets
contains the NCI GI collection of molecular dataset which were used in the paper. However, the folder
./datasets/nci
contains the original NCI datasets, which we transformed into datasets of interpretations which are in
the form of
+ ar_bond(29, 26), car(29), car(26), ar_bond(26, 29)....
where the first character +
or -
stands for positive and negative example, respectively. Thereafter, a list of facts
follows. This snippet of data, in particular, shows a part of positive molecule that has two carbons that are connected
via the aromatic bond. In general, predicates may be of arbitrary non-zero arity; they start with a lower-case. Constants
start with a lower-case as well; in this case, integers are used to denote two different atoms. Finally, a variable starts
with an upper-case, e.g. V1
; however, these are not allowed within examples since we are working within the learning from interpretation
-- and so each example has is a (fully specified) Herbrand interpretation. Each line of the dataset file corresponds to
exactly one learning example.
Beside the NCI datasets, there are also datasets from Alchemy; we just had to transform
these (IMDB, Proteins, UWCS) to be in the same format we use. Last but not least, there are a few new dataset we created.
Namely, these are PDDL-Grip, loops and groups. The PDDL-Grip dataset was created using VAL;
the archive ./datasets/pddl_grip/pddl.rar
contains the code that constructed the dataset. Note that it includes an executable
of the above-mentioned VAL program; it's present only for reproducibility, it is not well documented. In the same fashion,
the archive ./datasets/groupds/groups.rar
contains the code that constructed the loops datasets. Again, it may contain
code from another authors as it uses GAP which needs to be installed; it is not well
documented, it is here only for reproducibility purposes.
You may either use the fat jar pruningHypothesis.jar
or build your own jar using the source codes and dependencies (all
libraries -- Matching
by Ondřej Kuželka and Sat4J -- are either built in or in the folder dependencies
).
The first application of this code base is to learn domain theory, i.e., a set of universally quantified clauses such that each clause covers all examples. Learning domain theory is as easy as
java
-Dida.searchPruning.dataset=./datasets/imdb/merged.db.oneLine
-Dida.searchPruning.learn=domainTheory
-Dida.logicStuff.constraints.maxLiterals=4
-Dida.logicStuff.constraints.maxVariables=15
-Dida.logicStuff.constraints.maxPosLit=1
-Dida.logicStuff.constraints.useSaturation=false
-jar pruningHypotheses.jar
The parameter ida.searchPruning.dataset
should point to a dataset in the format described above. The second parameter sets
up learning only domain theory. Further, we may set the maximal number of positive, negative and literals in total, as well
as number of variables, i.e., language bias. At the end, you should end up with something like
# ultraSCL setting
# useSaturations false
...
a setup of the learning setting followed by some statistics
# overall stats
# ----------------------------
# | total refinements | non-isomorphic | non-isomorphic saturated | #HC | time
# level 1 34 34 0 2 0.012882649983333333
# level 2 1016 530 0 109 0.06162433666666667
# level 3 13143 6872 0 115 0.22864976833333334
# level 4 135410 77698 0 242 1.5896445733333333
# counters
# literal stats
# uscl:theory simplifier is turned-off
# there are 242 hard constraints
!actor(V1), !genre_aaction(V1)
!actor(V1), !genre_acomedy(V1)
!actor(V1), !genre_adocumentary(V1)
!actor(V1), !genre_athriller(V1)
!director(V1), !actor(V1)
....
that are followed further by the learned constraints. All lines starting with #
are comments. These stat tell us that at
the first level, there were 34 refinements, from these, all 34 were non-isomorphic, none of these were removed using saturations.
Finally, there were 2 learned constraints within 0.01 minutes. Recall that we are learning universally quantified clauses,
i.e., the last one stands for
Well, that was not hard but -- where is the saturated pruning? The above-mentioned did not utilize saturation pruning within
the learning (searching) process. However, we have a version that incorporate the pruning. Actually, it uses domain theory
learned up to i-th level (exclusively) to saturate candidates in the i-th level. Sounds good, right? Just, try it by altering
the ida.logicStuff.constraints.useSaturation
parameter.
java
-Dida.searchPruning.dataset=./datasets/imdb/merged.db.oneLine
-Dida.searchPruning.learn=domainTheory
-Dida.logicStuff.constraints.maxLiterals=4
...
-Dida.logicStuff.constraints.useSaturation=true
-jar pruningHypotheses.jar
Well, it worked -- behold!
....
# overall stats
# ----------------------------
# | total refinements | non-isomorphic | non-isomorphic saturated | #HC | time
# level 1 34 34 34 2 0.01601748165
# level 2 1016 530 441 109 0.19080514666666668
# level 3 12682 6708 2123 115 1.2459660583333334
# level 4 118855 71006 11156 184 5.854695296666667
....
The execution time is slightly worse, however there are less total refinements produced. The discrepancy between non-isomorphic
and non-isomorhpic saturated
expresses how many clauses were pruned by the saturation-based pruning. You play around with
the language bias, e.g., allowing multiple components in a clause by setting up ida.logicStuff.constraints.maxComponents
parameter or changing the saturation mode -- that is what type of saturations we are going to allow -- using ida.logicStuff.constraints.saturationMode
parameter. These are the possible values:
-
-1
-- we are computing only positive saturations -
-2
-- we are computing only negative saturations -
-3
-- we are computing both negative and positive saturations -
-4
-- no saturations are computed (same as-Dida.logicStuff.constraints.useSaturation=false
) -
x
-- where$x > 0$ means that saturations are computed only up to (inclusively) i-th layer, i.e., clauses with at mostx
literals Saturations are computationally expensive, so this parameter allows you to switch between utilizing our approach and scalability.
Aggregating results from the table-like statistics above should be easy. Finally, you should be able to obtain results similar to the disseratation, i.e.
Mining features is as easy as running the following command:
java
-Dida.searchPruning.dataset=./datasets/nci_transformed/gi50_screen_KM20L2.txt
-Dida.searchPruning.learn=features
-Dida.searchPruning.maxDepth=4
-Dida.searchPruning.maxVariables=8
-Dida.searchPruning.minSupport=1
-Dida.searchPruning.mining=bfs
-Dida.searchPruning.runner.overallLimit=1200
-Dida.logicStuff.mineWithSaturations=true
-jar pruningHypotheses.jar
In particular, this mines features, i.e., conjunctive patterns, on the KM20L2 dataset with at most 4 literals, 8 variables, and a minimal support of 1, within at most 1200 minutes and saturation-based pruning. Thus, at the start, a domain theory is learned -- you may set up parameters of the theory, e.g., maximal number of literals, etc., as in the section above. This particular approach is level-wise WARMR-like method. The output, starts in similar way as above, e.g.
# start mining
# storing the output to .\r636023851589300
# overall stats
# ----------------------------
# | total refinements | non-isomorphic | non-isomorphic saturated | #HC | time
# level 1 50 50 50 4 0.010924436666666667
# level 2 2178 1172 881 232 0.3835711766666667
...
# end of mining
*****
depth : 1
constraint time [ms]: 23414
search time [ms]: 371
overall time [ms]: 23785
total searched nodes: 21
total extended: 4
total pruned nodes: 4
total killed: 8
*****
which starts showing the directory with the output -- this can be changed using the ida.searchPruning.targetFolder
parameter.
Familiar output of domain theory learner is displayed thereafter. Finally, statistics of the mining are displayed layer by
layer showing execution time from start (in ms), cumulative number (from the start) of opened nodes, cumulative number of
extended hypotheses extended by saturations (these got new literals thanks to saturations), cumulative number of pruned nodes
(the ones that were pruned by saturations either by being isomorphic after saturation, or by being trivial), and cumulative
number of killed hypotheses (those there were found trivial w.r.t. domain theory).
Running the feature mining without the saturation-based pruning, leaving it only with isomorphic-based pruning, is as simple as
using -Dida.logicStuff.mineWithSaturations=false
. Then, no domain theory is learned and so you won't find the first part
of the output.
In the output folder, you will find a file with a domain theory and cumulative statistics with features per layer. Aggregating these results should be relatively easy and would lead to similar results as in the papers, i.e.
This code base contains much more ideas around the saturation-based pruning than were published in either of the publications.
For example, you should be able to utilize the ida.searchPruning.targetPredicate
parameter to learn Horn rules. There are
also much more implemented strategies, e.g., beam search, DFS, etc, that already incorporate either basic saturation-based
pruning or much more extended version -- these include speeding up hacks, e.g., language bias, caching refinements (literals)
that were deemed trivial w.r.t. domain theory, and extended saturations, e.g., incorporating clauses with imperfect coverage
over the dataset and adaptive saturation learning. Note that these changes in the codebase may slightly change the results;
you may find earlier implementations in older commits. Similarly, the examples in this readme were executed on a different
machine, with different parameters, subsets of data, etc. than the experiments in the papers. A proper experimental setup
would also need the memory, e.g., -Xms
and -Xmx
.