Releases: eXascaleInfolab/StaTIX
Fast Type Inference
- Imported updated DAOC library with orders of magnitude faster clustering retaining exactly the same forming clusters.
- Description and Evaluation results updated
Links Cutting On Preprocessing
- Optional links cutting (similarity matrix reduction) on graph construction implemented to reduce the memory consumption (affects the accuracy)
- Links reduction policies (on clustering) refined
- Network serialization refined considering the filtering
- Cluster labels output added
The executable is built on Linux Ubuntu 16.04 x64, Java OpenJDK 1.8
Optional Features Added
- Links reduction policy parameterized
- Optional weighting of the input instances besides their relations
- Similarity function parameterized
The executable is built on Ubuntu x64 16.04, Java OpenJDK 1.8
Representative Types
- Synced with the updated version of the DAOC clustering library, which provides refined identification of the representative clusters (inferred types)
- Added output of the input network for the clustering without the type inference itself
- Fixed filtered out ids in the outputting .rcg network (clustering input)
The executable of libdaoc is built on Ubuntu x64 16.04, StaTIX jar is build on OpenJDK Java 1.8 x64.
Property Occurrences per Type
- Fixed more accurate properties occurrences evaluation (considering the number of properties per each type from each instance) instead of the binary presence of properties in the types
- Brief hints strategy a bit refined (early termination dropped)
- N-Triples format parsing refined: comments considered, malformed files parsed more reliable
Build is made using Java OpenJDK 1.8 x64 on Linux Ubuntu 16.04 x64 with default GCC (important for the linked native libdaoc clusterirng lib).
Brief Hints Supervision
- Brief Hints lightweight semi-supervision implemented
- Properties weights estimated more accurate for the [semi-]supervised type inferences
Property Weights Normalization Refined
- Property weights normalization refined => accuracy improved for both non-supervised and semi-supervised clustering
- Refined multi-level output with outliers filtering and custom output step
- Build fixed for Java 1.8 (earlier worked fine only on Java 1.9)
- Various minor optimizations performed (lower memory consumption, higher speed)
Built on Ubuntu 16.04 x64, Java 1.8
Note: --brief-hints
is not implemented except the stub API.
Benchmarked Release
- Execution options extended (input links reduction policies added, multi-level output of the representative clusters, etc.)
- Some bugs fixed (NAN values in the similarity matrix could occur in the semi-supervised mode when some properties of the input dataset were not present in the prelabeled one)
- Updated DAOC library linked
Build on Linux Ubuntu 16.04 x64, Java1.9 is attached
Initial Release of the Statistical Type Inference
Performs both non supervised (fully automatic) and semi supervised (using hinting sample dataset) statistical type inference for the RDF dataset (N3 format) yielding ids of the sequential unique subjects grouped by the #type (clusters) in .cnl (space separated list of member ids) format.
The build of the linked native clustering library (DAOC, contact Artem for the details) is performed on Linux Ubuntu 16.04 x64, might also work in the Ubuntu console of Windows 10 x64. On other platforms only the native library should be substituted to run the app.