wdsub

This project is a Wikibase Subsetting tool based on Shape Expressions (ShEx).

The project processes Wikidata dumps and extracts a subset based on a Shape Expression.

Usage as a command line tool

If you have a binary executable wdsub, it's usage is similar to linux command line tools. The tool has the following options:

Usage:
  wdsub extract
  wdsub dump
Wikidata subsetting command line tool
Options and flags:
 --help
     Display this help text.
 --version, -v
     Print the version number and exit.
Subcommands:
  extract
    Show information about an entity.
  dump
    Process dump files

As an example, the following command:

wdsub dump -s examples/humans.shex -o target/outputFile.json examples/100lines.json.gz

processes the dump file examples/100lines.json using the ShEx schema examples/humans.shex generating the file target/outputFile.json

The dump options are:

 Usage:
     wdsub dump --count [--out <file>] [--verbose] [--showCounter] [--compressOutput <string>] [--showSchema] [--dumpMode <string>] [--dumpFormat <string>] [--processor <string>] <dumpFile>
     wdsub dump --show [--maxStatements <integer>] [--out <file>] [--verbose] [--showCounter] [--compressOutput <string>] [--showSchema] [--dumpMode <string>] [--dumpFormat <string>] [--processor <string>] <dumpFile>
     wdsub dump --schema <file> [--schemaFormat <string>] [--verbose <string>] [--out <file>] [--verbose] [--showCounter] [--compressOutput <string>] [--showSchema] [--dumpMode <string>] [--dumpFormat <string>] [--processor <string>] <dumpFile>
 Process example dump file.
 Options and flags:
     --help
         Display this help text.
     --count
         count entities
     --show
         show entities
     --maxStatements <integer>
         max statements to show
     --schema <file>, -s <file>
         ShEx schema
     --schemaFormat <string>
         schemaFormat. Possible values: WShExC,ShExC
     --verbose <string>, -v <string>
         verbose level (0-nothing,1-basic,2-info,3-details,4-debug,5-step,6-all)
     --out <file>, -o <file>
         output path
     --verbose
         verbose mode
     --showCounter
         show counter at the end of process
     --compressOutput <string>
         compress output. Possible values: true,false
     --showSchema
         show schema
     --dumpMode <string>
         dumpMode. Possible values: OnlyMatched,WholeEntity,OnlyId
     --dumpFormat <string>
         dumpFormat. Possible values: Turtle,JSON,Text
     --processor <string>
         processor. Possible values: WDTK,Fs2

Usage from docker

The docker image is published as wesogroup/wdsub.

In order to process dumps from docker, you can run:

docker run -d -v [folder-with-dumps]:/data -v [folder-with-schemas]:/shex -v [output-folder]:/dumps wesogroup/wdsub:{version} dump -o /dumps/resultDump.json -s /shex/[shexFile].shex /data/[dumpFile].json.gz

Building and compiling

Prerequisites: Install Scala

The tool has been implemented in Scala and uses sbt for compilation. In order to create a standalone binary, you first need to install sbt.

Install instructions Scala:

Linux: https://www.scala-sbt.org/1.x/docs/Installing-sbt-on-Linux.html

Clone this repository

Once Scala is installed, clone this repository from GitHub.

git clone https://github.com/weso/wdsub.git

Go to the cloned directory

cd wdsub

Compilation to local binary

sbt universal:packageBin

Once it has been run, the binary will be available as a compressed file at:

target/universal/wdsubroot-version.zip

Once that file is uncompressed, the executable script is in folder bin and is called wdsubroot.

Publish docker image

If you want to create a docker local image, you can run:

sbt docker:publishLocal

In order to create a docker image (it requires the right credentials):

sbt docker:publish

More information

Another tool that creates subsets from wikidata dumps is WDumper.

Author & contributors

Author: Jose Emilio Labra Gayo

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
project		project
src		src
website		website
.gitignore		.gitignore
.jvmopts		.jvmopts
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wdsub

Usage as a command line tool

Usage from docker

Building and compiling

Prerequisites: Install Scala

Clone this repository

Go to the cloned directory

Compilation to local binary

Publish docker image

More information

Author & contributors

About

Releases 1

Packages

Contributors 6

Languages

License

weso/wdsub

Folders and files

Latest commit

History

Repository files navigation

wdsub

Usage as a command line tool

Usage from docker

Building and compiling

Prerequisites: Install Scala

Clone this repository

Go to the cloned directory

Compilation to local binary

Publish docker image

More information

Author & contributors

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages