-
Notifications
You must be signed in to change notification settings - Fork 2
User Guide
For general information on the Metafacture IDE see Home.
For an introduction to metadata processing with Flux and the Metafacture IDE see this blog post.
The easiest way to install the Metafacture IDE in Eclipse is by using the Eclipse Marketplace: choose Help → Eclipse Marketplace in Eclipse, search for ‘Metafacture’, select ‘install’ and follow the instructions.
If your Eclipse has no Marketplace client installed, you can either install the Marketplace from the standard Eclipse site (under Help → Install New Software) or you can manually add the Metafacture IDE site http://lobid.org/download/tools/p2
under Help → Install New Software → Add… and install the ‘Metafacture IDE’ feature.
If you have no Eclipse on your system you can download an Eclipse workbench that includes the Metafacture IDE (for Windows, Linux, and Mac) from http://lobid.org/download/tools.
After the installation, *.flux files will open with the Flux editor. The Metafacture IDE provides syntax coloring and syntax checking, semantic validation, content assist (CTRL+Space), an outline view, workflow visualization, and a launcher to execute Flux files (see screenshot and the Features section below).
The editor will highlight parts of your Flux files (like keywords and string literals, see screenshot above). It will also check for syntax errors (like forgotten semicolons or misplaces braces). The syntax errors are reported both in the editor ruler and the Problems view (see also section on Semantic validation).
The outline view provides a visual tree-like representation of your Flux workflow. It includes the input and output types of the Flux commands (in the form Input -> Output
). See the section on Semantic validation for more on input and output types.
The elements in the Outline view are linked to the corresponding parts of you Flux file in the editor: double-clicking the elements will highlight the corresponding text. If you enable the Link with Editor button in the Outline view, the active editor element will be highlighted in the Outline and vice versa:
Flux workflows describe typed processing pipelines: each Flux command accepts a particular input type and produces a particular output type. These are declared as annotations on the classes that implement the commands. The editor verifies that your workflows are valid as far as the input and output types are concerned. If a command requires a different input type than what the previous command outputs, the editor will display an error.
If the class that implements a command does not declare its input and output types, the editor will display a warning: these workflows may be correct, but the editor can’t verify them, so you should double check such commands and their inputs and outputs.
The errors and warnings are displayed both in the ruler and in the Problems view. Double-clicking a problem in that view will highlight the corresponding command in the editor. If you enable the Link with Editor button in the Outline view, this will also highlight the corresponding command there (which includes the correct input and output types). This can help you in debugging your problem. See the screenshot below to get the idea:
The Flux editor provides content assist (also known as auto completion or auto suggest) for the current cursor position. The content assist is triggered by CTRL+Space and displays commands and their documentation. Right after a |
, this will display all commands available (see screenshot at the top of this page). If you trigger the content assist after writing something like decode-
, you will get all available completions:
On saving a Flux file in the editor, a corresponding Graphviz DOT file is generated in the project’s src-gen
folder. If you open the Zest graph view and enable listening to changes in the workspace (Window → Show View → Other… → Visualization → Zest Graph, enable the first, i.e. leftmost button on that view), this will visualize the workflow currently edited in the Flux editor (see screenshot at the top of this page).
To execute a *.flux file, select the file (in the project explorer or the editor) and choose Run → Run As → Flux Workflow. For detailed usage instructions, see the Sample Usage section below.
To illustrate the usage of the Metafacture IDE, we provide a small sample transformation of MARC-XML input data to both RDF in N-Triple serialization and to Graphviz DOT.
As the first step, create a project in Eclipse: File → New → Project… → General → Project → Next → enter a name → Finish
Save the sample input file at https://github.com/culturegraph/metafacture-ide/raw/master/samples/input.xml to any location, select it in your file browser (like the Windows Explorer or the Mac Finder) and copy/paste or drag it into your project in Eclipse.
This file contains two bibliographic records in the MARC-XML format.
Create a file (File → New → File → select your project → enter morph.xml
as the file name → Finish) with the following content (this describes the actual transformation):
<?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1">
<rules>
<!-- General attributes for each record: -->
<data source="001" name="subject">
<regexp match="(.*)" format="http://lobid.org/zvdd/hbz/${1}" />
</data>
<data source="001" name="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
<regexp match=".*" format="http://purl.org/dc/terms/BibliographicResource" />
</data>
<data source="001" name="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
<regexp match=".*" format="http://purl.org/vocab/frbr/core#Manifestation" />
</data>
<data source="001" name="http://www.w3.org/2004/02/skos/core#Concept">
<regexp match=".*"
format="http://iflastandards.info/ns/isbd/terms/mediatype/T1010" />
</data>
<!-- Map specific fields: -->
<data source="8564 .u" name="http://lobid.org/vocab/lobid#fulltextOnline" />
<data source="24500.a" name="http://iflastandards.info/ns/isbd/elements/P1004">
<!-- Strip newlines with surrounding space: -->
<replace pattern="\s*\n+\s*" with=" " />
</data>
<data source="533 .c" name="http://iflastandards.info/ns/isbd/elements/P1017" />
<data source="1001 .a" name="http://purl.org/dc/elements/1.1/creator" />
<data source="260 .c" name="http://purl.org/dc/terms/medium/issued">
<!-- One processing example: pick out first valid year: -->
<regexp match="(1\d{3}|200\d)" format="${1}" />
</data>
<data source="041 .a" name="http://purl.org/dc/terms/language" />
<!-- Leave all other fields untransformed: <data source="_else"/> -->
</rules>
</metamorph>
You can also download this file from https://github.com/culturegraph/metafacture-ide/raw/master/samples/morph.xml.
For details on the Metamorph syntax and functionality have a look at the Metafacture wiki and samples.
Finally, we create the Flux file: create a file (File → New) called sample.flux
. In the dialog asking Do you want to add the Xtext nature to the project? select Yes. Add the following content to the file:
default files = FLUX_DIR;
files + "input.xml" |
open-file |
decode-xml |
handle-marcxml |
morph(files + "morph.xml") |
stream-tee | {
encode-ntriples |
write(files + "output.nt")
}{
encode-dot |
write(files + "output.dot")
};
You can also download this file from https://github.com/culturegraph/metafacture-ide/raw/master/samples/sample.flux. For advanced workflow definitions have a look at the Metafacture wiki and samples.
This file defines the workflow: open the input file, read it as MARC-XML, transform it using the morph definition from above, encode the result once as N-Triples and once as Graphviz DOT, and finally write both results to files. To execute the workflow, choose the file and select Run → Run As → Flux Workflow. If you open the *.nt output file, you should see the following content:
<http://lobid.org/zvdd/hbz/184000> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/terms/BibliographicResource> .
<http://lobid.org/zvdd/hbz/184000> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation> .
<http://lobid.org/zvdd/hbz/184000> <http://www.w3.org/2004/02/skos/core#Concept> <http://iflastandards.info/ns/isbd/terms/mediatype/T1010> .
<http://lobid.org/zvdd/hbz/184000> <http://purl.org/dc/terms/medium/issued> "1848" .
<http://lobid.org/zvdd/hbz/184000> <http://iflastandards.info/ns/isbd/elements/P1017> "Univ.-Bibl." .
<http://lobid.org/zvdd/hbz/184000> <http://purl.org/dc/terms/language> "ger" .
<http://lobid.org/zvdd/hbz/184000> <http://purl.org/dc/elements/1.1/creator> "Falke, Jakob" .
<http://lobid.org/zvdd/hbz/184000> <http://lobid.org/vocab/lobid#fulltextOnline> <http://digi.ub.uni-heidelberg.de/diglit/falke1873> .
<http://lobid.org/zvdd/hbz/183999> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/terms/BibliographicResource> .
<http://lobid.org/zvdd/hbz/183999> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation> .
<http://lobid.org/zvdd/hbz/183999> <http://www.w3.org/2004/02/skos/core#Concept> <http://iflastandards.info/ns/isbd/terms/mediatype/T1010> .
<http://lobid.org/zvdd/hbz/183999> <http://iflastandards.info/ns/isbd/elements/P1017> "Univ.-Bibl." .
<http://lobid.org/zvdd/hbz/183999> <http://purl.org/dc/terms/language> "ger" .
<http://lobid.org/zvdd/hbz/183999> <http://iflastandards.info/ns/isbd/elements/P1004> "Kunst und Kunstgewerbe auf der Wiener Weltausstellung 1873" .
<http://lobid.org/zvdd/hbz/183999> <http://lobid.org/vocab/lobid#fulltextOnline> <http://digi.ub.uni-heidelberg.de/diglit/luetzow1875> .
The *.dot file should contain the following content:
digraph g {
graph[layout=fdp]
"<http://lobid.org/zvdd/hbz/184000>" -> "<http://purl.org/dc/terms/BibliographicResource>" [label="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"]
"<http://lobid.org/zvdd/hbz/184000>" -> "<http://purl.org/vocab/frbr/core#Manifestation>" [label="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"]
"<http://lobid.org/zvdd/hbz/184000>" -> "<http://iflastandards.info/ns/isbd/terms/mediatype/T1010>" [label="http://www.w3.org/2004/02/skos/core#Concept"]
"<http://lobid.org/zvdd/hbz/184000>" -> "1848" [label="http://purl.org/dc/terms/medium/issued"]
"<http://lobid.org/zvdd/hbz/184000>" -> "Univ.-Bibl." [label="http://iflastandards.info/ns/isbd/elements/P1017"]
"<http://lobid.org/zvdd/hbz/184000>" -> "ger" [label="http://purl.org/dc/terms/language"]
"<http://lobid.org/zvdd/hbz/184000>" -> "Falke, Jakob" [label="http://purl.org/dc/elements/1.1/creator"]
"<http://lobid.org/zvdd/hbz/184000>" -> "<http://digi.ub.uni-heidelberg.de/diglit/falke1873>" [label="http://lobid.org/vocab/lobid#fulltextOnline"]
"<http://lobid.org/zvdd/hbz/183999>" -> "<http://purl.org/dc/terms/BibliographicResource>" [label="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"]
"<http://lobid.org/zvdd/hbz/183999>" -> "<http://purl.org/vocab/frbr/core#Manifestation>" [label="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"]
"<http://lobid.org/zvdd/hbz/183999>" -> "<http://iflastandards.info/ns/isbd/terms/mediatype/T1010>" [label="http://www.w3.org/2004/02/skos/core#Concept"]
"<http://lobid.org/zvdd/hbz/183999>" -> "Univ.-Bibl." [label="http://iflastandards.info/ns/isbd/elements/P1017"]
"<http://lobid.org/zvdd/hbz/183999>" -> "ger" [label="http://purl.org/dc/terms/language"]
"<http://lobid.org/zvdd/hbz/183999>" -> "Kunst und Kunstgewerbe auf der Wiener Weltausstellung 1873" [label="http://iflastandards.info/ns/isbd/elements/P1004"]
"<http://lobid.org/zvdd/hbz/183999>" -> "<http://digi.ub.uni-heidelberg.de/diglit/luetzow1875>" [label="http://lobid.org/vocab/lobid#fulltextOnline"]
}
For an instant visualization of the generated DOT, you can open the ‘Zest Graph’ view (Window → Show View → Other… → Visualization → Zest Graph). If you enable the first (leftmost) button on that view, any changes to *.dot files will be automatically picked up and visualized, i.e. upon running the *.flux file, the visualization is updated. This can be used to tweak and debug the transformation rules on small input sets.