Support more than one output format through argument chaining #54

sdruskat · 2022-03-10T14:30:48Z

It'd be great if you were able to write > 1 format at the same time, e.g. gitlab2prov -t {token} -f json -f rdf -f xml -p {url} -r 1 > provout/{outfilename}.{format}. This would help when building a larger dataset with provenance data. This would be fairly easy to achieve using click or typer as well.

The text was updated successfully, but these errors were encountered:

cdboer · 2022-03-10T15:42:43Z

Sounds good! It shouldn't be necessary to rerun the tool just to get the provenance data serialized in all of the desired formats. I'll add this to the feature list for releases after v1.0.

I'm not that familiar with click yet. Do you mean command chaining or multi value options / multiple options? And how would you redirect the output to three different files?

cdboer · 2022-03-28T10:51:22Z

@sdruskat I don't know if I understand your example correctly. I'll try to walk myself through it, correct me if i got it wrong. You want to be able to explicitly state multiple serialization formats and get gitlab2prov to serialize the generated provenance graph in all stated formats with one file per format, right? Allowing multiple formats for one command line flag is not the issue, that can be done with click, argparse or comparable libs. But how would you redirect the output to three different files? Is that even possible in a bash shell?

If the problem boils down to converting the serialization format of an already existing provenance document into another serialization format (e.g. rdf to provn or json to xml, etc.) then you could use provconvert to do so. I think its bundled with each install of the ProvToolbox. You could chain provconvert and gitlab2prov together if you want:

gitlab2prov -t {token} -f rdf > provout/doc.rdf
provconvert -infile provout/doc.rdf -outfile provout/doc.provn

sdruskat · 2022-04-01T08:35:42Z

I'm not that familiar with click yet. Do you mean command chaining or multi value options / multiple options? And how would you redirect the output to three different files?

Multiple options is what I meant, see the example in the first comment.

As for the redirection question, I haven't looked at the source code yet to understand how you write files. It was my assumption that you build an internal model, then dump that into files...

@sdruskat I don't know if I understand your example correctly. I'll try to walk myself through it, correct me if i got it wrong. You want to be able to explicitly state multiple serialization formats and get gitlab2prov to serialize the generated provenance graph in all stated formats with one file per format, right? ✅ Allowing multiple formats for one command line flag is not the issue, that can be done with click, argparse or comparable libs. But how would you redirect the output to three different files? Is that even possible in a bash shell?

I guess this would include adding serialization functionality in gitlab2prov directly, including perhaps an argument (or fixed value) to take a pattern for naming files.

E.g. gitlab2prov ... -f rdf -f json -f xml and then (perhaps multithreaded):

def write(outformat):
    with open(filename + '.' + outformat, 'w') as outfile:
        outfile.write(data)

for outformat in formats:
    write(outformat)

I'd need to look at the code to suggest something more concrete. The idea is that the retrieval step is done only once, instead of having to do it n times for n formats.

cdboer · 2022-05-12T09:29:49Z

I implemented multi-format mode as a cli subcommand in the branch linked in this issue. Usage with multiple formats now looks like this:

gitlab2prov -p {url} -t {token} multi-format -f json xml rdf -o provout/example

A config file for multi-format mode now looks like this:

[GITLAB]
project_urls = project_a_url, project_b_url
token = token

[OUTPUT]
format = json, rdf, xml
outfile = provout/example

[MISC]
profile = False
verbose = False
double_agents = path/to/alias/mapping
pseudonymous = False

The config option outfile is required when you want to serialize in multiple formats.

sdruskat · 2022-05-12T10:09:42Z

I implemented multi-format mode as a cli subcommand in the MR linked in this issue.

Nice one! That should fix this issue. Looking forward for this to be merged.

sdruskat · 2022-05-12T10:10:27Z

I guess the docs in the README should still be updated in the branch to reflect changes?

cdboer · 2022-05-12T11:55:52Z

Yes, I'll update the README before merging.

cdboer · 2022-06-12T15:50:32Z

@sdruskat @LynnvK multi-format mode is now merged

sdruskat changed the title ~~Support more than one output format through param chaining~~ Support more than one output format through argument chaining Mar 10, 2022

cdboer self-assigned this Mar 10, 2022

cdboer added the enhancement New feature or request label Mar 10, 2022

cdboer added this to the Multi-Format Serialization milestone May 8, 2022

cdboer mentioned this issue Jun 12, 2022

54 Support more than one output format through argument chaining #58

Merged

cdboer closed this as completed in #58 Jun 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more than one output format through argument chaining #54

Support more than one output format through argument chaining #54

sdruskat commented Mar 10, 2022

cdboer commented Mar 10, 2022

cdboer commented Mar 28, 2022

sdruskat commented Apr 1, 2022 •

edited

Loading

cdboer commented May 12, 2022

sdruskat commented May 12, 2022

sdruskat commented May 12, 2022

cdboer commented May 12, 2022

cdboer commented Jun 12, 2022 •

edited

Loading

Support more than one output format through argument chaining #54

Support more than one output format through argument chaining #54

Comments

sdruskat commented Mar 10, 2022

cdboer commented Mar 10, 2022

cdboer commented Mar 28, 2022

sdruskat commented Apr 1, 2022 • edited Loading

cdboer commented May 12, 2022

sdruskat commented May 12, 2022

sdruskat commented May 12, 2022

cdboer commented May 12, 2022

cdboer commented Jun 12, 2022 • edited Loading

sdruskat commented Apr 1, 2022 •

edited

Loading

cdboer commented Jun 12, 2022 •

edited

Loading