Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more than one output format through argument chaining #54

Closed
sdruskat opened this issue Mar 10, 2022 · 8 comments · Fixed by #58
Closed

Support more than one output format through argument chaining #54

sdruskat opened this issue Mar 10, 2022 · 8 comments · Fixed by #58
Assignees
Labels
enhancement New feature or request

Comments

@sdruskat
Copy link

It'd be great if you were able to write > 1 format at the same time, e.g. gitlab2prov -t {token} -f json -f rdf -f xml -p {url} -r 1 > provout/{outfilename}.{format}. This would help when building a larger dataset with provenance data. This would be fairly easy to achieve using click or typer as well.

@sdruskat sdruskat changed the title Support more than one output format through param chaining Support more than one output format through argument chaining Mar 10, 2022
@cdboer
Copy link
Collaborator

cdboer commented Mar 10, 2022

Sounds good! It shouldn't be necessary to rerun the tool just to get the provenance data serialized in all of the desired formats. I'll add this to the feature list for releases after v1.0.

I'm not that familiar with click yet. Do you mean command chaining or multi value options / multiple options? And how would you redirect the output to three different files?

@cdboer cdboer self-assigned this Mar 10, 2022
@cdboer cdboer added the enhancement New feature or request label Mar 10, 2022
@cdboer
Copy link
Collaborator

cdboer commented Mar 28, 2022

@sdruskat I don't know if I understand your example correctly. I'll try to walk myself through it, correct me if i got it wrong. You want to be able to explicitly state multiple serialization formats and get gitlab2prov to serialize the generated provenance graph in all stated formats with one file per format, right? Allowing multiple formats for one command line flag is not the issue, that can be done with click, argparse or comparable libs. But how would you redirect the output to three different files? Is that even possible in a bash shell?

If the problem boils down to converting the serialization format of an already existing provenance document into another serialization format (e.g. rdf to provn or json to xml, etc.) then you could use provconvert to do so. I think its bundled with each install of the ProvToolbox. You could chain provconvert and gitlab2prov together if you want:

gitlab2prov -t {token} -f rdf > provout/doc.rdf
provconvert -infile provout/doc.rdf -outfile provout/doc.provn

@sdruskat
Copy link
Author

sdruskat commented Apr 1, 2022

I'm not that familiar with click yet. Do you mean command chaining or multi value options / multiple options? And how would you redirect the output to three different files?

Multiple options is what I meant, see the example in the first comment.

As for the redirection question, I haven't looked at the source code yet to understand how you write files. It was my assumption that you build an internal model, then dump that into files...

@sdruskat I don't know if I understand your example correctly. I'll try to walk myself through it, correct me if i got it wrong. You want to be able to explicitly state multiple serialization formats and get gitlab2prov to serialize the generated provenance graph in all stated formats with one file per format, right? ✅ Allowing multiple formats for one command line flag is not the issue, that can be done with click, argparse or comparable libs. But how would you redirect the output to three different files? Is that even possible in a bash shell?

I guess this would include adding serialization functionality in gitlab2prov directly, including perhaps an argument (or fixed value) to take a pattern for naming files.

E.g. gitlab2prov ... -f rdf -f json -f xml and then (perhaps multithreaded):

def write(outformat):
    with open(filename + '.' + outformat, 'w') as outfile:
        outfile.write(data)

for outformat in formats:
    write(outformat)

I'd need to look at the code to suggest something more concrete. The idea is that the retrieval step is done only once, instead of having to do it n times for n formats.

@cdboer cdboer added this to the Multi-Format Serialization milestone May 8, 2022
@cdboer
Copy link
Collaborator

cdboer commented May 12, 2022

I implemented multi-format mode as a cli subcommand in the branch linked in this issue. Usage with multiple formats now looks like this:

gitlab2prov -p {url} -t {token} multi-format -f json xml rdf -o provout/example

A config file for multi-format mode now looks like this:

[GITLAB]
project_urls = project_a_url, project_b_url
token = token

[OUTPUT]
format = json, rdf, xml
outfile = provout/example

[MISC]
profile = False
verbose = False
double_agents = path/to/alias/mapping
pseudonymous = False

The config option outfile is required when you want to serialize in multiple formats.

@sdruskat
Copy link
Author

I implemented multi-format mode as a cli subcommand in the MR linked in this issue.

Nice one! That should fix this issue. Looking forward for this to be merged.

@sdruskat
Copy link
Author

I guess the docs in the README should still be updated in the branch to reflect changes?

@cdboer
Copy link
Collaborator

cdboer commented May 12, 2022

Yes, I'll update the README before merging.

@cdboer
Copy link
Collaborator

cdboer commented Jun 12, 2022

@sdruskat @LynnvK multi-format mode is now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants