Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple outputs from a single scan #789

Closed
pombredanne opened this issue Oct 4, 2017 · 24 comments
Closed

Support multiple outputs from a single scan #789

pombredanne opened this issue Oct 4, 2017 · 24 comments

Comments

@pombredanne
Copy link
Member

It would be nice to have multiple outputs created at once. Today this requires re-running a scan as a scan has only a single output file.

@sschuberth
Copy link
Collaborator

This probably should be supported via something like --format json,spdx-tv.

@pombredanne
Copy link
Member Author

@sschuberth How would you pass the path to the output file for each format then?
What about something like this?
--format-json foo/bat/scan.json --format-spdx-tv this/that/scan.spdx

@sschuberth
Copy link
Collaborator

Ah, sorry, I was too much thinking about how our own pre-processor takes command line arguments. We have a --format option that takes a comma-separated list of formats, and an --output-dir option that specifies the single output directory for all formats. The actual file base name is derived from the input directory / file.

If that's not flexible enough, I could also imagine --format to take a comma-separated list of file paths, and the output format is derived from the file extension, failing if it's not any of the known extensions.

I believe both suggestions above are more generic than having separate --format-XXX options for format XXX. Esp. if plugins may add new formats, is it even possible for plugins to add new command line options to the main program?

@pombredanne
Copy link
Member Author

@sschuberth you wrote

We have a --format option that takes a comma-separated list of formats, and an --output-dir option that specifies the single output directory for all formats. The actual file base name is derived from the input directory / file.

I have thought of this approach, but that sounds to me as difficult to understand and removes control by the user of which files are created exactly. Passing a list of comma-separated file paths and inferring the format based on this feels also like a possible source of confusion..

I find it though to get the right CLI UI/UX here!
I wonder if there are other CLI tools that allow to specify multiple output and how they deal with it?

Or may be this is the wrong way to look at this entirely? What if instead we had ONLY a json output in ScanCode and a few additional small tools that could convert this output to another format (CSV, HTML, SPDX, etc)?

@sschuberth
Copy link
Collaborator

What if instead we had ONLY a json output in ScanCode and a few additional small tools that could convert this output to another format

That's what we do right now, after all, as we need your own "flavor" of SPDX-TV. While that approach certainly has its advantages, the one thing that I don't like about it is that you lose visibility over what output formats are supported. I simply like it when scancode --help tells me about that.

@pombredanne
Copy link
Member Author

@sschuberth so you want scancode --help to tell you about all the available formats, as it should.
Right now, the help list only brief format "code names" right after the --format option.
If we were to use multiple options such as --json <path to output file>, -- spdx-tv <path to output file> we could still have a compact display in the help and these could still be grouped together.

Current way

  output:
    --strip-root           Strip the root directory segment of all paths. The
                           default is to always include the last directory
                           segment of the scanned path such that all paths have
                           a common root directory. This cannot be combined with
                           `--full-root` option.
    --full-root            Report full, absolute paths. The default is to always
                           include the last directory segment of the scanned
                           path such that all paths have a common root
                           directory. This cannot be combined with the `--strip-
                           root` option.
    -f, --format <format>  Set <output_file> format to one of: csv, html, html-
                           app, json, json-pp, jsonlines, spdx-rdf, spdx-tv or
                           use <format> as the path to a custom template file
                           [default: json]
    --verbose              Print verbose file-by-file progress messages.
    --quiet                Do not print summary or progress messages.

New way could be this:

  output:
  Available output formats   csv, html, html-app, json, json-pp, jsonlines,
                             spdx-rdf, spdx-tv or custom-format
                             [default: json]

    --json <file>            Save results to <file> in JSON compact format.
    --json-pp <file>         Save results to <file> in JSON pretty printed format.
    --jsonlines <file>       Save results to <file> in JSON lines format.
    --csv <file>             Save results to <file> in CSV format.
    --spdx-tv <file>         Save results to <file> in SPDX Tag-Value format.
                             This forces the --info scan option.
    --spdx-rdf <file>        Save results to <file> in SPDX RDF XML format.
                             This forces --info scan option.
    --html <file>            Save results to <file> in HTML format.
    --custom-template <file> Use the <file> as a custom format template [default:xyz]
    --custom  <file>         Save results to <file> in using a custom template format.
    --strip-root             Strip the root directory segment of all paths. The
                             default is to always include the last directory
                             segment of the scanned path such that all paths have
                             a common root directory. This cannot be combined with
                             `--full-root` option.
    --full-root              Report full, absolute paths. The default is to always
                             include the last directory segment of the scanned
                             path such that all paths have a common root
                             directory. This cannot be combined with the `--strip-
                             root` option.
    --verbose                Print verbose file-by-file progress messages.
    --quiet                  Do not print summary or progress messages.

You would still have a brief summary. Could something like this work for you?

@eclipsewebmaster
Copy link

Subscribed, that would be useful for us. Thank you

@pombredanne
Copy link
Member Author

@sschuberth any feedback on my last comment?
@eclipsewebmaster any input on the proposed user experience?

@sschuberth
Copy link
Collaborator

Sorry for my late feedback, @pombredanne. I somewhat like your suggested "New way" command line help, however, I'd prefer to have all output format options prefixed with --output-*, like --output-json, --output-json-pp etc. to better explain what the format is about. In exchange, we probably can omit the explicit grouping of command line options.

Finally, I don't think there's much value in a separate brief summary.

@pombredanne
Copy link
Member Author

@sschuberth Thank you! FWIW the grouping comes for free from the fact that these are output plugins. I like your explicit --output prefix anyway much better and no summary needed.

As a side note (as least on POSIX) we could also add shell completion (which is built in @mitsuhiko Click library)

pombredanne added a commit that referenced this issue Jan 17, 2018
 * each plugin just process a Codebase (though there is still some code
   left to cleanup)
 * renamed options to --output
 * multiple outputs are now possible for #789

Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Member Author

Fixed in #885 !

@pombredanne
Copy link
Member Author

Actually I have second thoughts about the new options names and their output prefix.
Here is what we have in develop after #885 merge:

 output formats:
    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --output-csv FILE       Write scan output as CSV to FILE.
    --output-html FILE      Write scan output as HTML to FILE.
    --output-custom FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.
    --output-html-app FILE  Write scan output as a mini HTML application to
                            FILE.
    --output-spdx-rdf FILE  Write scan output as SPDX RDF to FILE.
    --output-spdx-tv FILE   Write scan output as SPDX Tag/Value to FILE.

And I wonder if it would not be better to have instead:

    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --csv FILE              Write scan output as CSV to FILE.
    --html FILE             Write scan output as HTML to FILE.
    --output-custom FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.
    --html-app FILE         Write scan output as a mini HTML application to
                            FILE.
    --spdx-rdf FILE         Write scan output as SPDX RDF to FILE.
    --spdx-tv FILE          Write scan output as SPDX Tag/Value to FILE.

@sschuberth @eclipsewebmaster ping: any preferences? If not I might go with the later

@pombredanne pombredanne reopened this Feb 9, 2018
@sschuberth
Copy link
Collaborator

What I don't like about the first proposal is that the "--json*" options inconsistently have no "--output" prefix, but I guess there's no way to change that due to backwards compatibility.

What I don't like about the second proposal is that --output-custom, in contrast to the other options, still has the "--output" prefix. Granted, just --custom might be a bit nondescriptive, so how about --custom-output instead? That would group nicely then with --custom-template as both have the "--custom" prefix to emphasize they should be used together.

So bottom line, if --output-custom was renamed to --custom-output I'd go with the second proposal.

@pombredanne
Copy link
Member Author

re:

What I don't like about the first proposal is that the "--json*" options inconsistently have no "--output" prefix, but I guess there's no way to change that due to backwards compatibility.

This breaking compatibility in terms of CLI api anyway so we can do whatever we want alright!

So let update things for custom... note that we could also have multiple forms assigned to each option (e.g. -j, --json, --output-json could very much all be available too.

(Side note the latest Click will likely have built-in bash completion per https://github.com/pallets/click/blame/5210849f8fd0678998723014c711c93af6edd5ce/click/_bashcomplete.py thanks to @stopthatcow afaik)

@sschuberth
Copy link
Collaborator

note that we could also have multiple forms assigned to each option

I would not over-do it. Let's have one long option for any of these, and one short option only for the most commonly used ones.

@mjherzog
Copy link
Member

mjherzog commented Feb 9, 2018

I agree that we need to be cautious about short options since there are so many options. And the better solution for "remembering" the most commonly used options is a configuration file.

@pombredanne
Copy link
Member Author

@mjherzog indeed. This is tracked in #520

@pombredanne
Copy link
Member Author

Carrying over from #959 @JonoYang commented a day ago
Some output formats are prefixed with "output" while others are not. For the sake of consistency and simplicity, "output" should be dropped as a prefix to these output options.

yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Mar 3, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Mar 4, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
@pombredanne
Copy link
Member Author

So here the latest proposal on this based on @sschuberth excellent proposal in #789 (comment) and a little reordering:

    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --csv FILE              Write scan output as CSV to FILE.
    --spdx-rdf FILE         Write scan output as SPDX RDF to FILE.
    --spdx-tv FILE          Write scan output as SPDX Tag/Value to FILE.
    --html FILE             Write scan output as HTML to FILE.
    --html-app FILE         Write scan output as a mini HTML application to
                            FILE.
    --custom-output FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.

@mjherzog
Copy link
Member

The consistent use of FILE in the help text seems to make it clear that these are all output options. +1

yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 10, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 11, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 11, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 11, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 11, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
yash-nisar added a commit to yash-nisar/scancode-toolkit that referenced this issue Apr 12, 2018
Some output formats were prefixed with "output" while others were not.
For the sake of consistency and simplicity, "output" is dropped as a prefix
to these output options.

Signed-off-by: Yash Nisar <[email protected]>
@pombredanne
Copy link
Member Author

#961 is merged in develop now.

@pombredanne
Copy link
Member Author

I am doing some cleanup and review as a prep for the 3.0 release... and since this is fixed in develop, I am closing this!

@Shaker1978
Copy link

Excuse me, this thread here was returned to me as a google result. Is there ANY documentation about the contents of the custom template file written in Jinja? I have no clue about the variable / object names that can be used in such a Jinja template of a scancode result... Tried many expressions from other google results like {{ results }} or {{ files_count }} (because it says so in the json result) - I can't get more than an empty file with the custom output.

@pombredanne
Copy link
Member Author

@Shaker1978 good point ... there is little or no doc for that. The only "doc" would be the example default HTML output template at https://github.com/nexB/scancode-toolkit/blob/bec80ec900a69285ea4e545dc0755061243a5638/src/formattedcode/templates/html/template.html and the code that renders the Jinja template at https://github.com/nexB/scancode-toolkit/blob/bec80ec900a69285ea4e545dc0755061243a5638/src/formattedcode/output_html.py#L165

I reckon this is pretty crass and light. Let's have a ticket to track this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants