Support multiple outputs from a single scan #789

pombredanne · 2017-10-04T20:53:17Z

It would be nice to have multiple outputs created at once. Today this requires re-running a scan as a scan has only a single output file.

sschuberth · 2017-10-05T07:57:24Z

This probably should be supported via something like --format json,spdx-tv.

pombredanne · 2017-10-05T16:00:54Z

@sschuberth How would you pass the path to the output file for each format then?
What about something like this?
--format-json foo/bat/scan.json --format-spdx-tv this/that/scan.spdx

sschuberth · 2017-10-06T09:03:13Z

Ah, sorry, I was too much thinking about how our own pre-processor takes command line arguments. We have a --format option that takes a comma-separated list of formats, and an --output-dir option that specifies the single output directory for all formats. The actual file base name is derived from the input directory / file.

If that's not flexible enough, I could also imagine --format to take a comma-separated list of file paths, and the output format is derived from the file extension, failing if it's not any of the known extensions.

I believe both suggestions above are more generic than having separate --format-XXX options for format XXX. Esp. if plugins may add new formats, is it even possible for plugins to add new command line options to the main program?

pombredanne · 2017-10-10T03:22:58Z

@sschuberth you wrote

We have a --format option that takes a comma-separated list of formats, and an --output-dir option that specifies the single output directory for all formats. The actual file base name is derived from the input directory / file.

I have thought of this approach, but that sounds to me as difficult to understand and removes control by the user of which files are created exactly. Passing a list of comma-separated file paths and inferring the format based on this feels also like a possible source of confusion..

I find it though to get the right CLI UI/UX here!
I wonder if there are other CLI tools that allow to specify multiple output and how they deal with it?

Or may be this is the wrong way to look at this entirely? What if instead we had ONLY a json output in ScanCode and a few additional small tools that could convert this output to another format (CSV, HTML, SPDX, etc)?

sschuberth · 2017-10-10T07:09:07Z

What if instead we had ONLY a json output in ScanCode and a few additional small tools that could convert this output to another format

That's what we do right now, after all, as we need your own "flavor" of SPDX-TV. While that approach certainly has its advantages, the one thing that I don't like about it is that you lose visibility over what output formats are supported. I simply like it when scancode --help tells me about that.

pombredanne · 2017-10-10T18:43:30Z

@sschuberth so you want scancode --help to tell you about all the available formats, as it should.
Right now, the help list only brief format "code names" right after the --format option.
If we were to use multiple options such as --json <path to output file>, -- spdx-tv <path to output file> we could still have a compact display in the help and these could still be grouped together.

Current way

  output:
    --strip-root           Strip the root directory segment of all paths. The
                           default is to always include the last directory
                           segment of the scanned path such that all paths have
                           a common root directory. This cannot be combined with
                           `--full-root` option.
    --full-root            Report full, absolute paths. The default is to always
                           include the last directory segment of the scanned
                           path such that all paths have a common root
                           directory. This cannot be combined with the `--strip-
                           root` option.
    -f, --format <format>  Set <output_file> format to one of: csv, html, html-
                           app, json, json-pp, jsonlines, spdx-rdf, spdx-tv or
                           use <format> as the path to a custom template file
                           [default: json]
    --verbose              Print verbose file-by-file progress messages.
    --quiet                Do not print summary or progress messages.

New way could be this:

  output:
  Available output formats   csv, html, html-app, json, json-pp, jsonlines,
                             spdx-rdf, spdx-tv or custom-format
                             [default: json]

    --json <file>            Save results to <file> in JSON compact format.
    --json-pp <file>         Save results to <file> in JSON pretty printed format.
    --jsonlines <file>       Save results to <file> in JSON lines format.
    --csv <file>             Save results to <file> in CSV format.
    --spdx-tv <file>         Save results to <file> in SPDX Tag-Value format.
                             This forces the --info scan option.
    --spdx-rdf <file>        Save results to <file> in SPDX RDF XML format.
                             This forces --info scan option.
    --html <file>            Save results to <file> in HTML format.
    --custom-template <file> Use the <file> as a custom format template [default:xyz]
    --custom  <file>         Save results to <file> in using a custom template format.
    --strip-root             Strip the root directory segment of all paths. The
                             default is to always include the last directory
                             segment of the scanned path such that all paths have
                             a common root directory. This cannot be combined with
                             `--full-root` option.
    --full-root              Report full, absolute paths. The default is to always
                             include the last directory segment of the scanned
                             path such that all paths have a common root
                             directory. This cannot be combined with the `--strip-
                             root` option.
    --verbose                Print verbose file-by-file progress messages.
    --quiet                  Do not print summary or progress messages.

You would still have a brief summary. Could something like this work for you?

eclipsewebmaster · 2017-10-24T18:04:10Z

Subscribed, that would be useful for us. Thank you

pombredanne · 2017-10-25T21:39:14Z

@sschuberth any feedback on my last comment?
@eclipsewebmaster any input on the proposed user experience?

sschuberth · 2017-10-26T07:49:23Z

Sorry for my late feedback, @pombredanne. I somewhat like your suggested "New way" command line help, however, I'd prefer to have all output format options prefixed with --output-*, like --output-json, --output-json-pp etc. to better explain what the format is about. In exchange, we probably can omit the explicit grouping of command line options.

Finally, I don't think there's much value in a separate brief summary.

pombredanne · 2017-10-26T08:35:49Z

@sschuberth Thank you! FWIW the grouping comes for free from the fact that these are output plugins. I like your explicit --output prefix anyway much better and no summary needed.

As a side note (as least on POSIX) we could also add shell completion (which is built in @mitsuhiko Click library)

* each plugin just process a Codebase (though there is still some code left to cleanup) * renamed options to --output * multiple outputs are now possible for #789 Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne · 2018-02-07T16:16:58Z

Fixed in #885 !

pombredanne · 2018-02-09T05:54:27Z

Actually I have second thoughts about the new options names and their output prefix.
Here is what we have in develop after #885 merge:

 output formats:
    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --output-csv FILE       Write scan output as CSV to FILE.
    --output-html FILE      Write scan output as HTML to FILE.
    --output-custom FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.
    --output-html-app FILE  Write scan output as a mini HTML application to
                            FILE.
    --output-spdx-rdf FILE  Write scan output as SPDX RDF to FILE.
    --output-spdx-tv FILE   Write scan output as SPDX Tag/Value to FILE.

And I wonder if it would not be better to have instead:

    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --csv FILE              Write scan output as CSV to FILE.
    --html FILE             Write scan output as HTML to FILE.
    --output-custom FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.
    --html-app FILE         Write scan output as a mini HTML application to
                            FILE.
    --spdx-rdf FILE         Write scan output as SPDX RDF to FILE.
    --spdx-tv FILE          Write scan output as SPDX Tag/Value to FILE.

@sschuberth @eclipsewebmaster ping: any preferences? If not I might go with the later

sschuberth · 2018-02-09T12:00:40Z

What I don't like about the first proposal is that the "--json*" options inconsistently have no "--output" prefix, but I guess there's no way to change that due to backwards compatibility.

What I don't like about the second proposal is that --output-custom, in contrast to the other options, still has the "--output" prefix. Granted, just --custom might be a bit nondescriptive, so how about --custom-output instead? That would group nicely then with --custom-template as both have the "--custom" prefix to emphasize they should be used together.

So bottom line, if --output-custom was renamed to --custom-output I'd go with the second proposal.

pombredanne · 2018-02-09T13:25:25Z

re:

What I don't like about the first proposal is that the "--json*" options inconsistently have no "--output" prefix, but I guess there's no way to change that due to backwards compatibility.

This breaking compatibility in terms of CLI api anyway so we can do whatever we want alright!

So let update things for custom... note that we could also have multiple forms assigned to each option (e.g. -j, --json, --output-json could very much all be available too.

(Side note the latest Click will likely have built-in bash completion per https://github.com/pallets/click/blame/5210849f8fd0678998723014c711c93af6edd5ce/click/_bashcomplete.py thanks to @stopthatcow afaik)

sschuberth · 2018-02-09T14:17:04Z

note that we could also have multiple forms assigned to each option

I would not over-do it. Let's have one long option for any of these, and one short option only for the most commonly used ones.

mjherzog · 2018-02-09T18:27:33Z

I agree that we need to be cautious about short options since there are so many options. And the better solution for "remembering" the most commonly used options is a configuration file.

pombredanne · 2018-02-09T19:04:36Z

@mjherzog indeed. This is tracked in #520

pombredanne · 2018-03-03T03:40:12Z

Carrying over from #959 @JonoYang commented a day ago
Some output formats are prefixed with "output" while others are not. For the sake of consistency and simplicity, "output" should be dropped as a prefix to these output options.

Some output formats were prefixed with "output" while others were not. For the sake of consistency and simplicity, "output" is dropped as a prefix to these output options. Signed-off-by: Yash Nisar <[email protected]>

pombredanne · 2018-03-23T03:44:16Z

So here the latest proposal on this based on @sschuberth excellent proposal in #789 (comment) and a little reordering:

    --json FILE             Write scan output as compact JSON to FILE.
    --json-pp FILE          Write scan output as pretty-printed JSON to FILE.
    --json-lines FILE       Write scan output as JSON Lines to FILE.
    --csv FILE              Write scan output as CSV to FILE.
    --spdx-rdf FILE         Write scan output as SPDX RDF to FILE.
    --spdx-tv FILE          Write scan output as SPDX Tag/Value to FILE.
    --html FILE             Write scan output as HTML to FILE.
    --html-app FILE         Write scan output as a mini HTML application to
                            FILE.
    --custom-output FILE    Write scan output to FILE formatted with the custom
                            Jinja template file.
    --custom-template FILE  Use this Jinja template FILE as a custom template.

mjherzog · 2018-03-23T15:32:54Z

The consistent use of FILE in the help text seems to make it clear that these are all output options. +1

Some output formats were prefixed with "output" while others were not. For the sake of consistency and simplicity, "output" is dropped as a prefix to these output options. Signed-off-by: Yash Nisar <[email protected]>

pombredanne · 2018-04-12T15:59:43Z

#961 is merged in develop now.

pombredanne · 2018-08-06T09:59:57Z

I am doing some cleanup and review as a prep for the 3.0 release... and since this is fixed in develop, I am closing this!

Shaker1978 · 2019-02-08T15:27:15Z

Excuse me, this thread here was returned to me as a google result. Is there ANY documentation about the contents of the custom template file written in Jinja? I have no clue about the variable / object names that can be used in such a Jinja template of a scancode result... Tried many expressions from other google results like {{ results }} or {{ files_count }} (because it says so in the json result) - I can't get more than an empty file with the custom output.

pombredanne · 2019-02-11T17:00:29Z

@Shaker1978 good point ... there is little or no doc for that. The only "doc" would be the example default HTML output template at https://github.com/nexB/scancode-toolkit/blob/bec80ec900a69285ea4e545dc0755061243a5638/src/formattedcode/templates/html/template.html and the code that renders the Jinja template at https://github.com/nexB/scancode-toolkit/blob/bec80ec900a69285ea4e545dc0755061243a5638/src/formattedcode/output_html.py#L165

I reckon this is pretty crass and light. Let's have a ticket to track this!

pombredanne added enhancement GUI and outputs labels Oct 4, 2017

pombredanne mentioned this issue Oct 4, 2017

Pluggable output formats #381

Closed

pombredanne mentioned this issue Oct 24, 2017

Include Version Number of ScanCode Toolkit in Reports #814

Closed

pombredanne mentioned this issue Nov 16, 2017

Allow a plugin to provide more than one option #787

Closed

pombredanne mentioned this issue Jan 17, 2018

New plugin architecture #885

Merged

pombredanne closed this as completed Feb 7, 2018

pombredanne reopened this Feb 9, 2018

pombredanne mentioned this issue Mar 3, 2018

Simplify output option names #959

Closed

yash-nisar mentioned this issue Mar 6, 2018

Simplify output option names #789 #961

Merged

pombredanne added the fixed pending review label Apr 12, 2018

pombredanne closed this as completed Aug 6, 2018

pombredanne mentioned this issue Feb 11, 2019

Document data available to HTML templates #1359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple outputs from a single scan #789

Support multiple outputs from a single scan #789

pombredanne commented Oct 4, 2017

sschuberth commented Oct 5, 2017

pombredanne commented Oct 5, 2017

sschuberth commented Oct 6, 2017

pombredanne commented Oct 10, 2017

sschuberth commented Oct 10, 2017

pombredanne commented Oct 10, 2017

eclipsewebmaster commented Oct 24, 2017

pombredanne commented Oct 25, 2017

sschuberth commented Oct 26, 2017

pombredanne commented Oct 26, 2017

pombredanne commented Feb 7, 2018

pombredanne commented Feb 9, 2018

sschuberth commented Feb 9, 2018

pombredanne commented Feb 9, 2018

sschuberth commented Feb 9, 2018

mjherzog commented Feb 9, 2018

pombredanne commented Feb 9, 2018

pombredanne commented Mar 3, 2018

pombredanne commented Mar 23, 2018

mjherzog commented Mar 23, 2018

pombredanne commented Apr 12, 2018

pombredanne commented Aug 6, 2018

Shaker1978 commented Feb 8, 2019

pombredanne commented Feb 11, 2019

Support multiple outputs from a single scan #789

Support multiple outputs from a single scan #789

Comments

pombredanne commented Oct 4, 2017

sschuberth commented Oct 5, 2017

pombredanne commented Oct 5, 2017

sschuberth commented Oct 6, 2017

pombredanne commented Oct 10, 2017

sschuberth commented Oct 10, 2017

pombredanne commented Oct 10, 2017

eclipsewebmaster commented Oct 24, 2017

pombredanne commented Oct 25, 2017

sschuberth commented Oct 26, 2017

pombredanne commented Oct 26, 2017

pombredanne commented Feb 7, 2018

pombredanne commented Feb 9, 2018

sschuberth commented Feb 9, 2018

pombredanne commented Feb 9, 2018

sschuberth commented Feb 9, 2018

mjherzog commented Feb 9, 2018

pombredanne commented Feb 9, 2018

pombredanne commented Mar 3, 2018

pombredanne commented Mar 23, 2018

mjherzog commented Mar 23, 2018

pombredanne commented Apr 12, 2018

pombredanne commented Aug 6, 2018

Shaker1978 commented Feb 8, 2019

pombredanne commented Feb 11, 2019