-
-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create "data dictionary" for all SCTK fields #2008
Comments
From comments in the gitter/discuss channel that would be important:
Tim Hatch @thatch
|
hey @AyanSinhaMahapatra @pombredanne autodoc is an extension of the sphinx which is used to include documentation from docstrings that is available inside the source code, so there are many in-built methods that are available in the autodoc extension like automodule, autoclass and autoexception etc.
For more refer this So like this way, we can only parse that data that we want to show in docs more specifically, since the data we want to show, is only a small part of the classes, i.e. there will be (a lot) of functions and methods also documented which we don't want. which only takes out certain docstrings/class attributes having the documentation for that data field, and creates the doc from there, so like this way we can automate the docs for scancode data |
@Robin025 the information about autodoc/automodule/autoclass is known, but the issue here is, we want to document attribute members of classes, (not all members, only the ones that end up in the result data selectively). Take an example of this. Here in
Now, we aren't documenting functions/entire classes as they exist in scancode now. We have to document these attributes of classes, selectively, and in all of scancode and it's plugins. So, if we have to use the autodoc extension, we have to write one new class each for each of the attributes (there are a lot of them) that we want to document, write the docs in their docstring, and then use autoclass to collect all the docs from there. I'm not opposed to this, but we should consider all the options. If you see the suggestion of @thatch above, The documentation generation part would be easier this way, because nothing has to be done to collect these, but in the original suggestion above yours, it's more cleaner code wise and seems a better way to me, though some exploration has to be done on the collection and docs part. So consider the method suggested in the comment above by @pombredanne , and look into the scancode data by generating some scan results and looking where the data for the attributes are located, and let us know if you have any questions there. |
We need a comprehensive "data dictionary" of all fields that may be present in a ScanCode output file.
The minimum requirement is a list of fields with type (single value, list or ?) and description. This should, of course, be versioned SCTK releases.
We have some of this information for the CSV output at https://scancode-toolkit.readthedocs.io/en/latest/cli-reference/output-format.html#custom-output-format (which may not be current), but we do not have it for the full set of output fields in the JSON output.
There is currently no single file in the codebase (like Django models), but two files seem to contain most of the field definitions:
A first step should be to investigate whether there are existing automated documentation tools for Python that would help us get started.
This Issue supersedes #112 which is pretty stale at this point.
The text was updated successfully, but these errors were encountered: