Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve robustness of linking to license on hosting website #73

Open
Bobgy opened this issue Jun 25, 2021 · 8 comments · May be fixed by #110
Open

improve robustness of linking to license on hosting website #73

Bobgy opened this issue Jun 25, 2021 · 8 comments · May be fixed by #110

Comments

@Bobgy
Copy link
Collaborator

Bobgy commented Jun 25, 2021

In v2, I implemented some utils to get github repo from go-import=1 and use it to generate public & versioned links to detected licenses' hosting website (for now, only github).

I noticed some harder problems:

  1. distinguishing "major branch" and "major subdirectory" conventions

There is one problem: for a major version greater than 1, the templates for “major branch” and “major subdirectory” conventions differ (See https://research.swtch.com/vgo-module for a discussion of these conventions.) To determine the right template, make a HEAD request for the go.mod file using each template, and select the one that succeeds. For example, for module github.com/a/b/v2 at version v2.3.4, probe both github.com/a/b/blob/v2.3.4/go.mod (the location of the go.mod file using the “major branch” convention) and github.com/a/b/blob/v2.3.4/v2/go.mod (its location using “major subdirectory”).

  1. support modules not at root of a repo, example https://github.com/Azure/go-autorest/tree/autorest/v0.9.0. Note that tags are also different, a tag "autorest/v0.9.0" means v0.9.0 version of the module ROOT/autorest. https://github.com/googleapis/google-cloud-go/tree/master/storage is another example, tags for it has "storage/` prefix.
  2. support other source hosting websites

Potential Solution

@wlynch pointed out the following references, there's an internal source package built for pkgsite that exactly provides a package that can figure out repo hosting website of a go import path and get a public link to source code. However, the package is internal, so we cannot directly import it.

I'll ask if they are ready to make it public, or I have to vendor it in some way.

EDIT: the reply is that we need to vendor it: golang/go#40477 (comment).

References

@Bobgy Bobgy changed the title [v2] properly handle both "major branch" and "major subdirectory" conventions [v2] improve robustness of linking to license on hosting website Jun 25, 2021
@Bobgy
Copy link
Collaborator Author

Bobgy commented Jan 5, 2022

I noticed that problem 2 and 3 are mostly solved by pkgsite/source package.
While problem 1 -- distinguishing "major branch" and "major subdirectory" conventions may still cause incorrect remote URLs.

We will still need to leave this issue as open.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jan 23, 2022

Giving a breaking example for case 2 "support modules not at root":

$ go-licenses csv cloud.google.com/go/storage
...
cloud.google.com/go/storage, https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE, Apache-2.0
...

Note the URL https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE is broken, the correct URL should be https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/LICENSE. The problem is caused by the fact that:

  • for modules in a subdir of a repo, when go caches module files and found the submodule does not have a LICENSE file, it "magically" copies LICENSE file from root folder to the sub-module. e.g. https://github.com/googleapis/google-cloud-go/tree/storage/v1.10.0/storage
  • therefore, go-licenses finds a LICENSE file at root of submodule and tries to guess its remote URL as root of submodule, while the actual LICENSE file is at root of repo

Note, adopting pkgsite/source allowed us to get the correct tag storage/v1.10.0 for this repo, but we still hit this LICENSE file path problem.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jan 23, 2022

Examples for problem 1: distinguishing "major branch" and "major subdirectory" conventions

Major branch (result is correct)

Major branch: a new major version is released in a branch, source code is at root of repo.
gopkg.in/yaml.v2
License: https://github.com/go-yaml/yaml/blob/v2.4.0/LICENSE

Major subdirectory (incorrect)

Major subdir: a new major version is released in a subdir in the same branch as v1, source code for v2 is at a subdir ./v2/
github.com/googleapis/gax-go/v2
License: got https://github.com/googleapis/gax-go/blob/v2.1.1/v2/LICENSE, but should be https://github.com/googleapis/gax-go/blob/v2.1.1/LICENSE

Therefore, root cause for this failure example is in fact the same as #73 (comment). The guessed URL is incorrect for module not at the root of a repo.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jan 24, 2022

Added a v2 proposal roadmap item: validate license URL by fetching it, we can detect these failures and turn the URL into unknown or try other locations again and finally verifying file content is exactly the same. With these workarounds, we can mitigate the issue of user unknowingly got an invalid URL.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Feb 3, 2022

Furthermore, we can solve all above broken cases by:

  1. Infer remote license URL as usual
  2. Fetch raw license file from remote, validate it's the same as the locally found license file
  3. If 2 failed, we can further try and validate LICENSE at repo root
  4. If everything failed, return UNKNOWN

@Bobgy Bobgy changed the title [v2] improve robustness of linking to license on hosting website improve robustness of linking to license on hosting website Apr 11, 2022
@dschmidt
Copy link

dschmidt commented Sep 6, 2022

Could you export a (versioned) URL to the root of the repo as well?
Possibly a breaking change to add it to the CSV, but it could be added to the data available to templates.

I'm creating a licenses page in my web app and would like to link the package name to the respective github (or wherever) page.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Sep 6, 2022

Possibly a breaking change to add it to the CSV

The csv format is fixed, I would not modify it.

but it could be added to the data available to templates.

Welcome a PR, this isn't too hard.

@dschmidt
Copy link

dschmidt commented Sep 6, 2022

Okies, already started and have it basically working - unfortunately I won't have time to polish/finish it this/next week, but will do when I get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants