-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: cmd/go: use the go
version declared in the go.mod
file to determine module boundaries and checksums
#30369
Comments
@bcmills I think your alternative proposal is spot on:
I think the key insight is a hash is only valid if you both define the hash algorithm and what the input is defined to me. In simple situations, such as a hash of a file or chunk, this is easy, you define the hash to be the sequence of bytes of the file. But in Go, this hash is used over many file over many folders. The hash version should reflect the input, not just the algorithm. I could easily see people changing an embedded go version in tools or manually for one reason or another and mostly of the time nothing would break. But then in a corner case (hash change), doing so would break go.sum. Don't make this a hidden dependency, tie directly to the hash version. |
Note that the “input” in this case may be a If a user with an older If we change the name of the hash function, then that property will not hold: the older But the problem is even worse than that. Since the extraction algorithm changes the contents of the zipfile itself, a zipfile extracted using a newer Go version wouldn't necessarily match the checksum computed by an older version, and the files needed to recompute the correct checksum wouldn't necessarily even be present in the file (the change to the algorithm may have pruned them out). So we would still need some mechanism for older clients to determine whether to record their own checksum: changing the name of the hash function is still only a partial solution. |
On the other hand, it might be nice for older-version clients to be able to record the checksums for probably-incorrectly-extracted versions as well. (If we know that “the copy of module However, I would argue that that should be part of the path information stored in the So perhaps we could add some sort of path suffix (like we do today for the |
Regardless of how you handle the issue of versioning hash algorithms, it does seem like any update to the hash aglo or verification process should also trigger a new point release on older supported versions of Go, and older versions of Go without the point release should be smart enough to be able to reliably detect that something with the hash algo or process has been changed, that it can't understand it, and report it to the user and suggest an update. For security reasons, it seems like it would probably need to "fail" oh unknown hash algo, but at least the user messaging wouldn't be "don't trust go.sum" but "update your Go install". So I support another benefit of incrementing the hash algo number as opposed to the go version, is that older clients would have a clear signal "oh, I don't understand this, upgrade me". Rather then, oh, that's a newer version of Go, maybe something changed? |
@kardianos Part of the point of this proposal is that older clients won't need to upgrade, as long as they're getting their modules from an up-to-date module proxy. If the hash function hasn't changed and the zipfile format hasn't changed, why should the client consuming that zipfile and hashing it with that function need to change? But perhaps we could adjust the behavior a bit. Perhaps we should only fail to record checksums for newer versions, but still verify them: we could emit the “you need to upgrade” warning instead of “checksum mismatch” only if the checksum fails and the required version is newer (as in #28221). Then, the cases would be:
The interesting case is:
|
go
version declared in the go.mod
file to determine module boundaries and checksums
Edited the proposal per the above comment, with the “fail open” behavior. (I'm open to arguments that we should choose “fail closed” instead, but given the assumption that we don't have a checksum at all I'm skeptical of the benefit — especially given that that should be a very rare condition.) |
@kardianos, note that that is not the behavior today. We emit a warning if the go/src/cmd/go/internal/modfetch/fetch.go Lines 417 to 419 in 73b803e
|
Here's how I'm thinking about this (correct me if I have anything wrong):
The goal of this proposal is to make it so that new versions of Z can be introduced without breaking older clients. This works by tying Z to the |
I guess what I'm not clear on is what the current stated purpose of the In particular, it doesn't seem like there's any mechanism for a client at an older version (say go1.12) to exclude modules that declare a newer version (say go1.20). That means that an old client may need to verify zip files produced by a newer proxy. That seems okay under this proposal. However, if an old client is not using a proxy at all, they won't be able to verify Another concern: what if you want to change Z between minor versions? What if it changes multiple times during development of a new Go version? Different sums will be produced before and after each change. |
|
They will, but only for the subset of modules whose extraction isn't affected by the change. So that doesn't give us free reign in how we change |
Under this proposal, that is not permissible.
As long as most modules during development are still on the old version, that's fine. That suggests that we should be careful to bump the default |
I don't think we should add this complexity. Instead we should make the module extraction as simple as possible. Yes, there was a bug involving symlinks, and we were able to fix it because modules were very young. Now the algorithm is the algorithm. Let's leave it there. Any bugs that remain are now features. |
Will this algorithm ever be exposed or formally documented? Or will all proxies forever must use |
@marwan-at-work, proxies should forever use |
At the moment the problems (and bugs) we have encountered relating to module boundaries has not been severe enough to warrant the extreme churn this proposal would cause. I am withdrawing it until (and unless) such a severe problem is found. |
This proposal has been declined as retracted. |
Summary
go
version declared in thego.mod
file to determine the boundaries of the module's source code.go.mod
loaded during a build, regardless of its version.Background
In the fix for #27093, we changed the module loader to drop symlinks in repositories when converting them to modules.
That changed the contents of some modules, and therefore their hashes, and rendered the contents of some existing
go.sum
files invalid (#29278). In retrospect, that was a mistake: we should never give users a reason to delete or otherwise mistrust theirgo.sum
files, because that undermines the very purpose ofgo.sum
files: a checksum mismatch should be treated as a potential security threat, not just a bug in thego
tool.At some point, we will probably find another bug in module extraction, or decide to make a change in how we compute module boundaries (such as ignoring
go.mod
files intestdata
directories for #27852, or pruning outvendor
directories for #30240). If and when we do, we should be careful not to break existinggo.sum
files.This proposal attempts to build on #28221 to provide a safe means to make such changes.
Detail
Just as the
go
version determines the semantics in effect for the compiler, it should also determine the semantics of the module loader. A given release of thego
tool may understand how to load arbitrarily many versions, and patch releases for older versions may even support newer versions.If the
go
version used to extract the module does not support thego
version declared by that module, fetch the module according to the closest supported version instead. If we have an existing checksum for the module and it does not match, fail with an “unsupportedgo
version” warning. If we do not have an existing checksum, mark the module as provisional and do not record the new checksum (per #28835).h1:
toh2:
— even though the checksum algorithm itself doesn't need to change — to indicate the reliability of that checksum. (Anh1
checksum might indicate a correctly-computed sum for an incorrectly-extracted module.)However, do continue to record and verify checksums for all
go.mod
files regardless of thego
version in use.go
version declared in thatgo.mod
file: otherwise, an attacker could inject a module using a known-unsupportedgo
version in order to disable source verification.In the
.Info
files served by module proxies, include both the version of thego
tool used to extract the module, and thego
language version actually selected by that tool. For example, if thecmd/go
binary fromgo 1.15.2
only supports the semantics ofgo 1.13
and above, and is used to extract a module that declaresgo 1.12
, the.info
file would indicate:This allows module proxies to serve up-to-date checksums even for older or newer clients: if the proxy indicates that the module was extracted using an appropriate
go
version, then the client can still verify that thezip
file matches the recorded checksum, and can still add the checksum to itsgo.mod
file — even though it cannot reproduce thatzip
file by re-extracting that module from the origin.Edits:
go
version declared in thego.mod
file to determine module boundaries and checksums #30369 (comment).(CC @rsc @jayconrod @FiloSottile @hyangah @heschik @katiehockman)
The text was updated successfully, but these errors were encountered: