-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gomod: Do not attempt to parse transitive dependencies #2880
gomod: Do not attempt to parse transitive dependencies #2880
Conversation
@@ -37,7 +37,7 @@ | |||
describe "parse" do | |||
subject(:dependencies) { parser.parse } | |||
|
|||
its(:length) { is_expected.to eq(6) } | |||
its(:length) { is_expected.to eq(5) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer pull in rsc.io/sampler
since it's not required
|
||
command = "go mod edit -print > /dev/null" | ||
command += " && go list -m -json all" | ||
command = "go get > /dev/null" # Verify the packages are installable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does add a bit of overhead, and as discussed in the PR body, I doubt that it's necessary. The upside is that we fail early in the sad path, the downside is we have to duplicate error handling between this class and the GoModUpdater
, and it adds some overhead in the happy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this slow down the pull request opening latency? I assume beause we're running this twice, this means there will be significant overhead as you meant. I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages
. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this slow down the pull request opening latency?
A bit, it's hard to say how significant it would be, we definitely perform some heavier operations in the actual updating step, but I noticed our tests for this this class increased by ~40 seconds after adding this go get
vs main
, and our test go.mod
files are typically very small. When we remove the go get
, we speed up this step significantly as we do not need to make any network calls.
I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?
I tend to agree! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 👍 on removing go get
from this step, seems like we're overloading the parser by doing resolvability that should be handled by the update checker/file updater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍🏼 Just left a comment about go get
behavior, otherwise I'm glad we changed this bit to more efficient. I think if we don't add go get
and because we removed go list
, this will also significantly improve the time we open pull requests for Go modules.
dependency = dependency_from_details(dep) | ||
dependency_set << dependency if dependency | ||
required_packages.each do |dep| | ||
dependency_set << dependency_from_details(dep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplification here 👍🏼
@@ -78,17 +69,18 @@ def module_info | |||
end | |||
|
|||
File.write("go.mod", go_mod_content) | |||
File.write("main.go", "package dummypkg\n func main() {}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we add this here? I think it would be nice to add some comment maybe explaining the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call, we add it so that we can run go get
, we perform this step in a temp directory where we copy in just the go.mod
and go.sum
files, and without the main.go
we get go get .: path is not a package in module rooted at
.
I agree that we should clarify this if we indeed decide to keep it, but if we get rid of go get
, we can also nuke this step
|
||
command = "go mod edit -print > /dev/null" | ||
command += " && go list -m -json all" | ||
command = "go get > /dev/null" # Verify the packages are installable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this slow down the pull request opening latency? I assume beause we're running this twice, this means there will be significant overhead as you meant. I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages
. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?
I just want to make sure I understand the problem this solves. Does the following situation capture it correctly? Project |
Yes, I think that's exactly the scenario 👍 with the caveat that package |
Sorry for all the questions - I just want to clarify this bit:
By "required", do you mean like "needed by the project", as opposed to I ask because we probably will want to support updating of transitive dependencies at some point. Though if it's causing problems right now, I'm happy to pull it out and we can re-add it down the line. |
No worries!
No, the scenario where this happens is actually a little different, apologies for not spotting that:
In this case, module We still grab all the (have shared the actual dependency chain privately as they contain private repo's) |
So, the downside of this approach is that we do not parse any regular transitive dependencies, only those listed in the Example provided by
Since we do not update transitive dependencies for go modules, I think we should update the FileParser to omit them entirely, and also remove the Alternatively we could parse the package parser
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"path/filepath"
"strings"
"golang.org/x/mod/module"
)
type result struct {
Packages []module.Version
}
func ParseSumfile() (interface{}, error) {
data, err := ioutil.ReadFile("go.sum")
if err != nil {
return nil, err
}
file, err := filepath.Abs("go.sum")
if err != nil {
return nil, err
}
fmt.Println(file)
m, err := readGoSum(file, data)
res := result{Packages: m}
p, err := json.Marshal(res)
if err != nil {
return nil, err
}
return string(p), nil
}
// emptyGoModHash is the hash of a 1-file tree containing a 0-length go.mod.
// A bug caused us to write these into go.sum files for non-modules.
// We detect and remove them.
const emptyGoModHash = "h1:G7mAYYxgmS0lVkHyy2hEOLQCFB0DlQFTMLWggykrydY="
// readGoSum parses data, which is the content of go.sum and returns the parsed module versions
func readGoSum(file string, data []byte) ([]module.Version, error) {
dst := []module.Version{}
lineno := 0
for len(data) > 0 {
var line []byte
lineno++
i := bytes.IndexByte(data, '\n')
if i < 0 {
line, data = data, nil
} else {
line, data = data[:i], data[i+1:]
}
f := strings.Fields(string(line))
if len(f) == 0 {
// blank line; skip it
continue
}
if len(f) != 3 {
return dst, fmt.Errorf("malformed go.sum:\n%s:%d: wrong number of fields %v", file, lineno, len(f))
}
if f[2] == emptyGoModHash {
// Old bug; drop it.
continue
}
mod := module.Version{Path: f[0], Version: f[1]}
dst = append(dst, mod)
}
return dst, nil
} But I would suggest we cross that bridge when we do decide to start updating transitive dependencies. cc @fatih wdyt? |
@jurre the current approach looks good for me if we don't want to return the transitive dependencies. It makes things easier in terms of readability as we only take care of listing the direct dependencies. As a bonus, we can later add a |
I'd rather get rid of all transitive dependencies than parse the |
👍 I agree, that's what I've done in the latest commit FWIW, the Go code I posted was an alternative, but I agree it's probably problematic for the reasons you mentioned |
@jurre if you rebase/force push this branch ci should kick in, accidentally turned off actions while testing their repo select UI last week 🤦 |
e39a7a6
to
7265bd9
Compare
@thepwagner I would love to get your thoughts on the trade-offs made in this PR. We're currently still running |
Previously we would run `go list -m -json all` to generate a list of dependencies to attempt to update. The way that this is implemented, it causes _every_ transitive dependency to be listed, even if they are not required by this project. This can happen when a module defines several packages, but only a subset of those packages is used by the current repository. It will also verify that each of those transitive dependencies is available, and this causes issues when we rely on a dependency with private transitive dependencies. We run into the scenario where we have access to any of the transitive dependencies that are used, but since we do not have access to an _unused_ transitive dependency, we run into a `dependency_file_not_resolvable` error. This is especially unfortunate, since we won't attempt to update _any_ indirect dependencies for go_modules anyway. By using `go mod edit -json` instead, we get an easier to parse list of packages that are _actually_ used in this repo. In order to replicate the previous behavior of raising a `DependencyFileNotResolvable` error when some of the packages are not available, we now also run a `go get`, but it's debatable if we even need to do this, since this error will be raised in subsequent steps (in the `FileUpdater`), and not raising it here means that we can consolidate our error handling for this scenario in one place. Co-authored-by: Fatih Arslan <[email protected]>
Also cleans up any error handling that can only occur from running `go get`, or any other command that actually downloads the modules.
7265bd9
to
18d48b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig the FileParser
's reduction in scope to literally parsing the file.
We could probably do that without affecting transitive dependency support, but per discussion transitive dependencies will take out-of-scope follow-ups no matter what. This LGTM for what's supported today.
dependency = dependency_from_details(dep) | ||
dependency_set << dependency if dependency | ||
required_packages.each do |dep| | ||
dependency_set << dependency_from_details(dep) unless dep["Indirect"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is skipping Indirect
dependencies a requirement of this change?
I get that they're filtered out later, but is that behaviour correct?
I don't think we should scope creep into supporting indirect dependencies in this PR - more surprised that we're not currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is skipping Indirect dependencies a requirement of this change?
It's more of a consequence, really. go list -m
only includes transitive dependencies that are included in the go.mod
file, which apparently are only transitive dependencies that aren't go modules.
We filter indirect dependencies out much more explicitly in the UpdateChecker though, and seems to be a deliberate limitation of dependabot currently.
Since #2880 , we no longer detect unavailable git dependencies in the `FileParser`. We didn't add the functionality to catch them downstream: in the `UpdateChecker` and `FileUpdater`. This is the former.
Previously we would run
go list -m -json all
to generate a list ofdependencies to attempt to update.
The way that this is implemented, it causes every transitive
dependency to be listed, even if they are not required by this project.
This can happen when a module defines several packages, but only a
subset of those packages is used by the current repository.
It will also verify that each of those transitive dependencies is
available, and this causes issues when we rely on a dependency with
private transitive dependencies. We run into the scenario where we have
access to any of the transitive dependencies that are used, but since we
do not have access to an unused transitive dependency, we run into a
dependency_file_not_resolvable
error.This is especially unfortunate, since we won't attempt to update any
indirect dependencies for go_modules anyway.
By using
go mod edit -json
instead, we get an easier to parse list ofdependencies, and we grab just the direct ones.
In order to replicate the previous behavior of raising a
DependencyFileNotResolvable
error when some of the packages are notavailable, we now also run a
go get
, but it's debatable if we evenneed to do this, since this error will be raised in subsequent steps (in
the
FileUpdater
), and not raising it here means that we canconsolidate our error handling for this scenario in one place.
Co-authored-by: Fatih Arslan [email protected]