-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generation performance #918
Comments
The CPU profile: |
Another CPU profile: My first idea for improving things is: can we deduplicate the work of loading package information by allowing a single gqlgen command to generate the schemas for all 5 of the servers so that the dependency graph doesn't have to be re-generated 5 times? |
Yeah packages is pretty slow, it shells out to go list. Do you have any cgo? If so try to put an interface in front of the package so gqlgen doesnt need to walk down that far. |
Hmm... yeah, we do have cgo somewhere down the rabbit hole of dependencies. It's likely imported somewhere. Let me know if there's an easy way to confirm. |
I think you can run with |
Ok, so I have some good news and some bad news. The bad news is that I think that the plugins API needs to be changed. The good news is that I think I improved performance by at least 10x for non-trivial projects. I was able to speed up our go generate in CI by 1.5 mins. FindingsThe main insight that I got after trial and error and messing around with profiling and modifying the gqlgen code is that the vast majority of the time is spent in packages.Load and in build.Import. I'm opening a pull request to show the changes that I made to reduce the time it takes to run gqlgen on my (weak) laptop from 60 seconds to 6. See #926 The main change I made was to remove redundant calls to package.Load and load all relevant packages up front. This required some changes and may have broken some features, but at least for us it didn't change the output. NameForPackage has to be changed from a global function with a global cache to a struct so that it can use the package list that's pre-loaded. I also found the post-validation kind of redundant and it was causing yet another packages.Load, so I disabled it. We don't use models generation and so I disabled it rather than trying to adapt it to the new upfront loading change. Immediate action items
Other opportunities for performance gainspacakges.Load in my PR ended up having Imports pruning is taking up a solid 30+% of the time after my optimizations. This can be further improved by tracking which imports are referenced some other way during generation time rather than doing a post-processing step. Another idea: Loading the configs for all 5 of our servers at the same time, doing a single Beyond this, it's just a matter of shuffling bytes around effectively, maybe improving templating efficiency, avoiding ReadAll, etc. There's no need to go there yet. |
I'll copy my comments over from that PR so this issue stays up to date: Nice exploration, good to see someone dig in here. Load packages only once at the startSadly, I dont think its going to be this straightforward. The reason for multiple packages.Load calls is plugins change the code on disk for a package (eg modelgen). If we want to go to a single pass we need a way for plugins to invalidate and refetch some packages, this didnt exist in packages when the code was written but maybe it does now. One of the ideas we threw around at the time was using one of the long running language servers, which should handle reloading packages better - eg gopls or guru. But we would need to be careful that go generate can still run cleanly inside a docker container where inotify events can be quite when using docker for mac and volume mounts. Allow for models generation to be disabledI'm happy to add this to the default binary behind a flag but we need to make sure it keeps working. You can already do it now if you roll your own binary, just dont add the modelgen plugin. QualifyPackagePath shouldn't be used in go mod modeSadly, I think you can still use vendor with modules. Maybe we can check if thats not set and modules is turned on and make this function a no-op? I wouldnt want to drift away from using build for when it is needed though, path resolution in vendor mode is really complicated and trying to keep an implentation in sync with the users go version is going to cause heaps of bugs. Allow for optionally disabling post-generation validationIts probably still worth leaving on by default, because it does surface issues binding to user code that isnt correct (eg a hand written interface model that doesnt satisfy the interface). But maybe these cases should be treated as bugs and we should be validating them more during generation. Imports pruning is taking up a solid 30+% of the time after my optimizations.In templates we can have hardcoded references to packages (eg graphql.Response), this makes the templates easier to write and read. We could add something to track it like
instead of
But I think it would be quite error prone, and makes it quite hard to read. Loading the configs for all 5 of our servers at the same time, doing a singleI think this is probably better addressed by using gopls or guru to maintain some persistent cache rather than us trying to do it ourselves. |
Thanks for your thorough consideration of each part! Load packages only once at the start / Allow for models generation to be disabled /Allow for optionally disabling post-generation validationThis was really most of the gains. If it ultimately comes down to 1 load for certain optional plugins and 1 load for the rest of the system, we would still be in great shape compared to now. Changing the plugins to work with one pass is clearly a non-starter right now, but if we consolidated all the non-plugin package.Load calls, we would get most of the wins. With the two new flags and this, we'd get huge gains. My only fear here is that to do the single packages.Load here I had to move QualifyPackagePath shouldn't be used in go mod modeUnfortunately, this was was pretty important... It might even be more important than the packages.Load. I just tried changing only this and nothing else and it by itself gives me a 2x improvement. If you think this is too dangerous to auto-detect, maybe we can make it a flag? Imports pruning is taking up a solid 30+% of the time after my optimizations.Sure, let's not do this yet, but getImport is not a bad idea because hardcoded imports in templates might cause more problem than just this. What if there's an import name conflict? What if there's a local variable with the same name? (I guess the latter is still a problem actually) using gopls / guru / generating multiple servers at onceI would love to know what this would mean. Were you thinking of starting an instance of the gopls server binary and making requests to it over its API? Or something else? Seems like this would be a pretty massive change one way or another, so I'm ok with not digging too deep into this yet given how many low hanging fruits we already have. |
If we can find a good way to detect both go modules and
With more thought, I think this might actually be impossible to remove. its not just removing imports its also doing the go formatting pass too.
the |
Actually, I just realized that it's very hard to know if the BTW, I think for detecting if we are in a go module, execing In the worst case scenario, I'd be okay with the lightweight "go mod" mode being opt-in through a flag, but I do understand how insanely confusing it would be to know whether or not to use the flag. Is it possible for gqlgen to somehow not have to care about whether the package it's inspecting is vendored or not?
You can see in the trace below that we are double parsing the file by writing it to disk and then reading it. Maybe this is just because we are using the limited interfaces provided by golang.org/x/tools. I'd like to point out that if we could manipulate the AST directly rather than using text templating, we'd be able to just write it out once with By the way, I tried out Anyway, these are small gains compared to the main 2 changes I'm hoping for: packages.Load and QualifyPackagePath. Let's not get distracted from those fixes.
Ah, makes sense. Thanks. |
We already have code for finding the go.mod file in the
Yeah maybe its better to opt into -mod=vendor, I wonder how many poeple actually uses it? Its probably also safe to assume they are used to dealing with these issues at some point, so may be more likely to search for the flag.
I opened an issue complaining that packages forces this on its users, but it didnt get fixed. I wish I could link to it, but it seems x/tools github repo now has turned off issues. 🎉
Thats surprising, that code is actually just a copy of |
Actually, sorry, I'm getting some diffs in the generated output on my repo. I'll have to really make sure each change doesn't affect the output because it's very easy to break things with PRs like this. |
Ok, it works now. There was a bug in #945 that's fixed now. |
I've merged the low risk stuff to get a release out. Will take a closer look at the remaining ones for next release. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
What happened?
We've been using gqlgen for a long time and it has been taking almost 1.5 mins to generate from scratch now and 30 seconds to re-generate with no changes. We have 5 graphql servers that share a lot of resolvers. In total this is still multiple minutes in the best case. This is the slowest part of our
go generate
process.The "from scratch" number if after running
go clean -r ./...
.The machine I'm running on is linux with an SSD and 16 GB ram and 4 cores.
In the from scratch case, it used 600MB memory and in the no changes case, 200MB.
On a less powerful machine, I've given up entirely on running gqlgen.
What did you expect?
Generation time should be measured in seconds, not minutes.
Minimal graphql.schema and models to reproduce
Our project is very large and closed source. I can't share too much. I'll share some stats for one of the servers:
versions
I'm looking for a few things:
Let me know if you guys have a chat I can join so we can discuss ideas.
Edit: here's the output from
time -v
for the best case scenario with no changes:The text was updated successfully, but these errors were encountered: