Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache all packages.Load calls in a central object #988

Merged
merged 1 commit into from
Feb 4, 2020
Merged

Conversation

vektah
Copy link
Collaborator

@vektah vektah commented Jan 15, 2020

The idea here is to have a central proxy to packages.Load that can be invalidated as files are written out, reducing the number of package.Load calls way down.

Would love to get your thoughts @vikstrous.

Alternative to #945 and #943 that includes some fairly heavy refactor.

Updated timing for one of our apps:

v0.10

real 0m7.437s
user 0m17.978s
sys 0m1.751s

master

real 0m5.677s
user 0m10.742s
sys 0m1.205s

package-cache

real 0m2.987s
user 0m5.286s
sys 0m0.608s

@coveralls
Copy link

Coverage Status

Coverage increased (+1.1%) to 64.465% when pulling 9c87c28 on package-cache into c6b3e2a on master.

@vikstrous
Copy link
Contributor

vikstrous commented Jan 19, 2020

I'm seeing pretty different results from you:

package-cache
37.77user 18.94system 0:23.36elapsed 242%CPU (0avgtext+0avgdata 197848maxresident)k
0inputs+0outputs (1major+387685minor)pagefaults 0swaps
43.19user 19.77system 0:26.45elapsed 238%CPU (0avgtext+0avgdata 201328maxresident)k
0inputs+0outputs (1major+372326minor)pagefaults 0swaps

master (ae79e75)
39.62user 22.05system 0:26.89elapsed 229%CPU (0avgtext+0avgdata 195872maxresident)k
0inputs+0outputs (1major+428290minor)pagefaults 0swaps
45.03user 23.43system 0:29.11elapsed 235%CPU (0avgtext+0avgdata 201928maxresident)k
0inputs+0outputs (1major+408359minor)pagefaults 0swaps

v0.10.2
37.58user 21.98system 0:25.15elapsed 236%CPU (0avgtext+0avgdata 196608maxresident)k
0inputs+0outputs (1major+434955minor)pagefaults 0swaps
37.08user 21.68system 0:24.70elapsed 237%CPU (0avgtext+0avgdata 194676maxresident)k
0inputs+0outputs (1major+435000minor)pagefaults 0swaps

v0.10.1 (without skip_validation and without disabling model generation)
132.90user 77.27system 1:24.02elapsed 250%CPU (0avgtext+0avgdata 605844maxresident)k
0inputs+0outputs (1major+1054116minor)pagefaults 0swaps
111.52user 62.92system 1:13.44elapsed 237%CPU (0avgtext+0avgdata 187124maxresident)k
0inputs+0outputs (1major+760962minor)pagefaults 0swaps

no-double-load (rebased on master)
26.90user 10.25system 0:16.76elapsed 221%CPU (0avgtext+0avgdata 197608maxresident)k
0inputs+0outputs (1major+200472minor)pagefaults 0swaps
25.54user 10.30system 0:16.09elapsed 222%CPU (0avgtext+0avgdata 188176maxresident)k
0inputs+0outputs (1major+195789minor)pagefaults 0swaps

@vektah
Copy link
Collaborator Author

vektah commented Jan 19, 2020

I'm seeing pretty different results from you:

The app I'm testing against has:

  • modelgen on: turning it off would be a fairly large chore. Lots of manual wiring would need to be done even to just bring in existing models.
  • autobind off: are you using wildcards in your autoload path? Could that be causing a large difference?
  • validation off

master (ae79e75)

master should include c6b3e2a, I was testing with ec4f6b1

I cant see any reason this should be any slower than no-double-load, it should be loading the same packages but fewer times. Could you try running this branch with this patch applied to log each call to packages.Load?

diff --git a/internal/code/packages.go b/internal/code/packages.go
index b14c45a..2bf10ba 100644
--- a/internal/code/packages.go
+++ b/internal/code/packages.go
@@ -2,6 +2,7 @@ package code
 
 import (
        "bytes"
+       "fmt"
        "path/filepath"
 
        "github.com/pkg/errors"
@@ -43,6 +44,7 @@ func (p *Packages) LoadAll(importPaths ...string) []*packages.Package {
 
        if len(missing) > 0 {
                p.numLoadCalls++
+               fmt.Println("LOAD", missing)
                pkgs, err := packages.Load(&packages.Config{Mode: mode}, missing...)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
@@ -85,6 +87,7 @@ func (p *Packages) LoadWithTypes(importPath string) *packages.Package {
        pkg := p.Load(importPath)
        if pkg == nil || pkg.TypesInfo == nil {
                p.numLoadCalls++
+               fmt.Println("LOAD", importPath)
                pkgs, err := packages.Load(&packages.Config{Mode: mode}, importPath)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
@@ -118,6 +121,7 @@ func (p *Packages) NameForPackage(importPath string) string {
        if pkg == nil {
                // otherwise do a name only lookup for it but dont put it in the package cache.
                p.numNameCalls++
+               fmt.Println("LOAD NAMES", importPath)
                pkgs, err := packages.Load(&packages.Config{Mode: packages.NeedName}, importPath)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
-- 
2.20.1

to see whats actually getting loaded when? compare with something similar on no-double-load?

@vikstrous
Copy link
Contributor

Here's a redacted version of what I'm seeing with your print statements. I added "MISSING" on line 47 to differentiate the two places where loads happen.

LOAD MISSING [x/models]
LOAD MISSING [github.com/99designs/gqlgen/graphql/introspection x/service1resolver x/service2resolver x/service3resolver x/service4resolver github.com/99designs/gqlgen/graphql/introspection]
LOAD github.com/99designs/gqlgen/graphql
LOAD x/redacted1
LOAD x/redacted2
LOAD x/redacted3
LOAD x/redacted4
LOAD x/redacted5
LOAD x/redacted6
LOAD x/redacted7
LOAD x/redacted8
LOAD NAMES github.com/vektah/gqlparser
LOAD MISSING [x/resolverroot x/resolverroot]

I think the big difference between multiple load calls and one big load call is that shared dependencies of the various packages. Our codebase is more tightly coupled than I would like and there's a lot of overlap between dependencies from each call.

I did an experiment:

multiLoadTest:

package main

import "golang.org/x/tools/go/packages"

var mode = packages.NeedName |
	packages.NeedFiles |
	packages.NeedImports |
	packages.NeedTypes |
	packages.NeedSyntax |
	packages.NeedTypesInfo

func main() {
	packages.Load(&packages.Config{Mode: mode}, "redacted")
	...
	packages.Load(&packages.Config{Mode: packages.NeedName}, "github.com/vektah/gqlparser")
	packages.Load(&packages.Config{Mode: mode}, "redacted")
}

singleLoadTest:

package main

import "golang.org/x/tools/go/packages"

var mode = packages.NeedName |
	packages.NeedFiles |
	packages.NeedImports |
	packages.NeedTypes |
	packages.NeedSyntax |
	packages.NeedTypesInfo |
	packages.NeedName

func main() {
	_, err := packages.Load(&packages.Config{Mode: mode}, redacted)
	if err != nil {
		panic(err)
	}
}

multiLoadTest:

29.04user 15.36system 0:17.14elapsed 259%CPU (0avgtext+0avgdata 91152maxresident)k
0inputs+0outputs (1major+377223minor)pagefaults 0swaps
39.74user 20.35system 0:22.61elapsed 265%CPU (0avgtext+0avgdata 89468maxresident)k
0inputs+0outputs (1major+348847minor)pagefaults 0swaps
32.08user 17.35system 0:19.00elapsed 260%CPU (0avgtext+0avgdata 90116maxresident)k
0inputs+0outputs (1major+359011minor)pagefaults 0swaps

singleLoadTest:

6.54user 3.19system 0:03.79elapsed 256%CPU (0avgtext+0avgdata 154904maxresident)k
0inputs+0outputs (1major+109919minor)pagefaults 0swaps
6.70user 3.35system 0:03.98elapsed 252%CPU (0avgtext+0avgdata 153816maxresident)k
0inputs+0outputs (1major+102084minor)pagefaults 0swaps

The conclusion is that this PR actually makes the situation worse, not better. Could you share the results of adding the print statements for your repo? Can you try running my experiment with your set of loads?

@vikstrous
Copy link
Contributor

btw, we have autobind on, validation off and model generation off

@vektah
Copy link
Collaborator Author

vektah commented Jan 19, 2020

Interesting. This PR should be doing a single load. Why aren't redacted being covered by ReferencedPackages in this branch but they are in package-cache?

@vektah
Copy link
Collaborator Author

vektah commented Feb 4, 2020

I'm going to roll forward with this - This gives us speed gains for the apps we can test with and centralizes a bunch of loading logic.

Would love to get to the bottom of why ReferencedPackages isnt covering your deps but that can happen on master.

@vektah vektah merged commit 8ed2ec5 into master Feb 4, 2020
@vektah vektah deleted the package-cache branch February 4, 2020 11:29
cgxxv pushed a commit to cgxxv/gqlgen that referenced this pull request Mar 25, 2022
Cache all packages.Load calls in a central object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants