Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Umbrella Issue #1409

Closed
Poincare opened this issue Mar 13, 2017 · 59 comments
Closed

Performance Umbrella Issue #1409

Poincare opened this issue Mar 13, 2017 · 59 comments

Comments

@Poincare
Copy link
Contributor

Poincare commented Mar 13, 2017

This issue is meant to describe current tasks related to improving Apollo Client store performance and serve as a summary about the information we currently know about Apollo Client's perf. characteristics.

The primary issue that we are concerned with at the moment is store read performance. This is probably the operation that Apollo Client performs the most since apps don't write new results or send new queries over the network that often. We know a few pieces of information about the store read performance as measured through the benchmarks.

No-fetch query vs diffQueryAgainstStore

There are two ways we can look at cache read performance. We can either fire a client.query({ noFetch: true }) call or we can directly call diffQueryAgainstStore. The former also counts the time it takes to set up all of the surrounding logic (e.g. set up and tear down of the ObservableQuery, delivering the result to the query observers, etc.) whereas the latter does not.

So, we if we compare these on reading a GraphQL result that contains a list of objects out of the store, we can tell whether most of the time is spent on denormalization or in just setting up the stuff surrounding query. Results:

graph

As evident from above, query takes ~10-15x the time as diffQueryAgainstStore to read a result with the same number of associated objects. This probably implies that our logic within QueryManager and ObservableQuery can be improved and also probably means that the de-normalization that lives inside diffQueryAgainstStore isn't the primary culprit, especially for the counts of items considered in the above plot.

REC

As mentioned in #1242, referential equality checking (REC) imposes some cost on the store read times as measured through a no-fetch query. Here are the reproduced results:

graph

The blue line is with REC and the pink line is without. We can also see that resultMapper and addPreviousResultToIdValues (both tied to REC, afaik) play a significant role in the CPU profile for the noFetch query benchmark:

cpu profile

I'm not sure if this perf. hit is expected or if it's a consequence of an implementation bug. If it is the former, we can just offer an option to allow the application developer to turn off REC. That way, the tradeoff between preventing a React re-render and going through REC can be made at the application level.

Tasks

  1. Investigate why a noFetch query takes so long in comparison to diffQueryAgainstStore and make it better (currently working on this)
  2. Figure out why REC takes time
  3. (Maybe?) Attempt to improve graphql-anywhere performance for diffQueryAgainstStore

Comments would be great.

@wmertens
Copy link
Contributor

wmertens commented Mar 13, 2017

I must say this is surprising to me, in my testing the flamegraph showed that a lot of time was being spent rebuilding the query from cache.

I had a quick look at the benchmarks but could not immediately figure out what they were doing.

It would be nice if the benchmarks included a configuration as follows:

  • a query returning an array of 500 small objects without unique id like {index: $i, text: $loremIpsum + ${i}}
  • a query returning an array of 500 large objects with unique id, including unique sub objects and arrays without ids {id, subObj: {a: 5, b: "meep", c: $i}, links: [$i + 3, $i - 3, $i * 2], combo: [{a: $i}, {a: $i*2}, {a: $i*3}], lotsOfText: $loremIpsum + $i}
  • 50 queries with variable $itemId returning smallish objects $itemId => ({id: $itemId, ts: Date.now(), title: $itemId title, content: $loremIpsum})

@Poincare
Copy link
Contributor Author

@wmertens The benchmarks currently implement the first out of the three you mention. What would be the difference we'd expect to see between the first and the second or the first and the third? It seems that the third would have more to do with updating query observers than actual store read time.

@wmertens
Copy link
Contributor

wmertens commented Mar 13, 2017

Those are the 3 use cases in my app :)

However, the first 2 don't really get re-run, their base components are always mounted. So it's the 3rd one that is the killer.

Come to think of it, the third case is not actually correct. In my app, the 3rd case actually augments the data from the 2nd case, requesting extra data for the same object ids. So even though they don't request much themselves, they do act on big objects, not sure if that is related?

@Poincare
Copy link
Contributor Author

So, with the third case, do you have 50 active query observers that get delivered an update? Is that the part that's taking a long time? If so, there's a benchmark for that as well. Here's what the results look like:

results

As you can see, this is a totally separate problem from store reads. It turns out that the amount of time it takes to deliver updates to "n" subscribers scales super linearly in "n". Do you think this is related to the issue within your app?

@wmertens
Copy link
Contributor

wmertens commented Mar 13, 2017 via email

@Poincare
Copy link
Contributor Author

Hm. I honestly have no idea why that's happening in your specific case - if you can write a short benchmark that captures the behavior you're running into and produces similar times, I'd be happy to look into it more deeply.

@nss213
Copy link

nss213 commented Mar 17, 2017

So I think my data structures sound similar to @wmertens and I'm also experiencing data perf problems with diffQueryAgainstStore. I have fiddled with the benchmarks but haven't got them to demonstrate the issue. In the browser, however, I'm delivering 3 queries of about 20-30 objects each. Those objects each have 10-20 subobjects in an array field, and those subobjects have 4 string fields.
One thing that improved performance for me was to change my graphql schema to expose the subobjects as 'scalar' JSONObjects rather than being a typed GraphQLObject.
screen shot 2017-03-17 at 12 43 58 pm

@helfer
Copy link
Contributor

helfer commented Mar 17, 2017

@nss213 that's interesting. How much of a difference did you observe with scalars vs. nested objects?

@nss213
Copy link

nss213 commented Mar 18, 2017

@helfer It's a rough measurement, but I'm seeing 200-400ms for my test with the typed subobject and 150-200ms with the JSONType for the subobject. So perhaps we are seeing a symptom but not the root cause. The top-level object which resides in an array is still typed. I think perhaps we're seeing something similar to @michalkvasnicak in issue #1342

@Poincare
Copy link
Contributor Author

Poincare commented Mar 18, 2017

Hmm. The fact that the scalar JSON objects helped probably means that the denormalization is slow but in a different way than what the benchmarks are currently measuring. Could you give us a copy of the queries and an idea of what the results look like?

Secondly, it seems like diffQueryAgainstStore is being run twice on a single call of fetchQuery. I don't recall if this is intended behavior or not; @helfer?

It would also be great to see what your attempts at producing a benchmark have lead to. What do the results look like if you change the diffQueryAgainstStore benchmark to something that reads the kind of object you're describing?

@nss213
Copy link

nss213 commented Mar 19, 2017

So far I've taken my entire response body and slammed it into the benchmark in place of the "hotel reservations" -- it doesn't seem to make the benchmark any slower for diffQueryAgainstStore but I can get the actual query benchmark to take 18ms or so (in node 6). I think we need to be able to run the benchmark in the browser -- not sure what config is needed to do that. Is that something you have done?

My guess is diffQueryAgainstStore is getting run twice because there are probably 2 listeners for the query.

I'll get you some numbers and more detail once I put a test together.

@Poincare
Copy link
Contributor Author

There isn't a slick way to run the benchmark in browser yet, but I expect that just linking the compiled equivalent of benchmark/index.ts in an HTML document might do the trick.

@nss213
Copy link

nss213 commented Mar 19, 2017

Ok, well I wasn't able to get those running in a short amount of time so I just cobbled together a browser-based test based on some webpack boilerplate. I assume there are much better ways of putting these things together, but I'm not up to speed on all the community libs.
https://github.com/nss213/apollo-perf-test
You can see the test in app/main.js, app/test.js, and the data in app/data.js.
Just do the usual npm install and npm start

Use the chrome profiling tools to make the flame graph -- it looks very similar to the one I posted earlier from my own app.

There are a few interesting things to note here:

  • I've not setup any listeners or anything, just executed the query like the benchmark does, and we sometimes see multiple calls to diffQueryAgainstStore for single call to broadcastQueries and sometimes we see multiple calls to broacastQueries as well.
  • for the lifetime of a single ApolloClient in the test, reading the query from the store is slower each time -- my real app also shows this behavior
  • diffQueryAgainsStore seems to get slower and slower as well.

Now, it's totally possible that I wrote the test incorrectly, but this seems to match my actual application. Let me know if this helps, or what else I can do.
screen shot 2017-03-19 at 1 04 02 am

@helfer
Copy link
Contributor

helfer commented Mar 20, 2017

Thanks for the flame graph @nss213. It seems to indicate that diffQueryAgainstStore is a problem after all, so if we can't make that more performant, we'll have to look into other ways of propagating cache updates, for example by tracking dependency paths of a query in Apollo Client.

@Poincare
Copy link
Contributor Author

Poincare commented Mar 20, 2017

Thanks so much for the flame graph! This helps a lot.

I have a hunch that this must be an implementation issue within diffQueryAgainstStore because in an idealized implementation, the denormalization execution time should be completely independent of the store size and number of queries executed but this clearly doesn't seem to be the case. I'll look into this ASAP.

@nss213
Copy link

nss213 commented Mar 20, 2017

Glad I could help - If you will be using the test I gave up above, I do suggest reviewing it to make sure this graph isn't an artifact of a faulty test :)

@Poincare
Copy link
Contributor Author

@nss213 The test does seem to be valid to me. I'm running into some very weird behavior, however. If I run a test for reading a single object with 500 subjects on Safari, I get the following results:

safari results

As you can see, the read times are quite reasonable in magnitude and don't increase over the tests. On the other hand, this is the same test on Chrome:

chrome results

And there we see the horrible read times and an increase in the read times. I'm not sure what exactly this means in terms of the solution (maybe something GC related?) but it is certainly thought provoking. Any ideas would be appreciated.

@helfer
Copy link
Contributor

helfer commented Mar 24, 2017

That is indeed very intriguing @Poincare!

@nss213
Copy link

nss213 commented Mar 24, 2017

the read times are quite reasonable in magnitude

IMO 10 to 80 ms is still too slow since multiple queries will add up and tie up the CPU for that entire time. 200ms will be clear, perceptible stutter in the UI.

I'm indeed able to replicate the difference in perf with Safari and Chrome that you are seeing.

@Poincare
Copy link
Contributor Author

Poincare commented Mar 24, 2017

After spending some time adding console.log's in the various function calls, I believe I have an explanation for what is happening.

The list this.queryListeners within QueryManager continues to grow for each query that is fired which gives us the increasing query read times. This occurs because of this particular block within ObservableQuery. We're using setTimeout in order to execute the call just after the end of function but Chrome seems to decide to execute this call much later and the unsubscribe call isn't processed until much later. On the other hand, Safari handles these calls on a much shorter timespan. So, on Chrome the queryListeners array grows quickly whereas it does not Safari which gives us the different performance behaviors.

I'm not familiar enough with the setTimeout(() => { ... }, 0) pattern to be able to tell if this should happen but is there some way we could replace that something like Go's defer statement which is guaranteed to execute after the end of the function? Alternatively, we can have a "listener reaper" that comes along every few cycles and removes query listeners that are no longer in use.

@nss213 I agree that 10-80ms is still quite slow but I believe we can improve on these running times with smaller optimizations, e.g. turning off REC (which seems to take up ~25% of total running time on those particular tests). Those are probably a separate set of issues than the one I describe above in this comment.

@helfer
Copy link
Contributor

helfer commented Mar 24, 2017

@Poincare can you link to the block within ObservableQuery that you're referring to?

@Poincare
Copy link
Contributor Author

I did - see "this particular block" in the text above. Markdown error was preventing it from being rendered as a link earlier.

@nss213
Copy link

nss213 commented Mar 24, 2017

@Poincare that's some excellent analysis; the story makes sense. Might you be able to get away with unsubscribing without a timeout altogether? Does the unsubscription actually have to wait until the this particular call-stack is complete?

@helfer
Copy link
Contributor

helfer commented Mar 24, 2017

@nss213 @Poincare I'm fairly sure we could use setImmediate in that position as well instead of setTimeout(fn, 0). Could you try that @Poincare?

@nss213
Copy link

nss213 commented Mar 24, 2017

Pretty sure setImmediate is nonstandard https://developer.mozilla.org/en-US/docs/Web/API/Window/setImmediate

@helfer
Copy link
Contributor

helfer commented Mar 24, 2017

Oh, I didn't realize. Then we'll have to look into whether we can run the result function without going through subscribe().

@Poincare
Copy link
Contributor Author

Poincare commented Mar 24, 2017

I don't think the core issue is the result function since exactly the same issue will occur when we use watchQuery directly. I think a reasonable approach is to separately run the logic that has to do with QueryManager's list of queries and the logic that has to do with tearing down the subscription. This way, we can run remove the query from the QueryManager's list of queries before the return of that anonymous function, i.e. without the need for setTimeout.

To check whether this would work, I put in a delete _this.queryManager.stopQuery(_this.queryId); into the code just before the setTimeout and the Chrome benchmark times now look like this:

Chrome times

These are a bit higher than normal on average due to the extra logging I'm doing but the average magnitude of the times is still significantly smaller and the values don't increase over time. It also makes sense that the benchmarks in the Apollo Client repo didn't catch this, because V8 in the browser is handling setTimeout very differently from Node due to the fact that the browser has to concern itself with rendering, network requests, etc. So, given the evidence, I'm pretty confident this is the underlying issue.

@abergenw
Copy link
Contributor

I'm currently experiencing performance problems as a result of running mutations. Typically, when I run a mutation the store gets updated 3 times; before the mutation is sent to the server (regardless of optimisticResponse or not), after the mutation response has been received and finally after refetchQueries response has been received.

This is problematic for two reasons. First, and most importantly, updating the store seems to be a slow process at least in my case. It doesn't matter if no data is actually changed (usually the first 2 updates explained above), the performance is still the same.

Second, Apollo doesn't seem to expose an API to minimize these store updates as a result of a mutation. In many cases what I actually want to do is to fire a mutation and if successful refetch all my queries (basically client.resetStore()) and only after this perform one single update of the store.

This being said, here's a CPU profile clarifying my point. I fire a mutation and as a result you can see 3 updates to the store (the 3 CPU spikes), the third one being highlighted. They are all rather slow; the first two ones ~100ms even though they don't cause any store data to actually change and the third one (shown in the image) ~180ms (refetchQueries, causing data to actually change). This is all caused by apollo as I'm preventing my react components from actually updating in this example.

My normalized cache contains ~700 objects most of which denormalized are nested at least a few levels.

Any suggestions anyone?

pasted image at 2017_04_18 09_44 pm

@helfer
Copy link
Contributor

helfer commented May 4, 2017

Hey @abergenw, that sounds awesome! 👏

Some of the optimizations sound like they would break functionality that other people rely on, but I think the denormalized query cache and extra mutation hooks sound like we could integrate them quite smoothly! I'd love to have each of those as a separate PR. The smaller the PR, the faster we can review it, and the greater the chance that we can merge it!

For the other things, I think there's also a chance to make them work, but we'd have to look at them carefully to see if we can minimize the impact it will have on current use-cases.

Really excited to see those PRs! 👍

I'm sure @Poincare will have some thoughts as well.

@Poincare
Copy link
Contributor Author

Hey. Sorry for my late response on this - was caught up with finals. These changes sound great @abergenw. I'm particularly excited by the cache associated with the query since it probably eliminates a lot of calls to diffQueryAgainstStore given how most applications use the query cache currently.

I think there's also a lot of work to be done around managing how state is held with QueryManager. There appear to still be several cases in which query listeners seem to pile up with the QueryManager and result in unnecessary reads from the store. I'll post more details here once I have something more concrete.

@stevewillard
Copy link
Contributor

@abergenw I would be curious to see your changes. I am experiencing similar issues with large result sets (5k-10k items in an array). The most time consuming time is spent writing/diffing/reading from the store.

screen shot 2017-05-26 at 9 58 19 am

@abergenw
Copy link
Contributor

@stevewillard you can find my performance related improvements here:

https://github.com/abergenw/apollo-client/tree/performance-improvements

Most of the improved performance is a result of the query cache. More on this in #1673.

@stevewillard
Copy link
Contributor

@abergenw BTW I tried your branch and it noticeably faster in my app (3x-5x faster for large arrays). Nice work! I'll be following your PR.

@nevir
Copy link
Contributor

nevir commented Jun 5, 2017

Slight side track, but fundamentally it seems like Apollo is going to hit a wall with the way it juggles normalized vs materialized nodes.

Even with this (second) query cache, there is a significant initialization cost per query (e.g. diffQueryAgainstStore and friends being called the first time). If you have, say, a mobile app that retrieves several large subgraphs when mounting a scene/route, it's very difficult to avoid jank during the initial read.

Anecdotally, our mobile app needs to suck in a heavily nested initial set of data (approx 20kb when serialized as JSON). Apollo spends 500ms+ performing initial reads on a Pixel across ~10 (watch) queries that have some overlap. Sorry, I don't have better numbers or examples yet; still working on instrumenting it.


From what I can tell, the core of problem it is that Apollo (and Relay) both err on the side of correctness: each query must return exactly what was requested - which prevents them from sharing a lot of work between common subgraphs. E.g. if Apollo has all the data available to satisfy a query, it still has to materialize a new result from the normalized store, every single time.

An alternative approach that may be worth considering is to loosen the correctness constraint a bit: return results that satisfy a query, but may also contain other un-requested properties. E.g. if there are two queries: { foo }, and { bar }, both could potentially be given the same object, with both foo and bar populated.

This would allow Apollo to have to keep only two copies of each node: once for its normalized form, and another for the materialized form. At that point, you effectively have an identity map of every node in the graph, and it's very quick to return results from the store. It gets slightly more tricky when you're talking about nodes with parameterized queries in them, but prototype chains can help there.

There is a risk of developers accidentally relying on properties they didn't request, to be sure. But, it seems like the trade off may be worth it - and could be potentially mitigated by a dev-mode strict mode much like how Apollo operates today.

@wmertens
Copy link
Contributor

wmertens commented Jun 5, 2017

@nevir could not agree more, see also some musings on the subject at #1300 (comment)

@helfer
Copy link
Contributor

helfer commented Jun 5, 2017

@nevir with the new store we're working on you'll be able to make these kinds of tradeoffs yourself. If consistency isn't too important, you can plug in a store that simply caches the whole query and returns it when exactly that query is requested. From there one can add more optimizations like working out consistency in the background if desired. Happy to involve you in the design, if you're interested.

@nevir
Copy link
Contributor

nevir commented Jun 5, 2017

@helfer yeah, would definitely be curious about it - do you have any docs/thoughts written up on it?

@nss213
Copy link

nss213 commented Jun 29, 2017

In the meantime, has anyone come up with some good performance workarounds for "I want to display a filterable list of medium sized objects"? Ideally something that doesn't involve changing the schema (e.g. making the data structure an opaque json blob).

@koenpunt
Copy link
Contributor

koenpunt commented Jun 29, 2017

Would making use of requestIdleCallback help? I'm currently using that to defer mutations during navigating in a React Native app, which works quite well.

@stale
Copy link

stale bot commented Jul 20, 2017

This issue has been automatically marked as stale becuase it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to Apollo Client!

@chompomonim
Copy link

Dear bot, please don't touch this issue. It's quite important one.

@jbaxleyiii
Copy link
Contributor

@chompomonim you can now tell the bot to not mark an issue using labels. Take a look at #1924 for more information!

@elie222
Copy link

elie222 commented Jul 22, 2017

Yes, we've run into this issue too. Apollo being super slow for us. Need to work around this somehow. Not sure how right now.

@nerdmed
Copy link

nerdmed commented Aug 11, 2017

@helfer any news on this one or any plans you guys could share regarding apollo performance? We also found some performance issues on slower devices that totally break the UX.

@ephemer
Copy link

ephemer commented Aug 11, 2017

Adding some more info to avoid this issue being closed: we're having issues with Apollo Client performance in React Native on older devices (anything older than an iPhone 6). Part of this is probably due to JSCore not allowing JIT outside of WKWebView, leaving us with a pretty poor performance baseline for JS to begin with. We're investigating too.

@elie222
Copy link

elie222 commented Aug 11, 2017 via email

@stale
Copy link

stale bot commented Sep 1, 2017

This issue has been automatically marked as stale becuase it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to Apollo Client!

@nerdmed
Copy link

nerdmed commented Sep 2, 2017

Dear bot, please don't touch this issue. It's quite important one. 👍

@stale stale bot removed the no-recent-activity label Sep 2, 2017
@jbaxleyiii
Copy link
Contributor

The 2.0 has around a 5x performance jump and new cache implementations (like apollo-cache-hermes) can increase this jump even more! I think its time to close this issue for the 1.* line and open a new one for the 2.0 if / when needs come up!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests