-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Umbrella Issue #1409
Comments
I must say this is surprising to me, in my testing the flamegraph showed that a lot of time was being spent rebuilding the query from cache. I had a quick look at the benchmarks but could not immediately figure out what they were doing. It would be nice if the benchmarks included a configuration as follows:
|
@wmertens The benchmarks currently implement the first out of the three you mention. What would be the difference we'd expect to see between the first and the second or the first and the third? It seems that the third would have more to do with updating query observers than actual store read time. |
Those are the 3 use cases in my app :) However, the first 2 don't really get re-run, their base components are always mounted. So it's the 3rd one that is the killer. Come to think of it, the third case is not actually correct. In my app, the 3rd case actually augments the data from the 2nd case, requesting extra data for the same object ids. So even though they don't request much themselves, they do act on big objects, not sure if that is related? |
So, with the third case, do you have 50 active query observers that get delivered an update? Is that the part that's taking a long time? If so, there's a benchmark for that as well. Here's what the results look like: As you can see, this is a totally separate problem from store reads. It turns out that the amount of time it takes to deliver updates to "n" subscribers scales super linearly in "n". Do you think this is related to the issue within your app? |
Actually, I only have 20, and they take 2s to update :-/
…On Mon, Mar 13, 2017, 5:49 PM Dhaivat Pandya ***@***.***> wrote:
So, with the third case, do you have 50 active query observers that get
delivered an update? Is that the part that's taking a long time? If so,
there's a benchmark for that
<https://github.com/apollographql/apollo-client/blob/master/benchmark/index.ts#L128>
as well. Here's what the results look like:
[image: results]
<https://camo.githubusercontent.com/a599c2ec6c99bb9a2b0afd82c3221d35f1bdbde1/68747470733a2f2f706c6f742e6c792f2537454468616976617450616e6479612f32332e706e67>
As you can see, this is a totally separate problem from store reads. It
turns out that the amount of time it takes to deliver updates to "n"
subscribers scales super linearly in "n". Do you think this is related to
the issue within your app?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1409 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADWlkaCIz-nlIeUcufJ6TmGci2Z3j7Oks5rlXOtgaJpZM4Ma3xM>
.
|
Hm. I honestly have no idea why that's happening in your specific case - if you can write a short benchmark that captures the behavior you're running into and produces similar times, I'd be happy to look into it more deeply. |
So I think my data structures sound similar to @wmertens and I'm also experiencing data perf problems with |
@nss213 that's interesting. How much of a difference did you observe with scalars vs. nested objects? |
@helfer It's a rough measurement, but I'm seeing 200-400ms for my test with the typed subobject and 150-200ms with the JSONType for the subobject. So perhaps we are seeing a symptom but not the root cause. The top-level object which resides in an array is still typed. I think perhaps we're seeing something similar to @michalkvasnicak in issue #1342 |
Hmm. The fact that the scalar JSON objects helped probably means that the denormalization is slow but in a different way than what the benchmarks are currently measuring. Could you give us a copy of the queries and an idea of what the results look like? Secondly, it seems like It would also be great to see what your attempts at producing a benchmark have lead to. What do the results look like if you change the |
So far I've taken my entire response body and slammed it into the benchmark in place of the "hotel reservations" -- it doesn't seem to make the benchmark any slower for My guess is I'll get you some numbers and more detail once I put a test together. |
There isn't a slick way to run the benchmark in browser yet, but I expect that just linking the compiled equivalent of |
Ok, well I wasn't able to get those running in a short amount of time so I just cobbled together a browser-based test based on some webpack boilerplate. I assume there are much better ways of putting these things together, but I'm not up to speed on all the community libs. Use the chrome profiling tools to make the flame graph -- it looks very similar to the one I posted earlier from my own app. There are a few interesting things to note here:
Now, it's totally possible that I wrote the test incorrectly, but this seems to match my actual application. Let me know if this helps, or what else I can do. |
Thanks for the flame graph @nss213. It seems to indicate that |
Thanks so much for the flame graph! This helps a lot. I have a hunch that this must be an implementation issue within |
Glad I could help - If you will be using the test I gave up above, I do suggest reviewing it to make sure this graph isn't an artifact of a faulty test :) |
@nss213 The test does seem to be valid to me. I'm running into some very weird behavior, however. If I run a test for reading a single object with 500 subjects on Safari, I get the following results: As you can see, the read times are quite reasonable in magnitude and don't increase over the tests. On the other hand, this is the same test on Chrome: And there we see the horrible read times and an increase in the read times. I'm not sure what exactly this means in terms of the solution (maybe something GC related?) but it is certainly thought provoking. Any ideas would be appreciated. |
That is indeed very intriguing @Poincare! |
IMO 10 to 80 ms is still too slow since multiple queries will add up and tie up the CPU for that entire time. 200ms will be clear, perceptible stutter in the UI. I'm indeed able to replicate the difference in perf with Safari and Chrome that you are seeing. |
After spending some time adding The list I'm not familiar enough with the @nss213 I agree that 10-80ms is still quite slow but I believe we can improve on these running times with smaller optimizations, e.g. turning off REC (which seems to take up ~25% of total running time on those particular tests). Those are probably a separate set of issues than the one I describe above in this comment. |
@Poincare can you link to the block within |
I did - see "this particular block" in the text above. Markdown error was preventing it from being rendered as a link earlier. |
@Poincare that's some excellent analysis; the story makes sense. Might you be able to get away with unsubscribing without a timeout altogether? Does the unsubscription actually have to wait until the this particular call-stack is complete? |
Pretty sure setImmediate is nonstandard https://developer.mozilla.org/en-US/docs/Web/API/Window/setImmediate |
Oh, I didn't realize. Then we'll have to look into whether we can run the |
I don't think the core issue is the result function since exactly the same issue will occur when we use To check whether this would work, I put in a These are a bit higher than normal on average due to the extra logging I'm doing but the average magnitude of the times is still significantly smaller and the values don't increase over time. It also makes sense that the benchmarks in the Apollo Client repo didn't catch this, because V8 in the browser is handling |
I'm currently experiencing performance problems as a result of running mutations. Typically, when I run a mutation the store gets updated 3 times; before the mutation is sent to the server (regardless of This is problematic for two reasons. First, and most importantly, updating the store seems to be a slow process at least in my case. It doesn't matter if no data is actually changed (usually the first 2 updates explained above), the performance is still the same. Second, Apollo doesn't seem to expose an API to minimize these store updates as a result of a mutation. In many cases what I actually want to do is to fire a mutation and if successful refetch all my queries (basically This being said, here's a CPU profile clarifying my point. I fire a mutation and as a result you can see 3 updates to the store (the 3 CPU spikes), the third one being highlighted. They are all rather slow; the first two ones ~100ms even though they don't cause any store data to actually change and the third one (shown in the image) ~180ms ( My normalized cache contains ~700 objects most of which denormalized are nested at least a few levels. Any suggestions anyone? |
Hey @abergenw, that sounds awesome! 👏 Some of the optimizations sound like they would break functionality that other people rely on, but I think the denormalized query cache and extra mutation hooks sound like we could integrate them quite smoothly! I'd love to have each of those as a separate PR. The smaller the PR, the faster we can review it, and the greater the chance that we can merge it! For the other things, I think there's also a chance to make them work, but we'd have to look at them carefully to see if we can minimize the impact it will have on current use-cases. Really excited to see those PRs! 👍 I'm sure @Poincare will have some thoughts as well. |
Hey. Sorry for my late response on this - was caught up with finals. These changes sound great @abergenw. I'm particularly excited by the cache associated with the query since it probably eliminates a lot of calls to I think there's also a lot of work to be done around managing how state is held with |
@abergenw I would be curious to see your changes. I am experiencing similar issues with large result sets (5k-10k items in an array). The most time consuming time is spent writing/diffing/reading from the store. |
@stevewillard you can find my performance related improvements here: https://github.com/abergenw/apollo-client/tree/performance-improvements Most of the improved performance is a result of the query cache. More on this in #1673. |
@abergenw BTW I tried your branch and it noticeably faster in my app (3x-5x faster for large arrays). Nice work! I'll be following your PR. |
Slight side track, but fundamentally it seems like Apollo is going to hit a wall with the way it juggles normalized vs materialized nodes. Even with this (second) query cache, there is a significant initialization cost per query (e.g. Anecdotally, our mobile app needs to suck in a heavily nested initial set of data (approx 20kb when serialized as JSON). Apollo spends 500ms+ performing initial reads on a Pixel across ~10 (watch) queries that have some overlap. Sorry, I don't have better numbers or examples yet; still working on instrumenting it. From what I can tell, the core of problem it is that Apollo (and Relay) both err on the side of correctness: each query must return exactly what was requested - which prevents them from sharing a lot of work between common subgraphs. E.g. if Apollo has all the data available to satisfy a query, it still has to materialize a new result from the normalized store, every single time. An alternative approach that may be worth considering is to loosen the correctness constraint a bit: return results that satisfy a query, but may also contain other un-requested properties. E.g. if there are two queries: This would allow Apollo to have to keep only two copies of each node: once for its normalized form, and another for the materialized form. At that point, you effectively have an identity map of every node in the graph, and it's very quick to return results from the store. It gets slightly more tricky when you're talking about nodes with parameterized queries in them, but prototype chains can help there. There is a risk of developers accidentally relying on properties they didn't request, to be sure. But, it seems like the trade off may be worth it - and could be potentially mitigated by a dev-mode strict mode much like how Apollo operates today. |
@nevir could not agree more, see also some musings on the subject at #1300 (comment) |
@nevir with the new store we're working on you'll be able to make these kinds of tradeoffs yourself. If consistency isn't too important, you can plug in a store that simply caches the whole query and returns it when exactly that query is requested. From there one can add more optimizations like working out consistency in the background if desired. Happy to involve you in the design, if you're interested. |
@helfer yeah, would definitely be curious about it - do you have any docs/thoughts written up on it? |
In the meantime, has anyone come up with some good performance workarounds for "I want to display a filterable list of medium sized objects"? Ideally something that doesn't involve changing the schema (e.g. making the data structure an opaque json blob). |
Would making use of |
This issue has been automatically marked as stale becuase it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to Apollo Client! |
Dear bot, please don't touch this issue. It's quite important one. |
@chompomonim you can now tell the bot to not mark an issue using labels. Take a look at #1924 for more information! |
Yes, we've run into this issue too. Apollo being super slow for us. Need to work around this somehow. Not sure how right now. |
@helfer any news on this one or any plans you guys could share regarding apollo performance? We also found some performance issues on slower devices that totally break the UX. |
Adding some more info to avoid this issue being closed: we're having issues with Apollo Client performance in React Native on older devices (anything older than an iPhone 6). Part of this is probably due to JSCore not allowing JIT outside of WKWebView, leaving us with a pretty poor performance baseline for JS to begin with. We're investigating too. |
Our Android React Native performance is awful. Not sure if this is Apollo
or not. We know for certain that having too many items in the Apollo client
store is a recipe for disaster though. Hopefully 2.0 fixes it.
…On 11 Aug 2017 15:29, "Geordie J" ***@***.***> wrote:
Adding some more info to avoid this issue being closed: we're having
issues with Apollo Client performance in React Native on older devices
(anything older than an iPhone 6). Part of this is probably due to JSCore
not allowing JIT outside of WKWebView, leaving us with a pretty poor
performance baseline for JS to begin with. We're investigating too.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1409 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC8oXyw3Mf43q7g4sjMlyJg9nz1oTKW7ks5sXEkogaJpZM4Ma3xM>
.
|
This issue has been automatically marked as stale becuase it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to Apollo Client! |
Dear bot, please don't touch this issue. It's quite important one. 👍 |
The 2.0 has around a 5x performance jump and new cache implementations (like apollo-cache-hermes) can increase this jump even more! I think its time to close this issue for the 1.* line and open a new one for the 2.0 if / when needs come up! |
This issue is meant to describe current tasks related to improving Apollo Client store performance and serve as a summary about the information we currently know about Apollo Client's perf. characteristics.
The primary issue that we are concerned with at the moment is store read performance. This is probably the operation that Apollo Client performs the most since apps don't write new results or send new queries over the network that often. We know a few pieces of information about the store read performance as measured through the benchmarks.
No-fetch query vs diffQueryAgainstStore
There are two ways we can look at cache read performance. We can either fire a
client.query({ noFetch: true })
call or we can directly calldiffQueryAgainstStore.
The former also counts the time it takes to set up all of the surrounding logic (e.g. set up and tear down of the ObservableQuery, delivering the result to the query observers, etc.) whereas the latter does not.So, we if we compare these on reading a GraphQL result that contains a list of objects out of the store, we can tell whether most of the time is spent on denormalization or in just setting up the stuff surrounding
query
. Results:As evident from above,
query
takes ~10-15x the time asdiffQueryAgainstStore
to read a result with the same number of associated objects. This probably implies that our logic withinQueryManager
andObservableQuery
can be improved and also probably means that the de-normalization that lives insidediffQueryAgainstStore
isn't the primary culprit, especially for the counts of items considered in the above plot.REC
As mentioned in #1242, referential equality checking (REC) imposes some cost on the store read times as measured through a no-fetch
query
. Here are the reproduced results:The blue line is with REC and the pink line is without. We can also see that
resultMapper
andaddPreviousResultToIdValues
(both tied to REC, afaik) play a significant role in the CPU profile for thenoFetch query
benchmark:I'm not sure if this perf. hit is expected or if it's a consequence of an implementation bug. If it is the former, we can just offer an option to allow the application developer to turn off REC. That way, the tradeoff between preventing a React re-render and going through REC can be made at the application level.
Tasks
noFetch query
takes so long in comparison todiffQueryAgainstStore
and make it better (currently working on this)graphql-anywhere
performance fordiffQueryAgainstStore
Comments would be great.
The text was updated successfully, but these errors were encountered: