Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for recommendation for Pagination + Watching flow #1393

Closed
bharath2020 opened this issue Sep 16, 2020 · 7 comments
Closed

Looking for recommendation for Pagination + Watching flow #1393

bharath2020 opened this issue Sep 16, 2020 · 7 comments
Labels
caching question Issues that have a question which should be addressed

Comments

@bharath2020
Copy link
Contributor

(Have this question posted in Spectrum chat as well, Adding here for better reach. Apologize for noise.)

Problem

We have done some experiments in adopting pagination and encountered a few issues. I am looking for a recommendation on how to extract paginated results that do achieve two of my main goals. For the context, we have a mobile application that has screens that show paginated content that allows infinite vertical scrolling.

  1. Perform pagination in a responsible manner without any lag on the UI
  2. Ability to watch any changes to the items retrieved across all pages

For example, assume the following schema

type PageResult {
   cursor: String
   hasMore: Bool
   items: [String]
}
query {
   getItems(cursor: String): PageResult
}

Here are the approaches we tried:

Approach 1: Follow the suggestion as per the Apollo iOS tutorial blog

  1. Load the first page with no cursor with cache policy that writes data back to the cache
  2. Load the second page with the cursor and follow the same for the next pages with cache policy that writes data back to the cache

Issues with Approach 1

1. Zombie records:

Following the query path approach to generate cache key, the First page will be written under cache key "QUERY_ROOT.getPage-cursor", while subsequent pages will hold the items under the cache key "QUERY_ROOT.getItems-cursor-page2cursor", "QUERY_ROOT.getItems-cursor-page3cursor", and so on..

In our case, The page cursors are generated run time and only valid until the first page is refreshed again to get a new cursor. Following this approach, We notice that all the pages that were previously fetched will never be deleted after re-fetching the first page and hence get accumulated as the user fetches more pages and re-fetches leading to larger cache size over time. Not to mention the more number of records, the higher the read time.

2. Cannot watch updates to items from the second page onwards:

We want to set up a GraphQLQueryWatcher on the paginated query, such that any changes to the items, including items from all the pages retrieved so far. However, Looking at the implementation of the GraphQLQueryWatcher, it appears that with this approach, Watcher will notify only for the items returned from the first page since dependent keys for the watcher includes only keys from the first page.

Approach 2: Manually merge data from subsequent pages into the first page

In order to solve the Approach #1 Issue #2, We took inspiration from Apollo react Pagination which does the following:

  1. Load the first page with no cursor with cache policy that writes data back to the cache
  2. Load the second page, and so on, with fetchIgnoringCacheCompletely cache policy and manually write the data into the First-page query.

Issues with Approach 2

1. Writes are slow

As we can notice in Step #2, we merge the previous page with data pulled from the next page and re-write the entire data back. As we fetch more pages, the write latency increases.

2 Latency of reading the first page exponentially increases

Since we merge data from all pages into the first page, reading the first page exponentially increases based on the number of items present in the cache. This is more evident with queries that have an increasing number of attributes that are non-scalar data types in the query as the batch loader does round trip to the database for each attribute. With our sample of data, the read time for a query with medium complex schema has clocked 10 seconds to read 150 items on an iPhone XS device.

@designatednerd designatednerd added caching question Issues that have a question which should be addressed labels Sep 17, 2020
@designatednerd
Copy link
Contributor

Hey @bharath2020 - just a heads up that I'm bogged down at the moment but this is a very good question that I need to work on a better answer to (and that we should have a way better answer to overall for mobile), and I think messing around with the tutorial is probably a good place to start with it. And you are very correct that 10 seconds to read 150 items is 🤮.

In terms of what you can try while I'm digging out from under what I'm dealing with now, there's the cacheKeyForObject closure that you can use to generate a custom cache key. For most items you should be able to cache by ID, and that should avoid having to rewrite all the data if it's not necessary.

I'm going to leave this open for the community to respond more - I know this has been done well, but my knowledge of details isn't there because I'm focused on other parts of the SDK at the moment.

@bharath2020
Copy link
Contributor Author

Thanks, @designatednerd for the response.

For most items you should be able to cache by ID, and that should avoid having to rewrite all the data if it's not necessary.

Interesting. In approach #2, Given that I am appending items from morePage back to firstPage and issuing a re-write of data for the first-page query, I am curious how would the store understand to not overwrite data referencing items in the first page? In other words, I understand cacheKeyForObject would reduce the number of records written to the database, but even with normalized cache, there would be unnecessary overwrites (in this case the items in the first page) that would add to the write latency. Please correct me If I am wrong.

P.S. I have cacheKeyForObject enabled in my sample, and timings were taken after. The only part where I would not be able to generate a cache key is for the PageResult type, as cacheKeyForObject does not expose the variables (or field arguments) used to generate the type, in this case, it is the cursor provided for getItems(cursor:) query.

@designatednerd
Copy link
Contributor

Again, my knowledge of the cache is not as deep as it should be, but my understanding is you'd be rewriting the references with IDs rather than the entire object and its entire tree of changes. This almost certainly has some unnecessary work, but it would likely constitute a lot less unnecessary work than without the cacheKeyForObject.

And again, this is not working as well as we want it to. We have some changes to the cache slated for phases 2 and 3 of the ongoing Swift codegen rewrite, and making things less annoying for pagination is one of the major things I want to do within that work.

Wish I had a better answer for you.

@bharath2020
Copy link
Contributor Author

bharath2020 commented Sep 18, 2020

Got it. I played a little bit with batch reading objects from the database. I will give it a try to see if we can do the batch updates within a transaction. Right now, each roundtrip to the SQLite database is in its own transaction and I am guessing that is causing the reads to slow down. I have in the past read a huge number of items from SQLite and latency was fine.

I will see if it makes any difference in the same and post it back.

@designatednerd
Copy link
Contributor

OK - do you mind if we close out this issue?

@bharath2020
Copy link
Contributor Author

Ok

@RoryKelly
Copy link

We have solved this problem using a third approach page normally using queries. Then use a watcher to watch only the currently visible page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
caching question Issues that have a question which should be addressed
Projects
None yet
Development

No branches or pull requests

3 participants