Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginated Watchers for GraphQL Queries #3007

Closed
wants to merge 72 commits into from

Conversation

Iron-Ham
Copy link
Contributor

@Iron-Ham Iron-Ham commented May 11, 2023

What are you trying to accomplish?

I'd like to have an easy way to watch a paginated query.

My frustrations with #3004 stemmed from one place: I wanted to have an easy manner of consumers to automate pagination; I didn't want client engineers writing a new LocalCacheMutation for each paginated query.

So this PR introduces just that: A strategy for watching paginated queries.

closes #3004
Related: "[issue TBA] Better handling of paginated data" via #1701
closes #565

What approach did you choose and why?

I introduced a GraphQLPaginatedQueryWatcher, which can paginate over a query and watch the results for each page.

Generally, it works by composing the results of several GraphQLQueryWatchers. On instantiation of a new GraphQLPaginatedQueryWatcher, we ask for one additional field over a standard GraphQLQueryWatcher:

strategy: A PaginationStrategy which defines how we go about paginating a query.

Let's break down the PaginationStrategy:

PaginationStrategy

At its core, this protocol is composed of several other protocols:

  1. PageExtractionStrategy: A protocol which defines how we extract a Page from a given Input. It is agnostic as to what a Page is, and it is equally agnostic as to what an Input is. An Input can be the query's response data, a set of variables, or any other value. At present, we have one PageExtractor, purpose-built for Relay-style pagination.
  2. DataTransformer: A protocol which transforms the query's response data into a given Output. At present, there are two transformers implemented:
    • PassthroughDataTransformer: Simply allows the query's data to passthrough, unaltered.
    • CustomDataTransformer: Allows the user fine-grained control over their data. Using this transformer, the user can output their own custom objects.
  3. NextPageStrategy: Constructs a new Query, given a Page.
  4. PaginationMergeStrategy: A protocol whose responsibility it is to take a PaginationDataResponse, which contains all of the Output values of each page, and merges them into one valid Output.

These protocols are stored as local variables on the strategy, along with an array of Page objects and the currentPage. Additionally, this protocol has three function requirements:

  1. onWatchResult: The callback triggered by a GraphQLQueryWatcher.
  2. canFetchNextPage: Whether the strategy can fetch another page, given its own internal state.
  3. reset: Resets the internal state of the PaginationStrategy.

Let's look at the three defined PaginationMergeStrategys:

CustomPaginationMergeStrategy

Allows the user fine-grain control over how information is merged. Intended to be used with custom objects.

SimplePaginationMergeStrategy

Naively merges all lists together.

TargetedPaginationMergeStrategy

Takes in a KeyPath to a list and specifically merges only that list together.

Why take a multi-watcher approach?

I wrote about it here, but that's slightly out of date. Most of it remains valid, but fundamentally the reason boils down to the path of least resistance. I fully concede it's not the perfect solution. However, since both single-watcher and multi-watcher have pros and cons, it's a matter of weighing tradeoffs. I chose to go for the path that can be rapidly implemented.

Anything you want to highlight for special attention from reviewers?

I'd love feedback on this approach. I think it should work for most users and provides an escape hatch for users for whom the Simple strategy won't work.

@netlify
Copy link

netlify bot commented May 11, 2023

👷 Deploy request for apollo-ios-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 8c13b61

private let createPageQuery: CreatePageQuery
private let nextPageTransform: (T?, T) -> T

private var model: T? // 🚗
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm so sorry i couldn't help myself.

@Iron-Ham Iron-Ham changed the title GraphQL: Paginated Watcher Paginated Watchers for GraphQL Queries May 11, 2023
Copy link
Contributor

@AnthonyMDev AnthonyMDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start on this! It looks really solid.

The biggest point of discussion here in my mind is if this is the right general approach. I think there are other viable approaches to handling this that we should consider also.

For instance, would some form of exposing merging strategies for paginated data in the cache be preferable. Allowing you to fetch a new page, but configure some function that, when new pages are fetched, actually writes them to the cache in a way that just updates an original query watcher's dependent keys.

There are a lot of additional complications there, because we would be departing from the concept of keying cached fields based on just the values of the field arguments passed. I'm not sure if this would make things better or open a can of worms that would cascade into a bunch of other bugs. But it is something worth further consideration.


A big part of the reason we haven't implemented pagination support is because there are so many things that can go wrong based on your specific use case. Having people build custom mechanisms for their specific situation has been easier.

This PR is a really great start to one approach to handling pagination. But any solution that we provide needs to be as stable and consistent as possible. This makes me think there is more work to be done here.


Ideally, any solution here would provide for a really lightweight way to do this for the most common use cases.

There needs to be some consideration for how to merge the list pages together. Your nextPageTransform allows users to do this in-memory, though it doesn't support merging them in the cache and probably requires a lot of boiler plate code in most situations. I'd love to provide default handlers for common use cases.

For example a list of entities that can be uniquely identified by a cache key or identifier field and sorted by some field could probably have a pre-defined function for handling the next page transformation.

Sources/Apollo/GraphQLPaginatedQueryWatcher.swift Outdated Show resolved Hide resolved
Sources/Apollo/GraphQLPaginatedQueryWatcher.swift Outdated Show resolved Hide resolved
@Iron-Ham Iron-Ham marked this pull request as draft May 12, 2023 22:04
@Iron-Ham
Copy link
Contributor Author

Iron-Ham commented May 15, 2023

Chatted with Anthony on Friday, the 12th. We decided to try to investigate a way of managing this with a single watcher.

Since then, investigated using a single watcher and updating its dependent keys.
I found few things from that: While it's totally valid to update the dependent keys of a watcher, it doesn't change the mapping of those keys in the database.

Take the following query:

query($id: ID!, $after: String) {
  hero(id: $id) {
    id
    name
    friendsConnection(first: 2, after: $after) {
      friends {
        name
        id
      }
    }
  }
}

Under that strategy, if we were to fetch a second page, we could add the dependent keys of the second page to the first watcher. However, the initial watcher would never pull the data for those keys because when we load a value from the store, we do so by passing in the operation type and its variables. While the nodes underneath friends and the hero are entity types (i.e., uniquely identifiable), the friendsConnection by which friends are accessed is crucially not an entity type.

Therefore, the cache would store them like so:

Key Value
Droid:$id.friendsConnection(after:null, first:2)
Droid:$id.friendsConnection(after:$after,first:2)
Human:$id ...

And while the values stored under the connections are simply reference pointers to the entities, there is currently no way to group these values together under the same key.

With that in mind, there are a few ways to handle this (h/t to Anthony here, as this section is directly adapted from his messages):

  1. Using multiple watchers as per this PR
  2. Giving the cache an idea that a given field is a paginated field and should be handled with a different merge strategy, which is heading towards first-class pagination support. Arguments could be marked as paginated fields (perhaps via directive?). Then, when those fields store their data into the cache, they would use those arguments. However, when being used as dependent keys, those arguments are stripped.

My immediate thoughts are that each of these two paths have clear pros and cons:

Multi-Watcher Strategy

The multi-watcher strategy -- implemented in this pull request -- would work. However, it's a bit of an opinionated outcome (and could be made a bit less opinionated with some tweaking, more on that later), since it currently operates above the level of Query.Data.

Pros:

  • Individual pages remain cached and individually retrievable
  • Since the cache only handles one page at a time, fetching any one individual page from the cache is quick.
  • Since we're not appending data, there's no need for excess writes to the cache.
  • Some local cache mutations are easier to reason about -- deletions, additions.
  • Flexible merge strategy; gives the user control over how changes to the data are handled: moves, additions, deletions, etc.

Cons:

  • Doesn't operate over Query.Data, and instead encourages the creation of a secondary model. Some users of Apollo may already be doing this, others are not.
  • Some local cache mutations are difficult to reason about: How would you handle moving an item from say, page 2 to page 0? (Though, I would make the argument that if you truly need to handle this locally, that you're outside of the scope of this class's intent -- it could be handled by sending a server update, then refresh the from and the to pages)
  • Assumes the user will handle the messy issue of duplicate items; operating over highly dynamic data may mean that a given item is in page 1 and 2, depending on when the user fetches.

Some of the cons of this strategy can be mitigated, specifically, operating over Query.Data. One direction I can take this current PR is to introduce the concept of a MergeStrategy. The current implementation may as well be called a custom strategy, whereby the user defines exactly how they'd like it to be handled. Another possible implementation could be a simple strategy, whereby the pagination returns an array of [Query.Data], or a mapping of [Page: Query.Data] for the user to handle.

Single Watcher Strategy

The single watcher strategy, mentioned in the early part of this comment, could also work. It may require some changes to codegen (i.e., directive or similar), or a specific additive API to the cache to conditionally strip fields (a user could mark a parent field as a connective type, for example).

This strategy also has its pros and cons:

Pros:

  • Single watcher strategy is perhaps simpler to reason about
  • Would certainly operate over the level of Query.Data.

Cons:

  • Reads might get a bit slow as the data being pulled at once gets to be quite large.
  • More cache writes
  • Local cache moves are just as complex.
  • Also assumes the user will handle duplicate items

On the face of it, Single Watcher seems more complex to implement and comes some additional operations over the cache -- but has the massive benefit of operating over the level of Query.Data. I cannot make an assumption over how other consumers of Apollo use the Apollo models. While I exclusively use the Apollo models to instantiate my own, others may be using the models directly. The first solution seems to be fine if you assume that folks aren't using Apollo outputs directly in the UI layer of their applications -- and while I have no data to support it, I can assume that a significant portion of Apollo users do just that.

@Iron-Ham Iron-Ham marked this pull request as ready for review May 16, 2023 19:55
@Iron-Ham Iron-Ham requested a review from AnthonyMDev May 16, 2023 19:55
@Iron-Ham Iron-Ham marked this pull request as ready for review May 25, 2023 15:40
Copy link
Contributor

@AnthonyMDev AnthonyMDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the amazing work on this @Iron-Ham!

I think the overall approach with multiple query watchers is the right way to go, and the implementation of the logic to execute the queries, create new watchers, and handle the result data looks really good.

I do think there are still some overall design questions I'd like us to hash out more. I've summarized a lot of my thoughts here. Looking forward to talking with you about this more on Monday!


Result Handler Placement

Right now, the resultHandler that actually gets called with a PaginatedOutput when new data is fetched is only referred to and defined directly on the implementations of the RelayPaginationStrategy and the OffsetPaginationStrategy. There might be a reason for this that I'm not seeing, but I'm feeling like the placement of the result handler is confusing.

My initial assumption is that this would be on the GraphQLPaginatedQueryWatcher itself, but when I didn't see it there, I looked at the PaginationStrategy protocol and there was nothing about the resultHandler there either. It's actually just an implementation detail of the actual implementations of PaginationStrategy. This took me a long time to find, and it was really confusing to me how to even actually use the GraphQLPaginatedQueryWatcher because of it.

It seems you could create a new PaginationStrategy and not include a resultHandler at all, rendering the entire thing mostly dysfunctional? Unless your PaginationStrategy itself is hard-coded to do whatever you wanted with the result data, (which I would see as violation of separation of responsibilities) you need to write the resultHandler code yourself. Was this intentional?

Would it be viable to move the resultHandler so that it's part of the GraphQLPaginatedQueryWatcher? I'd like to be able to do this:

let watcher = GraphQLPaginatedQueryWatcher(
  client: client,
  strategy: ...,
  resultHandler: { result in 
    ...
  }
)

watcher.fetchMore()

It actually looks like maybe this was intended to work this way at some point and it changed? I'm guessing that because GraphQLPaginatedQueryWatcher has a typealias ResultHandler that is currently unused.


Initial Query Construction

It feels ergonomically awkward that we are creating queries in two different places. You have the initialQuery parameter on the GraphQLPaginatedQueryWatcher initializer, and also the NextPageStrategy creating the rest of the queries. I'm wondering if we can encapsulate all of that in the NextPageStrategy.

If we do that, I think we should rename that protocol from NextPageStrategy to PaginatedQueryProvider.


PaginationStrategy

There is a lot I'm not understanding about the design of PaginationStrategy right now. It seems like a lot is going on in the implementation of onWatchResult() in the actual implemented strategies that is currently repetitive. I'm feeling like much of that should either be in a protocol extension on PaginationStrategy or in the GraphQLPaginatedQueryWatcher itself.

As it stands now I don't think there is a reason for the PaginationStrategy protocol to require the pageExtractionStrategy, outputTransformer or mergeStrategy. These are only used in the implementation details of the two strategies we have implemented. I don't think we should actually be removing them, but rather that the usage of them is a commonality that should be refactored into a place where it is reused.


Page Data Merging & Data Transformation

I'm getting the feeling that the process of transforming individual page data, and then merging it together into the final PaginationResult with it's output is too might be a bit redundant in most cases.

It seems like for the majority of cases, transforming the output data and merging the pages together could be done at the same time. If you just use the PassthroughDataTransformer you can handle the logic for merging the Query.Data from both queries in the mergePageResults(paginationResponse:) function on the merge strategy and output whatever type of view model or other Output you want from there.

I'm struggling to come up with a situation in which using a custom DataTransformer to provide the intermediate models is really valuable or necessary. But I suspect I may be missing something there. I'm currently leaning towards suggesting that we remove the DataTransformer completely, and just do what the PassthroughDataTransformer is currently doing -- leaving all custom data transformation to the PaginationMergeStrategy. If there is some use case I'm missing that makes the current approach preferable, we should discuss that.

But even if we do keep this, I'd like to tweak the APIs here so we make this feel more like an opt-in thing rather than an opt-out. PassthroughDataTransformer should be the default and most straightforward way to do this, and you only need to provide additional things if you want to do something more advanced.


Thread Safety

I debated not even bringing this up. Part of me says that we shouldn't even really be handling this too much and let users deal with making sure their code is thread safe, but I'd like to consider it nonetheless.

Do we need to make the watchers array (and probably some other things) atomic? We have an @Atomic property wrapper we could probably use for some of this already.

What if the consumer ends up calling fetchMore() from multiple threads in parallel? Should we be preventing new fetches from occurring while one is already happening, or queue them up so it does them serially? It's also possible that we fetch multiple different pages in parallel, but then we would need to make the tracking of currentPage atomic and update it at the appropriate time to ensure that the next page is always calculated properly.

I think we should talk about what the desired behavior is here and ensure that we are accounting for it properly. I'm not opposed to us saying "This isn't thread safe, so don't do crazy stuff." but that should be well documented and the expected behaviors and limitations should be known.

public typealias Page = Strategy.NextPageConstructor.Page
private typealias ResultHandler = (Result<GraphQLResult<Strategy.Query.Data>, Error>) -> Void

private let client: any ApolloClientProtocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphQLQueryWatcher has always held a weak reference to the client. I'm wondering if this should do the same. I don't imagine that this would create a retain cycle, as the retain cycle would be broken as soon as you call cancel().

Any thoughts on this? @Iron-Ham @calvincestari @BobaFetters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds reasonable

where OutputTransformer.Query == Query, OutputTransformer.Output == Output

associatedtype NextPageConstructor: NextPageStrategy
where NextPageConstructor.Page == PageExtractor.Page,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to use the new Swift protocol primary associated types to make this a bit cleaner?

import Foundation

/// A pagination strategy to be used with Relay-style cursor based pagination.
public class RelayPaginationStrategy<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should specifically call reference "Relay" here or just rename it to something like CursorBasedPaginationStrategy and have a note in the documentation indicating that this is set up to work specifically with the relay connection spec (https://relay.dev/graphql/connections.htm)?

Relay itself is a client, but the cursor based pagination spec is just a spec for GraphQL schemas and backends to use in order to be consumed by the Relay client.

Just concerned that people who don't use Relay (or aren't very familiar with it) will not know that they can use this strategy with a standard cursor based pagination. If you are using Apollo iOS, you are decidedly not using the Relay client. I don't want people to assume that this is tightly coupled to using relay.

@bignimbus What do you think about the naming and messaging here? Anyone else who's input we should get on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call-out here, thanks for the mention. I don't have a strong personal opinion here. In Apollo Client a similar utility is called relayStylePagination. So there's precedent in case that helps inform the decision.

#endif

/// The strategy by which we create the next query in a series of paginated queries. Gives full custom control over mapping a `Page` into a `Query`.
public struct CustomNextPageStrategy<Page: Hashable, Query: GraphQLQuery>: NextPageStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the value in including this at all. I think this goes for the CustomDataTransformer and CustomPaginationMergeStrategy as well. If people need to implement a custom implementation of any of these, you should just define your own types that conforms to the protocols.

I guess I see why it is ergonomically valuable to be able to use these to essentially turn the initialization of PaginationStrategy into closure based parameters (as is shown in the unit tests for this), so I can be persuaded here. Let's discuss this more with the team.


If it makes sense, what I would like to include is an implementation of a NextPageStrategy that does this for you given just a query type specifically for the cursor-based and offset + count pagination methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed a bit on our call.

I think we could probably collapse both CustomNextPageStrategy and the need for an initialQuery into a new QueryProvider protocol that takes in a Page? as an argument (as opposed to the Page that we take in now).

We could default to generating the initial query if the Page is nil.


/// Convenience initializer
/// - Parameter arrayKeyPath: A `KeyPath` over a `Query.Data` which identifies the array key within the `Query.Data`.
public init(arrayKeyPath: KeyPath<Query.Data, [(some SelectionSet)?]?>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't account for 2 things:

  1. Lists of scalar/custom scalars
  2. Nested lists (2D, 3D, etc.)

I'm wondering if it would work if we change this to:

public init(arrayKeyPath: KeyPath<Query.Data, some Collection>) {

@AnthonyMDev AnthonyMDev self-requested a review July 28, 2023 21:25
@swizzlr
Copy link

swizzlr commented Aug 24, 2023

Hey all, been following this PR with great interest as I definitely will be needing this soon. It looks like it’s in good shape with some outstanding code review comments and I wanted to ask if there was anything I could do to assist - even if it’s just opening a PR against it to resolve merge conflicts. Please let me know (I’ll be able to do this on company time :D). @Iron-Ham @AnthonyMDev

@Iron-Ham
Copy link
Contributor Author

Iron-Ham commented Aug 24, 2023

@swizzlr for the time being, work on this PR will continue here: https://github.com/apollographql/apollo-ios-pagination

Once the API hits a stable state, it may (or may not?) be merged directly into Apollo-iOS

FWIW: we've been using this in production at GitHub since this PR was opened and it's working well for us

@matthewweldon
Copy link

matthewweldon commented Aug 25, 2023

Thanks @Iron-Ham for all your work on this, I have also been watching this PR's career with great interest.

Question about how paginatedWatcher interacts with Relay style pagination and item insertion.

When a query with relay style pagination results has a new item inserted in the list, this affects all pages, does the paginatedWatcher understand this and update the pages and queries?

For Example

Let's say you have [A,B,C,D,E] data which is fetchable via the relay-style paginated "SomeQuery"

Given these two simplified paginated queries/results:
Watcher 1: SomeQuery(first:3, after:nil, ... ) returns [A,B,C] [hasNextPage:true, endCursor: c]
Watcher 2: SomeQuery(first:3, after: c, ...) returns [D,E] [hasNextPage:false, endCursor: e]

If an item is inserted at the beginning of this list, while these queries are watched, the first watcher will fire its handler, lets say this new item is N

the data is now [N,A,B,C,D,E]

the state of our watchers/queries and returns becomes:

Watcher 1: SomeQuery(first:3, after:nil, ... ) returns [N,A,B], [hasNextPage:true, endCursor: b]

Watcher 2: SomeQuery(first:3, after: c, ... ) returns [D,E], [hasNextPage:false, endCursor: e]

Unless watcher 2 query is updated, we will lose the last item in the first query/watcher. Does the paginatedWatcher handle this case and update the endCursor in all queries? or is there a better way to handle this case?

@Iron-Ham
Copy link
Contributor Author

Iron-Ham commented Aug 25, 2023

@matthewweldon That's a great question!
The answer is... It depends. Let's dive into your scenario and examine watcher state under different scenarios.

Let's say that we've inserted N using a local cache mutation.
In this case, we would be inserting N into the first page of results via a local cache mutation. The end result of this insertion is that the first page watcher, despite having a query parameter stating it returns three items, will return four items as there are now four references linked to this result in the cache.

Initial Page0 Watcher Return: [A, B, C]
After the local cache mutation: [N, A, B, C].

When we paginate at this point, we will continue to paginate using our end-cursor, which is tied to object C. The next page will return D, E.

Let's say that the user pulls-to-refresh.

If we call the refetch function, we will discard all watchers and start anew with one watcher that returns [N, A, B].
If we instead call refresh of only the first page, we may now be in an inconsistent state: Our first watcher will return [N, A, B]. Our second watcher is not removed from the watch pool, and remains watching [D, E]. A pagination event will trigger the creation of a second "Page 1" watcher, that returns [C, D, E].

At this point, if we were using a passthrough data transformer and a targeted pagination merge over a specific list, our resulting list in GraphQL would look something like this:

[N, A, B, D, E, C, D, E ]

This is incorrect, but we can:

  1. Fix it by updating the targeted pagination strategy to use an OrderedSet here to merge the keys, such that we prefer the most recently updated result positioning. That would result in [N, A, B, C, D, E]. This is easier now that it's a separate repo -- we'd be adding a dependency to OrderedCollections to an optional add-in, vs. to Apollo-iOS.
  2. Work around it by either creating our own merge pagination merge strategy, or if we are using a CustomPaginationMergeStrategy which de-dupes results at some point.

Truthfully, at GitHub we don't tend to use Apollo's generated models other than to bind to our own view models. Given that, we've abstracted away setting a pagination merge strategy in our codebase; we always use an indexed collection to merge all pages, which accomplishes both the correct position for these values as well as de-duping. Additionally, since we're not working with generated models, that means that we don't tend to use passthrough transforms.

@matthewweldon I'm really glad you brought this up though -- because this is clearly a bug that we didn't run into in our usage.

@AnthonyMDev
Copy link
Contributor

We've decided to move this over to it's own separate package which lives in this repo. Work is still ongoing on this feature, but any additional discussions and PRs should be made there instead. I'm hoping we can get this feature to a stable point and release it for consumer use in the coming weeks/months.

We are continuing to work with @Iron-Ham who is leading the implementation efforts on this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants