Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddFeedRangeSupportInChangeFeed #36930

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Aug 16, 2024

Description

In this PR, we added support for getting the feedRanges of the container and added support for getting change feed by using feed ranges.

Getting feed ranges from container

feedRange - represents a set of co-located logical partitions. For each container, customer will be able to get a list of feedRanges(each representing a hash-range of single physical partition). For query(future) and changeFeed(in this PR), SDK will allow customers to config a feedRange(can map to a single physical partition, span of multiple physical partitions or a subset of single physical partition) to filter the results.

Examples:
created_collection.read_feed_ranges()
await created_collection.read_feed_ranges()

Adding feed range support in change feed query

There are few issues with current changeFeed query API:
image

  • It allows customer to pass in a physical partition id for filtering the results, however there is no official public contract exists today for customer to acquire the physical partition id.
  • Physical partition id is an internal concept which from SDK perspective, we would like to keep as internal implementation details.
  • Currently, the continuation token being returned is simple _lsn, which is not split/merge proof. A new format continuation token will need to be returned
  • Does not support all change feed mode(Not included in this PR, will add in following PR)

Based on the above considerations, except adding feedRange support, we are also going to deprecate partition_key_range_id and is_start_from_beginning parameter. The API will be changed into the following:

 @overload
    def query_items_change_feed(
            self,
            *,
            max_item_count: Optional[int] = None,
            start_time: Optional[Union[datetime, Literal["Now", "Beginning"]]] = None,
            partition_key: PartitionKeyType,
            priority: Optional[Literal["High", "Low"]] = None,
            **kwargs: Any
    ) -> ItemPaged[Dict[str, Any]]:
  @overload
    def query_items_change_feed(
            self,
            *,
            feed_range: str,
            max_item_count: Optional[int] = None,
            start_time: Optional[Union[datetime, Literal["Now", "Beginning"]]] = None,
            priority: Optional[Literal["High", "Low"]] = None,
            **kwargs: Any
    ) -> ItemPaged[Dict[str, Any]]:
    @overload
    def query_items_change_feed(
            self,
            *,
            continuation: str,
            max_item_count: Optional[int] = None,
            priority: Optional[Literal["High", "Low"]] = None,
            **kwargs: Any
    ) -> ItemPaged[Dict[str, Any]]:
    @overload
    def query_items_change_feed(
            self,
            *,
            max_item_count: Optional[int] = None,
            start_time: Optional[Union[datetime, Literal["Now", "Beginning"]]] = None,
            priority: Optional[Literal["High", "Low"]] = None,
            **kwargs: Any
    ) -> ItemPaged[Dict[str, Any]]:
    @distributed_trace
    def query_items_change_feed(
            self,
            *args: Any,
            **kwargs: Any
    ) -> ItemPaged[Dict[str, Any]]:

Notes:
continuation token v1 -> old/existing formatted continuation token -> simple _lsn format, for example "3"
continuation token v2 -> new formatted continuation token ,it will contain information including containerRid, changeFeed mode, changeFeed start from, and a list of continuationTokens for each sub-range
continuation token when filter by pk value

eyJ2IjogInYyIiwiY29udGFpbmVyUmlkIjogIk1TVXRBTEVqc2FzPSIsICJtb2RlIjogIkluY3JlbWVudGFsIiwgInN0YXJ0RnJvbSI6IHsiVHlwZSI6ICJCZWdpbm5pbmcifSwiY29udGludWF0aW9uIjogeyJ2IjogInYyIiwicmlkIjogIk1TVXRBTEVqc2FzPSIsImNvbnRpbnVhdGlvbiI6IFsgeyAidG9rZW4iOiAiXCIyXCIiLCJyYW5nZSI6IHsibWluIjogIiIsICJtYXgiOiAiRkYiLCJpc01pbkluY2x1c2l2ZSI6IHRydWUsImlzTWF4SW5jbHVzaXZlIjogZmFsc2UgfSB9IF0sIlBLIjogJ3BrJyB9fQ==

continuation token when filter by feed range

eyJ2IjogInYyIiwiY29udGFpbmVyUmlkIjogIk1TVXRBTEVqc2FzPSIsIm1vZGUiOiAiSW5jcmVtZW50YWwiLCJzdGFydEZyb20iOiB7IlR5cGUiOiAiQmVnaW5uaW5nIiB9LCJjb250aW51YXRpb24iOiB7InYiOiAidjIiLCJyaWQiOiAiTVNVdEFMRWpzYXM9IiwiY29udGludWF0aW9uIjogWyB7InRva2VuIjogIlwiMlwiIiwicmFuZ2UiOiB7Im1pbiI6ICIiLCJtYXgiOiAiRkYiLCJpc01pbkluY2x1c2l2ZSI6IHRydWUsImlzTWF4SW5jbHVzaXZlIjogZmFsc2UgfSB9XSwiUmFuZ2UiOiB7Im1pbiI6ICIiLCJtYXgiOiAiRkYiLCAiaXNNaW5JbmNsdXNpdmUiOiB0cnVlLCAiaXNNYXhJbmNsdXNpdmUiOiBmYWxzZSB9fX0=
  • Filter by physical partition id will still be supported, but with the continuation token v1 version being returned, it will not split/merge safe
  • When filter by partition key, if it starts with continuation token none, then new formatted continuation token will be returned. If continuation token v1 is being used, then continuation token v1 version will continue being returned. Otherwise, it will return continuation token v2 version
  • When filter by feed range, if continuation token v1 is being used, an exception will be thrown

Split flow

Continuation token before split:

eyAidiI6ICJ2MiIsImNvbnRhaW5lclJpZCI6ICJNU1V0QUxFanNhcz0iLCJtb2RlIjogIkluY3JlbWVudGFsIiwgInN0YXJ0RnJvbSI6IHsiVHlwZSI6ICJCZWdpbm5pbmcifSwiY29udGludWF0aW9uIjogeyJ2IjogInYyIiwicmlkIjogIk1TVXRBTEVqc2FzPSIsImNvbnRpbnVhdGlvbiI6IFt7InRva2VuIjogIlwiMlwiIiwgInJhbmdlIjogeyJtaW4iOiAiIiwibWF4IjogIkZGIiwiaXNNaW5JbmNsdXNpdmUiOiB0cnVlLCJpc01heEluY2x1c2l2ZSI6IGZhbHNlfSB9XSwiUmFuZ2UiOiB7ICJtaW4iOiAiIiwibWF4IjogIkZGIiwgImlzTWluSW5jbHVzaXZlIjogdHJ1ZSwiaXNNYXhJbmNsdXNpdmUiOiBmYWxzZX19fQ==

Continuation token after split:

eyAidiI6ICJ2MiIsImNvbnRhaW5lclJpZCI6ICJNU1V0QUxFanNhcz0iLCJtb2RlIjogIkluY3JlbWVudGFsIiwic3RhcnRGcm9tIjogeyAiVHlwZSI6ICJCZWdpbm5pbmciIH0sImNvbnRpbnVhdGlvbiI6IHsgInYiOiAidjIiLCJyaWQiOiAiTVNVdEFMRWpzYXM9IiwiY29udGludWF0aW9uIjogWyB7InRva2VuIjogIlwiMTFcIiIsICJyYW5nZSI6IHsibWluIjogIiIsICJtYXgiOiAiMUZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkYiLCJpc01pbkluY2x1c2l2ZSI6IHRydWUsICJpc01heEluY2x1c2l2ZSI6IGZhbHNlfSB9LCB7InRva2VuIjogIlwiMTRcIiIsInJhbmdlIjogeyJtaW4iOiAiMUZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkZGRkYiLCAibWF4IjogIkZGIiwiaXNNaW5JbmNsdXNpdmUiOiB0cnVlLCJpc01heEluY2x1c2l2ZSI6IGZhbHNlIH0gfSBdLCJSYW5nZSI6IHsibWluIjogIiIsICJtYXgiOiAiRkYiLCJpc01pbkluY2x1c2l2ZSI6IHRydWUsImlzTWF4SW5jbHVzaXZlIjogZmFsc2UgfSB9fQ==

What happens when query change feed from multiple partitions

The continuation token will contain a list of tokens for each partition, changes will be read from each partition in round-robin fashion.

using the continuation token from the above split example, it includes to tokens. one for range ["", 1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF), one for [1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, FF). It will read changes from
["", 1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF), then read changes from [1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, FF) and continue the pattern.

Future changes

  • Add full fidelity mode support
  • Add API so customer can re-load balancing based on the existing continuation tokens

@xinlian12 xinlian12 requested review from annatisch and a team as code owners August 16, 2024 18:16
@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

@xinlian12 xinlian12 force-pushed the addFeedRangeSupportInChangeFeed branch from a063d41 to 7a1a1eb Compare August 18, 2024 01:21
@xinlian12 xinlian12 force-pushed the addFeedRangeSupportInChangeFeed branch 2 times, most recently from 63ea375 to b9e66ae Compare August 18, 2024 18:59
@xinlian12 xinlian12 force-pushed the addFeedRangeSupportInChangeFeed branch from b9e66ae to 5f16b14 Compare August 18, 2024 19:04
@xinlian12
Copy link
Member Author

/azp run python - cosmos - tests

@xinlian12
Copy link
Member Author

/azp run python - cosmos - ci

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@xinlian12
Copy link
Member Author

Failed tests are not caused by the change in the PR:

test_user_agent_suffix_special_character
test_wrong_queries

@xinlian12
Copy link
Member Author

/check-enforcer override

@xinlian12 xinlian12 merged commit f89b02b into Azure:users/xinlian/feature/feedRangeAndChangeFeed Sep 16, 2024
8 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants