Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-braintree-native: update transactions to fetch updates #2208

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Dec 16, 2024

Description:

Previously, the transactions stream only incrementally captured creates since Braintree does not expose the updated_at field for API searches.

The transactions stream has been updated to capture updates, assuming Braintree's updated_at field is just the latest of the various searchable ???ed_at fields. The strategy is:

  1. Get the unique ids of all transactions that have been created, voided, settled, etc. (i.e. figure out what transactions have been
    updated) in the date window.
  2. Request the full transaction objects associated with the unique ids fetched in step 1. A scatter/gather strategy is used to fetch batches of transactions, yielding them via an asyncio.Queue as they arrive.

The transactions stream also has a distinct backfill task now. It ignores any documents with an updated_at field that's after the cutoff date, meaning that the incremental task will pick up the updated document.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

Docs should be updated to reflect that the transactions stream captures updates and does not require regular backfills.

Notes for reviewers:

Tested on a local stack. Confirmed:

  • Snapshot tests passed.
  • The number of unique transactions document IDs in a date window fetched by _fetch_unique_updated_transaction_ids equals the number of documents yielded with the scatter/gather strategy within fetch_transactions.
  • Confirmed transactions backfills complete.

This change is Reviewable

@Alex-Bair Alex-Bair marked this pull request as ready for review December 16, 2024 17:07
@Alex-Bair Alex-Bair added the change:unplanned Unplanned change, useful for things like doc updates label Dec 16, 2024

queue = asyncio.Queue(maxsize=10)

async with asyncio.TaskGroup() as tg:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably use asyncio.gather here as well.

A possible consideration is memory usage if there are a lot of batches which all get results concurrently before their documents can be emitted. I doubt this will be a problem in practice since the speed at emitting documents should be quite quick in comparison to fetching batches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor to the asyncio.gather pattern you linked in your other comment.

I agree about memory usage. Theoretically with the current batch size of TRANSACTION_SEARCH_LIMIT, there could be at max 9 batches (the same as the the number of transaction search fields) if all of them return unique transaction IDs within the time window. In practice, there's a lot of overlap between some of these (like created_at and submitted_for_settlement_at), and searches for the non-created_at fields usually return fewer IDs than created_at searches. I could add some handling to only allow X batches to fetch results at a time, but I suggest we could wait & see if that's actually needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to refactor this one, but I had to omit the await. Otherwise, the connector would get stuck putting docs onto the queue but never taking them off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in _fetch_unique_updated_transaction_ids looks good 👍

For fetch_transactions I see how the same approach won't work, since we specifically don't want to keep every single full record in memory.

I'm not 100% sure what you've got in fetch_transactions right now will work. I think the gather has to be await'd at some point for it to be assured to complete.

As an alternative to using a queue, take a look at asyncio.as_completed - I think that would do what we want, which is to eagerly process the results of each task as it becomes available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based off our discussion, I made fetch_transactions fetch batches in sequence instead of in parallel. It'll probably be uncommon to have more than 1-2 batches while we use a batch size of 50,000. We can look into reducing batch size & processing the batches in parallel at a later point.

@Alex-Bair Alex-Bair force-pushed the bair/braintree-transactions-updates branch from 0a47336 to c51c0af Compare December 16, 2024 18:51
Previously, the transactions stream only incrementally captured creates
since Braintree does not expose the `updated_at` field for API searches.

The transactions stream has been updated to capture updates. The
strategy is:
1. Get the unique ids of all transactions that have been created,
voided, settled, etc. (i.e. figuring out what transactions have been
updated) in the date window.
2. Request the full transaction objects associated with the unique ids
fetched in step 1.
@Alex-Bair Alex-Bair force-pushed the bair/braintree-transactions-updates branch from c51c0af to 3d50de7 Compare December 16, 2024 21:11
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alex-Bair Alex-Bair merged commit d649c97 into main Dec 16, 2024
74 of 80 checks passed
@Alex-Bair Alex-Bair deleted the bair/braintree-transactions-updates branch December 16, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants