Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-15566: Data loss of child streams #57

Closed

Conversation

hpatel41
Copy link
Contributor

Description of change

TDL-15566: Data loss of child streams

  • Updated new_bookmark with bookmark_dttm for syncing child stream records from the start date rather than the updated bookmark from the previous sync.

Manual QA steps

Risks

Rollback steps

  • revert this branch

@karanpanchal-crest karanpanchal-crest self-requested a review October 20, 2021 14:51
@@ -406,7 +406,9 @@ def sync_substream(self, state, parent, sub_stream, parent_response):
integer_datetime_fmt=
"unix-milliseconds-integer-datetime-parsing"
) as transformer:
stream_events = sub_stream.sync(state, new_bookmark,
# bug fix for syncing child streams from start date or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savan-chovatiya Instead of writing 'bug fix', you can update the comment to say:
"syncing child streams from start date or state file date"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

stream_events = sub_stream.sync(state, new_bookmark,
# bug fix for syncing child streams from start date or
# state file date and not newly updated bookmark
stream_events = sub_stream.sync(state, bookmark_dttm,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savan-chovatiya Can you please add unit test cases for this code change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unittest.

# using "start_date" that is passed and not using the bookmark
# value stored in the state file, as it will be updated after
# every sync of child stream for parent stream
abs_start, abs_end = get_absolute_start_end_time(start_date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savan-chovatiya What if I run the sync again once all the child streams and parent streams are synced?
Will it still try collecting the data from the "start-date" in the second sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will start collecting from the bookmark present in the state file, here the start_date is just a variable.

@hpatel41 hpatel41 requested a review from KrisPersonal October 25, 2021 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants