use option to replace expect with context and avoid creating two gap detectors #428

yuunlimm · 2024-06-26T16:23:50Z

Summary

made bucket root to be configurable for each stage.
fix gap detector issue where the parquet gap detector wasn't updating its processor status to db.
move unrelated functions in MoveResource to outside of the struct impl
removed unnecessary fields in the parquetProcessingResult

Test Plan

Tested locally and verified status being updated

rust/processor/src/bq_analytics/gcs_handler.rs

rust/processor/src/gap_detectors/mod.rs

rtso · 2024-06-27T19:49:24Z

rust/processor/src/gap_detectors/mod.rs

@@ -132,24 +144,20 @@ pub async fn create_gap_detector_status_tracker_loop(
                                    // We don't panic as everything downstream will panic if it doesn't work/receive
                                }

-                                if let Some(res_last_success_batch) = res.last_success_batch {


Why do we need to remove this?

I've decided to remove the last_success_batch field from the DefaultGapDetectorResult due to the complexity and limited utility it offered in tracking progress within the parquet processor. Given the processor's architecture, where 10 tasks each manage a buffer containing distinct transaction ranges, the end_version of a batch isn't necessarily sequential. This non-sequential nature made it challenging to effectively use end_version and last_transaction_timestamp for status updates.

Instead, I'm proposing that we update the processor status using next_version_process - 1. This approach aligns better with the purpose of the status ensuring data integrity.

But as I am writing this, we are running 10 tasks only for backfilling, but I tuned to 1 task for regular traffic, which doesn't apply to the case above. let me come back with a better one that is applicable to both cases.

I thought about it, maybe keep this behaviorfor now since backfilling is our priority, and then improve later to support both cases as a follow-up?

I'm bouncing back and forth between the new and old code and can't quite figure out what changed. At least here it seems like we're still logging the same stuff as before, but more reliably now, since the fields are always set. With the new code the structs seem less deep / cleaner. tldr I need to read the code further to figure out what's going on here, but don't let it block you, I'll catch up later.

rust/processor/src/gap_detectors/mod.rs

rust/processor/src/bq_analytics/gcs_handler.rs

rust/processor/src/bq_analytics/generic_parquet_processor.rs

rust/processor/src/db/common/models/default_models/parquet_write_set_changes.rs

rust/processor/src/gap_detectors/mod.rs

banool · 2024-07-03T21:28:25Z

rust/processor/src/gap_detectors/mod.rs

@@ -132,24 +144,20 @@ pub async fn create_gap_detector_status_tracker_loop(
                                    // We don't panic as everything downstream will panic if it doesn't work/receive
                                }

-                                if let Some(res_last_success_batch) = res.last_success_batch {


I'm bouncing back and forth between the new and old code and can't quite figure out what changed. At least here it seems like we're still logging the same stuff as before, but more reliably now, since the fields are always set. With the new code the structs seem less deep / cleaner. tldr I need to read the code further to figure out what's going on here, but don't let it block you, I'll catch up later.

rust/processor/src/worker.rs

…detectors

…d encoding beforehand to call this function. - reduced the duplicated lines of code - Use context instead of unwraping with unsafe assumptions - make gap detector trait fn to return Result type without requiring a unnecessary error type

banool

Nice!

yuunlimm mentioned this pull request Jun 26, 2024

add parquet-txn-metadata-processor #432

Closed

ying-w reviewed Jun 27, 2024

View reviewed changes

rust/processor/src/bq_analytics/gcs_handler.rs Show resolved Hide resolved

yuunlimm requested a review from banool June 27, 2024 18:14

rtso requested a review from a team June 27, 2024 19:40

rtso reviewed Jun 27, 2024

View reviewed changes

rust/processor/src/gap_detectors/mod.rs Outdated Show resolved Hide resolved

rtso reviewed Jun 27, 2024

View reviewed changes

rust/processor/src/gap_detectors/mod.rs Show resolved Hide resolved

yuunlimm force-pushed the yuunlimm/parquet-gap-detector-fix branch 4 times, most recently from 39b295a to 0488e8f Compare July 1, 2024 20:54

yuunlimm requested a review from rtso July 1, 2024 22:12

banool requested changes Jul 3, 2024

View reviewed changes

yuunlimm force-pushed the yuunlimm/parquet-gap-detector-fix branch from 86cfcb9 to 171609e Compare July 8, 2024 17:28

yuunlimm added 4 commits July 8, 2024 10:36

use option to replace expect with context and avoid creating two gap …

cecf1b7

…detectors

add trait param for gap detector loop

4cb5c9b

lint

5d70d9a

yuunlimm force-pushed the yuunlimm/parquet-gap-detector-fix branch from c1f9e04 to 5f3d48a Compare July 8, 2024 17:36

lint

e4593aa

yuunlimm force-pushed the yuunlimm/parquet-gap-detector-fix branch from 5f3d48a to e4593aa Compare July 8, 2024 18:01

yuunlimm requested review from banool and a team July 8, 2024 18:02

banool approved these changes Jul 9, 2024

View reviewed changes

yuunlimm merged commit c6c7305 into main Jul 9, 2024
7 checks passed

yuunlimm deleted the yuunlimm/parquet-gap-detector-fix branch July 9, 2024 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use option to replace expect with context and avoid creating two gap detectors #428

use option to replace expect with context and avoid creating two gap detectors #428

yuunlimm commented Jun 26, 2024

rtso Jun 27, 2024

yuunlimm Jun 27, 2024

yuunlimm Jun 28, 2024

banool Jul 3, 2024

banool Jul 3, 2024

banool left a comment

use option to replace expect with context and avoid creating two gap detectors #428

use option to replace expect with context and avoid creating two gap detectors #428

Conversation

yuunlimm commented Jun 26, 2024

Summary

Test Plan

rtso Jun 27, 2024

Choose a reason for hiding this comment

yuunlimm Jun 27, 2024

Choose a reason for hiding this comment

yuunlimm Jun 28, 2024

Choose a reason for hiding this comment

banool Jul 3, 2024

Choose a reason for hiding this comment

banool Jul 3, 2024

Choose a reason for hiding this comment

banool left a comment

Choose a reason for hiding this comment