-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK-parquet] add parquet version tracker #609
base: 11-12-_sdk-parquet_parquet_sized_buffer_and_gcs_handler
Are you sure you want to change the base?
[SDK-parquet] add parquet version tracker #609
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
f1ca3f9
to
4d56db7
Compare
5a1f116
to
b0c3839
Compare
4d56db7
to
d0a5380
Compare
b0c3839
to
5376225
Compare
d0a5380
to
946b726
Compare
5376225
to
11a65f0
Compare
1a4cc67
to
f321d33
Compare
f321d33
to
b276784
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between this and the non-parquet processor_status_saver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only difference is it takes in table_name: &str,
arg b/c it should be able to to update row per table. wasn't sure how we could reduce the duplicated code here :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering if the parquet version tracker could also just use the postgres processor_status_saver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm if we could add another function(save_parquet_processor_status
) to processor_status_saver with arg and let that call the existing function in the processor_status_saver
, that would also work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmm right. would it work if we added parquet_table_name
as an optional param?
@@ -19,6 +20,7 @@ use processor::schema::{backfill_processor_status, processor_status}; | |||
pub fn get_processor_status_saver( | |||
conn_pool: ArcDbPool, | |||
config: IndexerProcessorConfig, | |||
is_parquet: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline: consider using a more descriptive enum instead of this bool flag. Ideally we are able to determine what type of processor_status_saver
to return just from the config. It's weird/inconsistent that we have this function determine which enum to return while relying on a user input to help with that decision
Description
1. Added Parquet Version Tracker Functionality
2. Schema and Table Handling Updates
ParquetTypeEnum improvements:
3. Tests updated
4. General Code Improvements
Enhanced comments for better readability and maintainability.
Test Plan