-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet processor #390
Conversation
dd2c400
to
52653dd
Compare
c3c0829
to
f9aad63
Compare
rust/processor/src/parquet_processors/move_resource_processor.rs
Outdated
Show resolved
Hide resolved
583aabb
to
bf3e60d
Compare
rust/processor/src/parquet_processors/move_resource_processor.rs
Outdated
Show resolved
Hide resolved
rust/processor/src/models/default_models/parquet_move_resources.rs
Outdated
Show resolved
Hide resolved
568e60e
to
8c529e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be really nice to have but prob out of scope:
- having tx_version of previous state in move_resources and write_set_changes
rust/processor/src/models/default_models/parquet_move_resources.rs
Outdated
Show resolved
Hide resolved
rust/processor/src/models/default_models/parquet_transactions.rs
Outdated
Show resolved
Hide resolved
rust/processor/src/models/default_models/parquet_write_set_changes.rs
Outdated
Show resolved
Hide resolved
Gap Detector isn't working, but since I checked that all txns exist in BigQuery. this is a less concern for now. so I am prioritizing shipping this processor so that we can start backfilling mainnet |
edc7f16
to
2d0b9c1
Compare
2d0b9c1
to
77143d3
Compare
rust/processor/src/models/default_models/parquet_move_resources.rs
Outdated
Show resolved
Hide resolved
rust/processor/src/models/default_models/parquet_move_resources.rs
Outdated
Show resolved
Hide resolved
@yuunlimm please rebase this on main, will make the review easier for us :-) |
20c0959
to
3da4377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a refactor PR but it's my first time seeing lots of this code, so I left comments all over the place anyway. For any comment that's not relevant to the refactoring directly, please open an issue / task somewhere to track it for later!
I only reviewed the first bit for now. Since this is the refactoring PR I see there is now a mountain of new code hahah. I'm happy to do a full audit of the code later, but it's probably helpful for me to leave reviews in chunks rather than like 50 comments all at once.
rust/processor/src/db/common/models/default_models/parquet_move_resources.rs
Show resolved
Hide resolved
rust/processor/src/db/common/models/default_models/parquet_move_resources.rs
Show resolved
Hide resolved
} | ||
} | ||
|
||
pub fn get_outer_type_from_resource(write_resource: &WriteResource) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and convert_move_struct_tag
don't have anything to do with MoveResource
. Can you move them to the relevant struct or just make them freestanding?
As a general rule, if a method doesn't take self
as a param or return Self
there is no reason for it to a method on a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline, will follow up in the next pr.
} | ||
} | ||
|
||
impl Default for Transaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we have to implement Default? There is no "default transaction" so I'm not sure it makes sense to implement Default for Transaction. Same comment for many of these. If some trait / library is forcing us to implement Default, we should at least use unreachable
so we never use it, or if something does use it, for the fields we should use default, like txn_version: Default::default()
, event_root_hash: Default::default()
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a usecase, where we have derive the parquet schema from the instance, and the we need to know this schema in advance. please take a look at hasParquetSchema trait in the generic_parquet_processor.rs. and we use this schema() to construct parquetHandler for each struct type. that's the only usecase of default but I am not too sure if there is any other workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@banool for the non-Parquet processor, we use Default for transaction types where we can skip indexing most of its contents
|
||
impl Transaction { | ||
fn from_transaction_info( | ||
info: &TransactionInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure the use case but if possible, it'd be good to just take in the owned object, it'd be more efficient. No worries if it doesn't make sense for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline, will follow up in the next pr.
25a5706
to
e99e37d
Compare
4d57a60
to
3cf9386
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major items have been addressed. I'll leave a full audit here: #420.
This is to migrate some of tables in AlloyDB to BigQuery to fully deprecate those tables that are only used for analytics.
Components added:
Test Plan
tested in testing environment and running locally.