-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support interface value converted to parquet #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock. Honestly only understand this PR at a high level. Hit me up if you think it would be valuable for me to have context on this so we can do a walk though irl.
@@ -10,7 +10,7 @@ import ( | |||
|
|||
// Pass the obj as the go struct object | |||
func NewParquetWriterFromProto(pFile source.ParquetFile, obj interface{}, np int64) (*ParquetWriter, error) { | |||
schemaHandler, err := schema.NewSchemaHandlerFromProtoStruct(obj) | |||
schemaHandler, err := schema.NewSchemaHandlerFromStruct(obj, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to swap this out from NewSchemaHandlerFromProtoStruct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I want to just keep one code path since original implementation is actually incomplete.
case reflect.Array, reflect.Slice: | ||
tags, err = getListTag(typ) | ||
case reflect.Map: | ||
tags, err = getMapTag(typ) | ||
case reflect.Interface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crux of the PR is there correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@ongkong Thanks for review. I think it's OK to just review it high level. There are lots of details about how parquet file organize data and encoding data. I will make sure the testing cover the use cases we need. |
Description
Support the interface type in schema generation. It only replies on the reflect.Type which doesn't have enough information for the interface type. So I added the reflect.Value into the schema generation
Fix the definition level issue which doesn't have correct repetitionType tag set in the schema
Centralize the pointer processing which is scattered in original solution. Instead of processing it everywhere, just append the pointer or interface into stack let the dedicated function to handle it
Int and uint is not distinguished in the original implementation where it treat all value as int. While when there is uint value it will panic.
Log Analyzer Framework
How this code was tested
unit test.
make test
I also updated the proto_write.go to include the interface and nested interface array.