Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support interface value converted to parquet #4

Merged
merged 5 commits into from
Feb 14, 2024

Conversation

wenrui-meng-ai
Copy link

@wenrui-meng-ai wenrui-meng-ai commented Feb 12, 2024

Description

  • Support the interface type in schema generation. It only replies on the reflect.Type which doesn't have enough information for the interface type. So I added the reflect.Value into the schema generation

  • Fix the definition level issue which doesn't have correct repetitionType tag set in the schema

  • Centralize the pointer processing which is scattered in original solution. Instead of processing it everywhere, just append the pointer or interface into stack let the dedicated function to handle it

  • Int and uint is not distinguished in the original implementation where it treat all value as int. While when there is uint value it will panic.

Log Analyzer Framework

How this code was tested

unit test.
make test

I also updated the proto_write.go to include the interface and nested interface array.

cd example
go build proto_write.go
./proto_write
parquet-tools show output/proto_message.parquet

image

@ongkong ongkong self-requested a review February 13, 2024 05:07
Copy link

@ongkong ongkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock. Honestly only understand this PR at a high level. Hit me up if you think it would be valuable for me to have context on this so we can do a walk though irl.

@@ -10,7 +10,7 @@ import (

// Pass the obj as the go struct object
func NewParquetWriterFromProto(pFile source.ParquetFile, obj interface{}, np int64) (*ParquetWriter, error) {
schemaHandler, err := schema.NewSchemaHandlerFromProtoStruct(obj)
schemaHandler, err := schema.NewSchemaHandlerFromStruct(obj, false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to swap this out from NewSchemaHandlerFromProtoStruct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I want to just keep one code path since original implementation is actually incomplete.

case reflect.Array, reflect.Slice:
tags, err = getListTag(typ)
case reflect.Map:
tags, err = getMapTag(typ)
case reflect.Interface:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crux of the PR is there correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

schema/schemahandler.go Show resolved Hide resolved
@wenrui-meng-ai
Copy link
Author

Approving to unblock. Honestly only understand this PR at a high level. Hit me up if you think it would be valuable for me to have context on this so we can do a walk though irl.

@ongkong Thanks for review. I think it's OK to just review it high level. There are lots of details about how parquet file organize data and encoding data. I will make sure the testing cover the use cases we need.

@wenrui-meng-ai wenrui-meng-ai merged commit 0e10aa9 into master Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants