rfarrow contains convenience functions for using the Arrow library in Go directly with structs. It uses reflection to convert Go structs to and from Arrow format.
This is a pre-release version and the API may change.
go get github.com/chelseajonesr/rfarrow
WriteGoStructsToParquet
will write a slice of Go structs to an io.Writer in Parquet format:
err := WriteGoStructsToParquet(data, writer, nil)
You can set the compression type:
props := parquet.NewWriterProperties(
parquet.WithCompression(compress.Codecs.Snappy),
)
err := WriteGoStructsToParquet(values, buf, props)
The Parquet schema is automatically generated from the struct type definition using functions from the Arrow library.
You can change the names and types of the Parquet columns by using the parquet
tag in the struct type - for more details see the Arrow package:
type Add struct {
Path string `parquet:"name=path, repetition=OPTIONAL, converted=UTF8"`
PartitionValues map[string]string `parquet:"name=partitionValues, repetition=OPTIONAL, keyconverted=UTF8, valueconverted=UTF8"`
TimeMilli int32 `parquet:"name=timemilli, converted=TIME_MILLIS"`
}
ReadGoStructsFromParquet
will read a Parquet file into a slice of Go struct pointers:
data, err := ReadGoStructsFromParquet[MyStructType](reader)
The reader
needs to support io.ReaderAt
and io.Seeker
.
You can use the parquet
tag in the struct definition to specify Parquet column names, logical types, etc. For more details see the Arrow package.
NewArrowTableFromGoStructs
will convert a slice of Go structs to an Arrow table:
tbl, err := NewArrowTableFromGoStructs(data)
if err != nil {
...
}
defer tbl.Release()
// Use tbl
More examples