-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with dot "." in field name #349
Comments
hi, @pwmcintyre package main
import (
"log"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/parquet"
"github.com/xitongsys/parquet-go/reader"
"github.com/xitongsys/parquet-go/writer"
)
type Student struct {
//// name is the parquet filed name. inname is the variable name
Name string `parquet:"name=student.name, inname=name, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN_DICTIONARY"`
Age int32 `parquet:"name=age, type=INT32, encoding=PLAIN"`
}
func main() {
var err error
fw, err := local.NewLocalFileWriter("output/flat.parquet")
if err != nil {
log.Println("Can't create local file", err)
return
}
//write
pw, err := writer.NewParquetWriter(fw, new(Student), 4)
if err != nil {
log.Println("Can't create parquet writer", err)
return
}
pw.RowGroupSize = 128 * 1024 * 1024 //128M
pw.PageSize = 8 * 1024 //8K
pw.CompressionType = parquet.CompressionCodec_SNAPPY
num := 10
for i := 0; i < num; i++ {
stu := Student{
Name: "StudentName",
Age: int32(20 + i%5),
}
if err = pw.Write(stu); err != nil {
log.Println("Write error", err)
}
}
if err = pw.WriteStop(); err != nil {
log.Println("WriteStop error", err)
return
}
log.Println("Write Finished")
fw.Close()
///read
fr, err := local.NewLocalFileReader("output/flat.parquet")
if err != nil {
log.Println("Can't open file")
return
}
pr, err := reader.NewParquetReader(fr, new(Student), 4)
if err != nil {
log.Println("Can't create parquet reader", err)
return
}
num = int(pr.GetNumRows())
stus := make([]Student, num) //read 10 rows
if err = pr.Read(&stus); err != nil {
log.Println("Read error", err)
}
log.Println(stus)
pr.ReadStop()
fr.Close()
} running result:
|
@xitongsys — appreciate your time, thank you i have reproduced your result above — but similar to my example earlier, when attempting to read this new parquet file with my existing systems (i'm using AWS Athena), i get an error similar to the below error from parquet-tools: $ docker run -it --rm -v ${PWD}:/data nathanhowell/parquet-tools schema /data/output.parquet
org.apache.parquet.io.InvalidRecordException: student not found in message parquet_go_root {
required binary student.name (STRING) = 0;
required int32 age = 0;
} similarly, using another Go implementation, i still cannot read this file: $ parquet-tool schema output.parquet
panic: line 2: expected ;, got unknown start of token '46' instead and so i suspect there may be an issue in the handling of the "." in the output file? |
hi, @pwmcintyre |
@xitongsys — emailed, and while not sensitive, we would prefer it not shared publicly :) |
hi @xitongsys ... did your post get about java implementation get deleted? did you find the answer? |
hi, @pwmcintyre |
@xitongsys — thanks for the update, please let me know if there's anything I can help with |
hi, @pwmcintyre |
@xitongsys — well done! thanks again I can confirm AWS Athena is happy with this change 👌 (ignore the nulls, it's just a test) |
ok, I will close this issue. |
hi
I know it has been briefly mentioned in other issue about the drama of using "." in field names, but i'm hoping you can help
Using the Java parquet-tools to inspect the schema of an existing Parquet file i have, i can see it contains "." in the field names, but works fine:
and while using your tool i get the following:
I'm similarly having trouble writing files with "." in the key — eg with this struct:
I get the following error when attempting to read it:
any ideas?
The text was updated successfully, but these errors were encountered: