You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parquet format specification doesn't say whether a Parquet file having columns with the same name (in the same group node, so really exactly the same name) is valid. I.e., say I have a Parquet file with two columns. Both are called x. Is this file a valid Parquet file?
Gang Wu / @wgtmac:
I didn't find any statement to disallow identical field names in the parquet specs. For engines projecting columns on field ordinals or field ids, identical field names may not be a big issue. It is a good convention to avoid them.
Micah Kornfield / @emkornfield:
I've at least seen in the wild two columns containing the same name only diverging by case sensitivity. I agree we should recommend against them since its not clear they will be able to handled well. If we do update docs, we should also recommend against naming columns using "." as a delimeter as this can also lead to ambiguity.
The parquet format specification doesn't say whether a Parquet file having columns with the same name (in the same group node, so really exactly the same name) is valid. I.e., say I have a Parquet file with two columns. Both are called x. Is this file a valid Parquet file?
Reporter: Jan Finis / @JFinis
Note: This issue was originally created as PARQUET-2345. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: