[FEA] Split batches from parquet that are too large, and try to guess better before decompressing #4968
Labels
P0
Must have for release
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
task
Work required that improves the product but is not user facing
Is your feature request related to a problem? Please describe.
You might consider this a bug or an enterprise feature. I am fine either way. Occasionally we get really crazy compression ratios on ORC and Parquet files. Right now we have a config to limit the size of the data based off of compressed data input. But if we hit a situation where the compression ratio is really good we can violate the batch size limit, by a lot. Especially if we are reading all of the columns in the file. It would be great if we could.
The text was updated successfully, but these errors were encountered: