Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start a 'best practices' document #223

Open
cholmes opened this issue May 29, 2024 · 0 comments
Open

Start a 'best practices' document #223

cholmes opened this issue May 29, 2024 · 0 comments

Comments

@cholmes
Copy link
Member

cholmes commented May 29, 2024

Seems like it might be time to start a 'best practices' document for topics that are outside the spec but would be good for people to know about.

Remembered this when reading #79.

Potential ideas to include:

  • What compression to use (zstd, snappy, brotli, etc). Talk through how not all parquet implementations support all compressions, and also how to think the compression time vs file size tradeoff. Perhaps some discussion of what works best for geospatial / common geo use cases.
  • Discussion of spatial ordering - like explain how the bbox column works best when you've used a r-tree or something else to sort your data, point at what different implementations do, etc. Makes sense to keep the spec barebones and flexible, but nice to provide more explanation guidance for those who are making datasets.
  • Partitioning - we need to figure out the _metadata files in How should metadata be written in a partitioned dataset? #79, and a best practices doc likely makes sense. But also just a more general discussion of when to split up parquet files, and things to consider when splitting them up - admin boundaries vs bbox vs ...

The filename extension recommendation (#212) arguably would fit in a best practice (though I think in the spec is fine).

Other suggestions here are welcome. I'm not the expert on these, but happy to take a crack at drafting something that others could improve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant