Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory usage in plan_to_object_store #71

Merged
merged 11 commits into from
Aug 30, 2022
Merged

Conversation

mildbyte
Copy link
Contributor

Get the CREATE EXTERNAL TABLE + CREATE TABLE AS to fit under the 256MB free tier Fly.io memory limit (used in the tutorial):

  • Limit max row group size when writing Parquet files (deals with the RAM usage climbing up to 1G while buffering 1M rows, though it does result in actual differences to the written Parquet files, as we use smaller row groups, not sure the implications): c865d20
  • Avoid loading the partition to memory at all when "uploading" it to the local FS object store:
    • move the file from the temporary directory to the object store bind mount
    • stream the Parquet file from disk when hashing it (+ implement a hasher that uses tokio async IO)
    • stream the Parquet file from disk when getting file statistics (point the fake object store to the tmp dir)
  • disable mimalloc since it OOMs during the CREATE TABLE AS (didn't investigate memory usage in depth with mimalloc, since it doesn't seem to be profileable by standard tools)

Before (measured with bytehound) -- the 1G peaks are us buffering each partition before writing it out as Parquet, the final 200M plateau at the top is each partition getting loaded for hashing/indexing at the end

image

After: heap usage consistently below 80M (as we buffer each row group), no plateau at the end:

image

The Parquet writer keeps a whole row group buffered in memory before writing it
out to the output stream, which is ~1M rows by default. Limit the group size to
65536 rows to mitigate this.
This is less secure (paths can go away or get recycled), but we need this in
order to be able to move the temporary partition file into the local object
store (if we're using a local FS store).
If the backing object store is local, we support a "fast upload" which is just
moving the file to the new filesystem (we could be writing to the actual object
store FS directly, but then we wouldn't have the temp file deletion niceties).
If we're dealing with the local FS, move the temporary file there directly
instead of reading it (save memory). We're still going to be reading the
partition file in order to get its stats/hash, but this is step 1.
Make a dummy local FS store pointing to the directory with the temporary
Parquet file, so that we don't have to load the whole partition in memory to get
its stats.
Use a streaming hasher + make it work with Tokio so that it doesn't block the
rest of the app.

If we're not using a local FS object store, this will still result in a read,
but in other cases we get to stream the partition around and consume the minimum
amount of RAM.
(try to rename, if fails, copy and delete original)
On Fly.io's free tier (with the 256MB limit), it causes OOM errors when doing
`CREATE EXTERNAL TABLE` + `CREATE TABLE AS`, so disabling it for now.

This reverts commit 4b89565.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant