-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-787: Limit read allocation size #390
PARQUET-787: Limit read allocation size #390
Conversation
@rdblue Could detail how big specifically is a humongous allocation? My concern with this change is that it makes the apis for decoding more complex. For example to create a vectorized reader I want to have a full page in a single buffer. |
@julienledem, could you have another look at this? I'd like to get it in for the next release. To address the concerns about extra complexity in the read path leading to poor performance, I've run this code through the benchmarks. First, the results without this change:
And with this PR applied:
The average time and throughput measured is very close, but differs in most cases in favor of these changes. The reason is that although this allows pages to be split across byte buffers, they are not actually split in the majority of cases. This ensures that, at most, one page will read from multiple buffers and most reads proceed without any additional if statements. In some cases, like For your question on the humongous allocations: a humongous allocation is half of the G1GC region size or more. Here's a table of region sizes from total heap size. These allocations are automatically considered tenured, so will sit around until a full GC. I tested with 8MB allocations, which are not humongous for applications with > 32G heaps, like Presto nodes. I think this should help in long-running database processes. |
@danielcweeks, you may want to review this as well. |
offset += length; | ||
try { | ||
in.skipFully(length); | ||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already moved to Java 7, you could use a single catch block with IOException | RuntimeException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this white-space change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'll revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
ByteBuffer slice; | ||
if (length > current.remaining()) { | ||
// a copy is needed to return a single buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour seems unintuitive to me. Sometimes the buffers will have shared content (which is probably what programmers expect based on Java's ByteBuffer.slice()
), sometimes not. Sometimes the slice operation will be cheap (which is probably what programmers expect based on Java's ByteBuffer.slice()
), sometimes not.
Even if the ByteBuffers are never meant to be modified, are we okay with this inconsistency? This could lead to some hard-to-reproduce bugs/slowdowns that only happen when requested slices cross buffer boundaries. I can't suggest a better solution either, but I thought this issue is worth mentioning nevertheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copies are sometimes needed in order to get buffers back as ByteBuffer
and not have one massive allocation. There's no way to get around needing to copy in some cases where a read crosses the boundary between buffers.
However, cross-buffer reads are hardly ever the case because the buffers inside these input streams are on the order of megabytes, while slices from this method are on the order of bytes. Callers that need large slices can also use sliceBuffers
to avoid the problem. The benchmarks that I ran show that this is the case in practice: this buffer management is no slower than the existing code that depends on a huge allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point: it's far more expensive to copy, which happens fairly often in the current read path because everything is based on byte arrays and not byte buffers. I think the copy here isn't significant enough to worry about in comparison.
int bytesAccumulated = 0; | ||
while (bytesAccumulated < len) { | ||
if (current.remaining() > 0) { | ||
// get a slice of the current buffer to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that we could avoid slicing ByteBuffers that fit entirely in the requested range, but since it's a cheap operation, I'm not sure it is worth the effort of explicitly handling them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to duplicate the buffer either way, so we'd only be able to avoid setting the position and limit, which doesn't seem worth it to me.
31bc20b
to
9e72839
Compare
75b7fcb
to
1616403
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my findings are minor but there are some that matter, in my opinion.
} | ||
|
||
@Override | ||
public synchronized void mark(int readlimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not thread safe. Using synchronized here is unnecessary and I think it is also misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from InputStream
and can't be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why it cannot be changed. Whether or whether not the super method is synchronized it is not part of the signature. InputStream
is implemented to be thread-safe but this implementation is clearly not. I think, using synchronized here suggests that this implementation is also thread-safe which is misleading. In addition it might have some performance penalty as well but I don't think it is measurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right and they're not in the method signature. I've removed them and we'll see if the tests pass.
} | ||
|
||
@Override | ||
public synchronized void reset() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not thread safe. Using synchronized here is unnecessary and I think it is also misleading.
} | ||
|
||
@Override | ||
public synchronized void reset() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not thread safe. Using synchronized here is unnecessary and I think it is also misleading.
} | ||
} | ||
|
||
private synchronized void discardMark() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not thread safe. Using synchronized here is unnecessary and I think it is also misleading.
} | ||
|
||
@Override | ||
public synchronized void mark(int readlimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not thread safe. Using synchronized here is unnecessary and I think it is also misleading.
|
||
@Override | ||
public int read(byte[] bytes, int off, int len) { | ||
if (len <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might swallow an error here in case of len
is negative. It might worth checking and throwing an IndexOutOfBoundsException
if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
discardMark(); | ||
nextBuffer(); // go back to the marked buffers | ||
} else { | ||
throw new IOException("No mark defined"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message might mislead the user. Reaching markLimit
should also be mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
throw new NoSuchElementException(); | ||
} | ||
|
||
if (useFirst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Iterator
shall not rely on that hasNext()
is invoked every time before next()
. As a private implementation it is good enough however it might worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the logic to conform to the contract.
*/ | ||
public static BytesInput from(ByteBuffer buffer, int offset, int length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to simply remove this public implementation instead of deprecating first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ByteInput
is internal, so I think it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree if it is internal but I am always confused which class is for public and which is for internal use only. It would be great if someone who have the requried knowledge (not pointing to you...) would mark these classes.
* | ||
* @throws IOException | ||
*/ | ||
public abstract void initFromPage(int valueCount, ByteBuffer page, int offset) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to deprecate these methods instead of removing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's almost always better to remove methods, as long as they aren't part of the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
This renames the existing implementation to SingleBufferInputStream, moves the new implementation to MultiBufferInputStream, and adds an interface that both implement to access slices of the backing arrays.
1616403
to
4abba3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the changes.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@zivanfi, @gszadovszky thank you for taking the time to review! I'm going to merge this. |
WIP: This update the `ParquetFileReader` to use multiple buffers when reading a row group, instead of a single humongous allocation. As a consequence, many classes needed to be updated to accept a stream backed by multiple buffers, instead of using a single buffer directly. Assuming a single contiguous buffer would require too many copies. Author: Ryan Blue <[email protected]> Closes apache#390 from rdblue/PARQUET-787-limit-read-allocation-size and squashes the following commits: 4abba3e [Ryan Blue] PARQUET-787: Update byte buffer input streams for review comments. e7c6c5d [Ryan Blue] PARQUET-787: Fix problems from Zoltan's review. be52b59 [Ryan Blue] PARQUET-787: Update tests for both ByteBufferInputStreams. b0b6147 [Ryan Blue] PARQUET-787: Update encodings to use ByteBufferInputStream. a4fa05a [Ryan Blue] Refactor ByteBufferInputStream implementations. 56b22a6 [Ryan Blue] Make allocation size configurable. 103ed3d [Ryan Blue] Add tests for ByteBufferInputStream and fix bugs. 614a2bb [Ryan Blue] Limit allocation size to 8MB chunks for better garbage collection.
WIP: This update the
ParquetFileReader
to use multiple buffers when reading a row group, instead of a single humongous allocation. As a consequence, many classes needed to be updated to accept a stream backed by multiple buffers, instead of using a single buffer directly. Assuming a single contiguous buffer would require too many copies.