-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a size limit on a record batch #2220
Comments
Since the unit of transmission is a flight data which may contain at least one record batch, the size limit shall be much more conservative than the default maximum size set by Tonic. |
I'm interested in this feature. Firstly, I have a question to ask, is this feature for server security by limiting the grpc request/response size? If that's the case. I can see 2 ways to achieve this:
|
@MichaelScofield PTAL |
@TheWaWaR Thx for your interest. It's not about security. It's a configuration on request/response's size limit in Tonic that should be considered. I think the service level limitation is enough. |
By searching the codebase I find
|
@TheWaWaR You are right. |
@TheWaWaR thx, I've merged your PR. However, to close this issue, I think it's important to set a limit on RecordBatches, too. The RecordBatches are popped from DataFusion, so it requires additional efforts to investigate how to restrict their size. Are you willing to continue to do that? |
Yes, why not. So, I have some new questions: First, how to define the size of a Second, what's the intension to limit the size of a |
@TheWaWaR sorry for the late reply. The size of a RecordBatch is indeed hard to estimate. The best guess might be sum up its vectors' As to the second question, now our grpc interface does have the size limit. However, what if we feed a huge RecordBatch to the grpc interface, will it segment the input inside to fit its size limit? A quick googling suggests that Tonic does not have this type of feature. So I guess it would be a problem if the underlying query engine pops a huge RecordBatch that is larger than the Tonic's size limit. It would be simply failed. That said, limiting the size of RecordBatch (or segment it) is hard. I think a more proper place to do it is in |
What problem does the new feature solve?
Currently, there is no size limit set on a record batch, which has become problematic due to the constraints imposed by the Tonic crate on message sizes for both receiving and sending. Tonic employs a default size limit of 4MB for received messages, defined as the
DEFAULT_MAX_RECV_MESSAGE_SIZE
constant, and a default maximum send message size of2^64 - 1
bytes, defined as theDEFAULT_MAX_SEND_MESSAGE_SIZE
constant.To address this issue, it's crucial to set a size limit on a record batch to prevent it from exceeding 4MB.
What does the feature do?
Introduce a size limit on a record batch.
Implementation challenges
The text was updated successfully, but these errors were encountered: