Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigtable Python Client Max Row size is 4mb #2880

Closed
brandon-white opened this issue Dec 17, 2016 · 39 comments
Closed

Bigtable Python Client Max Row size is 4mb #2880

brandon-white opened this issue Dec 17, 2016 · 39 comments
Assignees
Labels
api: bigtable Issues related to the Bigtable API.

Comments

@brandon-white
Copy link

brandon-white commented Dec 17, 2016

On the big table documents, it says the max cell size is 100mb. However, when I try to read a row with a cell of size 10mb using the Bigtable Python client, I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/happybase/table.py", line 190, in row
    row, filter_=filter_)
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigtable/table.py", line 234, in read_row
    rows_data.consume_all()
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigtable/row_data.py", line 323, in consume_all
    self.consume_next()
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigtable/row_data.py", line 261, in consume_next
    response = six.next(self._response_iterator)
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 344, in next
    return self._next()
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 335, in _next
    raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Max message size exceeded)>

This max size seems to be hard coded in the grpc library. Has anybody been able to read large rows using the Bigtable Python client? Any idea for workarounds or how I can set the max size?

@daspecster daspecster added the api: bigtable Issues related to the Bigtable API. label Dec 17, 2016
@dhermes
Copy link
Contributor

dhermes commented Dec 19, 2016

@nathanielmanistaatgoogle WDYT?

@nathanielmanistaatgoogle

A few gRPC users have had this kind of problem and we've started a discussion in grpc/grpc.github.io issue 371.

For now you can cross your fingers and see what happens when you pass a channel options value like options=(('grpc.max_message_length', 100 * 1024 * 1024),) to grpc.insecure_channel or grpc.secure_channel (know that grpc.max_message_length is broken up into grpc.max_send_message_length and grpc.max_receive_message_length in gRPC Python 1.1-and-later). It might work.

If it doesn't work, then you've hit some limit inside gRPC that isn't overridable and you'll have to break up the large message into a stream of smaller ones.

@dhermes
Copy link
Contributor

dhermes commented Dec 19, 2016

@nathanielmanistaatgoogle The "you" in "when you pass a channel options value" here is the library maintainers, yes? As a user, @brandon-white isn't directly creating any gRPC object, he is dealing with our Bigtable classes (i.e. the nouns) and calling the API via their instance methods (i.e. the verbs).

@brandon-white
Copy link
Author

brandon-white commented Dec 19, 2016

Thanks @nathanielmanistaatgoogle I'll try this but, ideally, I would prefer a solution where I do not edit the Bigtable Python client.

For the Google Bigtable Client folks, have you encountered any use cases where the rows and cells are large? If so, how do you break up these large rows and cells into streams? Are large rows and cells supported through the Python Client?

@dhermes
Copy link
Contributor

dhermes commented Dec 19, 2016

@mbrukman Have you run into this in other clients?

@mbrukman
Copy link
Contributor

While there's a per-response limit in gRPC, it is possible to retrieve larger cell values via streaming responses and reconstructing them client-side. ReadRowsResponse includes a list of CellChunks which are the pieces of cells that can be reconstructed to form the larger values.

/cc: @sduskis, @garye

@brandon-white
Copy link
Author

Thank you @mbrukman ! I don't see these calls in the BigTable Python Client. Does this mean I need to write my own client if I want to consume these large cell values? Are there any plans to incorporate this into existing clients?

@sduskis
Copy link
Contributor

sduskis commented Dec 19, 2016

The python client already deals with ReadRowResponse under the covers. This seems like something we need to change in the service.

@sduskis
Copy link
Contributor

sduskis commented Dec 19, 2016

I'll take that back. This seems like a client setting. In the java client, we set the value on the netty channel (maxMessageSize) to be 256MB. Is that a setting that the python client controls?

@brandon-white
Copy link
Author

@sduskis I looked for that but didn't see any option to set that in the Python client. So it seems like we need to add this max_message_size config to the BigTable Python Client?

@sduskis
Copy link
Contributor

sduskis commented Dec 19, 2016

@dhermes, do you have any idea where grpc settings are set, and if max_message_size is a settable property?

@dhermes
Copy link
Contributor

dhermes commented Dec 19, 2016

I do not, though @nathanielmanistaatgoogle would be a good person to ask. I'm not 100% clear on if "grpc settings" are a global thing or a per-call setting? The options= described above is how we'd set gRPC metadata on a per-call basis.

@gamorris
Copy link

It looks like the channel options get set in core/google/cloud/_helpers.py:489. I don't see a way at the moment to pass options specifically for Bigtable.

I think grpc.max_receive_message_length would be the option to set.

We might ultimately want to set the max send message length for Bigtable also. It looks like it is currently unlimited, but there is a TODO there to change that.

@brandon-white
Copy link
Author

So do I need to make the PR to this or do any of the committers have bandwidth to take a look at this setting? What is the process for getting fixes like this out?

@daspecster
Copy link
Contributor

Hello @brandon-white!
I think I have a patch for this, do you have any sample code that I can try with my branch?

@brandon-white
Copy link
Author

@daspecster Thank you! I do not really have any custom code, I simply use the Bigtable Python API to query large cells.

from google.cloud import bigtable
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)
table.read_row(row_key)

@daspecster
Copy link
Contributor

daspecster commented Dec 22, 2016

It appears that passing the ('grpc.max_receive_message_length', 100 * 1024 * 1024) in options doesn't work. :(

@mbrukman it appears to me that there is some handling of chunks on the client side.

But in this case it appears to not get the chunks back yet.

row_data = table.read_rows('large-row')
print(row_data.consume_next())
Traceback (most recent call last):
  File "/Documents/test_bigtable.py", line 14, in <module>
    print(row_data.consume_next())
  File "/Documents/test/.tox/py27/lib/python2.7/site-packages/google_cloud_bigtable-0.22.0-py2.7.egg/google/cloud/bigtable/row_data.py", line 261, in consume_next
    response = six.next(self._response_iterator)
  File "python_build/bdist.macosx-10.12-intel/egg/grpc/_channel.py", line 344, in next
  File "python_build/bdist.macosx-10.12-intel/egg/grpc/_channel.py", line 335, in _next
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Max message size exceeded)>

I'm missing how to get ReadRowsRequest to return a ReadRowsResponse where the chunks are 4mb or less?

@sduskis
Copy link
Contributor

sduskis commented Dec 22, 2016

ReadRowsResponses are currently not chunked on the server sides, os there's no way to force it to happen. That's an enhancement we wanted to allow via the API, but didn't implement yet.

I think we need some help from someone on the grpc team to solve this.

@daspecster
Copy link
Contributor

I wasn't sure based on the previous conversation. Thanks for clearing that up for me @sduskis!

Let me know if there's anything I can do to help. I'm not entirely sure who to ask about this?

@sduskis
Copy link
Contributor

sduskis commented Dec 22, 2016

I'll ping some folks who might know how to help.

@brandon-white
Copy link
Author

Thanks all! So for now, it looks like this is a feature request which needs an owner?

@sduskis
Copy link
Contributor

sduskis commented Dec 23, 2016

There's an answer in this thread to the question by the grpc python lead @nathanielmanistaatgoogle. @gamorris's answer points to the code that needs to be changed.

@dhermes, are you the right person to change this?

@nathanielmanistaatgoogle

@dhermes: the "you" in "when you pass a channel options value" is "the caller of the channel construction functions grpc.insecure_channel and grpc.secure_channel".

@daspecster
Copy link
Contributor

daspecster commented Dec 23, 2016

Just for clarification, I tried adding the ('grpc.max_receive_message_length', 100 * 1024 * 1024) header in make_secure_stub and that didn't seem to have any affect.

My rough addition for testing is here and here.

@atdt
Copy link

atdt commented Dec 28, 2016

If no one is actively working on a patch, would anyone mind if I took a stab at it?

@nathanielmanistaatgoogle

@dhermes
Copy link
Contributor

dhermes commented Dec 28, 2016

@atdt By all means, take a stab. @daspecster is digging around but I'm not sure how actively.

@daspecster
Copy link
Contributor

Thanks @nathanielmanistaatgoogle! I'll give that a try right now.
I think I just grabbed the wrong one when switching tabs and ran with it.

@daspecster
Copy link
Contributor

@nathanielmanistaatgoogle @dhermes @atdt, using the correct option header seems to have worked.

I'll update the tests for this and make a PR.

Note: I'll add a system test so that when grpc gets updated we won't forget(at least for long) to switch it to grpc.max_receive_message_length.

@nathanielmanistaatgoogle

Do unrecognized options get ignored? I wonder if you might be able to simply set both today and leave a note to remove the old one in the future.

@daspecster
Copy link
Contributor

It appears they are ignored.
I just tried it with passing both at the same time and it still seemed to work.

@atdt
Copy link

atdt commented Dec 28, 2016

@dhermes, seems @daspecster is on it -- I'll find another issue :)

@brandon-white
Copy link
Author

Thank you very much @dhermes and @daspecster !! Once this is merged, can you please let me know how I can use it through pip?

@daspecster
Copy link
Contributor

@brandon-white it will be in the next release, but I'm not entirely sure when that will be.

Until then, you could point pip to this commit hash, unless @dhermes or @tseaver have other ideas?

@brandon-white
Copy link
Author

@daspecster Thanks! The changes work but I cannot pull them with pip install git+git://github.com/GoogleCloudPlatform/google-cloud-python.git@60f1ada4a2a04c09f67790c3d1d929f8d18f30f8. Any ideas on how I can get these changes or when the next release is? @dhermes @tseaver

@dhermes
Copy link
Contributor

dhermes commented Jan 8, 2017

@brandon-white that's because the root setup.py file is just a shell that points at each subpackage. pip then installs those from PyPI. What you really want to do is to clone the repo and then pip install ${GIT_REPO}/bigtable/ (which will give you the google-cloud-bigtable repo). pip may allow you to point to subdirectories within a GitHub repo, but I think that is only allowed for the local git protocol supported by pip.

@brandon-white
Copy link
Author

brandon-white commented Jan 26, 2017

@dhermes Thanks for your help here! Do you or anybody else have any idea when the next Bigtable client release might be?

@dhermes
Copy link
Contributor

dhermes commented Jan 26, 2017

I may cut a release this week. However, I'm happy to help you get a local install working from source, feel free to ping me on Hangouts (email on GitHub profile)

@brandon-white
Copy link
Author

@dhermes Appreciate it Danny! I am willing to wait 1-2 weeks for the official release on pip. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

No branches or pull requests

8 participants