-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 support #40
Comments
Thift seems to be Python 3 compatible nowadyas. See also #78. |
(removed comment, now part of issue description) |
Thrift is getting there: https://issues.apache.org/jira/browse/THRIFT-1857. I'm highly interested in this patch. Please let me know if there's anything I can do to help. For now, I'll just fork this, apply #78 and regenerate the bindings with a nightly from Thrift. |
Ugh, I didn't actually want to link these two issues, but I just wanted to indicate where I moved my comments... Anyway, so I used Thrift 1.0.0-dev to gain Python 3 support and one of the tests fails:
Which simplifies to:
This is due to a new line in TBinaryProtocol that wasn't there before: It seems that the 'utf-8' encoding parameter in TBinaryProtocol causes the problems, which is new in 1.0.0-dev: ## Thrift 0.9.2: lib/py/src/protocol/TBinaryProtocol.py ##
def writeString(self, str):
self.writeI32(len(str))
self.trans.write(str)
## Thrift 1.0.0-dev: lib/py/src/protocol/TBinaryProtocol.py ##
def writeString(self, str):
encoded = bytearray(str, 'utf-8')
self.writeI32(len(encoded))
self.trans.write(encoded) The integer probably should be send as a TType.I64. Does that mean a new message type, because it can't be send as part of a mutateRows_args anymore? Or is there a way to enforce a different datatype with mutations? edit: No, in ttypes.Mutation.thrift_spec a value is defined as TType.STRING Furthermore, the tearDown fails:
The tear down failure also caused an NPE at the server's side:
I will look into the tearDown() exception now. |
This code you mentioned handles byte strings on Python 2: ## Thrift 0.9.2: lib/py/src/protocol/TBinaryProtocol.py ##
def writeString(self, str):
self.writeI32(len(str))
self.trans.write(str) The updated code you mentioned is plain wrong if it is still intended to handle byte strings: ## Thrift 1.0.0-dev: lib/py/src/protocol/TBinaryProtocol.py ##
def writeString(self, str):
encoded = bytearray(str, 'utf-8')
self.writeI32(len(encoded))
self.trans.write(encoded) This code will handle unicode text only, but I think Anyway, the problem is the buggy Thrift code: it needs to make up its mind about unicode text versus bytes. It should handle byte strings. |
I see now that it is not in Thrift's master, but it was introduced by one of the patches for Python 3 support. I will notify that thread. |
In Python 3, comparison with None raises a TypeError. Which in the context of table.scan() makes sense, because batch_size should be an integer >= 1. The TypeError conveys more precise information, without having to change any code. So I suggest updating test_scan() as follows: ## tests_api.test_scan() ##
with assert_raises(TypeError):
list(table.scan(batch_size=None))
with assert_raises(ValueError):
list(table.scan(batch_size=0)) |
This one seems like an architectural decision. Changing the decode() call for codec.decode(s_hex, 'hex'), led me to a "TypeError: ord() expected string of length 1, but int found". util.str_increment() says its intent is to increment and truncate a byte string, so it would be enough to remove the call to ord(), since s[i] would already result in an integer. However, then other tests start to fail, because, for example, the row_prefix for table.scan is not a byte string. Now, I guess you don't want to convert all strings to byte strings, yet the str_increment is written with byte strings in mind, and assumes a hexadecimal space. However, HBase's row keys are UTF-8 encoded, so I reckon there's no need to wrap around after '\xFF'. Please find my proposed fix below. All the tests pass, but I don't know if there are any side-effects I'm overlooking, or whether you have a completely different direction in mind. ## happybase/util.py ##
MAX_UNICODE = chr(1114111)
def str_increment(s):
"""Increment and truncate a string (for sorting purposes)
This functions returns the shortest string that sorts after the given
string when compared using regular string comparison semantics.
This function increments the last character that is smaller than MAX_UNICODE, and
drops everything after it. If the string only contains MAX_UNICODE characters,
`None` is returned.
"""
for i in range(len(s) - 1, -1, -1):
if s[i] != MAX_UNICODE:
return s[:i] + chr(ord(s[i]) + 1)
return None
## tests/test_util.py ##
def test_str_increment():
def check(s, expected):
v = util.str_increment(s)
assert_equal(expected, v)
assert_less(s, v)
test_values = [
('00', '01'),
('01', '02'),
('fe', 'ff'),
('1234', '1235'),
('12fe', '12ff'),
('12ff', '12fg'),
('424242' + util.MAX_UNICODE, '424243'),
('4242' + 2 * util.MAX_UNICODE, '4243'),
]
assert util.str_increment(util.MAX_UNICODE * 4) is None
for s, expected in test_values:
yield check, s, expected |
That's not true. Row keys are byte strings, and a lot of data models depend on that. |
That will break on Python 2, since |
Ah yes, I should have read the follow up comment on this one: http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/18006 Although I guess for generating end keys, it still suffices, albeit potentially slightly less efficient (i.e. sending longer end keys than necessary to the server).
Another reason to leave it in (I left the call to ord() in the code). However, the call to chr() is not backwards compatible, since the equivalent under Python 2 would be unichr(). How did you solve it in your Plyvel project? A quick search on "str_increment" didn't yield anything. |
Few quick follow-up notes:
|
This encoding compatibility stuff is a tricky business, and I still feel that it involves some architectural decisions, but I'm effectively only using happybase since this week, so I'm either wrong about it involving architectural decisions, or I'm too inexperienced with the library to make them ;) Below my solution, which suits my needs for now, and which I'll happily replace with happybase > 0.9. ## happybase/util.py ##
def str_increment(s):
"""Increment and truncate a byte string (for sorting purposes)
This functions returns the shortest string that sorts after the given
string when compared using regular string comparison semantics.
This function increments the last byte that is smaller than ``0xFF``, and
drops everything after it. If the string only contains ``0xFF`` bytes,
`None` is returned.
"""
try:
ba = bytearray(s, "utf-8")
except TypeError:
ba = bytearray(s)
for i in range(len(ba) - 1, -1, -1):
if ba[i] != 255:
ba[i] += 1
return ba[:(i+1)]
return None
## tests/test_util.py ##
def test_str_increment():
def check(s_hex, expected):
s = codecs.decode(s_hex, 'hex')
v = util.str_increment(s)
v_hex = codecs.encode(v, 'hex')
assert_equal(expected, v_hex)
assert_less(s, v)
test_values = [
(b'00', b'01'),
(b'01', b'02'),
(b'fe', b'ff'),
(b'1234', b'1235'),
(b'12fe', b'12ff'),
(b'12ff', b'13'),
(b'424242ff', b'424243'),
(b'4242ffff', b'4243'),
]
assert util.str_increment(b'\xff\xff\xff') is None
for s, expected in test_values:
yield check, s, expected |
Hi, ...pip install happybase doesn't seem to work...
pip.log:
My environment:
Am I doing something wrong or is this a bug? |
python 3 is unfortunately not yet supported, see related tickets and known issue list for details
sent from my phone. please forgive my tpyos. On February 4, 2016 7:06:52 PM GMT+01:00, Csaba Szilveszter [email protected] wrote:
|
I'm a little confused as to the status of related tickets and known issues. Can you summarize what needs to be done in order to make happybase work with Python 3? If the amount of work is reasonable and I can get permission from my employer, I'm likely willing to do what needs to be done. |
I created a pull request for Python 3 support. Note that while I don't think accepting the pull request is a good idea yet (see details in the pull request), it might still be useful for other developers. |
I closed the pull request, as most of the Hbase documentation conflicts with the comment in the Hbase.thrift file that suggests that keys, values, etc. should be UTF-8. I'll make another pull request which accepts bytes() instances in Python 3. |
I've filed a new pull request with code that does the right thing (mainly, expecting bytes() instances for table names, columns, cell values, etc.). |
For Python3,There is still an error. I used pip to install the happybase.
|
@kayzhou: Python 3 compatibility has not yet made its way into the master branch. |
Is there a recommendation for users who want to use happybase in Python 3? |
@ssaamm |
Niiiiiiice.... thx On Fri, Apr 22, 2016 at 8:46 PM, Scott Cooper [email protected]
|
Is there an ETA on when python 3 support will hit the master? |
warning. rant ahead. this is not personal. there's a pr that needs some more work. you could have read my comments over there. i don't give an eta since i maintain this project in my SPARE time even though i don't use it myself anymore, while pretty much ALL users are corporate users, and only very few of them have ever helped out in any significant way, e.g. by contributing code or providing financial support. even though i keep my project alive since i care about it, i don't work for free on demand by others, so i only spend time on it when i feel like it. i also refuse to feel guilty about this. this is a cultural problem with open source projects used in corporate environments. only taking while never giving back is not sustainable. |
i merged #116 into master |
This would be trivial for HappyBase, but the underlying Thrift library needs to be Python 3 compatible first.
A simple '2to3' on the source code seems to work, but this needs to be properly supported in a stable Thrift release first.
Status/todo for this issue:
TODO.rst
)The text was updated successfully, but these errors were encountered: