Fix py3 bytes #85

asmodehn · 2017-08-07T07:54:31Z

I fixed this while working on another package and testing message serialization for python3.
It works fine for me, but I am not sure what kind of test can be added for this here...

fixing inefficient canon method in rostime (ros#77)

asmodehn · 2017-08-08T05:25:33Z

This allows sending bytes into a string field (by skipping the encoding part for python3 in that case), but it might be the wrong fix...
See https://discourse.ros.org/t/python3-and-strings/2392/3

…serialization.

asmodehn · 2017-08-11T07:41:47Z

@dirk-thomas any input/advice there ?

It seems to me that we're stuck in a corner without any perfect solution (since going either the full unicode string way would change too much, and enforcing only ascii string might break existing code).

From what I understand the existing code in generator.py tries to match the string field type to whatever str means in python2 or python3. However these are not compatible between each other from what I know...

This fix at least fix the breaking case when a user attempts to send bytes in a string message field.
Other questionable cases' symptom is only garbled output between nodes using different python version as far as I can tell, so, although not suitable, is not as critical.

It would be good if there were tests actually ensuring the serialization and deserialization function are inverse of one another (for the proper types, in any language version), I couldn't find any here.
Thats how I caught this one...

dirk-thomas · 2017-10-24T17:16:19Z

This fix at least fix the breaking case when a user attempts to send bytes in a string message field.

The user is supposed to pass a str in Python 3. The API doesn't necessarily have to support accepting bytes. Especially since on the receiving side it will expose a str again. So I am not convinced this patch is necessary / a good idea.

From what I understand the existing code in generator.py tries to match the string field type to whatever str means in python2 or python3. However these are not compatible between each other from what I know...

This is exactly why the wiki page says that unicode is not supported. The API / mapping only works correctly for ASCII across languages / versions (Python 2 vs. Python 3 vs. C++).

This is a really difficult topic - especially with the different languages needing to inter-operate well. Even in ROS 2 support for unicode hasn't been completed (even though we have the "freedom" to break behavior there (if necessary) and don't have to support Python 2). So from my point of view it is yet unclear how to change the spec (and after that the implementation) to satisfy all the goals and constraints.

asmodehn · 2017-11-01T18:38:35Z

Yes this is a difficult topic with all language needed to interoperate. And python2 and python3 have different types for this, so they could even be considered different languages. And we can keep the spec the same, but a "specification" should be abstracted from any possible implementation, therefore not related to any language construct (including C) My perspective on this is "a library should not impose different semantics on the user/dev than the semantics of the language it is interfacing with". Otherwise you get into horror stories of people writing crazy code that might eventually have to be maintained forever... So, since unicode is not supported, lets forget about it for now. 1) As far as I understand the current ros field string spec is: "ascii string". This should be implemented in python3 with bytes. So the rospy library should not expose str for an ascii string in the python3 case, because we need to match the language semantics. In python2, the library should expose str and not unicode. "The user is supposed to pass a str in python3" -> this is an implementation error in my view. This is what I am focusing on here, to make what should work, sending bytes , work. The second step would be to expose bytes on the other end, but not sure if/how we ll break things, so that can be another PR... At this step I would consider the utf8 part of the implementation incomplete, since only one encoding is supported, and confusing, since we have one implementation for two different languages py2 and py3 for the same ros type supporting two usecases (ascii or unicode). Ideally I would remove it but we cant break things that already "work" for users we cannot control. So : 2) How an ascii string is different from a uint8[] ? Do we need an extra ros unistring field type ? How do we serialize the encoding ? These would be the questions that need to be answered in order to cover unicode strings in the message field specification.

dirk-thomas · 2017-11-01T19:00:58Z

"The user is supposed to pass a str in python3" -> this is an implementation error in my view.

Imo Python 3 code should be allows to assign "foo" to a string field therefore I don't agree with your conclusion. On the "writer" side it should be fairly easy to support setting b"foo" as well as "foo". The choice comes on the "reader" side. What type is being used to expose the value to the subscriber? I could see users being confused if that is b"foo". In ROS 2 that could be something which makes sense. In ROS 1 though this will break a lot of existing code though...

How an ascii string is different from a uint8[] ?

I would argue that an ascii string has the semantic of a \0 terminated "text" whereas uint8[] is a sequence of bytes / octets. Depending on the language I would expect them to map to very different types, e.g. in C++ to std::string and std::vector<uint8_t>.

Do we need an extra ros unistring field type ?

Since the current "string" type is only defined for ASCII I would argue yes, we need another type like "wstring".

How do we serialize the encoding ?

The encoding can either be agreed on in the spec (e.g. UTF-8) or could be stored beside the payload in a separate "field".

These would be the questions that need to be answered in order to cover unicode strings in the message field specification.

Additionally the exact mapping to language specific types need to be defined as well as the behavior of that API (e.g. what happens if the user passes a different type? Is there some conversion happening or not?).

This is exactly what we are trying to answer / decide for ROS 2 (which still doesn't support beyond ASCII atm). Please see ros2/design#117 and ros2/design#130.

asmodehn · 2017-11-05T05:50:34Z

Imo Python 3 code should be allows to assign "foo" to a string field

This is however unspecified behavior, since "foo" is the same as "안녕하세요" : a unicode string. Meaning that for a user now, string accepts unicode. Allowing it means exposing ourselves to potentially uncompatible breaking changes later, when specification time arrives...

On the "writer" side it should be fairly easy to support setting b"foo" as well as "foo".

The only point of this PR.

The choice comes on the "reader" side. What type is being used to expose the value to the subscriber? I could see users being confused if that is b"foo". In ROS 2 that could be something which makes sense. In ROS 1 though this will break a lot of existing code though...

My point of view here is :

It makes sense for ROS2 and also makes sense for Python3.
For ROS1, we should keep in mind that the end of Python2 is coming.So we will have to make some breaking changes anyway, and we should start early to have the time to do the transition.
We probably want ROS1 to follow ROS2 on that front (interoperability, ease of use, etc.), so it's probably good to do ROS2 specification and first implementation first before breaking anything in ROS1.

I would argue that an ascii string has the semantic of a \0 terminated "text" whereas uint8[] is a sequence of bytes / octets. Depending on the language I would expect them to map to very different types, e.g. in C++ to std::string and std::vector<uint8_t>.

If the array serialization algorithm encode the size of the array, then the \0 is not needed, and the serialization format for both is the same. What might differ indeed is the matching to a native type in a language, and python3 matching is different than c++ matching which easily confuse devs. One option would be for ROS to define his own cross-language types, with semantics based on serialization algorithm only, ie. ros_string and let any potential confusion when mapping to a language, be managed separately in each ros-api library in each language. But I assume this was already thought about and deemed unsuitable?

Anyway I ll join the design conversation, but it shouldn't be in the way of this PR as far as I can see...

dirk-thomas · 2017-11-06T18:32:19Z

This is however unspecified behavior, since "foo" is the same as "안녕하세요" : a unicode string. Meaning that for a user now, string accepts unicode. Allowing it means exposing ourselves to potentially uncompatible breaking changes later, when specification time arrives...

I was mostly referring to the use case that the caller should be able to pass a Python 3 str. The constraint would still be that it needs to be encodable as "ASCII".

See ros#85 for more discussion

asmodehn · 2017-11-07T00:52:17Z

That constraint however is not matching the laguage semantics, therefore not checkable by the language tools (even later when we add python3 types) which means it should be checked by the ros library, but that is not implemented (and it should be clearly documented that it is not the primary specified behavior nor a stable reliable implementation to avoid confusion like the one we are having now, where I expect ASCII string to translate to bytes in py3 - both ways - and have bytes py3 type tested and always working, but the unicode unspecified solution is the half implemented py3 behavior ). I would agree however with the naive expectation that a ros string should accept *any* unicode utf8 string, but this still needs to be specified, and it will change/break quite a few things in a few places like python did between 2 and 3... so it s maybe not for ROS1 ? Following for ROS1 it means that : - the unspecified behavior (accepting unicode) should be completely dropped (and translate to bytes for py3 instead of str should be implemented), OR - that constraint of limiting to ASCII characters and accepting unicode string (but translate to bytes for py3) should be implemented in the library, OR - we eventually will follow ROS2 and break existing user code starting from a specific distro, doing whatever ROS2 decides to do. I commented on the linked design discussions about that. Nothing ideal here, and code migration pain is lurking around the corner in any case...

dirk-thomas · 2020-05-12T19:18:34Z

I will close this for now due to the age / inactivity and because the upcoming ROS Noetic being the first ROS distro officially targeting Python 3. If the problem still exists in Noetic please open a new ticket with steps to reproduce.

asmodehn added 2 commits July 3, 2017 16:16

Merge pull request #1 from ros/kinetic-devel

fd0d7ab

fixing inefficient canon method in rostime (ros#77)

fixing check for python3 and byte/text data

6869948

fixing tests for new check to decide if encoding is necessary during …

2ac4b03

…serialization.

asmodehn force-pushed the fix_py3_bytes branch from 047039c to 2ac4b03 Compare August 11, 2017 07:36

asmodehn mentioned this pull request Aug 30, 2017

bytes/string pyros-dev/pyros-msgs#24

Open

dirk-thomas mentioned this pull request Nov 6, 2017

msg check_type: uint8/char array in a msg is a byte array not str #90

Closed

kartikmohta added a commit to kartikmohta/genpy that referenced this pull request Nov 6, 2017

Allow str input into a uint8[] message field

23b30da

See ros#85 for more discussion

asmodehn mentioned this pull request Feb 17, 2018

Strings issues with python3 pyros-dev/pyros-schemas#11

Open

dirk-thomas closed this May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix py3 bytes #85

Fix py3 bytes #85

asmodehn commented Aug 7, 2017

asmodehn commented Aug 8, 2017

asmodehn commented Aug 11, 2017

dirk-thomas commented Oct 24, 2017

asmodehn commented Nov 1, 2017 via email •

edited by dirk-thomas

Loading

dirk-thomas commented Nov 1, 2017

asmodehn commented Nov 5, 2017

dirk-thomas commented Nov 6, 2017

asmodehn commented Nov 7, 2017 via email •

edited by dirk-thomas

Loading

dirk-thomas commented May 12, 2020

Fix py3 bytes #85

Fix py3 bytes #85

Conversation

asmodehn commented Aug 7, 2017

asmodehn commented Aug 8, 2017

asmodehn commented Aug 11, 2017

dirk-thomas commented Oct 24, 2017

asmodehn commented Nov 1, 2017 via email • edited by dirk-thomas Loading

dirk-thomas commented Nov 1, 2017

asmodehn commented Nov 5, 2017

dirk-thomas commented Nov 6, 2017

asmodehn commented Nov 7, 2017 via email • edited by dirk-thomas Loading

dirk-thomas commented May 12, 2020

asmodehn commented Nov 1, 2017 via email •

edited by dirk-thomas

Loading

asmodehn commented Nov 7, 2017 via email •

edited by dirk-thomas

Loading