Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix py3 bytes #85

Closed
wants to merge 3 commits into from
Closed

Fix py3 bytes #85

wants to merge 3 commits into from

Conversation

asmodehn
Copy link
Contributor

@asmodehn asmodehn commented Aug 7, 2017

I fixed this while working on another package and testing message serialization for python3.
It works fine for me, but I am not sure what kind of test can be added for this here...

@asmodehn
Copy link
Contributor Author

asmodehn commented Aug 8, 2017

This allows sending bytes into a string field (by skipping the encoding part for python3 in that case), but it might be the wrong fix...
See https://discourse.ros.org/t/python3-and-strings/2392/3

@asmodehn
Copy link
Contributor Author

@dirk-thomas any input/advice there ?

It seems to me that we're stuck in a corner without any perfect solution (since going either the full unicode string way would change too much, and enforcing only ascii string might break existing code).

From what I understand the existing code in generator.py tries to match the string field type to whatever str means in python2 or python3. However these are not compatible between each other from what I know...

This fix at least fix the breaking case when a user attempts to send bytes in a string message field.
Other questionable cases' symptom is only garbled output between nodes using different python version as far as I can tell, so, although not suitable, is not as critical.

It would be good if there were tests actually ensuring the serialization and deserialization function are inverse of one another (for the proper types, in any language version), I couldn't find any here.
Thats how I caught this one...

@dirk-thomas
Copy link
Member

This fix at least fix the breaking case when a user attempts to send bytes in a string message field.

The user is supposed to pass a str in Python 3. The API doesn't necessarily have to support accepting bytes. Especially since on the receiving side it will expose a str again. So I am not convinced this patch is necessary / a good idea.

From what I understand the existing code in generator.py tries to match the string field type to whatever str means in python2 or python3. However these are not compatible between each other from what I know...

This is exactly why the wiki page says that unicode is not supported. The API / mapping only works correctly for ASCII across languages / versions (Python 2 vs. Python 3 vs. C++).

This is a really difficult topic - especially with the different languages needing to inter-operate well. Even in ROS 2 support for unicode hasn't been completed (even though we have the "freedom" to break behavior there (if necessary) and don't have to support Python 2). So from my point of view it is yet unclear how to change the spec (and after that the implementation) to satisfy all the goals and constraints.

@asmodehn
Copy link
Contributor Author

asmodehn commented Nov 1, 2017 via email

@dirk-thomas
Copy link
Member

"The user is supposed to pass a str in python3" -> this is an implementation error in my view.

Imo Python 3 code should be allows to assign "foo" to a string field therefore I don't agree with your conclusion. On the "writer" side it should be fairly easy to support setting b"foo" as well as "foo". The choice comes on the "reader" side. What type is being used to expose the value to the subscriber? I could see users being confused if that is b"foo". In ROS 2 that could be something which makes sense. In ROS 1 though this will break a lot of existing code though...

How an ascii string is different from a uint8[] ?

I would argue that an ascii string has the semantic of a \0 terminated "text" whereas uint8[] is a sequence of bytes / octets. Depending on the language I would expect them to map to very different types, e.g. in C++ to std::string and std::vector<uint8_t>.

Do we need an extra ros unistring field type ?

Since the current "string" type is only defined for ASCII I would argue yes, we need another type like "wstring".

How do we serialize the encoding ?

The encoding can either be agreed on in the spec (e.g. UTF-8) or could be stored beside the payload in a separate "field".

These would be the questions that need to be answered in order to cover unicode strings in the message field specification.

Additionally the exact mapping to language specific types need to be defined as well as the behavior of that API (e.g. what happens if the user passes a different type? Is there some conversion happening or not?).

This is exactly what we are trying to answer / decide for ROS 2 (which still doesn't support beyond ASCII atm). Please see ros2/design#117 and ros2/design#130.

@asmodehn
Copy link
Contributor Author

asmodehn commented Nov 5, 2017

Imo Python 3 code should be allows to assign "foo" to a string field

This is however unspecified behavior, since "foo" is the same as "안녕하세요" : a unicode string. Meaning that for a user now, string accepts unicode. Allowing it means exposing ourselves to potentially uncompatible breaking changes later, when specification time arrives...

On the "writer" side it should be fairly easy to support setting b"foo" as well as "foo".

The only point of this PR.

The choice comes on the "reader" side. What type is being used to expose the value to the subscriber? I could see users being confused if that is b"foo". In ROS 2 that could be something which makes sense. In ROS 1 though this will break a lot of existing code though...

My point of view here is :

  • It makes sense for ROS2 and also makes sense for Python3.
  • For ROS1, we should keep in mind that the end of Python2 is coming.So we will have to make some breaking changes anyway, and we should start early to have the time to do the transition.
  • We probably want ROS1 to follow ROS2 on that front (interoperability, ease of use, etc.), so it's probably good to do ROS2 specification and first implementation first before breaking anything in ROS1.

I would argue that an ascii string has the semantic of a \0 terminated "text" whereas uint8[] is a sequence of bytes / octets. Depending on the language I would expect them to map to very different types, e.g. in C++ to std::string and std::vector<uint8_t>.

If the array serialization algorithm encode the size of the array, then the \0 is not needed, and the serialization format for both is the same. What might differ indeed is the matching to a native type in a language, and python3 matching is different than c++ matching which easily confuse devs. One option would be for ROS to define his own cross-language types, with semantics based on serialization algorithm only, ie. ros_string and let any potential confusion when mapping to a language, be managed separately in each ros-api library in each language. But I assume this was already thought about and deemed unsuitable?

Anyway I ll join the design conversation, but it shouldn't be in the way of this PR as far as I can see...

@dirk-thomas
Copy link
Member

This is however unspecified behavior, since "foo" is the same as "안녕하세요" : a unicode string. Meaning that for a user now, string accepts unicode. Allowing it means exposing ourselves to potentially uncompatible breaking changes later, when specification time arrives...

I was mostly referring to the use case that the caller should be able to pass a Python 3 str. The constraint would still be that it needs to be encodable as "ASCII".

kartikmohta added a commit to kartikmohta/genpy that referenced this pull request Nov 6, 2017
@asmodehn
Copy link
Contributor Author

asmodehn commented Nov 7, 2017 via email

@dirk-thomas
Copy link
Member

I will close this for now due to the age / inactivity and because the upcoming ROS Noetic being the first ROS distro officially targeting Python 3. If the problem still exists in Noetic please open a new ticket with steps to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants