Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3 #1383

Closed
FabioNevesRezende opened this issue Dec 4, 2018 · 5 comments
Closed
Labels
solution: invalid the issue is not related to the library

Comments

@FabioNevesRezende
Copy link

  • What is the issue you have?
    Can't dump json object into string

  • Please describe the steps to reproduce the issue. Can you provide a small but working code example?

    nlohmann::json fJson;
    std::string codigo_ativo("ÇÃO");
    fJson["CODIGO_ATIVO"] = codigo_ativo;
    fJson.dump();
  • What is the expected behavior?
    the .dump() method to generate the serialized string of the json object.

  • And what is the actual behavior instead?
    Exception thrown: [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3

  • Which compiler and operating system are you using? Is it a supported compiler?
    cmake version 3.11.4 with -utf-8 compile option

  • Did you use a released version of the library or the version from the develop branch?
    Release version nº 3.1.2 (https://github.com/nlohmann/json/releases/tag/v3.1.2)

  • If you experience a compilation error: can you compile and run the unit tests?
    no compilation error

I've noticed similar erros at issues #1022 and
#1131
To try to fix it I added the -utf-8 flag to the compiler. Before setting a value to tje fJson object, I printed the content of the codigo_ativo variable to check its hex content:

for (size_t i = 0; i < codigo_ativo.size(); ++i)
      {
        std::cout << i << " " << std::hex << static_cast<int>(static_cast<uint8_t>(codigo_ativo[i])) << std::endl;
      }

outputs:

0 c7
1 c3
2 4f

@nlohmann
Copy link
Owner

nlohmann commented Dec 4, 2018

The string is not UTF-8 encoded. The string ÇÃO should yield the code points C7 C3 4F and thus the UTF-8 byte sequence C387 C383 4F. The latter is printed by your example program. This is not a bug from the library (it in fact detects that C3 is not a valid UTF-8 byte), but your compiler or the encoding of the source code file.

@nlohmann nlohmann added the solution: invalid the issue is not related to the library label Dec 4, 2018
@FabioNevesRezende
Copy link
Author

FabioNevesRezende commented Dec 4, 2018

The string is encoded in Ascii, but isn't ascii codes equivalent to their respective in utf-8?

see:
https://stackoverflow.com/questions/2347783/how-to-convert-an-ascii-string-to-an-utf8-string-in-c

@nlohmann
Copy link
Owner

nlohmann commented Dec 4, 2018

ASCII is a subset of UTF-8. From your string, only the last character can be expressed by ASCII.

You may want to have a look at https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ and https://utf8everywhere.org

@FabioNevesRezende
Copy link
Author

"ASCII is a subset of UTF-8" so if the API accepts UTF-8 it should be accepting ASCII. And All the three characters can be expressed in ascii, see its table:

https://theasciicode.com.ar/

Decimal 199 = Ã
Decimal 128 = Ç

@nlohmann
Copy link
Owner

nlohmann commented Dec 4, 2018

That is extended ASCII. ASCII can only express 128 characters - from 0x00 to 0x7F.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solution: invalid the issue is not related to the library
Projects
None yet
Development

No branches or pull requests

2 participants