-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Characters like äüö are output incorrectly #19
Comments
I don't think it's a Readability4J issue but that you have to wrap the output in a structure like this to set encoding to UTF-8 (see #2):
This is exactly what |
@jamal2362 Is it possible the website uses a charset other than UTF-8 and you don't take that into account when creating your |
You're right, Created now the method But i don't think that will resolve @jamal2362's issue as above document, google.de, has its charset already set to UTF-8. Try version 1.0.8 if it solves your issue but i think the issue lies somewhere else. |
@dankito My apologies, my question was aimed at @jamal2362, sorry if that wasn't clear. I don't think your library does anything wrong. I think the String that's being passed to your library is already wrong, because the code creating the String doesn't check the website encoding. The same thing actually happened to me and I thought for a while that Readability4J was malfunctioning before realizing it was my own fault :-) |
@dankito @michaldvorak79 |
@jamal2362 What I mean is this: when you download a web page, you have a byte array, right? But Readability4J requires Charset can normally be obtained from the response HTTP headers or it's included in a I don't know what your code looks like exactly and how do you obtain the data in your You can check your |
Can you post your code how you download web page's HTML, Jamal? Maybe this code helps you:
|
Characters like äüö are output incorrectly on some websites.
In the German language these characters are often used.
In English it does not occur and there is not this problem.
Here is a picture how this looks like on Google.
Here is a screenshot where it is displayed without problems äüö.
The text was updated successfully, but these errors were encountered: