Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding on title attribute for a link tag fails #30

Closed
jeroenp opened this issue Jul 11, 2016 · 7 comments
Closed

Encoding on title attribute for a link tag fails #30

jeroenp opened this issue Jul 11, 2016 · 7 comments
Labels

Comments

@jeroenp
Copy link

jeroenp commented Jul 11, 2016

The textile.utils generate_tags() function crashes with a UnicodeDecodeError when you use special characters on the title attribute, for example:

"Tëxtíle (Tëxtíle)":http://lala.com

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

I tried solving the issue by encoding the inserted content, since that mixes the bytes with a unicode string that causes the decode error, but that makes textile crash somewhere in core.py:

element_tag.insert(len(element_tag) - 1, content.encode(enc))

What works for me is decoding the result from elementtree:

        element_tag = [v.decode(enc) for v in ElementTree.tostringlist(
                       element, encoding=enc, method='html')]

Platform: python 2.7.11

@sebix sebix added the bug label Jul 11, 2016
ikirudennis added a commit that referenced this issue Jul 11, 2016
@ikirudennis
Copy link
Member

Blargh. Out of curiosity, would anyone care if I stopped supporting Python 2.6?

@ikirudennis
Copy link
Member

Note to self: Nothing is ever easy.

@ikirudennis
Copy link
Member

@jeroenp I think I've got a reasonable solution going on here. Are you able to test your code against the hotfix/unicode_title branch to confirm there aren't more edge cases?

@jbouclier
Copy link

jbouclier commented Jul 18, 2016

I was having troubles with the following link:

"ANMAT(Administración Nacional de Medicamentos, Alimentos y Tecnología Médica)":http://www.anmat.gov.ar

As you can see there are some accuted letters there.

The hot fix solved for me. Is the first time I install a hotfix using pip
I'm not sure is relevant, just in case this is how I installed the hotfix:

pip uninstall textile
pip install git+https://github.com/textile/python-textile.git@hotfix/unicode_title

@ikirudennis
Copy link
Member

That's great. And yes, those pip commands are correct, though it might complain that it's lacking a #egg=textile on the end of that git url. I'm going to push this out shortly.

@jbouclier
Copy link

It didn't complain about the lack of #egg=textile. This is the output from the pip install:

E:\temp\app_map>pip install git+https://github.com/textile/python-textile.git@hotfix/unicode_title
Collecting git+https://github.com/textile/python-textile.git@hotfix/unicode_title
  Cloning https://github.com/textile/python-textile.git (to hotfix/unicode_title) to c:\users\jjavier\appdata\local\temp\pip-clh9vq-build
Requirement already satisfied (use --upgrade to upgrade): six in c:\adp\python27\lib\site-packages (from textile==2.3.3)
Installing collected packages: textile
  Running setup.py install for textile ... done
Successfully installed textile-2.3.3

May be is something in my local environment? It doesn't seems something to worry about.

Just in case, this the version of pip I'm using:

E:\temp\app_map>pip --version
pip 8.1.2 from C:\adp\Python27\lib\site-packages (python 2.7)

@ikirudennis
Copy link
Member

Those all seem to be non-issues. Anyway, I've merged this in, and will be releasing the new version as soon as it passes through travis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants