Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicitly passed parametrize ids do not get escaped to ascii #1351

Closed
davehunt opened this issue Feb 1, 2016 · 16 comments · Fixed by mozilla/bedrock#4304
Closed

explicitly passed parametrize ids do not get escaped to ascii #1351

davehunt opened this issue Feb 1, 2016 · 16 comments · Fixed by mozilla/bedrock#4304
Labels
good first issue easy issue that is friendly to new contributor status: help wanted developers would like help from experts on this topic topic: collection related to the collection phase topic: parametrize related to @pytest.mark.parametrize type: bug problem that needs to be addressed

Comments

@davehunt
Copy link
Contributor

davehunt commented Feb 1, 2016

If values in parametrize contain unicode bytes, they are escaped when generating the IDs, however if these same objects are provided via the ids argument, they are not escaped. As a result, the resulting IDs can't be used with pylib's XML/HTML generation.

This was discovered and previously discussed in https://github.com/davehunt/pytest-html/pull/30

To reproduce, create a conftest.py with the following contents:

# -*- coding: utf-8 -*-
def pytest_runtest_logreport(report):
    # print report.nodeid  # test_unicode.py::test_unicode[〈]
    print unicode(report.nodeid)  # UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
    # print report.nodeid.encode('string-escape')  # test_unicode.py::test_unicode[〈]
    # print unicode(report.nodeid.encode('string-escape'))  # test_unicode[\xe3\x80\x88]

Then, create a test file with the following contents:

# -*- coding: utf-8 -*-
import pytest
@pytest.mark.parametrize('val', ['foo'], ids=['〈'])
def test_unicode(val):
    pass

Run the test, and you will get the following traceback:

test_unicode.py 
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 90, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 121, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 146, in pytest_runtestloop
INTERNALERROR>     item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 595, in execute
INTERNALERROR>     return _wrapped_call(hook_impl.function(*args), self.execute)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 253, in _wrapped_call
INTERNALERROR>     return call_outcome.get_result()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 279, in get_result
INTERNALERROR>     _reraise(*ex)  # noqa
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 264, in __init__
INTERNALERROR>     self.result = func()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 65, in pytest_runtest_protocol
INTERNALERROR>     runtestprotocol(item, nextitem=nextitem)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 72, in runtestprotocol
INTERNALERROR>     rep = call_and_report(item, "setup", log)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 123, in call_and_report
INTERNALERROR>     hook.pytest_runtest_logreport(report=report)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/workspace/pytest-scratchpad/conftest.py", line 6, in pytest_runtest_logreport
INTERNALERROR>     print unicode(report.nodeid)  # UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
INTERNALERROR> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
@RonnyPfannschmidt RonnyPfannschmidt added type: bug problem that needs to be addressed status: help wanted developers would like help from experts on this topic good first issue easy issue that is friendly to new contributor topic: parametrize related to @pytest.mark.parametrize topic: collection related to the collection phase unicode labels Feb 1, 2016
@RonnyPfannschmidt
Copy link
Member

at second glace, this isbreaking as expected, print on python2 will break on unicode

please verify

@The-Compiler
Copy link
Member

No, unicode is - in the case of pytest-html, it's when trying to add the element to HTML

The point is if pytest should always make sure that ids are valid unicode objects, by doing _escape_bytes if they are bytes as well.

@RonnyPfannschmidt
Copy link
Member

my understanding is, that currently node id's are native strings so bytes on python2
if we want to make them unicode in all cases, then we need a major release

@The-Compiler
Copy link
Member

Sorry, my comment above is wrong - it should ensure they're always ascii-only native strings (i.e. bytes on Python 2), like it does with the autogenerated IDs.

For example, with this conftest.py:

from __future__ import print_function

def pytest_runtest_logreport(report):
    if report.when == 'call':
        print('\n  nodeid: {!r}'.format(report.nodeid))

and this test_foo.py:

# encoding: utf-8

import pytest

@pytest.mark.parametrize('foo', ['büpf'])
def test_args(foo):
    pass

@pytest.mark.parametrize('foo', ['a'], ids=['büpf'])
def test_ids(foo):
    pass

The autogenerated ID gets escaped to be ascii only, but not the ids one:

$ python2 -m py.test  -v
[...]

test_foo.py::test_args[b\xc3\xbcpf] PASSED
  nodeid: 'test_foo.py::test_args[b\\xc3\\xbcpf]'

test_foo.py::test_ids[büpf] PASSED
  nodeid: 'test_foo.py::test_ids[b\xc3\xbcpf]'

@RonnyPfannschmidt RonnyPfannschmidt changed the title bytes objects in parametrize ID lists are not encoded explicitly passed parametrize ids do not get escaped to ascii Feb 11, 2016
@RonnyPfannschmidt
Copy link
Member

@hpk42 ping?

im under the impression we should either ensure the id's are always ascii,
and warn if the given id's do not match canonical ascii or support unicode

since python is growing more international and identifiers are unicode in 3.x, i think we should support unicode node id's simply because that is what we will get in future
for non-ascii byte string node id's we should issue a pytest-warning and escape them as if they where byte strings

@ceridwen
Copy link
Contributor

I'd like to try this bug for my Outreachy application for pytest-html. What's the final conclusion on what parametrize should do with the ids arguments? I'd assume it's probably better to support Unicode ids. Should I call the same _escape_bytes() that's being used in _idval?

@The-Compiler
Copy link
Member

FWIW, @RonnyPfannschmidt's proposal makes sense from my POV.

@RonnyPfannschmidt
Copy link
Member

a simplistic implementation needs to ensure a few details

  1. ensue non-ascii byte strings are escaped to something ascii-ish
  2. ensure unicode/utf8 is mapped correctly to text

we should always byte-escape non-aphanumeric id values that are byte strings
we should always unicode-escape unicode id values
the target is, that node id's are always ascii text in the end

@nicoddemus
Copy link
Member

@RonnyPfannschmidt in your proposal do you mean Python 2 and 3, or only for Python 2? Python 3 should we strive to support full unicode?

@RonnyPfannschmidt
Copy link
Member

for both - even if we support full unicode id's we still will run into problems on windows, osx and other things

@nicoddemus
Copy link
Member

Displaying them in the terminal for example?

@RonnyPfannschmidt
Copy link
Member

yes, also they are sometimes used for filenames/lookup keys - making them consistent ascii text in all cases seems sensible simply to avoid platform issues on unicode boundaries

@nicoddemus
Copy link
Member

Fair enough, sounds reasonable. Thanks! 😁

@ceridwen
Copy link
Contributor

A draft pull request is up at #1463.

@ceridwen
Copy link
Contributor

New pull request targeted at features is up at #1470.

@The-Compiler
Copy link
Member

Closing as the PR was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue easy issue that is friendly to new contributor status: help wanted developers would like help from experts on this topic topic: collection related to the collection phase topic: parametrize related to @pytest.mark.parametrize type: bug problem that needs to be addressed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants