Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RECORD file broken when filename in the wheel contains a comma #280

Closed
ronaldoussoren opened this issue Dec 31, 2018 · 19 comments
Closed

RECORD file broken when filename in the wheel contains a comma #280

ronaldoussoren opened this issue Dec 31, 2018 · 19 comments
Labels

Comments

@ronaldoussoren
Copy link
Contributor

Environment

pip version: 18.1
Python version: 3.7.1
OS: macOS 14.1
wheel: 0.32.3
Description

I have a package containing a datafile that has a comma in its file name. When creating a wheel and then installing that wheel the RECORD file contains two lines:

package/other,data.dat,sha256=9jljVen42HXtXHKL7kl6flBZmawA_EkJI0VxHGxlzOs,10
"package/other,data.dat",,

The first line is also present in the generated wheel file, the second is added by pip.

Expected behavior

I'd expect 1 line for the file. If PEP 372 is the specification for the format of wheels the RECORD file should be a CSV file and the correct line would be:

"package/other,data.dat",sha256=9jljVen42HXtXHKL7kl6flBZmawA_EkJI0VxHGxlzOs,10

This is a bug in wheel because it generates an invalid RECORD file in this case (one with an additional field), pip then adds a second line with the correct format.

How to Reproduce

I've attached an sdist that contains a datafile as described above.

simple-package-1.0.tar.gz

@agronholm
Copy link
Contributor

Ok, so the problem is that wheel does not add the quotes for the file name?

@agronholm
Copy link
Contributor

PEP 372 is about ordered dictionaries. Perhaps you meant some other PEP?

@agronholm
Copy link
Contributor

You probably meant PEP 427, but I cannot find any passage in it about quoting file names in RECORD. A little help here?

@ronaldoussoren
Copy link
Contributor Author

Yes, but not quite. The problem is that wheel does not escape/quote values that are special in CSV files (and in particular the dialect used in the RECORD file). This means that file names containing comma's or quotes. Newlines in filenames could also be problematic.

BTW. The PEP I meant to write is 376

@agronholm agronholm added the bug label Jan 2, 2019
@ronaldoussoren
Copy link
Contributor Author

And I forgot to mention that I don't know if that PEP is the current specification for the RECORD file.

@pfmoore
Copy link
Member

pfmoore commented Jan 2, 2019

PEP 376 defines the RECORD file here, and specifically states "The RECORD file is a CSV file, composed of records, one line per installed file", with details of the exact options that should be used for the csv module.

While PEP 376 is a bit of a mixture (chunks of it have never been implemented, and are never likely to be) it remains definitive unless superseded. And AFAIK, this part has never been superseded. Also, given that it defines an unambiguous means of handling filenames containing odd characters like commas, why would you not follow it here?

@ronaldoussoren
Copy link
Contributor Author

I ran into this while implementing some functions that read the RECORD file without first reading the specification. Tests for my code failed for a test with a file name that contains a comma, which in the end resulted in this issue.

An alternative way to split the line into fields is to use line.rsplit("," , 2), which removes the need to add quoting but would still fail with pathological filenames containing line separators. This is could work because the second and third fields cannot contain CSV special characters, but is not forward compatible with possible extensions of the file format by adding more fields.

@pfmoore
Copy link
Member

pfmoore commented Jan 2, 2019

It's also worth noting that the tests assert here that the RECORD file always uses \n as the line separator, not os.linesep as PEP 376 requires. I'd be inclined to continue deviating from the PEP in this case, as I'd be reluctant to change existing and tested behaviour without good cause.

I've created #281 for this, if it's of any use.

@agronholm
Copy link
Contributor

agronholm commented Jan 2, 2019

Changing the line separator for RECORD depending on the OS which is used for packaging seems like a very bad idea to me, which is why I did it this way. For installers this should be fine.

@agronholm
Copy link
Contributor

It should also be noted that the wheel spec doesn't say that RECORD should be a CSV file.

@pfmoore
Copy link
Member

pfmoore commented Jan 2, 2019

Good point regarding linesep. And agreed, the wheel spec makes no comment here. Maybe it should - there's certainly a need (even if only a rare one) to support files with commas in their names and using "proper" CSV format seems a reasonable way to do so.

@rcramblit
Copy link

I'd like to bump this issue - it appears this now breaks some pip installs in pip==19.0, which now explicitly depend on a properly escaped RECORD. See pypa/pip#6165

@agronholm
Copy link
Contributor

I'll try to get this done on the coming weekend.

@mikegrant321
Copy link

Hey @agronholm, any updated idea of when you'll be able to get to this one?

@agronholm
Copy link
Contributor

This is the first weekend I have when I'm not completely swamped with RL things to do. This has been among my top priorities in F/OSS work and looks like this weekend I can finally work on it.

@cjerdonek
Copy link
Member

Great! 🎉

@mikegrant321
Copy link

Thanks @agronholm - when can we expect a new release of the wheel package to pypi that includes this fix?

@agronholm
Copy link
Contributor

I was going to investigate test failures on macOS and Windows first but if it looks like that will take too long I will make the release anyway – likely before Wednesday.

@agronholm
Copy link
Contributor

agronholm commented Feb 12, 2019

All of the remaining test failures on macOS and Windows are related to non-ASCII file names in distributions. I will deal with these later. I've released v0.33.0 on PyPI now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants