Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a type check to ensure the description type provided is valid #473

Closed
wants to merge 14 commits into from

Conversation

Bachmann1234
Copy link

Howdy!

Thanks for this check tool! While using it I made an error that I would have liked to see this tool catch so I thought I would take a stab at adding it myself. Let me know if you have any thoughts!

Thank you!

@@ -88,6 +89,12 @@ def check(dists, output_stream=sys.stdout):
)
description_content_type = 'text/x-rst'

if description_content_type not in _supported_readme_types:
output_stream.write(
'warning; `long_description_content_type` invalid.\n'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the team's thoughts I could make a change to make this an outright failure.

The problem is if this type is wrong pypi will outright reject the package with a 400 error.

I know that technically this tool is checking if the readme will render so perhaps the type is a separate concern?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN the back and forth on this PR I ended up just making this an error

@Bachmann1234
Copy link
Author

Looks like I have some lint errors to fix

twine/commands/check.py:95:80: E501 line too long (107 > 79 characters)
tests/test_check.py:128:80: E501 line too long (100 > 79 characters)

@codecov
Copy link

codecov bot commented Jul 7, 2019

Codecov Report

Merging #473 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #473      +/-   ##
========================================
+ Coverage   82.93%    83%   +0.06%     
========================================
  Files          14     14              
  Lines         750    753       +3     
  Branches      108    110       +2     
========================================
+ Hits          622    625       +3     
  Misses         92     92              
  Partials       36     36
Impacted Files Coverage Δ
twine/commands/check.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add0aea...76efb49. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 7, 2019

Codecov Report

Merging #473 into master will increase coverage by 1.57%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #473      +/-   ##
=========================================
+ Coverage   82.93%   84.5%   +1.57%     
=========================================
  Files          14      14              
  Lines         750     781      +31     
  Branches      108     116       +8     
=========================================
+ Hits          622     660      +38     
+ Misses         92      84       -8     
- Partials       36      37       +1
Impacted Files Coverage Δ
twine/commands/check.py 100% <100%> (ø) ⬆️
twine/commands/upload.py 92.72% <0%> (+1.06%) ⬆️
twine/package.py 88.49% <0%> (+1.76%) ⬆️
twine/repository.py 75.43% <0%> (+2.1%) ⬆️
twine/utils.py 87.23% <0%> (+3.54%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add0aea...3440847. Read the comment docs.

@Bachmann1234
Copy link
Author

Ok, all fixed now :-)

Copy link
Contributor

@bhrutledge bhrutledge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a useful addition to check. I'm not a maintainer, just a new contributor, so I can't speak to the UX questions. However, I do have some style suggestions.

})

monkeypatch.setattr(check, "_find_dists", lambda a: ["dist/dist.tar.gz"])
monkeypatch.setattr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this patching makes sense with the structure of check.check(), but it feels a bit awkward. What do you think about extracting a function from check()? Something like:

def check(dists, output_stream=sys.stdout):
    # ...
    for filename in uploads:
        output_stream.write("Checking distribution %s: " % filename)
        package = PackageFile.from_filename(filename, comment=None)
        failure |= check_package(package, output_stream)

That way, you could unit-test check_package with a PackageFile stub directly, and maybe not have to monkeypatch anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look. I was mostly just following the existing structure trying to keep the changes pretty minimal.

But I guess I can give it a wider look and just revert back if the maintainer prefers the more conservative approach

tests/test_check.py Outdated Show resolved Hide resolved
twine/commands/check.py Outdated Show resolved Hide resolved

output_stream = check.StringIO()
check.check("dist/*", output_stream=output_stream)
assert output_stream.getvalue() == (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this assert be a little narrower? I.E., just check for the expected error content, rather than the whole message? I received similar feedback on a recent PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean maybe? I could basically check for the existence of warning; long_description_content_type invalid to ensure I hit the line. But it would be behaving pretty differently then the other tests of the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten the impression that there's leeway to deviate from the existing test structures, with this sort of thing being an example. I think your proposed check would be sufficient.

@Bachmann1234
Copy link
Author

Bachmann1234 commented Jul 7, 2019

Restructured the check logic a bunch. I think it's cleaner now and potentially prepared to take on other functionality if there are thoughts. Though I do worry I have greatly expanded the initial scope of the PR.

That being said im game to keep working with it. If people like the bigger change and have further comments ill keep going. Or if they like the idea but would prefer a smaller change I could revert back. Either way. (the original PR was just this initial commit 76efb49)

Thanks @bhrutledge for the feedback so far

output_stream
)
description = _get_description(metadata, output_stream)
failure = _description_will_not_render(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now a render failure is the only thing that triggers a a failure. I personally think the description type being provided but invalid should be a failure as well but I wanted to wait for feedback before I did that. See #473 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, but maybe there are some other considerations that we don't know about.

With that in mind, I think it will be easier to adjust the failure modes if the utility functions above were inlined here, as they were in check(). It would also be less of a structural change.

Related to that, the _get_* functions feel like they're doing two things: retrieving a value, and checking its validity. If they don't get inlined, I think it'd be more flexible if they were just validators (possibly with the warning output), similar to _description_will_not_render.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure. They warn you if something is potentially strange but its basically just a log line. The check_package function does not really depend on any of that that other than just getting the value. That's why I see them as getters.

If the validation was actually relevant to the check_package function I could see either separating it out or inlining that validation to check_package

actually saying that makes me wanna go ahead and make the type check a proper validation. Since unlike the other two warnings. It actually does matter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and inlined the getters. What pushed me over the edge was realizing the order which output stream gets written to does matter. Given this I did not want to be passing around output_stream and having functions return both their values and any warnings they wanted to emit felt like a bridge too far.

twine/commands/check.py Outdated Show resolved Hide resolved

output_stream = check.StringIO()
check.check("dist/*", output_stream=output_stream)
assert output_stream.getvalue() == (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten the impression that there's leeway to deviate from the existing test structures, with this sort of thing being an example. I think your proposed check would be sufficient.

failed = check_package(package, output_stream)
assert not failed # Invalid Description type is currently just a warning
assert output_stream.getvalue() == (
'warning; `long_description_content_type` invalid.\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'warning; `long_description_content_type` invalid.\n'
'warning: `long_description_content_type` invalid.\n'

output_stream
)
description = _get_description(metadata, output_stream)
failure = _description_will_not_render(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, but maybe there are some other considerations that we don't know about.

With that in mind, I think it will be easier to adjust the failure modes if the utility functions above were inlined here, as they were in check(). It would also be less of a structural change.

Related to that, the _get_* functions feel like they're doing two things: retrieving a value, and checking its validity. If they don't get inlined, I think it'd be more flexible if they were just validators (possibly with the warning output), similar to _description_will_not_render.

What do you think?

@sigmavirus24
Copy link
Member

So years ago, PyPI didn't accept markdown and only in the last 2-3 years did they start. There's nothing to prevent PyPI potentially adding support for other doc formats (asciidoc, orgmode, etc.) except for someone implementing it and there being a sufficient use-case to support it. Provided that, I don't think a strict error here (without some sort of escape hatch) is necessary. I fully trust users to know what they're doing and if a user is stuck to some older version of twine because that's what their operating system provides and they can't install a newer version (or don't want to), should we really prevent them from uploading their package to that future PyPI?

@Bachmann1234
Copy link
Author

That's a good point. Would you be interested in downgrading this to a warning? Or should I close this?

@sigmavirus24
Copy link
Member

Oh wait, am I remembering correctly that this is automagically ran on upload? If not, ignore my concerns about things breaking for folks in the future.

@Bachmann1234
Copy link
Author

I dont think upload runs check. I only found this command after googling around trying to find out why my package was not uploading.

@Bachmann1234
Copy link
Author

Bachmann1234 commented Jul 8, 2019

Yeah, here is a test I ran (using my branch. I can re run on master if you would like)

I introduced a syntax error into a readme on a project of mine. Verified the error with twine check

Then I tried to upload a version (the version already existed so I expected it to be rejected but it was rejected due to the description instead of the expected reason)

If check was running as part of upload I would not have expected a request to have been made at all

❯ ~/code/twine/venv/bin/twine check dist/marshmallow-polyfield-5.7.tar.gz
Checking distribution dist/marshmallow-polyfield-5.7.tar.gz: Failed
The project's long_description has invalid markup which will not be rendered on PyPI. The following syntax errors were detected:
line 1: Severe: Title overline & underline mismatch.

=====================
Marshmallow-Polyfield
======================

❯ ~/code/twine/venv/bin/twine upload dist/marshmallow-polyfield-5.7.tar.gz --verbose
Uploading distributions to https://upload.pypi.org/legacy/
Uploading marshmallow-polyfield-5.7.tar.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.1k/19.1k [00:00<00:00, 25.4kB/s]
Content received from server:
<html>
 <head>
  <title>400 The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information.</title>
 </head>
 <body>
  <h1>400 The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information.</h1>
  The server could not comply with the request since it is either malformed or otherwise incorrect.<br/><br/>
The description failed to render for &#x27;text/x-rst&#x27;. See https://pypi.org/help/#description-content-type for more information.


 </body>
</html>
HTTPError: 400 Client Error: The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information. for url: https://upload.pypi.org/legacy/

@di
Copy link
Member

di commented Jul 9, 2019

FWIW, I think the "right" way to go about this would be to do what I suggested in #430: add the ability to validate metadata in pypa/packaging, and then validate all the metadata for a given distribution with twine check (not just Long-Description-Content-Type).

I don't think twine should be embedding any logic about what is and isn't valid metadata.

@Bachmann1234
Copy link
Author

I think that makes a lot of sense. I am not super informed in the warehouse/packaging/pip development structures but from what I can tell adding full metadata validation seems to be a larger problem that may take some time. Though it looks like its being worked on based on pypa/packaging#147 and pypa/packaging-problems#264 . Reading though all that its probably something best left to the core team given their experience.

Given that, I could close this, I think the only potential reason to keep it is that right now check depends on this field being correctly specified to do its checking properly.

If you put in an invalid type the tool will just go use rst which seems like a strange choice to me. I get defaulting to rst if type is missing since that used to be the only rendering method. However, if I put in something unsupported having the check give back rst rendering errors seems counter intuitive.

if description in {None, 'UNKNOWN\n\n\n'}:
output_stream.write('warning: `long_description` missing.\n')
description = None
failures = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the future of this PR is in question, but I'm curious about the decision to inline some validation, and put others in helper functions. FWIW, I'd find this easier to follow if they were all in check_package, and given the uncertainty around what should be a warning or a failure, I think it'd be easier to change in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that warning being emitted is not relevant to the final result of check_package. It does not feel like a validation so much as a log line since a package cannot fail due to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants