Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--find-links should not warn about missing HTML5 doctype #10903

Closed
1 task done
virtuald opened this issue Feb 13, 2022 · 8 comments · Fixed by #10906
Closed
1 task done

--find-links should not warn about missing HTML5 doctype #10903

virtuald opened this issue Feb 13, 2022 · 8 comments · Fixed by #10906
Labels
type: enhancement Improvements to functionality
Milestone

Comments

@virtuald
Copy link

Description

This is not a duplicate of #10825 -- I've moved a number of my comments from #10825 as this is a separate issue. That issue is about enforcing PEP 503, which states that servers implementing the python simple index protocol should have an HTML5 doctype. This is not about that.


When using --find-links, this warning can appear:

× The package index page being used does not have a proper HTML doctype declaration.
╰─> Problematic URL: https://www.tortall.net/~robotpy/wheels/2022/roborio/

note: This is an issue with the page at the URL mentioned above.
hint: You might need to reach out to the owner of that package index, to get this fixed. See https://github.com/pypa/pip/issues/10825 for context.

Currently the --find-links documentation says:

  -f, --find-links <url>      If a URL or path to an html file, then parse for links
                              to archives such as sdist (.tar.gz) or wheel (.whl)
                              files. If a local path or file:// URL that's a
                              directory, then look for archives in the directory
                              listing. Links to VCS project URLs are not supported.

There is no HTML5 doctype requirement mentioned.


To me, --find-links serves a very different purpose than a full-up pypi-style index implementation. For environments where a full python index is too much work (or in corporate environments where working with IT is really difficult), it's very convenient to stick a bunch of files on a webserver and be able to point pip at an arbitrary directory listing and install packages from that directory. Unfortunately, the most popular webservers in the world (and even python's default http.server!) do not put an HTML5 doctype by default, because it simply does not matter if all you're doing is trying to show a directory listing so users can download a file.

You might say, that it's currently only a warning, and it'll be a long time until we make it an error! But it's a useless warning, and the only way to fully resolve this is to go to every webserver vendor in the world and tell them that they must use an HTML5 doctype in their directory listings because pip says so. And then those changes need to be backported to 'stable' linux distributions like RHEL.

In many corporate environments, developers don't get a choice of which webserver IT is using, and so this warning is just unnecessary noise and will waste hundreds of hours for developers and ops teams.

Production-quality web servers that don't emit HTML5 doctype by default

Others that don't

Those that do


I appreciate that html5lib adds a lot of work for pip maintainers. If there's a way to use http.parser and ignore the doctype (which the migration from an error to a warning indicates that it is), it seems like that would save hundreds (thousands?) of person-hours for ops teams all around the world who would need to figure out how to reconfigure their webservers because pip is being unnecessarily picky.

Thanks for your consideration.

Expected behavior

No warning

pip version

22.0.3

Python version

3.10

OS

any

How to Reproduce

N/A

Output

No response

Code of Conduct

@virtuald virtuald added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 13, 2022
@pfmoore
Copy link
Member

pfmoore commented Feb 13, 2022

This is a good point. I agree that --find-links should not enforce the doctype restriction.

A PR to fix this would be welcome.

@pradyunsg pradyunsg added type: enhancement Improvements to functionality and removed S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 13, 2022
@pradyunsg
Copy link
Member

Honestly, I'd be fine with dropping the doctype check entirely as well.

@ppena-LiveData
Copy link

ppena-LiveData commented Feb 14, 2022

Dropping the doctype check would also fix Issue #10880, so I whole-heartedly agree with that idea.

@pombredanne
Copy link
Contributor

My 2 cents, in full agreement and carrying over my comment from #10825 (comment)

FWIW, a plain http directory listing now returns a warning when used with --find-links https://thirdparty.aboutcode.org/pypi/ as a way to supplement PyPI with extra pre-built wheels missing on PyPI.

Looking in links: https://thirdparty.aboutcode.org/pypi/
warning: bad-index-doctype

This is the same issue as reported several times above
Now there is a perfectly valid doctype there:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

May be not an HTML5 type ... but a valid HTML type. I would really appreciate that this deprecation does not force everyone to do busy work for something which is a non-issue IMHO ;)

In practice a warning on doctype is IMHO counterproductive and should be dropped. In the vast majority of the cases, a user will not be able to do anything about it, so that's like crying wolf 🐺
It is similar to having Firefox always warn you about the quirks of the web. That would not be fun and useful. Stated another way, it NOT a client responsability to fix the upstream server, and definitely not pip's role to police PyPI repositories.

@pombredanne
Copy link
Contributor

@pradyunsg re:

Honestly, I'd be fine with dropping the doctype check entirely as well.

I think this makes the most sense to me.

@pradyunsg pradyunsg added this to the 22.0.4 milestone Feb 25, 2022
@fungi
Copy link
Contributor

fungi commented Feb 25, 2022

Repeating my day-of comment from the other issue here since it was really about working around --find-links issues, and because the HTML 3.2 doctype mentioned above looks almost certain to be from Apache mod_autoindex: If it helps anyone in a similar situation, here's what we did in the OpenDev Collaboratory in order to coerce Apache's mod_autoindex into returning file lists for our wheel cache which pip 22.0 would tolerate: https://opendev.org/opendev/system-config/commit/e61f584

In short, configure it not to include its normal preamble, and add a custom header file which has the HTML 5 doctype in it instead.

@pombredanne
Copy link
Contributor

@fungi Thanks! this is very useful, but IMHO we should not require the user of a web site to modify its core config just to serve plain links.

@fungi
Copy link
Contributor

fungi commented Feb 25, 2022

@fungi Thanks! this is very useful, but IMHO we should not require the user of a web site to modify its core config just to serve plain links.

Yes, thanks, I should have been clear that we did it as a (hopefully temporary) workaround so that things wouldn't remain broken for users of latest pip while its maintainers work through this.

inmantaci pushed a commit to inmanta/inmanta-core that referenced this issue Mar 7, 2022
Bumps [pip](https://github.com/pypa/pip) from 22.0.3 to 22.0.4.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p>
<blockquote>
<h1>22.0.4 (2022-03-06)</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li>Drop the doctype check, that presented a warning for index pages that use non-compliant HTML 5. (<code>[#10903](pypa/pip#10903) &lt;https://github.com/pypa/pip/issues/10903&gt;</code>_)</li>
</ul>
<h2>Vendored Libraries</h2>
<ul>
<li>Downgrade distlib to 0.3.3.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/pypa/pip/commit/29ddc93ee8f33f4e1c7402a16b32bdf5b61c7369"><code>29ddc93</code></a> Bump for release</li>
<li><a href="https://github.com/pypa/pip/commit/c14946891cf4b6adb6c33a18c65dc4627fc80aa4"><code>c149468</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10943">#10943</a> from pradyunsg/downgrade-distlib</li>
<li><a href="https://github.com/pypa/pip/commit/6c17e27772f8e3d5d4e8732dc8c9e6e516a6575e"><code>6c17e27</code></a> Drop the doctype check (<a href="https://github-redirect.dependabot.com/pypa/pip/issues/10906">#10906</a>)</li>
<li>See full diff in <a href="https://github.com/pypa/pip/compare/22.0.3...22.0.4">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=22.0.3&new-version=22.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
mergify bot pushed a commit to andrewbolster/bolster that referenced this issue Mar 8, 2022
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some time.

Note: if you make any changes to this PR yourself, they will take precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps [pip](https://github.com/pypa/pip) from 22.0.2 to 22.0.4.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p>
<blockquote>
<h1>22.0.4 (2022-03-06)</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li>Drop the doctype check, that presented a warning for index pages that use non-compliant HTML 5. (<code>[#10903](pypa/pip#10903) &lt;https://github.com/pypa/pip/issues/10903&gt;</code>_)</li>
</ul>
<h2>Vendored Libraries</h2>
<ul>
<li>Downgrade distlib to 0.3.3.</li>
</ul>
<h1>22.0.3 (2022-02-03)</h1>
<h2>Features</h2>
<ul>
<li>Print the exception via <code>rich.traceback</code>, when running with <code>--debug</code>. (<code>[#10791](pypa/pip#10791) &lt;https://github.com/pypa/pip/issues/10791&gt;</code>_)</li>
</ul>
<h2>Bug Fixes</h2>
<ul>
<li>
<p>Only calculate topological installation order, for packages that are going to be installed/upgraded.</p>
<p>This fixes an <code>AssertionError</code> that occured when determining installation order, for a very specific combination of upgrading-already-installed-package + change of dependencies + fetching some packages from a package index. This combination was especially common in Read the Docs' builds. (<code>[#10851](pypa/pip#10851) &lt;https://github.com/pypa/pip/issues/10851&gt;</code>_)</p>
</li>
<li>
<p>Use <code>html.parser</code> by default, instead of falling back to <code>html5lib</code> when <code>--use-deprecated=html5lib</code> is not passed. (<code>[#10869](pypa/pip#10869) &lt;https://github.com/pypa/pip/issues/10869&gt;</code>_)</p>
</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Clarify that using per-requirement overrides disables the usage of wheels. (<code>[#9674](pypa/pip#9674) &lt;https://github.com/pypa/pip/issues/9674&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/pypa/pip/commit/29ddc93ee8f33f4e1c7402a16b32bdf5b61c7369"><code>29ddc93</code></a> Bump for release</li>
<li><a href="https://github.com/pypa/pip/commit/c14946891cf4b6adb6c33a18c65dc4627fc80aa4"><code>c149468</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10943">#10943</a> from pradyunsg/downgrade-distlib</li>
<li><a href="https://github.com/pypa/pip/commit/6c17e27772f8e3d5d4e8732dc8c9e6e516a6575e"><code>6c17e27</code></a> Drop the doctype check (<a href="https://github-redirect.dependabot.com/pypa/pip/issues/10906">#10906</a>)</li>
<li><a href="https://github.com/pypa/pip/commit/44018de50cafba25445a225c1a1986d6312e1ef3"><code>44018de</code></a> Bump for release</li>
<li><a href="https://github.com/pypa/pip/commit/65f096c432d60d5f0214793becd592e1c1c3b624"><code>65f096c</code></a> Update AUTHORS.txt</li>
<li><a href="https://github.com/pypa/pip/commit/7d50964bcb1b25f9fe2c49fe447ab58aad2b4247"><code>7d50964</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10876">#10876</a> from mbacchi/vcs_support_typo</li>
<li><a href="https://github.com/pypa/pip/commit/ff8dbb458a59905c5462d339a63536257aad497a"><code>ff8dbb4</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10867">#10867</a> from mauritsvanrees/maurits-topoligical-weights-req...</li>
<li><a href="https://github.com/pypa/pip/commit/b3f5cad73241e25a25ce7d50eb9175dbafcfd8db"><code>b3f5cad</code></a> Update news/10851.bugfix.rst</li>
<li><a href="https://github.com/pypa/pip/commit/cf4655f474cb8a04fa6b274ee0edaf774546a79b"><code>cf4655f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10869">#10869</a> from pradyunsg/put-html5lib-behind-flag</li>
<li><a href="https://github.com/pypa/pip/commit/3608b42ef0ab39a2d50335356644f8f3464f651a"><code>3608b42</code></a> Fix minor typo in vcs support doc</li>
<li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/22.0.2...22.0.4">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=22.0.2&new-version=22.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>
inmantaci pushed a commit to inmanta/inmanta-core that referenced this issue Mar 28, 2022
Bumps [pip](https://github.com/pypa/pip) from 22.0.3 to 22.0.4.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p>
<blockquote>
<h1>22.0.4 (2022-03-06)</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li>Drop the doctype check, that presented a warning for index pages that use non-compliant HTML 5. (<code>[#10903](pypa/pip#10903) &lt;https://github.com/pypa/pip/issues/10903&gt;</code>_)</li>
</ul>
<h2>Vendored Libraries</h2>
<ul>
<li>Downgrade distlib to 0.3.3.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/pypa/pip/commit/29ddc93ee8f33f4e1c7402a16b32bdf5b61c7369"><code>29ddc93</code></a> Bump for release</li>
<li><a href="https://github.com/pypa/pip/commit/c14946891cf4b6adb6c33a18c65dc4627fc80aa4"><code>c149468</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/10943">#10943</a> from pradyunsg/downgrade-distlib</li>
<li><a href="https://github.com/pypa/pip/commit/6c17e27772f8e3d5d4e8732dc8c9e6e516a6575e"><code>6c17e27</code></a> Drop the doctype check (<a href="https://github-redirect.dependabot.com/pypa/pip/issues/10906">#10906</a>)</li>
<li>See full diff in <a href="https://github.com/pypa/pip/compare/22.0.3...22.0.4">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=22.0.3&new-version=22.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants