Allow option to ignore redirects warnings #54

dereknola · 2024-10-02T17:55:55Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Problem

There are many redirects that are considered valid for our website https://github.com/k3s-io/docs.
With the bump to 2.0.0 we are now seeing warning such as:

30:3-30:80      warning Unexpected redirecting URL `https://etcd.io/docs/latest/op-guide/recovery/`, expected final URL `https://etcd.io/docs/v3.5/op-guide/recovery/`

It is very common for /stable/ or /latest/ urls to auto redirect to the version in question.

See https://github.com/k3s-io/docs/actions/runs/11113288516 for all the specific errors we ran into suddenly.

Solution

Add an option to ignore/skip the check on whether a url has been redirected. I don't care that the URL redirects, I only care that the URL hits a 404 or does not exist.

Alternatives

Currently we have simply started running this plugin without --frail and will manually check whether or not some URLs are actually 404/dead.

The text was updated successfully, but these errors were encountered:

wooorm · 2024-10-02T18:20:26Z

Hmm, I can see that some redirects are useful indeed. Though, I am sure some are to be avoided too? Could there be a regex pattern system for redirects?

wooorm · 2024-10-03T08:26:37Z

Also, could different status codes be used to differentiate?

jrfnl · 2024-10-07T03:54:43Z

This is really a problematic change as it basically prohibits canonical links which always redirect to the latest version of something.

wooorm · 2024-10-07T09:20:47Z

Can you read the above comments and add something useful to the conversion... You likely don’t want some redirects. You likely want to allow some redirects. How could we make that distinction?

jrfnl · 2024-10-07T13:32:24Z

Can you read the above comments and add something useful to the conversion...

I can and did. Can you read the README and CHANGELOG ?

Nowhere in the documentation of this project does it mention anything about forbidding redirection of URLs.

Now, I get that redirects should only be followed up to a point (2, maybe 3 redirects), but forbidding/failing on any redirect is an undocumented change in behaviour (not mentioned in the README or changelog) and should therefor be considered a bug.

It is also outside the scope of this plugin. People use this package for what it is says it does: check for dead URLs. URLs which redirect to a valid URL are not dead.

You likely don’t want some redirects. You likely want to allow some redirects. How could we make that distinction?

You can't, but you shouldn't have to. The only distinction you need to make if it a URL redirects to a 4## or 5## code or gets into a redirect loop. Anything else should not be considered problematic.

wooorm · 2024-10-07T14:27:50Z

ok well it’s an intended major change to improve docs — here’s your money back; feel free to use other things.

In the previous comments I was discussing which how more useful features can be added

You can't

I think it might be possible! there’s differences between 301 and 302 and 303 and 307 and 308.

jrfnl · 2024-10-07T15:23:20Z

here’s your money back; feel free to use other things.

Seriously ? Look I'm a maintainer too and I feel your pain, but these type of remarks are not helpful and only go to alienate people who do want to help (and understand how OS works).

I use this package in an action, often run on a weekly cronjob, to ensure that URLs in the README, CHANGELOG and other docs are not broken when end-users browse them.

I've had to make emergency fixes to a dozen or so projects over the last few days because of this undocumented change as otherwise contributors to my projects would be blocked by failing builds.

However, as canonical links are now "forbidden", this means these "fixed" links will soon go out of date and the docs of my projects will point to outdated information, which I then have to fix again.

Now, I realize that that is not your concern, but I just want to give you some perspective of the consequences of this bug.

In the previous comments I was discussing which how more useful features can be added

You can't

I think it might be possible! there’s differences between 301 and 302 and 303 and 307 and 308.

Well, that presumes sites actually send the correct HTTP code ;-) Then again, I've seen 404 pages being served with a 200 status, so I suppose that's something we shouldn't even consider.

I think limiting to the 301 and 308 statuses ("Moved Permanently"/"Permanent Redirect") may already be an improvement, but a 308 might still flag canonical links.

Having said that, I would still prefer an option to turn the warning on redirection off completely, like the OP suggested above.

Add an option to ignore/skip the check on whether a url has been redirected. I don't care that the URL redirects, I only care that the URL hits a 404 or does not exist.

wooorm · 2024-10-07T15:45:13Z

Yes, seriously. You should have used versioning: this was in a major release. I understand that the Python ecosystem has different traditions. But you should know that about JS too. This is between an abandoned v1 up to a maintained rewritten v2 with breaking changes.

Users are good at raising problems. I don‘t think they are always good at coming up with solutions. I think I am in a better position, as a maintainer, for that.
To come up with a good solution, I would like to hear more about your situation.
I also believe in first investigating several possible solutions, and weighing trade offs. See also https://whatwg.org/faq#adding-new-features.

If you see OPs log, you will see that they indeed use several intentionally redirecting URLs. But there are also redirecting URLs that can be improved.

I would appreciate it if you can share what your inputs are. What the actual current output is. And what your expected output is. Perhaps you can even look at which status codes you are getting. Thanks

jrfnl · 2024-10-07T17:15:35Z

Yes, seriously. You should have used versioning: this was in a major release.

That's a fair point, if it wasn't for predefined action runners and Docker containers including this package and me not having any control over those.

And even if I changed that setup to directly request the packages from within the action workflows (which I have done for some), Dependabot does not keep those up to date, which makes that a maintenance nightmare in the making.

I understand that the Python ecosystem has different traditions. But you should know that about JS too.

Interesting about the Python ecosystem. I have no idea about that at all. In the PHP world, using versioning is common and straight-forward.

The problem with versioning in the JS world - IMO - is that it appears to be a rule that a package can't be released without at least a 1000 dependencies, leading to regular dependency version conflicts, also known as "dependency hell".

Please note: this is not critism of this package, this is just a generic observation on JS/Node and dependencies.

This is between an abandoned v1 up to a maintained rewritten v2 with breaking changes.

Well, thank you for that. I had no idea the package was abandoned and can only compliment you on taking it on.

Regarding breaking changes: it would be great if the consequences of those were annotated in a human understandable way in the changelog/release.

Users are good at raising problems. I don‘t think they are always good at coming up with solutions. I think I am in a better position, as a maintainer, for that. To come up with a good solution, I would like to hear more about your situation. I also believe in first investigating several possible solutions, and weighing trade offs. See also https://whatwg.org/faq#adding-new-features.

If you see OPs log, you will see that they indeed use several intentionally redirecting URLs. But there are also redirecting URLs that can be improved.

I would appreciate it if you can share what your inputs are. What the actual current output is. And what your expected output is. Perhaps you can even look at which status codes you are getting. Thanks

Happy to. Would you like me to share some links to failing builds ? Or would you prefer examples ?

To start, here are some examples of what I would say are redirections which should be left alone:

Canonical	Redirects to	HTTP Status [*]
https://keepachangelog.com/	https://keepachangelog.com/en/1.1.0/	200 (with `<meta http-equiv=refresh...>` in the HTML header causing the redirect)
https://coveralls.io/repos/github/PHPCSStandards/PHPCSExtra/badge.svg	https://s3.amazonaws.com/assets.coveralls.io/badges/coveralls_100.svg	302
http://getcomposer.org/	https://getcomposer.org/	302
https://github.com/Yoast/PHPUnit-Polyfills/issues/new/choose	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FYoast%2FPHPUnit-Polyfills%2Fissues%2Fnew%2Fchoose	302

Here are some which can be argued should be reported (and are):

Requested URL	Redirects to	HTTP Status [*]
https://eslint.org/docs/rules/no-lonely-if	https://eslint.org/docs/latest/rules/no-lonely-if	301
http://semver.org/	https://semver.org/	301
https://img.shields.io/packagist/php-v/phpcsstandards/phpcsextra.svg	https://img.shields.io/packagist/dependency-v/phpcsstandards/phpcsextra/php.svg	301 (though to be honest, I'm not sure I want to care about the shields.io endpoint being changed when it still works via the redirect)

*: Status codes determined via a Curl request to the original URLs.

I'm also seeing anchors being reported as missing when the target website uses href="#anchor" type links with anchors defined in data- attributes (and the site handles those correctly).

Example:

9:1-9:177    warning Unexpected dead URL `https://packagist.org/packages/phpcsstandards/phpcsextra#dev-develop`, expected live URL                                                                                                                   no-dead-urls   remark-lint
  [cause]:
             error   Unexpected missing anchor element on `https://packagist.org/packages/phpcsstandards/phpcsextra` for fragment `dev-develop`, remove if unneeded or refer to an existing element

The https://packagist.org/packages/phpcsstandards/phpcsextra#dev-develop URL works perfectly fine and is based on the following HTML code in the target page:

<li class="details-toggler version" data-version-id="dev-develop" data-load-more="/versions/3558491.json">
                <a href="#dev-develop" class="version-number">dev-develop                        / 1.x-dev
                    </a>

Other errors I'm seeing intermittently which may be related:

Any links to https://poser.pugx.org/ (which are working perfectly fine) are marked as dead with an error along the lines of:

5:2-5:85      warning Unexpected dead URL `https://poser.pugx.org/phpcsstandards/phpcsutils/v/stable`, expected live URL                                                                                                                                                                                 no-dead-urls   remark-lint
  [cause]:
              error   Unexpected error fetching `https://poser.pugx.org/phpcsstandards/phpcsutils/v/stable`                                                                                                                                                                                              fetch          dead-or-alive
  [cause]:
    AbortError: This operation was aborted
    at fetch (/home/runner/work/PHPCSUtils/PHPCSUtils/node_modules/undici/index.js:112:13)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:545:9)
    at process.processTimers (node:internal/timers:519:7)
    at async deadOrAliveInternal (file:///home/runner/work/PHPCSUtils/PHPCSUtils/node_modules/dead-or-alive/lib/index.js:206:16)
    at async deadOrAlive (file:///home/runner/work/PHPCSUtils/PHPCSUtils/node_modules/dead-or-alive/lib/index.js:124:17)
    at async file:///home/runner/work/PHPCSUtils/PHPCSUtils/node_modules/remark-lint-no-dead-urls/lib/index.js:152:22
    at async Promise.all (index 0)
    at async rule (file:///home/runner/work/PHPCSUtils/PHPCSUtils/node_modules/remark-lint-no-dead-urls/lib/index.js:148:3)

Hope this helps.

wooorm · 2024-10-08T09:00:24Z

I thought you were doing python but it’s php, ok!

I think you are wrong about JavaScript; take a look at the 100s of projects that are being maintained by us. Or see https://github.com/wooorm/npm-high-impact/blob/main/lib/top.js. They don’t have 1000s of dependencies. I also think that what you call “dependency hell” is the reason JavaScript is more popular than PHP/Ruby/etc. And, the main problems that the JS ecosystem has — and it has many — I ascribe to being so popular.

If docker makes it hard to version things: don’t use docker. You can npm install remark-cli@9 remark-lint-no-dead-urls@1.

I'm also seeing anchors being reported as missing when the target website uses

Pass checkAnchor: false. Or pass anchorAllowlist. Or get the website to support user-content- prefixes on ids instead of data attributes (see resolveClobberPrefix) https://github.com/wooorm/dead-or-alive#options

Other errors I'm seeing intermittently which may be related:

It may be. Or see the options https://github.com/wooorm/dead-or-alive#options. If the website is slow, perhaps tweak the timeout option. Or ignore it if they are flakey

wooorm · 2024-10-08T09:11:33Z

*: Status codes determined via a Curl request to the original URLs.

Thanks for these! So this looks like we can differentiate between 301 and 302!
302 is acceptable to keep in docs. 301 should be changed.

As for your note on shield.io: I kinda get it. But every http request also takes time for your users. HTTP on 301:

The server is suggesting that a user agent with link-editing capability can permanently replace references to the target URI with one of the new references sent by the server. However, this suggestion is usually ignored unless the user agent is actively editing references (e.g., engaged in authoring content) […]

https://datatracker.ietf.org/doc/html/rfc9110#section-15.4.2

wooorm · 2024-10-08T09:25:55Z

http -> https for getcomposer.org is the same as for semver.org—both a good idea I think—but the former uses 302 and the latter 301. Sad!

wooorm · 2024-10-08T09:48:23Z

OK, I’ll publish a fix in a second. Where now this tool warns for each link in the following document:

# URLs

## Fine

[a](https://keepachangelog.com/)

[b](https://coveralls.io/repos/github/PHPCSStandards/PHPCSExtra/badge.svg)

[c](http://getcomposer.org/)

[e](https://github.com/Yoast/PHPUnit-Polyfills/issues/new/choose)

## Changable

[f](https://eslint.org/docs/rules/no-lonely-if)

[g](http://semver.org/)

[h](https://img.shields.io/packagist/php-v/phpcsstandards/phpcsextra.svg)

It will then warn for the last 3 URLs, as they are permanently redirected (solely 301 or 308s). This tool then proposes:

# URLs

## Fine

[a](https://keepachangelog.com/)

[b](https://coveralls.io/repos/github/PHPCSStandards/PHPCSExtra/badge.svg)

[c](http://getcomposer.org/)

[e](https://github.com/Yoast/PHPUnit-Polyfills/issues/new/choose)

## Changable

[f](https://eslint.org/docs/latest/rules/no-lonely-if)

[g](https://semver.org/)

[h](https://img.shields.io/packagist/dependency-v/phpcsstandards/phpcsextra/php.svg?)

Which then works without warnings

jrfnl · 2024-10-08T14:22:34Z

@wooorm

You can `npm install remark-cli@9 remark-lint-no-dead-urls@1

Will start using that more diligently in the repos where I converted the workflow away from Docker already - in combination with watching releases of all those repos as I won't be able to rely on Dependabot. Thanks.

Pass checkAnchor: false. Or pass anchorAllowlist.

Thanks for that suggestion. I think I'll go with the anchorAllowlist for now. I'll have a play to see if I can figure out how to configure that. (sorry, yaml for js configs is often still a mystery to me)

Or get the website to support user-content- prefixes on ids instead of data attributes (see resolveClobberPrefix) https://github.com/wooorm/dead-or-alive#options

http -> https for getcomposer.org is the same as for semver.org—both a good idea I think—but the former uses 302 and the latter 301. Sad!

In both these cases, it's the same team behind them, so I'll open some issues for them and see if I can convince them to update both. Might take a while, but still worth a try ;-)

It may be. Or see the options https://github.com/wooorm/dead-or-alive#options. If the website is slow, perhaps tweak the timeout option. Or ignore it if they are flakey

Yes, currently ignoring. Retriggering the workflow normally fixes it. Was just weird as I can't remember seeing that one before 2.0.

OK, I’ll publish a fix in a second. Where now this tool warns for each link in the following document:
...
It will then warn for the last 3 URLs, as they are permanently redirected (solely 301 or 308s). This tool then proposes:
...
Which then works without warnings

That's awesome! Thank you so much for listening and addressing this!

Related-to: remarkjs/remark-lint-no-dead-urls#54.

Related-to: GH-54.

wooorm · 2024-10-08T15:37:30Z

OK, can you both please check out https://github.com/remarkjs/remark-lint-no-dead-urls/releases/tag/2.0.1? Non-permanent redirects are now fine. In both your logs there’s a ton of things that I believe should be improved in the docs (or, sometimes, on the servers). Is this new behavior sufficient? Is something else needed? Is an option to allow permanent redirects too still needed (if so: please provide strong arguments). Thanks!

jrfnl · 2024-10-08T16:00:16Z

@wooorm Currently on my way to a conference, so probably won't get to this until I'm back. I've made a note to revert the "fix" commits I made earlier after the 2.0 release and to re-evaluate the run results.
Will let you know how I get on.

wooorm · 2024-10-10T12:31:21Z

OK I’ll close this as I think it‘s done. But open to further discussion!

jrfnl · 2024-10-13T23:48:55Z

@wooorm Reporting back, looking good so far, but I didn't manage to get the anchorAllowlist working. Might be good to add some example code to the Dead-or-Alive README to show what the format should look like (for the non-json experts amongst us).

The below did not work for me:

    [
        "remark-lint-no-dead-urls",
        {
            "deadOrAliveOptions": {
                "anchorAllowlist": [
                    ["https://packagist.org/packages/phpcsstandards/phpcsextra", "dev-develop"]
                ],
                "maxRetries": 3
            }
        }
    ],

wooorm · 2024-10-14T12:33:36Z

Right!

Regexes are expected. JSON and YAML don’t support regular expressions. In JS it would be something like:

anchorAllowlist: [
  [/^https:\/\/packagist\.org\/packages\/phpcsstandards\/phpcsextra\/$/i, /^dev-develop$/i]
]

…but the point of using regexes is that you can use expressions to match different things instead of single, literal, values

dereknola · 2024-10-14T17:12:25Z

Thanks for fixing this so quickly. You are correct our website did have a bunch of links that actually needed to be fixed. Now on to figure out why www.mysql.com returns a 403 🤔

ChristianMurphy · 2024-10-14T17:21:57Z

Curl might offer some pointers curl -v -L follows redirects and prints all the statuses along the way

for your example

curl -v -L www.mysql.com

I don't see a 403, but I do see an 301 MOVED PERMANENTLY upgrading from HTTP to HTTPS.

wooorm · 2024-10-15T08:48:37Z

I get a 403 as well. Some garbage old HTML with:

            <td width="68%" valign="top">
            <div align="center">
            <p align="justify"><font face="Arial, Helvetica, sans-serif">This site http://www.mysql.com/ </font><font face="Arial, Helvetica, sans-serif"> is experiencing technical difficulty. We are aware of the issue and are working as quick as possible to correct the issue. <br />
            <br />
            We apologize for any        inconvenience this may have caused. <br />
            <br />
            To speak with an Oracle sales representative: 1.800.ORACLE1.<br />
            <br />
            To contact Oracle Corporate Headquarters from anywhere in the world: 1.650.506.7000.<br />
            <br />
            To get technical support in the United States: 1.800.633.0738. </font><br />
            &nbsp;</p>
            </div>

Well. that’s the web 🤷‍♂️ Maybe contact an Oracle rep 😉

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 2, 2024

wooorm added a commit to wooorm/dead-or-alive that referenced this issue Oct 8, 2024

Add permanent field for permanents redirects

5e29db9

Related-to: remarkjs/remark-lint-no-dead-urls#54.

wooorm added a commit that referenced this issue Oct 8, 2024

Fix to ignore non-permanent redirects

044382d

Related-to: GH-54.

wooorm closed this as completed Oct 10, 2024

wooorm added the 💪 phase/solved Post is done label Oct 10, 2024

github-actions bot removed the 🤞 phase/open Post is being triaged manually label Oct 10, 2024

This was referenced Oct 13, 2024

Partially revert "README/Changelog: fix a few URLs which have changed" Yoast/PHPUnit-Polyfills#207

Merged

Partially revert PR #621 PHPCSStandards/PHP_CodeSniffer#629

Merged

This was referenced Oct 13, 2024

Partially revert "README/Changelog: fix a few URLs which have changed" PHPCSStandards/PHPCSUtils#631

Merged

Partially revert "README/Changelog: fix a few URLs which have changed" PHPCSStandards/PHPCSExtra#319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow option to ignore redirects warnings #54

Allow option to ignore redirects warnings #54

dereknola commented Oct 2, 2024

wooorm commented Oct 2, 2024

wooorm commented Oct 3, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024 •

edited

Loading

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

jrfnl commented Oct 8, 2024

wooorm commented Oct 8, 2024

jrfnl commented Oct 8, 2024

wooorm commented Oct 10, 2024

jrfnl commented Oct 13, 2024

wooorm commented Oct 14, 2024

dereknola commented Oct 14, 2024

ChristianMurphy commented Oct 14, 2024

wooorm commented Oct 15, 2024 •

edited

Loading

Allow option to ignore redirects warnings #54

Allow option to ignore redirects warnings #54

Comments

dereknola commented Oct 2, 2024

Initial checklist

Problem

Solution

Alternatives

wooorm commented Oct 2, 2024

wooorm commented Oct 3, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024 • edited Loading

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 7, 2024

jrfnl commented Oct 7, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

wooorm commented Oct 8, 2024

jrfnl commented Oct 8, 2024

wooorm commented Oct 8, 2024

jrfnl commented Oct 8, 2024

wooorm commented Oct 10, 2024

jrfnl commented Oct 13, 2024

wooorm commented Oct 14, 2024

dereknola commented Oct 14, 2024

ChristianMurphy commented Oct 14, 2024

wooorm commented Oct 15, 2024 • edited Loading

wooorm commented Oct 7, 2024 •

edited

Loading

wooorm commented Oct 15, 2024 •

edited

Loading