Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link check: CI verision does not work properly #1148

Closed
shcheklein opened this issue Apr 15, 2020 · 16 comments · Fixed by #1169
Closed

link check: CI verision does not work properly #1148

shcheklein opened this issue Apr 15, 2020 · 16 comments · Fixed by #1169
Assignees
Labels
A: website Area: website good first issue Good for newcomers help wanted Contributors especially welcome p1-important Active priorities to deal within next sprints 🐛 type: bug Something isn't working.

Comments

@shcheklein
Copy link
Member

@shcheklein shcheklein added 🐛 type: bug Something isn't working. A: website Area: website good first issue Good for newcomers help wanted Contributors especially welcome p1-important Active priorities to deal within next sprints labels Apr 15, 2020
@shcheklein
Copy link
Member Author

@casperdcl

I think a few things are wrong with these scripts:

changed="$differ --name-only :^redirects-list.json" 

should it be

changed=$("$differ --name-only :^redirects-list.json" )
  1. Here:
$(dirname "$0")/link-check.sh <($differ -U0 -- "$file" | grep '^\+')

This a nice stuff, but I think it creates a temp fd that you can read only once - means that only the first link from the diff is detected.

Not sure if there are any other problems ...

@shcheklein
Copy link
Member Author

Also,

sed -nE 's/.*href=["'"'"'](\/[^"'"'"']+?)["'"'"'].*/\1/p' "$@" | xargs -n1 -II echo ${base_url}I

sed on Mac does not like +? - what was your intention here?

@casperdcl
Copy link
Contributor

casperdcl commented Apr 16, 2020

changed="$differ --name-only :^redirects-list.json" 

should it be

changed=$("$differ --name-only :^redirects-list.json" )

Absolutely not. No idea what this is. Definitely not a valid command. Accidental garbage committed in 847d4ad ? @pavelgrinchenko did you mean to do this? The original line was simply: changed="$($differ --name-only)"

Were you trying to exclude the redirects-list.json file? (In that case you should use the same method used to exclude the exclude-links.txt file.)

$(dirname "$0")/link-check.sh <($differ -U0 -- "$file" | grep '^\+')

This a nice stuff, but I think it creates a temp fd that you can read only once - means that only the first link from the diff is detected.

Yes, I was very aware of this and opted for lazy programming since it's only ever an issue with running on git-diffs (a bit of extra pain for devs to fix one link at a time if there are multiple broken links in a single diff. I'd query "Y IZ MOAARR BODEN LINKZ IN UR PR PLZ? HAZ PAIN FIX PLZ"). I can definitely fix this though.

@casperdcl casperdcl self-assigned this Apr 16, 2020
@shcheklein
Copy link
Member Author

Absolutely not. No idea what this is. Definitely not a valid command.

gotcha

Were you trying to exclude the redirects-list.json file?

yes, it looks like so

In that case you should use the same method used to exclude the exclude-links.txt

for redirects probably it's better to make an exception and exclude it in the script?

Yes, I was very aware of this and opted for lazy programming since it's only ever an issue with running on git-diffs

got it, as well ... if it is too much effort - let's keep it as is

@casperdcl
Copy link
Contributor

casperdcl commented Apr 16, 2020

sed on Mac does not like +? - what was your intention here?

Similar to #1123 (comment), this was never intended to be run locally, and certainly not on a mac.

Could probably replace +? with *?. The intention is to convert these two cases:

  • href="/relative-url" => https://dvc.org/relative-url
  • href='/relative-url' => https://dvc.org/relative-url

Using *? would mean:

  • href="/" => https://dvc.org/

rather than (currently, +?):

  • href="/" => (i.e. not tested)

I don't have particularly strong opinions on this.

@casperdcl
Copy link
Contributor

for redirects probably it's better to make an exception and exclude it in the script?

Yes I think we're both talking about the same thing. To be clear:

  • exclude-links.txt defines links to skip checking
  • the whole exclude-links.txt file itself is never checked
  • I'm suggesting also having the whole redirects-list.json file itself never checked

@shcheklein
Copy link
Member Author

I'm suggesting also having the whole redirects-list.json file itself never checked

yep, that would be great

@shcheklein
Copy link
Member Author

I don't have particularly strong opinions on this.

yep, same. But it looks it is the only thing (if pcre is installed and it's quite easy) that prevents this script from running on Mac (granted you didn't have intention to support it locally, but even for devs to debug the script itself you have to run it at least somehow ).

Also, forgive me my lack of regex knowledge, but why can't we just use + there (without making it lazy)?

@shcheklein
Copy link
Member Author

Btw, I think it would be great for exclude-list to support path to a file to exclude. Again, if it's not that much effort. It's not nice to hard code some file within the script itself.

@pavelgrinchenko
Copy link
Contributor

Were you trying to exclude the redirects-list.json file?

Yes. Why it works: https://stackoverflow.com/questions/5685007/making-git-log-ignore-changes-for-certain-paths/21079437#21079437
But you definitely can handle it in another way if you think it would be better

@casperdcl
Copy link
Contributor

casperdcl commented Apr 18, 2020

yes, it's

  1. not actually being executed (missing $())
  2. missing a crucial . --
  3. not supported before git 2.13 (so won't run on ubuntu 16.04 even after apt-get install git)
  4. fails if the exclusion pattern makes no difference.
$ git diff master --name-only  # correct, but no exclusion
scripts/link-check-git-diff.sh
$ git diff master --name-only ':^redirects-list.json'  # incorrect (current)
fatal: ambiguous argument ':^redirects-list.json': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
$ git diff master --name-only -- ':^redirects-list.json'  # requires git>=2.13
$ git diff master --name-only -- ':!redirects-list.json'  # still fails
fatal: There is nothing to exclude from by :(exclude) patterns.
Perhaps you forgot to add either ':/' or '.' ?
$ git diff master --name-only -- . ':!redirects-list.json'  # correct
scripts/link-check-git-diff.sh

Also probably we should have a file-exclusion list as well as a link-exclusion list (see #1000). Will fix.

@casperdcl
Copy link
Contributor

casperdcl commented Apr 18, 2020

Also, forgive me my lack of regex knowledge, but why can't we just use + there (without making it lazy)?

from the original #958, it's called recursive/balancing groups regex. Technically only .NET has full balancing groups support though some things (including PCRE) support recursion. There's been recent work on https://pypi.org/project/regex/ which means pip install regex may potentially work, but this would still 1. require python and 2. potentially not work (the last time I tried the regex lib it didn't have the features required).

@shcheklein
Copy link
Member Author

@casperdcl

echo 'href="/"'| sed -nE 's/.*href=["'"'"'](\/[^"'"'"']+)["'"'"'].*/\1/p'

outputs nothing (not tested)

echo 'href="/relative-url"'| sed -nE 's/.*href=["'"'"'](\/[^"'"'"']+)["'"'"'].*/\1/p'

outputs:

/relative-url

If that was the intention, what is the case we need the +? for? That's what I'm still missing probably.

@casperdcl
Copy link
Contributor

casperdcl commented Apr 18, 2020

oh I thought you were talking about +? => *?.

If you mean dropping the ?, that's fine, it makes no difference. Would you still need to replace + => * for macs, though?

@shcheklein
Copy link
Member Author

+? -> + works for me on Mac

@casperdcl
Copy link
Contributor

curious, ok. I've seen reports in the past where mac users needed * => +. Anyway will fix.

casperdcl added a commit that referenced this issue Apr 19, 2020
casperdcl added a commit that referenced this issue Apr 25, 2020
shcheklein pushed a commit that referenced this issue Apr 26, 2020
* test: fix link-check diffs

Fixes #1148

* test: link-check: fix file paths, add comments

* test: link-check: use git pathspec exclusions

* test: link-check: more path safety and comments

* test: link-check: misc tidy

* test: link-check: fix git diff multi-errors

* test: link-check: fix mac sed

* test: link-check: diff: whitelist rather than blacklist

As with `link-check-git-all.sh`, only include `md` & `js`
rather than include all except specified files

* link-check: add more file extensions

* link-check: add and sort exclusions

* link-check: add *.css

* link-check: re-exclude redirects-list.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: website Area: website good first issue Good for newcomers help wanted Contributors especially welcome p1-important Active priorities to deal within next sprints 🐛 type: bug Something isn't working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants