Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/blog: use relative /doc links to avoid link checker errors #2222

Closed
iesahin opened this issue Feb 20, 2021 · 12 comments
Closed

docs/blog: use relative /doc links to avoid link checker errors #2222

iesahin opened this issue Feb 20, 2021 · 12 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: blog TEMPORARY Content of /blog good first issue Good for newcomers type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@iesahin
Copy link
Contributor

iesahin commented Feb 20, 2021

UPDATE: Jump to #2222 (comment)

There are links in the repository that could be written as local links.

These may fail in link-check when a URL updated in the site. e.g. When we update https://dvc.org/doc/a to https://dvc.org/doc/b, https://dvc.og/doc/b is not found by the link checker, because it's not deployed yet. If these links were in the form of /a and /b, the link checker would find them in the test deployment.

@jorgeorpinel
Copy link
Contributor

Can you clarify which links @iesahin ? I don't think any existing links to dvc.org/... fail in the CI checks. But if so this should probably be part of #1838 instead of a new issue, please specify which ones over there for now. We can reopen this if I'm not getting it correctly. Thanks

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 22, 2021

UPDATE: I see there's 3 such links in the whole docs. Again, I doubt it's related to the link checker issues (so no need to follow up there). I'm reopening this but a quick PR in these cases would probably be faster.

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels Feb 22, 2021
@jorgeorpinel jorgeorpinel reopened this Feb 22, 2021
@iesahin
Copy link
Contributor Author

iesahin commented Feb 22, 2021

Most (maybe all) of them are in the blog. e.g. this

Searching https?://dvc gives me about 280 results and some of these are legit, in README.md for example. Normally these pose no problem but when we change some doc URLs like in #2096 we get broken link errors for these. They are searched in dvc.org instead of test deployment.

I can fix all of them at once, keeping the legit ones in Github Readme and Contributing docs intact and changing others but I would like to know how you and @shcheklein view to this. @jorgeorpinel

@jorgeorpinel
Copy link
Contributor

I meant 3 relevant instances i.e. outside of the blog. No need to update old blogs for this.

Searching https?://dvc gives me about 280 results

Idk where you're looking. I see 8. Maybe you're searching .js files or the public/ dir? Please limit your seach to content/docs:

image

@shcheklein
Copy link
Member

I meant 3 relevant instances i.e. outside of the blog. No need to update old blogs for this.

we do need to update links everywhere, we run a link check using cron and it detects broken links in docs and in blog. Unless I didn't get your point @jorgeorpinel :)

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 22, 2021

Can someone share the failing check? I don't understand this issue. Old blog links never change so how is that a problem in the check? Even if the links are relative if that page goes away the link check will fail anyway right? (Should be changed when the source pages change unless a redirect is introduced) Maybe I'm not getting this...

@iesahin
Copy link
Contributor Author

iesahin commented Feb 22, 2021

I'm searching the whole repository, including the blog. An example of failure is #2214 that changes .../data-access to .../data-and-model-access in the docs but 3 links in the blog fail because of this. They are false positives for link failure. When we deploy new links and redirects to https://dvc.org, they will be gone but they cannot be found in test deployments. (e.g. htps.//test-deployment-for-pr.herokuapp.com/blog/page-a/ has links to dvc.org/doc/start/data-and-model-access but those are not there. Actually those links should be https://test-deployment-for-pr.herokuapp.com/doc/start/data-and-model-access)

Replacing makes link checks in test deployments to be more robust I think.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 23, 2021

I see! Thanks for the example. Maybe an easier approach be to update those 3 blog links in #2214, and so on (do them as they come). But I guess changing all of them in a PR is also OK, may just be tedious to do and review. But yes, I have no problem with that 👍 So I looked again in all of content/ and my results now are:

  • 17 results of (http://dvc.org — should always be https. Questions:
    • We can leave the URLs without a path, right? I.e. https://dvc.org/
    • Several are missing the trailing /.
  • 6 results of (https://dvc.org/) — same top question
  • 185 results of \(https://dvc.org/..*\) (regex) — change all to relative (main issue here). Question:
    • Except dvc.org/chat, dvc.org/support, etc.? or should they also be relative (e.g. just /chat) cc @shcheklein ?

@iesahin
Copy link
Contributor Author

iesahin commented Feb 23, 2021

I'll fix in #2214 and will have separate PR's for the blog and docs about this. I think it's better to keep /chat or /support for now.

Some of the /doc/page#section links are also broken. Section names have changed but their old names are there in the links and AFAICG these are not provided in the redirects. Link checker doesn't report them because it's not 404 but those links just return the whole page. (I remember I fixed some of these but can't find an example right now.)

I think writing a small utility to check links with regexes to /doc/... and /blog/... with yarn is feasible. I need to learn a bit node.js for this and may not be the best person around, but seems doable. Or I may need to check the link-checker code to see if we can have some sort of checks for the sections as well.

Thanks.

@shcheklein
Copy link
Member

Some of the /doc/page#section links are also broken.

yep, it's a separate ticket to improve our link check to detect those. But let's not focus on this for now please :)

I think writing a small utility to check links with regexes to /doc/... and /blog/... with yarn is feasible.

to make them all relative? I think it can be another feature of the link checker indeed. I would also not focus on this too much. Link checker itself has way more important things to be fixed in the #1838

@iesahin
Copy link
Contributor Author

iesahin commented Feb 24, 2021

I have fixed the broken links in #2214.

Shall I continue updating the links?

BTW I read the Github action logs and it looks the whole repository link check takes more than 6 hours. It's ~21.000 seconds and it shouldn't take that long to check all the links I think. Can there be another problem with the link-checker? @shcheklein

@jorgeorpinel jorgeorpinel added the good first issue Good for newcomers label Mar 29, 2021
@jorgeorpinel
Copy link
Contributor

Shall I continue updating the links?

Updating some of the instanced mentioned in #2222 (comment) would be nice @iesahin, but it doesn't seem like there's a big issue to worry about, so up to you.

@jorgeorpinel jorgeorpinel changed the title Fix hardcoded links in the form of dvc.org/doc to links /doc to avoid some link check errors docs: change links from dvc.org/doc to /doc to avoid link check errors Sep 27, 2021
@jorgeorpinel jorgeorpinel changed the title docs: change links from dvc.org/doc to /doc to avoid link check errors docs: use relative /doc links to avoid link check errors Sep 27, 2021
@jorgeorpinel jorgeorpinel changed the title docs: use relative /doc links to avoid link check errors docs/blog: use relative /doc links to avoid link check errors Sep 27, 2021
@jorgeorpinel jorgeorpinel changed the title docs/blog: use relative /doc links to avoid link check errors docs/blog: use relative /doc links to avoid link checker errors Sep 27, 2021
@iesahin iesahin added the C: blog TEMPORARY Content of /blog label Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: blog TEMPORARY Content of /blog good first issue Good for newcomers type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

3 participants