Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 404 #1578

Closed
skshetry opened this issue Jul 15, 2020 · 30 comments
Closed

Getting 404 #1578

skshetry opened this issue Jul 15, 2020 · 30 comments
Assignees
Labels
p0-critical Affects users in a bad way at the moment 🐛 type: bug Something isn't working. website: eng-doc DEPRECATED JS engine for /doc

Comments

@skshetry
Copy link
Member

skshetry commented Jul 15, 2020

I'm getting 404 when opening direct links, and then get the content from cache (I noticed because the sidebar gets broken when loaded from cache).

I thought it was because of Adblocker or messed-up local cache, but it happened in incognito too.

Open https://man.dvc.org and see.

Screenshot

Screenshot from 2020-07-15 21-42-35

GIF

Peek 2020-07-15 22-46

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 15, 2020

Weird. Works for me:

image

It should redirect directly to https://dvc.org/doc/command-reference without / though.

@jorgeorpinel jorgeorpinel added the website: eng-doc DEPRECATED JS engine for /doc label Jul 15, 2020
@skshetry
Copy link
Member Author

@jorgeorpinel, did you open it directly?

@jorgeorpinel
Copy link
Contributor

What do you mean by directly?

Anyway, seems to be OK. See https://www.webpagetest.org/result/200715_E2_10035ae1c056835ef59f741bfef92943/

@jorgeorpinel
Copy link
Contributor

It should redirect directly to https://dvc.org/doc/command-reference without / though.

P/s This is probably not a problem also, we don't expect people to open just man.dvc.org without a cmd name. With a cmd e.g. man.dvc.org/status there's a single redirect.

@jorgeorpinel
Copy link
Contributor

Try clearing your browser cache/ force reload @skshetry

Should we investigate this further @rogermparent ? I think I've seen this happen before. V rare though

@skshetry
Copy link
Member Author

I'm still having this issue, also verified via curl

$ curl https://dvc.org/doc/command-reference -v > /dev/null
... truncated
< HTTP/2 404
< date: Wed, 15 Jul 2020 16:26:49 GMT
< content-type: text/html; charset=utf-8
< set-cookie: __cfduid=d27adaf397a935691abe064dc64a747141594830409; expires=Fri, 14-Aug-20 16:26:49 GMT; path=/; domain=.dvc.org; HttpOnly; SameSite=Lax; Secure
< x-powered-by: Express
< cache-control: public, max-age=0, s-maxage=999999
< vary: Accept-Encoding
< via: 1.1 vegur
< cf-cache-status: HIT
< age: 2196
< cf-request-id: 03f4e5e8420000ddf51414e200000001
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< server: cloudflare
< cf-ray: 5b34d8ed39feddf5-SIN
<
{ [967 bytes data]
100  158k    0  158k    0     0   185k      0 --:--:-- --:--:-- --:--:--  185k
* Connection #0 to host dvc.org left intact

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 15, 2020

You're curling https://dvc.org/doc/command-reference though. Also works from a different remote location, see https://www.webpagetest.org/result/200715_SX_3acd3a61b3a3f6d7a6b37ad4ee585ff7/ 🤷

@rogermparent
Copy link
Contributor

rogermparent commented Jul 15, 2020

I'm also trying to replicate with no luck. I can load the page fine through both browser and curl.
The description/screenshot of the symptom is just like the ghost page issue where cache would stick around for removed pages, but that issue's been fixed and this page isn't removed, so I can't see this stemming from that.

@shcheklein
Copy link
Member

Works for me as well.

@skshetry
Copy link
Member Author

This looks like a regional issue: https://www.webpagetest.org/result/200715_XY_dac6db890b7e93e642f98ca6a06316fb/

@rogermparent
Copy link
Contributor

@skshetry Thanks, that was incredibly helpful! This sounds like some kind of CloudFlare caching issue on our end, which would explain why this looks similar to previous caching issues.

@shcheklein shcheklein added 🐛 type: bug Something isn't working. p0-critical Affects users in a bad way at the moment and removed question labels Jul 15, 2020
@shcheklein
Copy link
Member

k, escalating it then. Need to figure out what's going on here.

@rogermparent
Copy link
Contributor

I went into the CloudFlare UI and clicked the "Purge Cache" button targeting "man.dvc.org"

Re-running the test @skshetry provided from the same EC2 instance from India seems to indicate that this issue is fixed, but further confirmation may be needed because regional caching is tricky.
https://www.webpagetest.org/result/200717_BD_4aaa8142aa0d9a897ae701a02cc705c9/

@skshetry
Copy link
Member Author

@rogermparent, can confirm it's fixed from my side. 👍

@rogermparent
Copy link
Contributor

rogermparent commented Jul 17, 2020

Awesome! Thanks for the help. While the issue described in the OP is fixed, it does indicate there's a deeper issue with the automatic CloudFlare clearing script. I'm going to debug that regardless of the status of this ticket, so it's up to the group whether we want to close and make a new issue or repurpose this one for the auto-clear script.

@shcheklein
Copy link
Member

@rogermparent let's repurpose this one! we are fixing the core of the problem after all. ...

@rogermparent
Copy link
Contributor

rogermparent commented Jul 18, 2020

Update to this at #1587, which makes logs from the CloudFlare cache clearer easier to spot.

To put it simply, I think we just need to update the CloudFlare auth token on the dvc.org production Heroku app to a newly generated one. The PR's changes helped expose the issue, but the actual fix is using a new CloudFlare token on Heroku.

@shcheklein
Copy link
Member

@rogermparent good research, Roger. Could you generate a new key?

@shcheklein
Copy link
Member

@rogermparent also, when you generate it - check if it is somehow connected with your CF account.

@jorgeorpinel
Copy link
Contributor

Did you guys notice the CF outage? Maybe related

https://www.cnet.com/news/cloudflare-dns-outage-makes-websites-unreachable/

@rogermparent
Copy link
Contributor

@jorgeorpinel funnily enough, I noticed the CloudFlare outage because it happened toward the end of my research and stopped my progress. All my testing was done beforehand, so I believe it has nothing to do with that issue.

@shcheklein Sure, I can make a new key! I'll try to apply it on Heroku, assuming I have the permissions to modify the production instance. If I can't, I'll send it to you in some sort of PM.

@rogermparent
Copy link
Contributor

rogermparent commented Jul 18, 2020

I just changed the token on Heroku to the one I successfully used on the PR. You can see this token's entry listed on the CloudFlare dashboard under the "dvc.org" site when clicking "Get your API tokens" named "DVC Cache Clear"

I can't seem to find any piece of info associating that CloudFlare token with my account in particular.

@shcheklein
Copy link
Member

Closing this for now and redeploying with the new token. Would be great for the deployment system to notify or fail if clear cache script failed.

@shcheklein
Copy link
Member

Merging the PR broke deployment. Reopening this issue to not forget review PR again ...

@rogermparent
Copy link
Contributor

How did the deploy break with the PR? I looked over the build logs and the process succeeded. I didn't see the site itself though, so maybe something broke there.

@shcheklein
Copy link
Member

I was a regular heroku Application error page. It should be possible to reproduce it in the preview env I hope?

@rogermparent
Copy link
Contributor

rogermparent commented Jul 20, 2020

I didn't see it on the review app, but I think I found the problem. When removing the cache cleaner invocation from the start script, I removed the node call that invokes server.js. Since the server doesn't have a #! like the CloudFlare script, it must be directly invoked with node.
That explains why the build worked and the deploy didn't- the build went fine because only the proxy server actually broke.

I was also primarily developing with gatsby build out of habit, which would explain why an issue like this would bypass me.

@rogermparent
Copy link
Contributor

I was right! New PR at #1600, and I actually checked the deploy preview this time.

@rogermparent
Copy link
Contributor

With #1600 successfully merged, we could either close this now or wait until we actually observe a change that would otherwise break without a cache clear.

@shcheklein
Copy link
Member

Yep, closing! Thanks @rogermparent !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p0-critical Affects users in a bad way at the moment 🐛 type: bug Something isn't working. website: eng-doc DEPRECATED JS engine for /doc
Projects
None yet
Development

No branches or pull requests

4 participants