Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Past Release sites are taking long time to load #3525

Open
yashcho opened this issue Apr 19, 2023 · 19 comments
Open

Past Release sites are taking long time to load #3525

yashcho opened this issue Apr 19, 2023 · 19 comments
Labels
bug Something isn't working get.jenkins.io

Comments

@yashcho
Copy link

yashcho commented Apr 19, 2023

Service(s)

get.jenkins.io

Summary

Below two links from page (https://www.jenkins.io/download/) are taking more than 30 seconds to load and show the table with list of old versions. Can you please look into it?

https://get.jenkins.io/war/

https://get.jenkins.io/war-stable/

Reproduction steps

  1. Open browser by removing cache and cookies then hit any one of link.
  2. Either of them will take more than 30 second or crash with status as "502 Bad Gateway"
@yashcho yashcho added the triage Incoming issues that need review label Apr 19, 2023
@dduportal dduportal self-assigned this Apr 19, 2023
@dduportal dduportal removed the triage Incoming issues that need review label Apr 19, 2023
@dduportal dduportal added this to the infra-team-sync-2023-04-25 milestone Apr 19, 2023
@dduportal dduportal added the bug Something isn't working label Apr 19, 2023
@dduportal
Copy link
Contributor

dduportal commented Apr 19, 2023

Hi @yashcho thanks for raising this issue, I can reproduce the issue, looking into it.

@dduportal
Copy link
Contributor

dduportal commented Apr 19, 2023

First investigation:

  • The Apache pod serving the files such as the HTML content on these pages (e.g. not when redirected to mirrors) restarted quite often since yesterday, while the redirector service (and reverse-proxy to the former) seems fine:
mirrorbits               mirrorbits-bb668f8f6-5bzqd                                        2/2     Running   0                 25h
mirrorbits               mirrorbits-bb668f8f6-hdxbs                                        2/2     Running   0                 25h
mirrorbits               mirrorbits-bb668f8f6-wlrxd                                        2/2     Running   0                 25h
mirrorbits               mirrorbits-files-76848c4d59-6cn5d                                 1/1     Running   371 (2m46s ago)   4d15h
mirrorbits               mirrorbits-files-76848c4d59-ds6bb                                 1/1     Running   192 (2m41s ago)   25h
mirrorbits               mirrorbits-files-76848c4d59-k959c                                 1/1     Running   194 (3m10s ago)   44h
  • A quick check on the recent log lines with kubectl -n mirrorbits -l app.kubernetes.io/name=mirrorbits-files logs shows 2 kind of errors:

    • Permission error on the (Azurefile) filesystem:
AH00132: file permissions deny server access: /usr/local/apache2/htdocs/war/HEADER.html
  • Apache overload:
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

@dduportal
Copy link
Contributor

  • Triggering a node pool image upgrade to force recylcing the underlying VMs (and the associated azurefile mount) as a short ter measure

@dduportal
Copy link
Contributor

@dduportal
Copy link
Contributor

Operations notes for the infra-team:

Preliminaries:

  • A snapshot of the persistent volume was taken before operations (named 20230419-09h46-utc)
  • Checked that all involved PVs are in "retain" mode in the mirrorbits namespace
  • Removed all the PVCs and PVs related to redis as mirrorbits uses an Azure-managed Redis instances since 2.5 years
    • The associated Azure disks are not deleted and should be once prodpublick8s will be deleted, after migration to publick8s

Operations:

  • Removed namespace "mirrorbits" to ensure that the PVC mirrorbits-binary is removed properly
  • Removed the PV mirrorbits-binary with kubectl delete pv mirrorbits-binary as it's immutable
  • Applied manually the Helm chart to re-create the PV, PVC and pod

Checks:

  • Service was back after 2 minutes of outage
  • Files are mounted, and mirror redirections are present
  • The reported file volume size is now 1000 Gib

@dduportal
Copy link
Contributor

  • The performances issues are back when serving the content of /war and /war-stable
  • The access logs shows that we are hammered by GET requests to these to paths

@dduportal
Copy link
Contributor

Let's see the result

@dduportal
Copy link
Contributor

The performances seems ok for the /war-stable/ page but not for the /war.

@yashcho
Copy link
Author

yashcho commented May 3, 2023

@dduportal, Even for me it is the case what do you suggest?

@dduportal
Copy link
Contributor

@dduportal, Even for me it is the case what do you suggest?

I'm not sure to understand, what is the exact problem you are having?

It is true that he page https://get.jenkins.io/war/ is slow, but we do not see HTTP/5XX errors on it.

Do you still see errors are are you blocked by the "slow" response time of this page? If you are block, can you describe what you are trying to achieve (as we could provide alternative and way more efficient solutions).

We are working on this matter but it is not the top priority as nothing seems blocked or broken: happy to reevaluate with more details

@yashcho
Copy link
Author

yashcho commented May 3, 2023

are you blocked by the "slow" response time of this page?

Yes, basically I am trying to access the URL and fetch the list of versions from table programmatically. So the test cases are failing due to slow response to fetch the table details.

@dduportal
Copy link
Contributor

dduportal commented May 3, 2023

are you blocked by the "slow" response time of this page?

Yes, basically I am trying to access the URL and fetch the list of versions from table programmatically. So the test cases are failing due to slow response to fetch the table details.

In that case, you might be interested by checking the artifactory metadatas. We have an example here, in shell script, which checks for the 2 past LTS releases: https://github.com/jenkinsci/docker/blob/master/.ci/publish.sh#L85-L89 and the 2 past weekly releases: https://github.com/jenkinsci/docker/blob/ca53c743e9e83db0cd235723e6689cd1490e1d13/.ci/publish.sh#L79-L83

The URL https://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/maven-metadata.xml should be considered as the source of truth AND it's XML so it's made for programamtic access (while an HTML dynamically generated page is made for humans)

@yashcho
Copy link
Author

yashcho commented May 3, 2023

Do you have similar pattern using golang codebase instead of shell script?

@dduportal
Copy link
Contributor

Do you have similar pattern using golang codebase instead of shell script?

No, that way out of our scope. Do not hesitate to propose a contribution if you find or build something.

@yashcho
Copy link
Author

yashcho commented May 3, 2023

Sure..

@dduportal
Copy link
Contributor

Hello @yashcho , do you still have the problem or can we close this issue?

@timja
Copy link
Member

timja commented Jan 26, 2024

This seems far quicker now since #3917

Plugins list is still slow-ish but does load eventually:
https://get.jenkins.io/plugins/

@lemeurherve
Copy link
Member

lemeurherve commented Jan 26, 2024

Plugins list is still slow-ish but does load eventually: https://get.jenkins.io/plugins/

Off-topic & low priority: should an index be generated like jenkinsci/packaging#159 to avoid Apache having to render this particular folder list?

@dduportal
Copy link
Contributor

This seems far quicker now since #3917

Plugins list is still slow-ish but does load eventually: https://get.jenkins.io/plugins/

Yes, far quicker, but still take a bit of time. At least we are not reaching timeout.

Wasn't able to switch to NFS PVC (failed to some timeout in CSI driver when trying to reach server: need to be dig up) though.

Off-topic & low priority: should an index be generated like jenkinsci/packaging#159 to avoid Apache having to render this particular folder list?

I believe it is totally on the topic here as it could properly solve the problem here: generated files would be mirrored and spread across the world. And even if served by the HTTPD fallback (or achives), still better than today!

cc @MarkEWaite as he mentionned this possibility a few weeks earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working get.jenkins.io
Projects
None yet
Development

No branches or pull requests

4 participants