Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup scrip breaks consistently over large jobs folder #415

Closed
trieszklr opened this issue Jun 17, 2020 · 6 comments
Closed

backup scrip breaks consistently over large jobs folder #415

trieszklr opened this issue Jun 17, 2020 · 6 comments
Labels

Comments

@trieszklr
Copy link

Expected Behavior

backup.sh runs successfully even when jobs folder is large

Actual Behavior

backup.sh fails practically consistently due to some files are always changing while it runs

2020-06-17T03:30:46.521Z    WARN    controller-jenkins    jenkins/jenkins_controller.go:171    Reconcile loop failed: pod exec error operation on stream: stdout 'Running backup
backup file '/backup/13019.tar.gz' is empty
' stderr 'tar: jobs/xxxx/jobs/yyyyyy: file changed as we read it
tar: jobs/xxxx/jobs/zzzzzz: file changed as we read it
tar: jobs/xxxx/jobs/vvvvv: file changed as we read it
': command terminated with exit code 1    {"cr": "xxxxxx"}

this does not change over 24 hours:

kubectl --kubeconfig ~/.kube/config.neutral -n jenkins get jenkins -o yaml | grep  Backup
      makeBackupBeforePodDeletion: true
    lastBackup: 13018
    pendingBackup: 13019
    restoredBackup: 13016

Steps to Reproduce the Problem

  1. set up github organization project with 30 projects and 20+ contributors with commit hooks
  2. create branches and run jobs so jobs folder grows 5GB+
  3. monitor jenkins-operator logs backup part

Additional Info

tar returns 1 if files change while archiving.
https://stackoverflow.com/questions/20318852/tar-file-changed-as-we-read-it

this breaks current backup.sh script due to

set -eo pipefail
  • Jenkins Operator version:
 v0.4.0
@tomaszsek
Copy link

Hi @trieszklr,

In my opinion, your Jenkins is too big. Why don't split it to there smaller Jenkinses? The backup will be much smaller. If you have that big amount of data you have to increase the resources for the backup container.

Cheers

@trieszklr
Copy link
Author

Hi @trieszklr,

In my opinion, your Jenkins is too big. Why don't split it to there smaller Jenkinses? The backup will be much smaller. If you have that big amount of data you have to increase the resources for the backup container.

Cheers

Hi @tomaszsek ,

thanks for your reply. We are using github organization which automagically creates folders/jobs for all projects and branches where pipeline file was found... We have already reduced the build retention to fairly minimal, but we would not want to lose this very convenient jenkins feature... So "splitting up to smaller jenkinses" is not really an option for us and we are likely not alone with this requirement...

And indeed increasing resource for backup container also helped to reduce the occurrences but that does not fix this bug in itself...

@trieszklr
Copy link
Author

trieszklr commented Jul 16, 2020

hi @tomaszsek and @waveywaves ,

just wanted to mention that I have built a custom backup container with this patch, rolled it out and it did fix the issue:
#416

Previously 9 out of 10 times jenkins master pod did come back online after a restart because of this bug.
If you guys suggest alternative approach, that is perfectly fine too, but this bug should be addressed IMHO. The implication is very severe and "splitting up jenkins" is not an option for larger teams using GitHub or Teambucket Organization Folders...

Cheers

@stale
Copy link

stale bot commented Jul 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this issue is still affecting you, just comment with any updates and we'll keep it open. Thank you for your contributions.

@stale stale bot added the stale label Jul 21, 2021
@stale
Copy link

stale bot commented Aug 22, 2021

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

@stale stale bot closed this as completed Aug 22, 2021
@Bakies
Copy link

Bakies commented Sep 1, 2021

your Jenkins is too big

I don't find this answer acceptable, the backup job should be more resilient. Like trieszklr, I added one organization job and backups started failing constantly. I can't split this up into smaller portion and don't want to manage more Jenkins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants