Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 errors in Packit #3372

Closed
FrostyX opened this issue Aug 14, 2024 · 19 comments · Fixed by #3415
Closed

500 errors in Packit #3372

FrostyX opened this issue Aug 14, 2024 · 19 comments · Fixed by #3415
Assignees

Comments

@FrostyX
Copy link
Member

FrostyX commented Aug 14, 2024

Reported by @majamassarini in #3329 (comment)

I was thinking that probably we should open an issue for this. I hoped it was somehow related to the above fix but sadly it isn't.
I looked in our logs and what catches me is that we got 500 errors from COPR in the last 7 days just in a bunch of projects and the exceptions are scattered all along the period of time; so I would say it does not depend on an high volume of requests or high load in the COPR service.

2024-08-13T13:42:03 containers/podman#23601
2024-08-10T09:39:55 containers/podman#23569
2024-08-12T13:29:25 containers/podman#23581
2024-08-12T15:38:41 containers/podman#23587
2024-08-07T14:38:59 containers/podman#23537

2024-08-13T16:26:09 containers/common#2124
2024-08-09T23:14:02 containers/common#2119

2024-08-08T00:58:49 containers/crun#1513
2024-08-12T17:50:45 containers/crun#1519
2024-08-12T21:16:54 containers/crun#1520

2024-08-10T00:08:19 containers/buildah#5680
2024-08-12T19:43:26 containers/buildah#5681
2024-08-12T20:15:48 containers/buildah#5682

2024-08-11T14:02:53 containers/netavark#1052

2024-08-13T10:50:34 rpm-software-management/dnf5#1625

2024-08-08T11:29:19 cockpit-project/cockpit-machines#1760
2024-08-11T19:30:25 cockpit-project/cockpit-machines#1761
2024-08-12T03:47:41 cockpit-project/cockpit-machines#1762

The containers projects and the cockpit-machines project both use the packages key. With the packages key I would expect more requests from Packit to COPR in a short period of time in comparison to other Packit projects. I could be wrong, but to me it looks like a race condition on the COPR side. Also because this does not happen always on the same PR, thus, probably, it is not the data we submit to COPR.

The dnf5 project, instead, has the most simple packit config we could find and nevertheless has been hit by this problem. I can explain it again just with some kind of race condition...

I can't spot anything else interesting in our logs but let us know if we can help you in some way debugging it.

@praiskup
Copy link
Member

Last 6 hours nothing suspicious. One of the events mentioned above created this traceback:
log.txt

[Tue Aug 13 10:50:32.041388 2024] [wsgi:error] [pid 3866555:tid 3866782] [remote 107.20.230.14:21570] psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "copr_name_for_user_uniq"
[Tue Aug 13 10:50:32.041400 2024] [wsgi:error] [pid 3866555:tid 3866782] [remote 107.20.230.14:21570] DETAIL:  Key (user_id, name)=(5576, rpm-software-management-dnf5-1625) already exists.

@mcrha
Copy link

mcrha commented Aug 20, 2024

Trying to open https://download.copr.fedorainfracloud.org/results/mcrha or https://download.copr.fedorainfracloud.org/results/rpmsoftwaremanagement/ leads to:

504 ERROR
The request could not be satisfied.
CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Generated by cloudfront (CloudFront)
Request ID: qb8C5Vsx7ySyHQaTIpUS5n-x2e_Q4qHumkAWoTo2f1ApQVW_IWDKig==

Is this anyhow related to this issue or I may file a new one, please?

@praiskup
Copy link
Member

@mcrha thank you for reporting that! Wes we had copr-backend issues yesterday, sorry for the inconvenience (should be working OK now). The problem discussed here is in copr-frontend.rpm (different VM).

@mcrha
Copy link

mcrha commented Aug 21, 2024

Aha, I see, different thing then. I'm sorry for the noise. You are right, it cured on itself an hour or so after I wrote a note here.

@praiskup praiskup moved this from In 3 months to In Progress in CPT Kanban Aug 26, 2024
@praiskup praiskup self-assigned this Aug 26, 2024
@majamassarini
Copy link
Contributor

@FrostyX , @praiskup I was quickly checking the last occurrences of this exception on the Packit side and I saw that this happened last time on August the 22nd around 10AM.
I don't know if you have done something that could have solved the problem? Or maybe the projects that trigger this exception are just on vacation ^_^.
I don't think something has changed on the Packit side on Thursday the 22nd (we release packit service on Tuesday).

@majamassarini
Copy link
Contributor

majamassarini commented Sep 19, 2024

If it can be of any help I checked the Packit logs again, here the latest exception we collected:

2024-09-16T06:00:25.410799561+00:00 rpm-software-management/mock#1452
2024-09-13T11:58:14.606678447+00:00 rpm-software-management/dnf5#1696
2024-09-15T16:55:24.320991349+00:00 containers/podman#23958
2024-09-16T19:05:21.306196435+00:00 containers/podman#23970
2024-09-13T17:29:00.908865675+00:00 rpm-software-management/dnf5#1699
2024-09-14T03:55:13.329540281+00:00 rpm-software-management/mock#1451
2024-09-18T14:24:55.040294306+00:00 containers/podman#23999
2024-09-12T15:37:14.705051873+00:00 containers/buildah#5734
2024-09-12T16:52:49.475155359+00:00 containers/conmon#528
2024-09-17T11:40:13.308385515+00:00 containers/podman#23979
2024-09-17T04:03:46.018696225+00:00 https://gitlab.com/packit-service/hello-world/-/merge_requests/1127
2024-09-16T14:41:02.828280764+00:00 containers/container-selinux#329
2024-09-17T12:15:29.408973412+00:00 containers/container-selinux#330

praiskup added a commit to praiskup/copr that referenced this issue Sep 20, 2024
This is TOCTOU issue.  The other checks for duplications (on so many
places) seem kinda redundant because nothing but try/except for commit()
may catch these concurrency problems.

Fixes: fedora-copr#3372
praiskup added a commit to praiskup/copr that referenced this issue Sep 20, 2024
This is TOCTOU issue.  The other checks for duplications (on so many
places) seem kinda redundant because nothing but try/except for commit()
may catch these concurrency problems.

Fixes: fedora-copr#3372
praiskup added a commit to praiskup/copr that referenced this issue Sep 20, 2024
This is TOCTOU issue.  The other checks for duplications (on so many
places) seem kinda redundant because nothing but try/except for commit()
may catch these concurrency problems.

Fixes: fedora-copr#3372
@nikromen nikromen moved this from In Progress to Done in CPT Kanban Sep 23, 2024
@lsm5
Copy link

lsm5 commented Sep 25, 2024

Hello, has the fix been deployed to copr? This issue did occur 8 hours ago: containers/ramalama#185 (comment)

@praiskup
Copy link
Member

Not yet, the ETA plan for the release is next Thursday (if everything goes OK). Is that OK, or do you want us to hot-fix this in production?

@majamassarini
Copy link
Contributor

@praiskup this morning Packit logged a new exception (for the ramalama project):

Cannot create a new Copr project (owner=packit project=containers-ramalama-182 chroots=['fedora-41-x86_64', 'fedora-rawhide-x86_64', 'fedora-40-x86_64', 'fedora-39-x86_64']): Copr: 'packit/containers-ramalama-182' already exists. Copr HTTP response is 400 BAD REQUEST.

I thought we should handle it silently.
Or there is something more to be deployed? And we shouldn't expect this to happen anymore?

@lsm5
Copy link

lsm5 commented Sep 25, 2024

Not yet, the ETA plan for the release is next Thursday (if everything goes OK). Is that OK, or do you want us to hot-fix this in production?

@praiskup just to double check, next Thursday is tomorrow or Thursday of next week?

Would be great if you could hotfix this. Else I'll just ask people to wait some more.

@praiskup
Copy link
Member

Oh, yes - I meant "next week Thursday" rather than "this week Thursday". But I can try to hotfix tomorrow, seems pretty easy to rollback if problems appear at least.

jaitjacob added a commit to jaitjacob/copr that referenced this issue Sep 25, 2024
commit 4b1576f
Merge: 57a303e 8b0977a
Author: jait <[email protected]>
Date:   Wed Sep 25 22:59:20 2024 +0530

    Merge branch 'fedora-copr:main' into Webhook-History-UI

commit 8b0977a
Author: Pavel Raiskup <[email protected]>
Date:   Tue Sep 24 21:09:24 2024 +0200

    rpmbuild: unblock testsuite

commit 5d77d36
Author: Jiri Kyjovsky <[email protected]>
Date:   Tue Sep 17 14:35:28 2024 +0200

    rpmbuild: specify snippets to mock config via copr-rpmbuild config file

    This allows us to specify tpm fs size to rpmbuild in order to be able to
    automatically generate its size for performance builders.

    See fedora-copr#3268

commit 57a303e
Author: Jait Jacob <[email protected]>
Date:   Wed Sep 25 13:26:34 2024 +0530

    remove unused import & revert localized_time filter method

commit 658230d
Author: Pavel Raiskup <[email protected]>
Date:   Mon Sep 23 15:20:09 2024 +0200

    backend: unknown resalloc tickets helper cleanup

    If no tickets are taken (which often happens in the staging
    environment), this script encountered corner case issues.

commit d6a3472
Author: Miroslav Suchý <[email protected]>
Date:   Mon Sep 23 22:12:43 2024 +0200

    rpmbuild: do not require rpkg,pyp2rpm,pyp2spec,gem2rpm and fedora-review on rhel

    Resolves: RHBZ#2313878

commit 6ae7c6c
Author: Miroslav Suchý <[email protected]>
Date:   Mon Sep 23 22:04:11 2024 +0200

    rpmbuild: do not require qemu-user-static on rhel

    Resolves: RHBZ#2313879

commit 1559b85
Author: Aurélien Bompard <[email protected]>
Date:   Mon Jul 29 10:54:02 2024 +0200

    Use `super()` without argument to make pylint happy

    Signed-off-by: Aurélien Bompard <[email protected]>

commit f3b1643
Author: Aurélien Bompard <[email protected]>
Date:   Mon Jul 29 10:40:13 2024 +0200

    Message schemas: set chroot message severity to DEBUG

    Signed-off-by: Aurélien Bompard <[email protected]>

commit c192726
Author: Aurélien Bompard <[email protected]>
Date:   Mon Jul 29 10:38:34 2024 +0200

    Message schemas: one-line descriptions should be the summary

    Signed-off-by: Aurélien Bompard <[email protected]>

commit 609d369
Author: Pavel Raiskup <[email protected]>
Date:   Fri Sep 20 14:14:31 2024 +0200

    frontend: fix the 500 for racy creation attempts

    This is TOCTOU issue.  The other checks for duplications (on so many
    places) seem kinda redundant because nothing but try/except for commit()
    may catch these concurrency problems.

    Fixes: fedora-copr#3372

commit 446dcb3
Author: Jiri Kyjovsky <[email protected]>
Date:   Wed Sep 18 16:49:52 2024 +0200

    beaker: use podman for testing inside container if installed

commit 030740a
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 22 23:17:17 2024 +0530

    remove unaccessed import

commit cf1cb81
Author: Jait Jacob <[email protected]>
Date:   Fri Sep 20 02:36:09 2024 +0530

    webhook_history stores UNIX timestamps instead of DateTime

commit 6aeb686
Author: Jiri Kyjovsky <[email protected]>
Date:   Mon Sep 16 10:15:37 2024 +0200

    docker: set hard ulimits for docker container

    Because of bug in python3-daemon [1] we need to set ulimits inside
    docker container, otherwise backend and dist-git ooms.

    [1] - https://bugzilla.redhat.com/show_bug.cgi?id=2307635

commit 6aa1d5e
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 15 19:55:11 2024 +0530

    resolve pylint warnings

commit a2adf81
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 15 19:44:07 2024 +0530

    remove unused code

commit 14faa50
Author: Jakub Kadlcik <[email protected]>
Date:   Thu Sep 5 11:10:15 2024 +0200

    frontend, python, cli: allow admins to set storage for new projects

    See fedora-copr#2533

    This will be useful for beaker tests where we can now add basic tests
    for every supported storage.

commit ad36b8b
Author: Jakub Kadlcik <[email protected]>
Date:   Thu Sep 5 10:36:46 2024 +0200

    frontend: make the default storage for new projects configurable

    See fedora-copr#2533

commit 286a913
Author: Jakub Kadlcik <[email protected]>
Date:   Tue Sep 10 08:36:23 2024 +0200

    backend: add a timeout for waiting until a Pulp task finishes

commit 23a2fa3
Author: Jakub Kadlcik <[email protected]>
Date:   Mon Jul 22 11:23:53 2024 +0200

    backend: actions don't call uses_devel_repo function anymore

commit c5166a1
Author: Jakub Kadlcik <[email protected]>
Date:   Thu Sep 5 09:43:09 2024 +0200

    backend, frontend: implement project and build deletion in Pulp

    Fix fedora-copr#3318
    Fix fedora-copr#3319

commit 19eff0c
Author: Jakub Kadlcik <[email protected]>
Date:   Sun Sep 8 12:03:15 2024 +0200

    copr: wait until Pulp publication is finished

commit ff5288d
Author: Miroslav Suchý <[email protected]>
Date:   Wed Sep 11 08:19:41 2024 +0200

    common: cleanup - remove six dependency

commit 6c71993
Author: Jait Jacob <[email protected]>
Date:   Wed Sep 11 16:21:34 2024 +0530

    fix pylint complaint

commit 649fe51
Author: Jait Jacob <[email protected]>
Date:   Wed Sep 11 16:15:14 2024 +0530

    optimize db calls, remove client side js, resolve review comments

commit 31880b4
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 8 22:32:14 2024 +0530

    cleanup

commit 593a13f
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 8 22:17:58 2024 +0530

    add pagination

commit 99e142e
Merge: 11d7fb1 2bdec45
Author: jait <[email protected]>
Date:   Sun Sep 8 15:03:36 2024 +0530

    Merge branch 'fedora-copr:main' into Webhook-History-UI

commit 11d7fb1
Author: Jait Jacob <[email protected]>
Date:   Sun Sep 8 15:03:29 2024 +0530

    return webhook history in descending order & without any duplicates

commit b7f2b34
Author: Jait Jacob <[email protected]>
Date:   Sat Sep 7 00:22:06 2024 +0530

    frontend: show webhook history table under Setting->Integration
@lsm5
Copy link

lsm5 commented Sep 26, 2024

Oh, yes - I meant "next week Thursday" rather than "this week Thursday". But I can try to hotfix tomorrow, seems pretty easy to rollback if problems appear at least.

great. Thanks @praiskup

@praiskup
Copy link
Member

I applied the patch now, I am sorry it took several days... got quite busy elsewhere.

@lsm5
Copy link

lsm5 commented Sep 30, 2024

I applied the patch now, I am sorry it took several days... got quite busy elsewhere.

Thanks @praiskup . I'll watch out for further occurrences if any.

@lsm5
Copy link

lsm5 commented Oct 2, 2024

@praiskup still seeing it unfortunately containers/buildah#5765 (comment)

@praiskup
Copy link
Member

praiskup commented Oct 2, 2024

Thank you for the report. I'm locked in a meeting room, but this seems like a different issue, not sure if related: #3443.

@lsm5
Copy link

lsm5 commented Oct 2, 2024

@praiskup still seeing it unfortunately containers/buildah#5765 (comment)

looks like they started running without anyone from the team restarting them. So maybe it works but some tmp failure messages need to be silenced?

@praiskup
Copy link
Member

praiskup commented Oct 7, 2024

@lsm5 I'm unsure how/when Packit re-creates the projects; perhaps some people from Packit reacted. Anyway, the remaining typo triggering 500 was fixed in #3443. I haven't seen this problem since Thursday's service upgrade (scheduled outage).

@lsm5
Copy link

lsm5 commented Oct 7, 2024

@lsm5 I'm unsure how/when Packit re-creates the projects; perhaps some people from Packit reacted. Anyway, the remaining typo triggering 500 was fixed in #3443. I haven't seen this problem since Thursday's service upgrade (scheduled outage).

ack thanks @praiskup . I'll watch out for further occurrences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants