Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project.last_serial failed to update. #12933

Closed
ewdurbin opened this issue Feb 4, 2023 · 26 comments · Fixed by #16797
Closed

Project.last_serial failed to update. #12933

ewdurbin opened this issue Feb 4, 2023 · 26 comments · Fixed by #16797
Labels

Comments

@ewdurbin
Copy link
Member

ewdurbin commented Feb 4, 2023

Reported by a bandersnatch user in #12214, the last_serial value for a Project was out of sync with the journals table after what appears to have been an automated mass removal of releases.

warehouse=> select id, name, version, action, submitted_date from journals where name='jarvis-ui' order by id desc limit 10;
    id    |   name    | version |     action     |       submitted_date       
----------+-----------+---------+----------------+----------------------------
 16626043 | jarvis-ui | 0.3.3   | remove release | 2023-01-30 23:30:11.147139
 16626042 | jarvis-ui | 0.3.8   | remove release | 2023-01-30 23:30:10.755916
 16626041 | jarvis-ui | 0.4.0   | remove release | 2023-01-30 23:30:10.732833
 16626040 | jarvis-ui | 0.3.5   | remove release | 2023-01-30 23:30:10.69155
 16626039 | jarvis-ui | 0.3.1   | remove release | 2023-01-30 23:30:10.640855
 16626038 | jarvis-ui | 0.3.2   | remove release | 2023-01-30 23:30:10.541308
 16626037 | jarvis-ui | 0.2.5   | remove release | 2023-01-30 23:30:10.50915
 16626036 | jarvis-ui | 0.3.0   | remove release | 2023-01-30 23:30:10.527536
 16626035 | jarvis-ui | 0.4.8   | remove release | 2023-01-30 23:30:10.537667
 16626034 | jarvis-ui | 0.4.3   | remove release | 2023-01-30 23:30:10.518038
(10 rows)

warehouse=> select last_serial from projects where name='jarvis-ui';
 last_serial 
-------------
    16626041
(1 row)

Note that the last_serial column on the project record is not actually the latest serial. Seems the trigger that calls maintain_project_last_serial() for the journals table failed to update the project. I'll file a separate issue for that.

It has been corrected by calling a no-op update on the row for now.

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=16626043;
UPDATE 1
warehouse=*> select * from journals where name='jarvis-ui' order by id desc limit 10;
    id    |   name    | version |     action     |       submitted_date       | submitted_by  |             submitted_from              
----------+-----------+---------+----------------+----------------------------+---------------+-----------------------------------------
 16626043 | jarvis-ui | 0.3.3   | remove release | 2023-01-30 23:30:11.147139 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626042 | jarvis-ui | 0.3.8   | remove release | 2023-01-30 23:30:10.755916 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626041 | jarvis-ui | 0.4.0   | remove release | 2023-01-30 23:30:10.732833 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626040 | jarvis-ui | 0.3.5   | remove release | 2023-01-30 23:30:10.69155  | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626039 | jarvis-ui | 0.3.1   | remove release | 2023-01-30 23:30:10.640855 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626038 | jarvis-ui | 0.3.2   | remove release | 2023-01-30 23:30:10.541308 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626037 | jarvis-ui | 0.2.5   | remove release | 2023-01-30 23:30:10.50915  | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626036 | jarvis-ui | 0.3.0   | remove release | 2023-01-30 23:30:10.527536 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626035 | jarvis-ui | 0.4.8   | remove release | 2023-01-30 23:30:10.537667 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626034 | jarvis-ui | 0.4.3   | remove release | 2023-01-30 23:30:10.518038 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
(10 rows)

warehouse=*> select last_serial from projects where name='jarvis-ui';
 last_serial 
-------------
    16626043
(1 row)

warehouse=*> commit;
COMMIT

Originally posted by @ewdurbin in #12214 (comment)

@dstufft
Copy link
Member

dstufft commented Feb 25, 2023

I'm trying to think how this could have happened, and I think I have a theory for it.

Multiple transactions occurring in isolation, READ COMMITTED

  1. Transaction A inserts record X into journals
  2. Transaction B inserts record Y into journals
  3. Transaction B updates Project.last_serial, locking it.
  4. Transaction A gets blocked on Transaction B.
  5. Transaction B commits, updating Project.last_serial to Y.
  6. Transaction A updates Project.last_serial, locking it.
  7. Transaction A commits, updating Project.last_serial to X.

If that's right, I suspect we can fix it by always issuing a SELECT FOR UPDATE on the projects table immediately in any view that is going to create journals. That should effectively serialize any transactions that operate on the same project.

@di di added the bug 🐛 label Mar 6, 2023
@ewdurbin
Copy link
Member Author

ewdurbin commented Apr 6, 2023

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='expose-localhost' order by id desc limit 10;
    id    |       name       | version |     action     |       submitted_date       
----------+------------------+---------+----------------+----------------------------
 17357317 | expose-localhost | 0.4.2   | remove release | 2023-03-20 00:20:18.457385
 17357316 | expose-localhost | 0.3.0   | remove release | 2023-03-20 00:20:18.45873
 17357315 | expose-localhost | 0.3.1   | remove release | 2023-03-20 00:20:18.461733
 17357314 | expose-localhost | 0.2.9   | remove release | 2023-03-20 00:20:18.431603
 17357313 | expose-localhost | 0.1.8   | remove release | 2023-03-20 00:20:18.423596
 17357312 | expose-localhost | 0.2.4   | remove release | 2023-03-20 00:20:18.367459
 17357311 | expose-localhost | 0.3.4   | remove release | 2023-03-20 00:20:18.360073
 17357310 | expose-localhost | 0.3.6   | remove release | 2023-03-20 00:20:18.373008
 17357309 | expose-localhost | 0.3.5   | remove release | 2023-03-20 00:20:18.271077
 17357308 | expose-localhost | 0.2.0   | remove release | 2023-03-20 00:20:17.959461
(10 rows)

warehouse=> select last_serial from projects where name='expose-localhost';
 last_serial 
-------------
    17357314
(1 row)

Manually resolved.

@TechCiel
Copy link

Two packages, cyaxares and later openmldb-tool has been failing for days.

I wonder if it's another instance of this issue, and if there's someone going to fix it manually.

@TechCiel
Copy link

Two packages, cyaxares and later openmldb-tool has been failing for days.

I wonder if it's another instance of this issue, and if there's someone going to fix it manually.

Could anybody look into this? Maybe @ewdurbin ?

@ewdurbin
Copy link
Member Author

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='cyaxares' order by id desc limit 20;
    id    |   name   | version |               action                |       submitted_date       
----------+----------+---------+-------------------------------------+----------------------------
 18070374 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.845118
 18070372 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.567462
 18070371 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.386991
 18070370 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.304009
 18070369 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.118203
 18070368 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:35.790249
 18070365 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.775461
 18070364 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.586464
 18070363 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.414776
 18070362 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.24229
 18070361 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.073759
 18070360 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:26.903427
 18070358 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:26.06091
 18070356 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:24.592312
 18070353 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:15.375401
 18070352 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:15.071967
 18070351 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:13.089239
 18070348 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:09.134356
 18070270 | cyaxares | 0.2     | remove file cyaxares-0.2.tar.gz     | 2023-05-11 22:08:59.822525
 18070249 | cyaxares | 0.2     | add source file cyaxares-0.2.tar.gz | 2023-05-11 22:07:59.015508
(20 rows)

warehouse=> select last_serial from projects where name = 'cyaxares';
 last_serial 
-------------
    18070352
(1 row)

Resolved:

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=18070374;
UPDATE 1
warehouse=*> select id, name, version, action, submitted_date from journals where name='cyaxares' order by id desc limit 10;
    id    |   name   | version |     action     |       submitted_date       
----------+----------+---------+----------------+----------------------------
 18070374 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.845118
 18070372 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.567462
 18070371 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.386991
 18070370 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.304009
 18070369 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.118203
 18070368 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:35.790249
 18070365 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.775461
 18070364 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.586464
 18070363 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.414776
 18070362 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.24229
(10 rows)

warehouse=*> select last_serial from projects where name = 'cyaxares';
 last_serial 
-------------
    18070374
(1 row)

warehouse=*> commit;
COMMIT

@ewdurbin
Copy link
Member Author

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='openmldb-tool' order by id desc limit 10;
    id    |     name      | version |                      action                       |       submitted_date       
----------+---------------+---------+---------------------------------------------------+----------------------------
 18075139 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.31279
 18075138 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.088052
 18075137 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:12.849749
 18075135 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.238179
 18075134 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.200528
 18075133 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:08.340646
 18075132 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:07.466855
 18075131 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:05.803824
 18075126 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:09:59.441708
 18072976 | openmldb-tool | 0.8.0   | add py3 file openmldb_tool-0.8.0-py3-none-any.whl | 2023-05-12 03:44:07.769301
(10 rows)

warehouse=> select last_serial from projects where name = 'openmldb-tool';
 last_serial 
-------------
    18075137
(1 row)

Resolved:

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=18075139;
UPDATE 1
warehouse=*> select last_serial from projects where name = 'openmldb-tool';
 last_serial 
-------------
    18075139
(1 row)

warehouse=*> select id, name, version, action, submitted_date from journals where name='openmldb-tool' order by id desc limit 10;
    id    |     name      | version |                      action                       |       submitted_date       
----------+---------------+---------+---------------------------------------------------+----------------------------
 18075139 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.31279
 18075138 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.088052
 18075137 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:12.849749
 18075135 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.238179
 18075134 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.200528
 18075133 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:08.340646
 18075132 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:07.466855
 18075131 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:05.803824
 18075126 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:09:59.441708
 18072976 | openmldb-tool | 0.8.0   | add py3 file openmldb_tool-0.8.0-py3-none-any.whl | 2023-05-12 03:44:07.769301
(10 rows)

warehouse=*> commit;
COMMIT

@dstufft
Copy link
Member

dstufft commented Jun 14, 2023

I think that #13936 might fix this.

@TechCiel
Copy link

TechCiel commented Jul 6, 2023

Maybe another instance on fiftyone-desktop. Waiting for manual intervention.

2023-07-05 23:53:10,292 ERROR: Stale serial for fiftyone-desktop (18769749) not updating. Giving up. (package.py:82)
2023-07-05 23:53:10,292 ERROR: Error syncing package: fiftyone-desktop@18769749 (mirror.py:377)
bandersnatch.master.StalePage: Expected PyPI serial 18769749 for request https://pypi.org/pypi/fiftyone-desktop/json but got 18769748.

@di
Copy link
Member

di commented Jul 6, 2023

@TechCiel Manually resolved that for you. In this instance we had two journals for the same project get issued within a ~2 seconds of each other:

    id    |       name       | version |                                 action                                  |       submitted_date
----------+------------------+---------+-------------------------------------------------------------------------+----------------------------
 18769749 | fiftyone-desktop | 0.28.1  | add py3 file fiftyone_desktop-0.28.1-py3-none-macosx_11_0_arm64.whl     | 2023-07-05 15:44:55.417118
 18769748 | fiftyone-desktop | 0.28.1  | remove file fiftyone_desktop-0.28.1-py3-none-macosx_11_10_arm64.whl     | 2023-07-05 15:44:57.439115 

@TechCiel
Copy link

TechCiel commented Jul 6, 2023

Manually resolved that for you.

Thank you @di , really appreciate that quick respond, but the problem is not resolved. Maybe manual update didn't purge the CDN cache?

> curl 'https://pypi.org/pypi/fiftyone-desktop/json' -I | grep x-pypi-last-serial
x-pypi-last-serial: 18769748

should be 18769749.

@di
Copy link
Member

di commented Jul 6, 2023

@TechCiel Good catch. I've issued a purge and this should now be resolved.

@TechCiel
Copy link

TechCiel commented Nov 6, 2023

Another package mpf is failing for 40+ hours, and I see no sign of self-recovery

2023-11-05 04:29:24,004 ERROR: Stale serial for package mpf - Attempt 3 (package.py:76)
2023-11-05 04:29:24,005 ERROR: Stale serial for mpf (20485085) not updating. Giving up. (package.py:82)
> curl 'https://pypi.org/pypi/mpf/json' -I | grep x-pypi-last-serial
x-pypi-last-serial: 20485084

@TechCiel
Copy link

Another package mpf is failing for 40+ hours, and I see no sign of self-recovery

Still.

@TechCiel
Copy link

TechCiel commented Feb 6, 2024

The mpf issue has been unintentionally resolved with an extra update of package.

Sadly, we now have another issue:

2024-02-05 20:19:26,441 ERROR: Stale serial for proteus-cli (21741396) not updating. Giving up. (package.py:82)

@TechCiel
Copy link

2024-04-16 10:36:39,262 ERROR: Stale serial for tccli (22781647) not updating. Giving up. (package.py:82)

New failure with the proteus-cli one still not resolved.

@epifeny
Copy link

epifeny commented May 19, 2024

Is the way to go about this to always just manually come here and request a purge? There is nothing on our client end that can assist with these issues?

bandersnatch.master.StalePage: Expected PyPI serial 14647759 for request https://pypi.org/pypi/pytranslation/json but got 13365137. We can no longer issue a PURGE. Report issue to PyPA Warehouse GitHub if it persists ...

@dstufft
Copy link
Member

dstufft commented May 31, 2024

With #13936 merged now (along with a few follow up PRs to fix some deadlocks), I think that the primary cause of this has been fixed now.

The tl;dr is that our mirroring relied on the serial to be a monotonically increasing integer, but due to the way PostgreSQL works, concurrent transactions could end up with serials being "out of order", and #13936 changes that so that transactions that generate new serial numbers are serialized behind what is effectively a mutex.

I'm going to close this now, but if anyone sees any new reports of this happening after today, we can re-open this issue.

@dstufft dstufft closed this as completed May 31, 2024
@TechCiel
Copy link

2024-09-24 10:36:00,003 ERROR: Error syncing package: github@25164033
 (mirror.py:377)
bandersnatch.master.StalePage: Expected PyPI serial 25164033 for requ
est https://pypi.org/pypi/github/json but got 25138863.

Likely but not sure, another instance of this problem, still ongoing. Could someone look into the database?

cc @dstufft

@ewdurbin
Copy link
Member Author

That's a bug for sure caused by a recent merge that causes journal to proceed but not purge for a project.

@TechCiel
Copy link

Could you help with purging the package while fixing the cause?

@ewdurbin
Copy link
Member Author

Purges are being issued shortly.

@ewdurbin
Copy link
Member Author

Should be cleared now. Thanks for the vigilance on this issue @TechCiel, let us know if any more crop up.

@TechCiel
Copy link

Should be cleared now.

It now shows x-pypi-last-serial: 14353030 ? 🤔 Should it be monotically increasing?

Thanks for the vigilance

That's because I maintain a PyPI mirror, and bandersnatch will fail the whole process in such cases.

@ewdurbin
Copy link
Member Author

Should be cleared now.

It now shows x-pypi-last-serial: 14353030 ? 🤔 Should it be monotically increasing?

14353030 is the last valid journal entry for the github project. The bug that was introduced associated an unrelated change with that project.

@TechCiel
Copy link

14353030 is the last valid journal entry for the github project. The bug that was introduced associated an unrelated change with that project.

I think increasing serial is kind of API promise of pypi.org, and bandersnatch is still expecting for 25164033. The XMLRPC changelog API could still be returning this serial.

@TechCiel
Copy link

bandersnatch.master.StalePage: Expected PyPI serial 25483353 for requ
est https://pypi.org/pypi/github/json but got 14353030.

The github package got bumped to a new serial and is failing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants