Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC Can delete that are not supposed to be deleted. #20547

Closed
Vad1mo opened this issue Jun 4, 2024 · 5 comments
Closed

GC Can delete that are not supposed to be deleted. #20547

Vad1mo opened this issue Jun 4, 2024 · 5 comments
Assignees

Comments

@Vad1mo
Copy link
Member

Vad1mo commented Jun 4, 2024

Expected behavior and actual behavior:
We have observed that a few times already, In some cases the GC might delete images that were not scheduled to be deleted.
The result is that information is present in Harbor UP and DB but not in S3.

It is also observable in the GC Logs that the manifest was deleted.

Steps to reproduce the problem:

In the UI the image is still visible
image

The image manifest sha is start with d4f9a6cf

#GC LOG
2024-06-04T04:13:09Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:238]: blob eligible for deletion: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:366]: [108/1438] delete blob from storage: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585
2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:395]: [108/1438] delete blob record from database: 5040, sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

trying to pull this image results in manifest unknown instead of not found if the images don't exist anymore.

We observed that during the GC run there have been some other operations going on DB level, indicating the we have run out of DB connections. 

```sh
2024-06-04T04:22:39Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.Handler of topic PUSH_ARTIFACT: failed to connect to `host=harbor-pg-database user=harbor database=harbor`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))

Versions:

  • harbor version: 2.7.x, 2.9.x, 2.10.

Additional context:

  • Log files: No other errors in the logs besides DB SQLSTATE 53300
  • GC Job completed successfully
@Vad1mo
Copy link
Member Author

Vad1mo commented Jun 4, 2024

maybe related to beego/beego#5255 resolved by #20452

@wy65701436
Copy link
Contributor

Similar issue: #19401

@wy65701436 wy65701436 self-assigned this Jun 5, 2024
@wy65701436
Copy link
Contributor

wy65701436 commented Jun 5, 2024

The issue may be caused by the beego ORM, as it doesn't carry errors during data scanning. In some extreme cases, such as when a connection is out of use, the ORM returns incorrect data, leading to wrong blob deletion candidates. We're working on upgrading Beego with this pull request. #20555

To mitigate the issue, you can now schedule garbage collection during low usage time slots.

@zyyw
Copy link
Contributor

zyyw commented Jun 16, 2024

maybe related to beego/beego#5255 resolved by #20452

We have this PR for it:

Copy link

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Aug 15, 2024
@Vad1mo Vad1mo mentioned this issue Aug 15, 2024
5 tasks
@Vad1mo Vad1mo closed this as completed Aug 15, 2024
@github-project-automation github-project-automation bot moved this from Issues to Completed in GC Improvement Activities Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Completed
Development

No branches or pull requests

3 participants