Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of garbage collection job #4780

Closed
bjester opened this issue Oct 8, 2024 · 1 comment
Closed

Improve performance of garbage collection job #4780

bjester opened this issue Oct 8, 2024 · 1 comment

Comments

@bjester
Copy link
Member

bjester commented Oct 8, 2024

Background

We run the garbage_collect management command everyday via a Kubernetes cron job.

Observed behavior

The management command gets stuck running the following query. It was stuck running this for at least 12 hours in production, without any indication that it would complete.

UPDATE "contentcuration_contentnode" 
SET "parent_id" = '00000000000000000000000000000000' 
WHERE (
  SELECT U0."id" 
  FROM "contentcuration_channel" U0 
  INNER JOIN "contentcuration_contentnode" U1 ON (U0."main_tree_id" = U1."id") 
  WHERE U1."tree_id" = "contentcuration_contentnode"."tree_id" LIMIT 1
) IN (
  '<REDACTED>', 
  '<REDACTED>', 
  '<REDACTED>', 
  '<REDACTED>'
)

Expected behavior

The management command and the queries it produces are optimized for our large Studio database. It should avoid queries that would require the subquery for each node in the result set.

User-facing consequences

Since the queries should occur within a transaction, a very-long running transaction modifying the content node table can cause issues and timeouts across studio, leading to indirect Sentry errors.

Additionally, I had to manually kill the query in Cloud SQL.

Steps to reproduce the issue

  1. Restored production database to hotfixes environment
  2. Manually started the job
  3. Observed running queries
@bjester
Copy link
Member Author

bjester commented Nov 14, 2024

Running the fix #4808 on the hotfixes server has easily proceeded past the previous point of failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants