Reduce queries for published graphs when indexing #11513

jacobtylerwalls · 2024-10-01T00:32:30Z

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

Before, when indexing resources there were 2 queries per graph to get the published graph in the active language. One for the GraphXPublishedGraph row, one for the PublishedGraph row. This was then cached in memory, persisting either for a short period (on the CLI) or potentially forever for server-side operations, which could lead to retrieving stale data.

Now, these are prefetched during optimize_resource_iteration, so that it takes 2 queries total per chunk, no matter how many graphs are involved. A typical chunk size is 500 (2000 // 8), so if two adjacent chunks each contain 500 resources of the same graph, then this might slightly increase queries, because we are no longer perpetually caching, but with a benefit of less memory overhead and better correctness for server-side operations--but in all other cases this should reduce queries by avoiding repetitive published graph queries.

Checklist

I targeted one of these branches:
- dev/8
I added a changelog in arches/releases
I submitted a PR to arches-docs (if appropriate)
Unit tests pass locally with my changes
I added tests that prove my fix is effective or that my feature works
My test fails on the target branch

Ticket Background

Sponsored by: Farallon
Found by: @chiatt
Tested by: @jacobtylerwalls

arches/app/utils/index_database.py

jacobtylerwalls · 2024-10-17T14:59:27Z

After pairing with @apeters on this, I think we're going to go in the direction of:

pulling this query for published graphs out of the resource batch iteration and do it earlier, passing the result as an arg to index_resources_using_singleprocessing

Reduce queries for published graphs when indexing

037bf7a

jacobtylerwalls requested a review from chiatt October 1, 2024 00:32

jacobtylerwalls added the Subject: Performance label Oct 1, 2024

jacobtylerwalls commented Oct 1, 2024

View reviewed changes

arches/app/utils/index_database.py Show resolved Hide resolved

jacobtylerwalls requested review from apeters and removed request for chiatt October 15, 2024 22:41

jacobtylerwalls marked this pull request as draft October 17, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce queries for published graphs when indexing #11513

Reduce queries for published graphs when indexing #11513

jacobtylerwalls commented Oct 1, 2024

jacobtylerwalls commented Oct 17, 2024

Reduce queries for published graphs when indexing #11513

Are you sure you want to change the base?

Reduce queries for published graphs when indexing #11513

Conversation

jacobtylerwalls commented Oct 1, 2024

Types of changes

Description of Change

Checklist

Ticket Background

jacobtylerwalls commented Oct 17, 2024