Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

Closed
jamesbursa opened this issue May 9, 2024 · 0 comments · Fixed by #1981 or #1984
Closed

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

jamesbursa opened this issue May 9, 2024 · 0 comments · Fixed by #1981 or #1984
Assignees
Labels
project: grants.gov Grants.gov Modernization tickets

Comments

@jamesbursa
Copy link
Collaborator

Summary

Running the load in prod failed for some tables with an Oracle error. See #1956.

psycopg.errors.FdwUnableToCreateExecution: error fetching result: OCIStmtFetch2 failed to fetch next result row
DETAIL: ORA-01555: snapshot too old: rollback segment number 21 with name "_SYSSMU21_2881811878$" too small

The solution is to limit the size of each insert and update by dividing it into chunks.

Acceptance criteria

  • load_data_for_table() and related code modified to load data in chunks
  • Verified successful load in prod
@jamesbursa jamesbursa added the project: grants.gov Grants.gov Modernization tickets label May 9, 2024
@jamesbursa jamesbursa added this to the Search API - ELT Implementation milestone May 9, 2024
@jamesbursa jamesbursa moved this from Icebox to In Progress in Simpler.Grants.gov Product Backlog May 9, 2024
@jamesbursa jamesbursa self-assigned this May 9, 2024
jamesbursa added a commit that referenced this issue May 10, 2024
## Summary
Fixes #1978

## Changes proposed
- Limit the query which inserts new rows to 4000 by default.
- Repeat it until there are no more new rows in the source table.

## Context for reviewers
If there are many new rows in a large table (for example the initial
load for `tsynopsis`), an Oracle error occurs when preparing the data.
To prevent this, process a maximum number of rows at a time until all
new rows are processed.

## Additional information
Tested locally but will need to be tested in prod with the real source
data and database.
jamesbursa added a commit that referenced this issue May 10, 2024
## Summary
Fixes #1978

## Changes proposed
- Optimize chunked load further by moving chunking logic from database
to PostgreSQL.

## Context for reviewers
Instead of using `LIMIT` to carry out chunking in PostgreSQL, read the
full set of ids as a first step, then issue a series of INSERT / UPDATE
queries.

This is expected to be faster. With the previous method, the PostgreSQL
optimizer did not do an ideal plan, and did a full read of all rows and
columns from the Oracle database. By splitting the query, we can do a
read of only the id columns for the new or updated rows. Then additional
queries select only the rows that have changed.

## Additional information
N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: grants.gov Grants.gov Modernization tickets
Projects
1 participant