[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

jamesbursa · 2024-05-09T21:28:51Z

Summary

Running the load in prod failed for some tables with an Oracle error. See #1956.

psycopg.errors.FdwUnableToCreateExecution: error fetching result: OCIStmtFetch2 failed to fetch next result row
DETAIL: ORA-01555: snapshot too old: rollback segment number 21 with name "_SYSSMU21_2881811878$" too small

The solution is to limit the size of each insert and update by dividing it into chunks.

Acceptance criteria

load_data_for_table() and related code modified to load data in chunks
Verified successful load in prod

The text was updated successfully, but these errors were encountered:

## Summary Fixes #1978 ## Changes proposed - Limit the query which inserts new rows to 4000 by default. - Repeat it until there are no more new rows in the source table. ## Context for reviewers If there are many new rows in a large table (for example the initial load for `tsynopsis`), an Oracle error occurs when preparing the data. To prevent this, process a maximum number of rows at a time until all new rows are processed. ## Additional information Tested locally but will need to be tested in prod with the real source data and database.

## Summary Fixes #1978 ## Changes proposed - Optimize chunked load further by moving chunking logic from database to PostgreSQL. ## Context for reviewers Instead of using `LIMIT` to carry out chunking in PostgreSQL, read the full set of ids as a first step, then issue a series of INSERT / UPDATE queries. This is expected to be faster. With the previous method, the PostgreSQL optimizer did not do an ideal plan, and did a full read of all rows and columns from the Oracle database. By splitting the query, we can do a read of only the id columns for the new or updated rows. Then additional queries select only the rows that have changed. ## Additional information N/A

jamesbursa added the project: grants.gov Grants.gov Modernization tickets label May 9, 2024

github-project-automation bot added this to Simpler.Grants.gov Product Backlog May 9, 2024

github-project-automation bot moved this to Icebox in Simpler.Grants.gov Product Backlog May 9, 2024

jamesbursa added this to the Search API - ELT Implementation milestone May 9, 2024

jamesbursa moved this from Icebox to In Progress in Simpler.Grants.gov Product Backlog May 9, 2024

jamesbursa self-assigned this May 9, 2024

jamesbursa added a commit that referenced this issue May 10, 2024

[Issue #1978] make inserts in chunks to prevent Oracle error

fb7834a

jamesbursa mentioned this issue May 10, 2024

[Issue #1978] make inserts in chunks to prevent Oracle error #1981

Merged

jamesbursa closed this as completed in #1981 May 10, 2024

github-project-automation bot moved this from In Progress to Done in Simpler.Grants.gov Product Backlog May 10, 2024

jamesbursa added a commit that referenced this issue May 10, 2024

[Issue #1978] further optimizations for load chunking

d893f22

jamesbursa mentioned this issue May 10, 2024

[Issue #1978] further optimizations for load chunking #1984

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

jamesbursa commented May 9, 2024

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978

Comments

jamesbursa commented May 9, 2024

Summary

Acceptance criteria