-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task]: run load in chunks to prevent Oracle error ORA-01555 #1978
Labels
project: grants.gov
Grants.gov Modernization tickets
Comments
github-project-automation
bot
moved this to Icebox
in Simpler.Grants.gov Product Backlog
May 9, 2024
jamesbursa
added a commit
that referenced
this issue
May 10, 2024
jamesbursa
added a commit
that referenced
this issue
May 10, 2024
## Summary Fixes #1978 ## Changes proposed - Limit the query which inserts new rows to 4000 by default. - Repeat it until there are no more new rows in the source table. ## Context for reviewers If there are many new rows in a large table (for example the initial load for `tsynopsis`), an Oracle error occurs when preparing the data. To prevent this, process a maximum number of rows at a time until all new rows are processed. ## Additional information Tested locally but will need to be tested in prod with the real source data and database.
github-project-automation
bot
moved this from In Progress
to Done
in Simpler.Grants.gov Product Backlog
May 10, 2024
jamesbursa
added a commit
that referenced
this issue
May 10, 2024
jamesbursa
added a commit
that referenced
this issue
May 10, 2024
## Summary Fixes #1978 ## Changes proposed - Optimize chunked load further by moving chunking logic from database to PostgreSQL. ## Context for reviewers Instead of using `LIMIT` to carry out chunking in PostgreSQL, read the full set of ids as a first step, then issue a series of INSERT / UPDATE queries. This is expected to be faster. With the previous method, the PostgreSQL optimizer did not do an ideal plan, and did a full read of all rows and columns from the Oracle database. By splitting the query, we can do a read of only the id columns for the new or updated rows. Then additional queries select only the rows that have changed. ## Additional information N/A
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Running the load in
prod
failed for some tables with an Oracle error. See #1956.The solution is to limit the size of each insert and update by dividing it into chunks.
Acceptance criteria
load_data_for_table()
and related code modified to load data in chunksThe text was updated successfully, but these errors were encountered: