Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Add individual command logging for index/constraint application step #752

Closed
1 task
AetherUnbound opened this issue Jun 10, 2022 · 4 comments · Fixed by #753
Closed
1 task

Add individual command logging for index/constraint application step #752

AetherUnbound opened this issue Jun 10, 2022 · 4 comments · Fixed by #753
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 💾 tech: postgres Requires familiarity with PostgreSQL 🐍 tech: python Requires familiarity with Python

Comments

@AetherUnbound
Copy link
Contributor

Description

As part of the investigation around the API response time jumping dramatically during a data refresh, we identified that the index & constraint application step specifically was causing the API degradation. I suspect that there may be a single index/constraint application that is causing the slowness, rather than the entire set of ALTERs. In order to test this hypothesis, we need accurate logging for when each step is run.

Specifically, these two query executions should be split up into the individual alters, and each one should be logged when it is run:

downstream_cur.execute(";\n".join(create_indices))

downstream_cur.execute(remap_constraints)

Additional context

Similar logging was added for the go-live step in #415

Implementation

  • 🙋 I would be interested in implementing this feature.
@AetherUnbound AetherUnbound added 🐍 tech: python Requires familiarity with Python 💻 aspect: code Concerns the software code in the repository 💾 tech: postgres Requires familiarity with PostgreSQL 🟧 priority: high Stalls work on the project or its dependents 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Jun 10, 2022
@sarayourfriend
Copy link
Contributor

Specifically, these two query executions should be split up into the individual alters, and each one should be logged when it is run:

Please log when it is run and when it finishes, maybe even log the delta if possible just so it's nice and easy to diagnose if there's an issue there 🙂

@AetherUnbound
Copy link
Contributor Author

Ah, they're run immediately after each other, so I was thinking the logs in CloudWatch would end up looking something like this:

2022-05-27 05:23:00,000 INFO ingest.py:356 - ALTER <column> ADD INDEX <foo>
2022-05-27 05:23:10,000 INFO ingest.py:356 - ALTER <column> ADD INDEX <bar>
2022-05-27 05:23:20,000 INFO ingest.py:356 - ALTER <column> ADD INDEX <baz>
2022-05-27 05:23:30,000 INFO ingest.py:356 - ALTER <column> ADD CONSTRAINT <zip>

Do you think that'd be sufficient? That would give us the exact indications of when each one starts, and since these are run serially the end can be assumed to just before the next line starts.

@sarayourfriend
Copy link
Contributor

I guess for all except the last one?

@AetherUnbound
Copy link
Contributor Author

These are the only steps run after before another log line appears:

_update_progress(progress, 70.0)
# Step 7: Recreate constraints from the original table
log.info("Done creating indices! Remapping constraints...")

_update_progress just sets a value:

def _update_progress(progress, new_value):
if progress:
progress.value = new_value

I feel like that gives us a good enough time bounding. If anything though, I can swap the log line and the _update_progress one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 💾 tech: postgres Requires familiarity with PostgreSQL 🐍 tech: python Requires familiarity with Python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants