-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add index metadata for PostgreSQL #144
Conversation
@@ -52,6 +55,25 @@ def add_ugc_runner(extractors: list, conf: ConfigTree, connection): | |||
return extractors, conf | |||
|
|||
|
|||
def add_indexes(extractors: list, conf: ConfigTree, connection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha. Thanks for following my pattern here - it is a bit weird though, isn't it? Probably would make sense to refactor at some point...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I am sure it could be improved, but I don't think it should block us from making progress. Let me see if I can come up with a refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no didn't mean to block the review. Just got bogged down checking things before merging :)
Okay @peterthesling one thing we should discuss - running this multiple times against a single index just appends those indices, which means that the index list just gets longer and longer with each pull. My gut says the best way to get around this is to modify the extractor to return a Let me know what you think @peterthesling. :) I'm happy to take this on, btw, just let me know. |
Codecov Report
@@ Coverage Diff @@
## master #144 +/- ##
===========================================
+ Coverage 53.67% 75.68% +22.00%
===========================================
Files 49 38 -11
Lines 2541 2044 -497
===========================================
+ Hits 1364 1547 +183
+ Misses 1177 497 -680
Continue to review full report at Codecov.
|
if not sections[INDEX_SECTION]: | ||
sections[INDEX_SECTION] = INDEX_DELIMITER + "\n" | ||
|
||
sections[INDEX_SECTION] = INDEX_DELIMITER + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this almost works, except if there are multiple indices, only the last one shows up :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Will take another look today.
aec05cb
to
d7c84e4
Compare
d7c84e4
to
ed41557
Compare
Love it! Thanks, Peter. :) Merging now! |
This PR adds the ability to scrape index metadata of a PostgreSQL instance, like this:
Structure of the code:
add_indexes
toconfigure_bigquery_extractor
, which adds index information to the table metadata.add_indexes
addsPostgresIndexExtractor
as an additional extractor. This is done via a dict, meaning it will be easy to add a e.g.BigQueryIndexExtractor
.PostgresIndexExtractor
inherits fromIndexExtractor
and expands onIndexExtractor
by a SQL query that gets all the index information we need, e.g. a list of indexes incl. name & which columns are included in an index.IndexExtractor
itself contains mostly boilerplate code to initiate a connection and interact with the iterator andSQLAlchemyEngine
.WhaleLoader
PostgresIndexExtractor
is run, since it was added to the list of extractors throughadd_indexes
.IndexMetadata
is the index class, which stores all information contained inside an index.format_for_markdown
function toIndexMetadata
, where the index is transformed into a string like[unique] `index_name` [`column_1`, `column_2`]
What will be left for future PRs: