Add index metadata for PostgreSQL #144

peterthesling · 2021-02-15T17:24:58Z

This PR adds the ability to scrape index metadata of a PostgreSQL instance, like this:

Structure of the code:

I add add_indexes to configure_bigquery_extractor, which adds index information to the table metadata.
add_indexes adds PostgresIndexExtractor as an additional extractor. This is done via a dict, meaning it will be easy to add a e.g. BigQueryIndexExtractor.
PostgresIndexExtractor inherits from IndexExtractor and expands on IndexExtractor by a SQL query that gets all the index information we need, e.g. a list of indexes incl. name & which columns are included in an index.
IndexExtractor itself contains mostly boilerplate code to initiate a connection and interact with the iterator and SQLAlchemyEngine.
Inside WhaleLoader PostgresIndexExtractor is run, since it was added to the list of extractors through add_indexes.
IndexMetadata is the index class, which stores all information contained inside an index.
Similar to Moving formatting of columns and table to corresponding classes and outside of WhaleLoader #143 I added a format_for_markdown function to IndexMetadata, where the index is transformed into a string like [unique] `index_name` [`column_1`, `column_2`]

What will be left for future PRs:

I think it would be helpful to also add information if an index is a primary key.
This PR only adds the ability for PostgreSQL to see indexes. All other SQL dialects should be worked on next.

rsyi · 2021-02-25T06:47:28Z

pipelines/whale/utils/extractor_wrappers.py

@@ -52,6 +55,25 @@ def add_ugc_runner(extractors: list, conf: ConfigTree, connection):
    return extractors, conf


+def add_indexes(extractors: list, conf: ConfigTree, connection):


Ha. Thanks for following my pattern here - it is a bit weird though, isn't it? Probably would make sense to refactor at some point...

Well, I am sure it could be improved, but I don't think it should block us from making progress. Let me see if I can come up with a refactor.

Oh no didn't mean to block the review. Just got bogged down checking things before merging :)

rsyi · 2021-02-25T16:34:07Z

Okay @peterthesling one thing we should discuss - running this multiple times against a single index just appends those indices, which means that the index list just gets longer and longer with each pull.

My gut says the best way to get around this is to modify the extractor to return a IndexSet object instead, then process this all at once in the loader, overwriting the previous indices. Hackier options probably also exist.

Let me know what you think @peterthesling. :) I'm happy to take this on, btw, just let me know.

codecov-io · 2021-02-27T23:36:39Z

Codecov Report

Merging #144 (1c7b95a) into master (6760c72) will increase coverage by 22.00%.
The diff coverage is 98.46%.

@@             Coverage Diff             @@
##           master     #144       +/-   ##
===========================================
+ Coverage   53.67%   75.68%   +22.00%     
===========================================
  Files          49       38       -11     
  Lines        2541     2044      -497     
===========================================
+ Hits         1364     1547      +183     
+ Misses       1177      497      -680

Impacted Files	Coverage Δ
pipelines/whale/utils/parsers.py	`94.80% <66.66%> (-1.15%)`	⬇️
...elines/whale/extractor/postgres_index_extractor.py	`95.65% <95.65%> (ø)`
pipelines/whale/extractor/base_index_extractor.py	`100.00% <100.00%> (ø)`
pipelines/whale/loader/whale_loader.py	`64.00% <100.00%> (+4.38%)`	⬆️
pipelines/whale/models/index_metadata.py	`100.00% <100.00%> (ø)`
pipelines/whale/utils/extractor_wrappers.py	`75.53% <100.00%> (+2.10%)`	⬆️
pipelines/whale/utils/markdown_delimiters.py	`100.00% <100.00%> (ø)`
cli/src/filesystem.rs
cli/src/skimmer.rs
cli/src/warehouse/buildscript.rs
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2c3da7...1c7b95a. Read the comment docs.

rsyi · 2021-02-28T06:38:25Z

pipelines/whale/loader/whale_loader.py

-    if not sections[INDEX_SECTION]:
-        sections[INDEX_SECTION] = INDEX_DELIMITER + "\n"
-
+    sections[INDEX_SECTION] = INDEX_DELIMITER + "\n"


Ah this almost works, except if there are multiple indices, only the last one shows up :/

🤦 Will take another look today.

rsyi · 2021-03-01T23:57:12Z

Love it! Thanks, Peter. :) Merging now!

peterthesling changed the title ~~Add index meta data for Postgres~~ Add index metadata for Postgres Feb 15, 2021

peterthesling changed the title ~~Add index metadata for Postgres~~ Add index metadata for PostgreSQL Feb 15, 2021

rsyi reviewed Feb 25, 2021

View reviewed changes

rsyi reviewed Feb 28, 2021

View reviewed changes

Adding index meta data for Postgres

2e27bb3

peterthesling force-pushed the pthesling/index-info-132 branch 2 times, most recently from aec05cb to d7c84e4 Compare February 28, 2021 23:34

Avoid writing indexes multiple times

ed41557

peterthesling force-pushed the pthesling/index-info-132 branch from d7c84e4 to ed41557 Compare February 28, 2021 23:49

rsyi merged commit ce95bba into rsyi:master Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index metadata for PostgreSQL #144

Add index metadata for PostgreSQL #144

peterthesling commented Feb 15, 2021 •

edited

Loading

rsyi Feb 25, 2021

peterthesling Feb 25, 2021

rsyi Feb 25, 2021

rsyi commented Feb 25, 2021

codecov-io commented Feb 27, 2021

rsyi Feb 28, 2021

peterthesling Feb 28, 2021

rsyi commented Mar 1, 2021

		@@ -52,6 +55,25 @@ def add_ugc_runner(extractors: list, conf: ConfigTree, connection):
		return extractors, conf


		def add_indexes(extractors: list, conf: ConfigTree, connection):

Add index metadata for PostgreSQL #144

Add index metadata for PostgreSQL #144

Conversation

peterthesling commented Feb 15, 2021 • edited Loading

Structure of the code:

What will be left for future PRs:

rsyi Feb 25, 2021

Choose a reason for hiding this comment

peterthesling Feb 25, 2021

Choose a reason for hiding this comment

rsyi Feb 25, 2021

Choose a reason for hiding this comment

rsyi commented Feb 25, 2021

codecov-io commented Feb 27, 2021

Codecov Report

rsyi Feb 28, 2021

Choose a reason for hiding this comment

peterthesling Feb 28, 2021

Choose a reason for hiding this comment

rsyi commented Mar 1, 2021

peterthesling commented Feb 15, 2021 •

edited

Loading