Optimize column altering checks #29

pnadolny13 · 2023-06-02T14:35:00Z

Currently theres a lot of wasted time during startup where the target is running lots of queries to check the existing schema against the schema message it received in case it needs to make alterations. One reason this is slow is because it iterates every column and either:

Creates a new column https://github.com/meltano/sdk/blob/4eaca03af29fccfe04a28d5a0eb125199eb9fde9/singer_sdk/connectors/sql.py#L723
Alters an existing column https://github.com/meltano/sdk/blob/4eaca03af29fccfe04a28d5a0eb125199eb9fde9/singer_sdk/connectors/sql.py#L1070

In doing this its requesting ddl to be generated then executes it in a serial loop.

An optimization to speed this up would be to bring the execution up to the prepare_table method after the for loop. The prepare_column method would instead return the ddl instead of executing it, so at the prepare_table level it can send one large script.

@edgarrmondragon what do you think about this?

The text was updated successfully, but these errors were encountered:

pnadolny13 · 2023-06-02T14:38:51Z

Theres another optimization approach that attempts to skip the alter queries completely. I think right now we're running alter queries without checking if theyre needed. When a table exists we could instead request the table metadata then for each column we diff the existing column metadata to what we'd create if it didnt exist, effectively skipping all queries for columns that dont need changes.

pnadolny13 · 2023-06-12T22:31:55Z

I noticed that target-postgres avoids this issue by not allowing column altering. This could be a short term fix but ideally we'd make changes to optimize the process and still allow altering.

pnadolny13 · 2023-06-13T23:39:10Z

Comment in #57 (comment) about how performance issues were improved but not fixed.

MeltyBot added this to MeltanoLabs Overview Jun 2, 2023

pnadolny13 added this to Data Team Jun 2, 2023

pnadolny13 removed this from Data Team Jun 6, 2023

This was referenced Jun 8, 2023

chore: uncomment SnowflakeTargetSchemaUpdates test #45

Merged

migrate to meltanolabs target snowflake meltano/squared#647

Merged

pnadolny13 added this to Data Team Jun 12, 2023

pnadolny13 mentioned this issue Jun 13, 2023

fix: optimize tables/schema operations #57

Merged

pnadolny13 closed this as completed in 5693b6f Jun 14, 2023

github-project-automation bot moved this to Done in MeltanoLabs Overview Jun 14, 2023

github-project-automation bot moved this to Planned in Data Team Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize column altering checks #29

Optimize column altering checks #29

pnadolny13 commented Jun 2, 2023 •

edited

Loading

pnadolny13 commented Jun 2, 2023 •

edited

Loading

pnadolny13 commented Jun 12, 2023

pnadolny13 commented Jun 13, 2023

Optimize column altering checks #29

Optimize column altering checks #29

Comments

pnadolny13 commented Jun 2, 2023 • edited Loading

pnadolny13 commented Jun 2, 2023 • edited Loading

pnadolny13 commented Jun 12, 2023

pnadolny13 commented Jun 13, 2023

pnadolny13 commented Jun 2, 2023 •

edited

Loading

pnadolny13 commented Jun 2, 2023 •

edited

Loading