Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(go/adbc/driver/snowflake): use vectorized scanner for bulk ingest #2025

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

zeroshade
Copy link
Member

Closes #2005

@zeroshade zeroshade requested review from lidavidm and joellubi July 18, 2024 19:45
@github-actions github-actions bot added this to the ADBC Libraries 14 milestone Jul 18, 2024
Copy link
Member

@joellubi joellubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any effect on performance today, or will we need to wait until Snowflake releases the follow-up changes?

@@ -45,7 +45,7 @@ import (

const (
bindStageName = "ADBC$BIND"
createTemporaryStageStmt = "CREATE OR REPLACE TEMPORARY STAGE " + bindStageName + " FILE_FORMAT = (TYPE = PARQUET USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE)"
createTemporaryStageStmt = "CREATE OR REPLACE TEMPORARY STAGE " + bindStageName + " FILE_FORMAT = (TYPE = PARQUET USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE USE_VECTORIZED_SCANNER=TRUE REPLACE_INVALID_CHARACTERS = TRUE)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to set REPLACE_INVALID_CHARACTERS? I guess Snowflake recommends it "as a general rule of thumb" but I wonder why...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, maybe just to reduce failures caused by the characters? Not sure myself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a best practice as invalid UTF8 can cause query failures or unexpected results:
https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-1013-1014

@zeroshade
Copy link
Member Author

Does this have any effect on performance today, or will we need to wait until Snowflake releases the follow-up changes?

It depends on the configuration of the warehouse, some configurations (encryption, security settings, etc.) are not yet supported, while others are. So if your configuration is supported, you'll get benefits immediately while other configurations will need to wait until snowflake makes some more changes.

@zeroshade zeroshade merged commit 7f53b8a into apache:main Jul 23, 2024
38 of 42 checks passed
@zeroshade zeroshade deleted the snowflake-vectorized-scanner branch July 23, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

go/adbc/driver/snowflake: add use_vectorized_scanner flag to bulk ingest
4 participants