Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bigframes.bigquery.create_vector_index to assist in creating vector index on ARRAY<FLOAT64> columns #1024

Merged
merged 18 commits into from
Oct 18, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Sep 26, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 365541165 🦕

…ing vector index on `ARRAY<FLOAT64>` columns
@tswast tswast requested review from a team as code owners September 26, 2024 20:54
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Sep 26, 2024
@GarrettWu GarrettWu removed their assignment Sep 26, 2024
create = "CREATE VECTOR INDEX IF NOT EXISTS "

if len(stored_column_names) > 0:
escaped_stored = [f"`{name}`" for name in stored_column_names]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use our existing sql utils to escape the ids? This will also escape special characters

)

return f"""
{create} {index_name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should escape index_name too?


return f"""
{create} {index_name}
ON `{table_name}`(`{column_name}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets reuse existing escaping logic

Comment on lines 59 to 60
if index_name is None:
index_name = table_id.split(".")[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would reading into a TableReference be more robust?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure it's exactly the same logic there, but makes sense from the perspective of having one place for that logic (in case "." is allowed as a character in the table name someday).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit (along with identifier logic).

@tswast tswast requested a review from TrevorBergeron October 16, 2024 13:37
@tswast tswast merged commit 863d694 into main Oct 18, 2024
16 of 23 checks passed
@tswast tswast deleted the b365541165-create_vector_index branch October 18, 2024 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants