Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limited support of lamdas in Series.apply #345

Merged
merged 18 commits into from
Feb 12, 2024

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Jan 24, 2024

BEGIN_COMMIT_OVERRIDE
feat: limited support of lambdas in Series.apply (#345)
END_COMMIT_OVERRIDE

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated https://screenshot.googleplex.com/6ZEiKXPz8LWMTRf

Partially fixes internal issue 295964341 🦕

@shobsi shobsi requested review from a team as code owners January 24, 2024 02:40
@shobsi shobsi requested a review from stevewalker-de January 24, 2024 02:40
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jan 24, 2024
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Jan 25, 2024
Comment on lines 1155 to 1159
There is a limited support of simple functions and lambdas which can be
operated directly (without converting into a `remote_function`) on the
BigQuery DataFrames objects. This approach takes advantage of a nuance
in the way BigQuery DataFrames objects are modeled internally and works
only if the function body contains only arithmatic or logical operators.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we rephrase this as "ufunc" emulation support, as defined in the pandas docs. https://pandas.pydata.org/docs/reference/api/pandas.Series.apply.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL.

bigframes/core/compile/scalar_op_compiler.py Outdated Show resolved Hide resolved
)

if not hasattr(func, "bigframes_remote_function"):
return func(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's catch exceptions here and if there's a "message" attribute, append a suggestion to try remote_function, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

# supported on a Series. Let's guide the customer to use a
# remote function instead
if hasattr(ex, "message"):
ex.message += "\n{_remote_function_recommendation_message}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an f string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, corrected.


if not hasattr(func, "bigframes_remote_function"):
try:
return func(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it may result in incorrect values if this isn't a true vectorized function, let's check for by_row=False. If by_row="compat" (default) then raise and suggest either remote_function or by_row=False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Feb 9, 2024
# It is not a remote function
# Then it must be a vectorized function that applies to the Series
# as a whole
assert (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError is a bit of an odd one to raise. Usually that means some invariant that we never expect to happen has violated. Please raise ValueError instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL, thanks.

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, thanks!

@shobsi shobsi added the automerge Merge the pull request once unit tests and other checks pass. label Feb 12, 2024
@gcf-merge-on-green gcf-merge-on-green bot merged commit 208e081 into main Feb 12, 2024
14 of 15 checks passed
@gcf-merge-on-green gcf-merge-on-green bot deleted the shobs-allow-lambdas branch February 12, 2024 23:16
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants