Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark): add some numeric function mappings #317

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

andrew-coleman
Copy link
Contributor

No description provided.

Copy link
Contributor

@Blizzara Blizzara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Approved but with a note on the StddevSamp

@@ -77,7 +97,8 @@ class FunctionMappings {
s[Min]("min"),
s[Max]("max"),
s[First]("any_value"),
s[HyperLogLogPlusPlus]("approx_count_distinct")
s[HyperLogLogPlusPlus]("approx_count_distinct"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tehcnically the "approx_count_distinct" says it's HyperLogLog while this is HLL++, but given the ++ should just be better that's probably fine!

@@ -77,7 +97,8 @@ class FunctionMappings {
s[Min]("min"),
s[Max]("max"),
s[First]("any_value"),
s[HyperLogLogPlusPlus]("approx_count_distinct")
s[HyperLogLogPlusPlus]("approx_count_distinct"),
s[StddevSamp]("std_dev")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrait's std_dev needs an option to specify if it's sample or population based: https://github.com/substrait-io/substrait/blob/9cccb04fba336489b70ed42b71f73a0a1e34f9f5/extensions/functions_arithmetic.yaml#L1335. Spark has both, as StddevSamp and StddevPop.

Still I think this is fine to merge liek this for now, just noting that plans produced without the option may not work as expected elsewhere or in future with Spark.

I have some code for handling the options which should work here, I'll try to push it up soon.

"q70", "q71", "q73", "q76", "q77", "q79",
"q80", "q81", "q82", "q85", "q86", "q87", "q88",
"q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
val failingSQL: Set[String] = Set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@andrew-coleman
Copy link
Contributor Author

@Blizzara @vbarua can this one be merged now? Thanks :)

@Blizzara Blizzara merged commit 6bb46ac into substrait-io:main Nov 26, 2024
13 checks passed
@Blizzara
Copy link
Contributor

@Blizzara @vbarua can this one be merged now? Thanks :)

Ah yep, sorry for the delay!

@andrew-coleman andrew-coleman deleted the numeric_functions branch November 26, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants