Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support StddevSamp with cast(col as double) for input #7891

Closed
viadea opened this issue Mar 15, 2023 · 2 comments · Fixed by #8207
Closed

[FEA] Support StddevSamp with cast(col as double) for input #7891

viadea opened this issue Mar 15, 2023 · 2 comments · Fixed by #8207
Assignees
Labels
feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Mar 15, 2023

I wish we can support StddevSamp with cast(col as double) for input.

Eg:

scala> spark.sql("select stddev_samp(cast(1 as double))").collect

It will show:

      !Expression <StddevSamp> stddev_samp(1.0) cannot run on GPU because input expression Literal 1.0 (DoubleType is not supported); expression StddevSamp stddev_samp(1.0) produces an unsupported type DoubleType
@viadea viadea added feature request New feature or request ? - Needs Triage Need team to review and classify labels Mar 15, 2023
@ttnghia
Copy link
Collaborator

ttnghia commented Mar 15, 2023

I just check the expression carefully. Indeed, we (both plugin + libcudf) don't support stddev in reduction context. It will only work in groupby and windowing.

In the long term, we should support it. While waiting, there is one simple workaround for the issue: Append a simple trivial keys column to the input (like an integer column with all 0 values), then do groupby on that keys column. For example:

scala> val df = Seq((1.0, 0),(2.0, 0)).toDF
df: org.apache.spark.sql.DataFrame = [_1: double, _2: int]

scala> df.groupBy("_2").agg(stddev("_1")).show
23/03/15 19:04:51 WARN GpuOverrides: 
*Exec <CollectLimitExec> will run on GPU
  *Partitioning <SinglePartition$> will run on GPU
  *Exec <HashAggregateExec> will run on GPU
    *Expression <AggregateExpression> stddev_samp(_1#164) will run on GPU
    ...

+---+---------------+
| _2|stddev_samp(_1)|
+---+---------------+
|  0|    0.707106781|
+---+---------------+

@ttnghia
Copy link
Collaborator

ttnghia commented Mar 28, 2023

Depends on #62.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants