Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] explode() can take expressions that generate arrays #1125

Closed
wjxiz1992 opened this issue Nov 16, 2020 · 3 comments
Closed

[FEA] explode() can take expressions that generate arrays #1125

wjxiz1992 opened this issue Nov 16, 2020 · 3 comments
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P1 Nice to have for release

Comments

@wjxiz1992
Copy link
Collaborator

wjxiz1992 commented Nov 16, 2020

Is your feature request related to a problem? Please describe.
This feature comes from the example in #1079.

    val myudf: (String) => Array[String] = a => {
      a.split(",")
    }
    val u = makeUdf(myudf)
    val dataset = List("first,second").toDF("x").repartition(1)
    var result = dataset.withColumn("new", explode(u(col("x"))))
    result.explain(true)

explode() should be able to deal with "created arrays" but current explain shows:

    *Exec <ProjectExec> could run on GPU
      !Exec <GenerateExec> cannot run on GPU because Only posexplode of a created array is currently supported

Additional context
This feature needs the fix for #1079 first.

@wjxiz1992 wjxiz1992 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 16, 2020
@revans2
Copy link
Collaborator

revans2 commented Nov 16, 2020

Out explode and pos_explode implementation is lacking right now. This is due to limitations in cudf.

rapidsai/cudf#2975 and rapidsai/cudf#6151 are what we need to be able to start to tackle this. We might be able to hack something together sooner if the time frame is tight, but that I am even a little skeptical of because we don't have a length for lists implementation yet.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 17, 2020
@sameerz sameerz added the P1 Nice to have for release label Nov 24, 2020
@sameerz sameerz added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 18, 2021
@GaryShen2008
Copy link
Collaborator

@wjxiz1992 Could you help to verify it on 0.6 since #2215 has been merged?

@wjxiz1992
Copy link
Collaborator Author

fixed:

+- GpuGenerate gpuexplode(split(x#5, ,, -1)), [x#5], false, [new#8]
   +- GpuShuffleCoalesce 2147483647

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P1 Nice to have for release
Projects
None yet
Development

No branches or pull requests

5 participants