Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library Request: cuDF + RAPIDS #8318

Closed
4 tasks done
AlexCatarino opened this issue Sep 11, 2024 · 2 comments · Fixed by #8455
Closed
4 tasks done

Library Request: cuDF + RAPIDS #8318

AlexCatarino opened this issue Sep 11, 2024 · 2 comments · Fixed by #8455

Comments

@AlexCatarino
Copy link
Member

AlexCatarino commented Sep 11, 2024

cuDF (pronounced "KOO-dee-eff") is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

Test:

import cudf

tips_df = cudf.read_csv("https://github.com/plotly/datasets/raw/master/tips.csv")
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100

# display average tip by dining party size
print(tips_df.groupby("size").tip_percentage.mean())

Gives us:

No module named 'cudf'

EDIT: We need to install RAPIDS too.

Checklist

  • I have completely filled out this template
  • I have confirmed that this issue exists on the current master branch
  • I have confirmed that this is not a duplicate issue by searching issues
  • I have provided detailed steps to reproduce the issue
@beckernick
Copy link

beckernick commented Sep 18, 2024

Hi! I came across this issue due to the cuDF reference. I work on cuDF and other RAPIDS projects at NVIDIA.

In addition to being a GPU library, cuDF can provide zero code change GPU-acceleration for pandas and (as of yesterday) Polars.

%load_ext cudf.pandas # or via command line for Python scripts

df = pd.read_parquet(filepath)

(df[["Registration State", "Violation Description"]]
 .value_counts()
 .groupby("Registration State")
 .head()
 .sort_index()
)
import polars as pl

ldf = pl.LazyFrame({"a": [1.242, 1.535]})

print(
    ldf.select(
        pl.col("a").round(1)
    ).collect(engine="gpu")
)

Would love to see these capabilities available for LEAN users. Happy to try to help answer any questions that might come up if you or anyone else explores this.

@AlexCatarino AlexCatarino changed the title Library Request: cuDF Library Request: cuDF + RAPIDS Sep 25, 2024
@Martin-Molinero Martin-Molinero mentioned this issue Dec 10, 2024
11 tasks
@Martin-Molinero
Copy link
Member

We will be adding these libraries to cloud only default environment due to the large footprint required (~8GB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants