Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Ideas for what to include in a user tutorial #842

Open
timsaucer opened this issue Aug 27, 2024 · 3 comments
Open

RFC: Ideas for what to include in a user tutorial #842

timsaucer opened this issue Aug 27, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In addition to the information we already have in the online documentation, it would be helpful to write a tutorial guiding a user through the various portions of DataFusion and how to get started. This issue is to collect ideas for what people would like to see in such a tutorial

Describe the solution you'd like

Please comment with topics that should be covered.

Additional context

Things I would like to see (unsorted list)

  • Creating a small dataframe from a pyarrow array
  • Reading/Writing data from/to csv and parquet
  • Zero copy import of data from pyarrow
  • Transferring DataFrame to/from pandas/polars
  • Displaying data via show(), repr_html(), and great tables
  • Basic column selection, including indexing into fields and element for structs and arrays
  • Performing joins
  • Performing window and aggregate functions, including how default and custom window frames work
  • Integrating with deltalake
  • Using object store from S3, Google Cloud, Azure
  • Unnesting columns
  • Making structs and arrays
  • Chaining DataFrame operations with transform (PR in review)
  • Doing a variety of conditional operations (both case and when without base statement)
  • Examples of string manipulation
  • Doing date time conversion
  • Writing a UDF (advanced topic: writing a rust UDF and using with datafusion-python)

Please add to the list what you would like to see!

@timsaucer timsaucer added the enhancement New feature or request label Aug 27, 2024
@timsaucer timsaucer changed the title RFC: Operations to include in a user tutorial RFC: Ideas for what to include in a user tutorial Aug 27, 2024
@mesejo
Copy link
Contributor

mesejo commented Aug 30, 2024

Great ideas!

I'm curious about:

Writing a UDF (advanced topic: writing a rust UDF and using with datafusion-python)

Would this involve wrapping the rust UDF with PyO3?

Reading/Writing data from/to csv and parquet

I would also like to see how to read csv from HTTP directly

Some other ideas:

  • Integration with numpy
  • UDF with numba acceleration (if possible?)

@timsaucer
Copy link
Contributor Author

Thanks! To answer your immediate question, I have a draft blog post about how you use rust UDFs here: https://github.com/timsaucer/datafusion-site/blob/tsaucer/python-udf-approaches/_posts/2024-08-06-datafusion-python-udf-comparisons.md I plan on publishing that soon-ish. I want to get a few things done with DF41 before we release it.

@mesejo
Copy link
Contributor

mesejo commented Aug 30, 2024

Thanks @timsaucer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants