Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Cape Python to work with Dask #94

Open
kjam opened this issue Aug 5, 2020 · 0 comments
Open

Integrate Cape Python to work with Dask #94

kjam opened this issue Aug 5, 2020 · 0 comments

Comments

@kjam
Copy link
Contributor

kjam commented Aug 5, 2020

Is your feature request related to a problem? Please describe.
We've had several users request working with Dask directly instead of Spark and Pandas. Because of it's use in the Python data science community and ease of use for out-of-core computations and parallelization of workflows, it fits well with the data science needs we are trying to address.

Describe the solution you'd like
We should see how many changes we would need to get the cape_pandas integrations working for Dask Dataframes. Matt Rocklin had a look on the webinar and pointed out only a few lines (for example, where we explicitly call pd.Series when returning an array as a series), which would need to be updated for it to just work.

Describe alternatives you've considered
We could wait on Dask integration to prioritize other integrations; however, if it truly is as simple as changing a few returns, I would prefer we do it sooner! :)

Additional context
To hear Matt's comments, check out around 48minutes on this YouTube: https://www.youtube.com/watch?v=cIvv8EGMDY0&feature=youtu.be - I'm sure he is happy to help if we need extra guidance! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant