Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discourse] Using pivot_table with non-numerical data #295

Closed
github-actions bot opened this issue Jan 21, 2022 · 5 comments
Closed

[Discourse] Using pivot_table with non-numerical data #295

github-actions bot opened this issue Jan 21, 2022 · 5 comments
Assignees
Labels
answered Questions that have been satisfactorily answered discourse Discussion topics coming from Discourse

Comments

@github-actions
Copy link

github-actions bot commented Jan 21, 2022

What would be the best way to do a pivot_table on a dataframe containing non-numerical data? I can do it with pandas for smaller dataframes, but my current data is too large to fit in memory.

Example:

raw_data_pd = pd.DataFrame({"ID": [1, …

Would you like to know more?

Read the full article on the following website:

https://dask.discourse.group/t/using-pivot-table-with-non-numerical-data/267

@github-actions github-actions bot added awaiting-triage New issues that need to be assessed or assigned discourse Discussion topics coming from Discourse labels Jan 21, 2022
@scharlottej13
Copy link
Contributor

scharlottej13 commented Jan 22, 2022

sticking some notes here for myself:

  • you can get close w/ ddf.groupby(['Col_ID', 'ID']).aggregate('first').compute()
  • I don't think Dask supports a multi-index yet, but seems like there is work being done on this?
  • ddf.map_partitions(lambda x: x.pivot_table(index="ID", columns="Col_ID", values="value", aggfunc="first")).compute() throws ValueError: The columns in the computed data do not match the columns in the provided metadata Extra: ['B'] Missing: [] I think b/c of the lack of support for a multi-index

@scharlottej13 scharlottej13 self-assigned this Jan 22, 2022
@scharlottej13 scharlottej13 removed the awaiting-triage New issues that need to be assessed or assigned label Jan 22, 2022
@pavithraes
Copy link
Contributor

@scharlottej13 Thanks for answering this! I asked Ian about this last week (because we seem to have worked on the same issue, oops! -- I'll close that one!), and in addition to your answer, we can encourage them to open an issue on dask/dask requesting first and last for pivot_table!

@scharlottej13
Copy link
Contributor

because we seem to have worked on the same issue, oops! -- I'll close that one!)

Sorry about that! I will be better about checking first to make sure it's not a duplicate! Thank you for closing the other one.

and in addition to your answer, we can encourage them to open an issue on dask/dask requesting first and last for pivot_table!

Ah great idea, I'll reply with this as well.

@scharlottej13 scharlottej13 added the answered Questions that have been satisfactorily answered label Jan 24, 2022
@scharlottej13
Copy link
Contributor

scharlottej13 commented Jan 25, 2022

  • before closing, open up issue in dask/dask for first and last if the question poster does not

@pavithraes
Copy link
Contributor

The user opened this issue: https://github.com/dask/dask/issues/8618 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered Questions that have been satisfactorily answered discourse Discussion topics coming from Discourse
Projects
None yet
Development

No branches or pull requests

2 participants