Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an introduction to the reshaping documentation #7623

Merged
merged 34 commits into from
Mar 31, 2023
Merged

Added an introduction to the reshaping documentation #7623

merged 34 commits into from
Mar 31, 2023

Conversation

nishtha981
Copy link
Contributor

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR and welcome to xarray.

I think this intro can still be improved, I have left a few hints. Maybe collectively we can come up with a more general motivation.

@@ -6,6 +6,16 @@ Reshaping and reorganizing data

These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.

Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence should be the first one of this intro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!
Will change that and put it as the beginning

@@ -6,6 +6,16 @@ Reshaping and reorganizing data

These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.

Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks.

To reorder dimensions on a DataArray or across all variables on a Dataset, use the transpose() method. An ellipsis (...) can be used to represent all other dimensions. To expand a DataArray or all variables on a Dataset along a new dimension, use the expand_dims() method. This method attaches a new dimension with size 1 to all data variables. To remove such a size-1 dimension from the DataArray or Dataset, use the squeeze() method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having this short summary which repeats parts from below is very useful to have here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it
Will remove it!

doc/user-guide/reshaping.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost :)

@@ -4,7 +4,9 @@
Reshaping and reorganizing data
###############################

These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.
Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks. To convert from a Dataset to a DataArray, use the to_array() method. Unlike pandas, xarray's stack() method does not automatically drop missing values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks. To convert from a Dataset to a DataArray, use the to_array() method. Unlike pandas, xarray's stack() method does not automatically drop missing values.
Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks.

No need to mention specific method and their peculiarities in the introduction already.
I didn't check below, so make sure that the section about these methods contain the sentences you have added here.

These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.
Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks. To convert from a Dataset to a DataArray, use the to_array() method. Unlike pandas, xarray's stack() method does not automatically drop missing values.

These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. It can also be used in working with geospatial data where we need to analyze and visualize geospatial data, such as satellite imagery or geospatial datasets. Xarray can also be used for time series analysis, including forecasting and anomaly detection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. It can also be used in working with geospatial data where we need to analyze and visualize geospatial data, such as satellite imagery or geospatial datasets. Xarray can also be used for time series analysis, including forecasting and anomaly detection.
These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. It can also be used in working with geospatial data where we need to analyze and visualize geospatial data, such as satellite imagery or geospatial datasets.

I think the last sentence does not really apply to reshaping. You can just drop it completely or check if you can add it somewhere else in a more general introduction.

@github-actions github-actions bot added the Automation Github bots, testing workflows, release automation label Mar 29, 2023
@nishtha981
Copy link
Contributor Author

@headtr1ck @TomNicholas
Please do review the pr!
Thanks!

.github/config.yml Outdated Show resolved Hide resolved
These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.
Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks. Unlike pandas, xarray's stack() method does not automatically drop missing values.

These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. It can also be used in working with geospatial data where we need to analyze and visualize geospatial data, such as satellite imagery or geospatial datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to say "geospatial data" three times in the same sentence.

Instead I might say

"Reshaping can also be required before passing data to external visualization tools, for example geospatial data might expect input organized into a particular format corresponding to stacks of satellite images."

@TomNicholas TomNicholas mentioned this pull request Mar 30, 2023
@nishtha981 nishtha981 requested a review from TomNicholas March 30, 2023 17:32
@TomNicholas
Copy link
Member

I've made a final edit to remove the mention of a specific method in the intro, and remove some blanks lines. This looks good to me now so I'll merge it! Thanks @nishtha981 !

@TomNicholas TomNicholas enabled auto-merge (squash) March 30, 2023 17:44
@nishtha981
Copy link
Contributor Author

Hey!
@TomNicholas the documentation tests are failing
Would you be able to merge the pr despite the issue or can you guide me on how to solve it?

@nishtha981
Copy link
Contributor Author

nishtha981 commented Mar 31, 2023

Hey @TomNicholas
I've found that the issue executablebooks/sphinx-book-theme#711 was causing an issue with the test cases of my pr.
It has been merged now so I'll be making an empty push so that the test cases can run again.

@TomNicholas
Copy link
Member

Hi @nishtha981 - that's great that you identified what was causing the docs ci builds to fail! I was wondering why that was!

Fixing this for xarray may require more than just re-running the ci. For example it might require us to pin a particular version of a library (here sphinx_book_theme) in order to guarantee the CI works again. If it doesn't work immediately now, what we normally do is to open a new github issue on xarray's issue tracker to track the problem. That way if the problem comes up in multiple PRs we can just refer all of them back to the one issue, until it gets resolved completely.

@TomNicholas
Copy link
Member

I've also just told the readthedocs to rebuild just now

@nishtha981
Copy link
Contributor Author

@TomNicholas
Do I pin the sphinx-book-theme version 0.3.0

auto-merge was automatically disabled March 31, 2023 05:15

Head branch was pushed to by a user without write access

@github-actions github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Mar 31, 2023
@nishtha981
Copy link
Contributor Author

I am not sure if pinning works as pydata-sphinx-theme had some private function which sphinx-book-theme was using but now it cannot.
Will probably have to wait for a new release of sphinx-book-theme.

@headtr1ck
Copy link
Collaborator

I am not sure if pinning works as pydata-sphinx-theme had some private function which sphinx-book-theme was using but now it cannot.
Will probably have to wait for a new release of sphinx-book-theme.

See #7703

@nishtha981
Copy link
Contributor Author

Hey! @TomNicholas
All the checks have passed.
Can you please merge the pr?

@TomNicholas TomNicholas merged commit 1c81162 into pydata:main Mar 31, 2023
@TomNicholas
Copy link
Member

Thanks @nishtha981 !

I just realised after merging that this PR should in theory have had a corresponding entry in the what's new page, as all PRs are supposed to have.

We won't worry about that this time, but try and remember to add it next time! That way you will also be listed as a contributor on the what's new page.

@nishtha981
Copy link
Contributor Author

Sure!
I'll keep that in mind next time!
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automation Github bots, testing workflows, release automation CI Continuous Integration tools dependencies Pull requests that update a dependency file topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reshaping doc intro looks incomplete
3 participants