Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DataTree.from_dict to be insensitive to insertion order #9292

Merged

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Jul 30, 2024

@TomNicholas TomNicholas added bug topic-DataTree Related to the implementation of a DataTree class labels Jul 30, 2024
@TomNicholas TomNicholas marked this pull request as ready for review July 30, 2024 13:24
@@ -1104,7 +1104,8 @@ def from_dict(

if d:
# Populate tree with children determined from data_objects mapping
for path, data in d.items():
# Sort keys so as to insert nodes from root first (see GH issue #9276)
for path, data in sorted(d.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomNicholas What if the user does want something like the following?

reversed = DataTree.from_dict(
            {
                "/": xr.Dataset({"age": 83}),
                "/C/Chloe": xr.Dataset({"age": 10}),
                "/A/Abe": xr.Dataset({"age": 39}),
            }
        )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hmm good question. Well this PR will succeed, but create a tree with the children in a different order:

<xarray.DataTree>
Group: /Dimensions:  ()
│   Data variables:
│       age      int64 8B 83
├── Group: /A
│   └── Group: /A/AbeDimensions:  ()
│           Data variables:
│               age      int64 8B 39
└── Group: /C
    └── Group: /C/Chloe
            Dimensions:  ()
            Data variables:
                age      int64 8B 10

Whether or not this is okay depends on whether or not we consider the children to be ordered. Internally they are ordered, which this fix would break I guess. But do we want to guarantee preservation of that order to users? I think it should behave the same way as whether or not we guarantee preservation of the order of .variables in Dataset.

cc @shoyer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keewis suggested that if we do a sortby depth then we would eliminate the bug without re-ordering the nodes within a group.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, we sort by the number of / characters in the (normalized) paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now, and tested. What a neat idea @keewis !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! 💯


# Check that Bart and Lisa's order is still preserved within the group,
# despite 'Bart' coming before 'Lisa' when sorted alphabetically
assert list(reversed["Homer"].children.keys()) == ["Lisa", "Bart"]
Copy link
Member Author

@TomNicholas TomNicholas Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion fails unless sorted sorts by depth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to think on this, but I like the way it works.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tracks, I like the depth order nice touch.

@TomNicholas
Copy link
Member Author

The test failure is an unrelated flaky dask test so now I have an approval I will merge.

@TomNicholas TomNicholas enabled auto-merge (squash) July 31, 2024 17:34
@TomNicholas TomNicholas merged commit 2df31b7 into pydata:main Jul 31, 2024
26 of 28 checks passed
@TomNicholas TomNicholas deleted the datatree_from_dict_insertion_order branch July 31, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree.from_dict fails to insert parent if parent/child inserted first.
4 participants