-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterating over Records and ak.from_iter
#1582
Comments
I am inclined to agree that we should not push the record-as-mapping protocol here. We'd start getting into the weeds with how NumPy defines e.g. |
The way I read this, this issue contained 2 things: disabling the iteration over records (done now) and fixing the way arrays are built from collections of records while iterating. I would suggest that we reopen as the second one is not fixed. |
@chernals you're correct; this issue did raise two closely related bugs / behavioural problems, and #1725 itself only addressed one of them. However, the building of records during iteration has also been fixed by #1721, which replaced v1 with our v2 submodule. The PR that closed this issue landed after #1721, so it's fixed in main. There is an argument to be made for backporting this fix (and the separate array-of-records fix in #1725) to the v2 submodule in 1.10.x, but I don't know yet what our criteria are for doing this. In fact, your reply is a good opportunity for @jpivarski to clarify that for me :) @chernals could you clarify the version of Awkward that you are using? Are you using the v2 submodule from our latest minor 1.10 release? |
This change was sufficient to fix both issues in v2 (this is from >>> import awkward as ak # which is to say, v2
>>> ak.from_iter(ak.Record({"a": 1, "b": 2}))
<Record {a: 1, b: 2} type='{a: int64, b: int64}'>
>>> ak.from_iter([ak.Record({"a": 1, "b": 2})])
<Array [{a: 1, b: 2}] type='1 * {a: int64, b: int64}'>
>>> ak.from_iter([[ak.Record({"a": 1, "b": 2})]])
<Array [[{a: 1, b: 2}]] type='1 * var * {a: int64, b: int64}'> The question, then, is whether it should be or can be backported into v1. Our stance is that we're only making essential bug-fixes in v1 now. A change in behavior like this might be considered a bug-fix (though the iteration over field names had once been intentional—it just had unforeseen consequences), but it's "not essential" in the sense that there's a work-around. In fact, we've been making very few changes to v1 in practice for about a year now. (It would be interesting to make a plot of how often the v1 directories were changed in git relative to the v2 directories.) |
Thanks for confirming that this is all available in v2 RC, I'll have a look. |
We don't yet have an RC out for v2. If you've used # Change XX for e.g. 39 (Python 3.9)
pipx run cibuildwheel . --only cpXX-manylinux_x86_64 Here, |
Currently,
ak.Record.__iter__
returns an iterator over its field names, which may be what a user expects if a Record were a Mapping/MutableMapping like a dict, but it's not. It's probably for the best that it's not, because we don't wantkeys
,values
,items
methods, particularly since we're trying to consolidate the nomenclature of field names as "fields", rather than "keys". And we especially don't want the non-vectorized__eq__
and__ne__
that Mapping has.To throw a little more confusion in,
ak.Record.__len__
always returns 1 and not the number of fields (the length of what it iterates over). That's a weird enough choice to be considered a bug.Since v2 is coming up and we can make little backward-incompatible changes like this, perhaps we should just ensure that v2
ak.Record
has no__iter__
and__len__
methods. If they're really needed for something, we can be driven by the use-case as to whetherak.Record
iteration should be over field names, field-value pairs, or something else. At the very least,__len__
should return the length of the__iter__
iterable.Related to this is how
ak.Record
is treated byak.from_iter
: it naively iterates, which means it gets the field names:Although iteration is a slow way to build an array and coming from a collection of
ak.Record
is suggestive that it's being used when a faster way is possible, the above is almost certainly not what the user had in mind. Changing v2ak.from_iter
to recognizeak.Record
and build an equivalent thing is an improvement that would be independent of fixing theak.Record.__iter__
and__len__
behavior, but both should be done.Edit: also check
ArrayBuilder.append
; make sure it's consistent withak.from_iter
(it might be howak.from_iter
is implemented...)The text was updated successfully, but these errors were encountered: