-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transform, but not caring about broadcasting regular dimensions #1668
Comments
I'm following what it is you're trying to do from the other thread. But you don't want to use this tool ( Maybe the easier way to do this is to do conventional recursion (as discussed in the other thread, with handlers for every |
@ivirshup could you elaborate on what your constraints are here? Do you need the dimensions up to the record content to match? (i.e. can you have |
I'm not so sure it's different. If I can broadcast when the second dimension has a variable length, why should the first dimension have to have the same length? This raised another question for me. I could wrap each array in an outer array of length one containing a variable length array. But this throws an error about not being able to handle nested lists. In some cases this could probably be flattened, then unflattened, but I'm not sure I understand why only one variable length dimension would be allowed. More practically, I do want to be recursing across multiple arrays, but I don't think we had an example of this in a previous discussion. I do want the logic that says recursing into a Record and entries of a list are different and should be an error, but don't necessarily want to broadcast the values. @agoose77, I would like the dimensions to be matched up, which is why I've been thinking of it in terms of the transformation. This is mainly about the discussion: |
In this case, it's complaining because the outer length is different, but it would complain at every level about every list in We haven't had a need to do this type of recursion yet, so it's too early to try to make it into a generalized function, similar to (or accessed through an option within) |
@ivirshup so, same dimensions, but we don't care about the sizes of those dimensions? This sounds like a custom recursion, as Jim suggests :) |
For this case, I think I just don't care about the size of the first dimension. I would care about the sizes of all the following dimensions. Similarly, if I wanted to concatenate some awkward arrays which had a regular second dimension along that dimension, I wouldn't care what the shape of the second dimension was for those arrays. I would care that all the other dimensions matched though. I'm not very into the idea of implementing this myself and outside awkward array since I suspect there are edge cases I'm not expecting (like #1672). |
OK. It sounds like you're trying to predict the outcome of |
I wasn't suggesting that; I think we'll be able to help you with this function (if it's needed).
Broadcasting will complain if variable-length lists are not all equal to each other. If I'm understanding the purpose of this function (to avoid unions when concatenating), you might want the regular dimensions to match, but not every list of the irregular dimensions. In fact, if the first dimension (length) doesn't match, it's not even possible to compare all the lengths of lists in two arrays ( Thinking back to the larger problem of wanting to give all records the same set of fields so that they concatenate without unions, what about a procedure like the following?
Step 2 can be done with Step 1 is only searching down to the first level of records—if there are records inside of records, then this whole procedure has to recurse (calling |
This is starting to make sense to me. I thing what I was going for was really broadcasting the types of the dimensions, not the values. However, I'm still a little unsure which "kind of types" I would want to be looking at.
I am aiming to be able to do both the intersection and union of fields. But intersection is generally easier.
I'm not sure this would work since dimension matching isn't being accounted for. For instance, this procedure could take Also, I think these functions should satisfy the record handling, just not the dimension handling. Would be called like def union_records(arrays):
fields = reduce(or_, (set(a.fields) for a in arrays))
out_arrays = []
for a in arrays:
for field in fields.difference(a.fields):
a = ak.with_field(a, None, field)
out_arrays.append(a)
return out_arrays
def intersect_records(arrays):
fields = list(reduce(and_, (set(a.fields) for a in arrays)))
return [a[fields] for a in arrays] |
This is what I meant by the procedure. The key thing is that we're applying the transformation to each array at a time. They're not getting broadcasted, but they're getting information about all the arrays because we pass it into the transformation function. Here's an >>> import awkward._v2 as ak
>>> a = ak.Array([{"a": 1}, {"a": 2}])
>>> b = ak.Array([{"b": 1.1}, {"b": 2.2}]) For the sake of argument, I'm going to take a union of their fields (the final concatenation will be an outer join). >>> fields = ak.fields(a) + ak.fields(b)
>>> fields
['a', 'b'] Here's a transformation function that adds an empty content (all def add_fields(layout, **kwargs):
if layout.is_RecordType:
asdict = dict(zip(layout.fields, layout.contents))
for field in fields:
if field not in asdict:
asdict[field] = ak.contents.IndexedOptionArray(
ak.index.Index64(np.full(len(layout), -1, np.int64)), ak.contents.EmptyArray()
)
return ak.contents.RecordArray(asdict.values(), asdict.keys(), length=len(layout)) Now we apply it individually to both arrays and after concatenation, we get option-type fields for the ones that aren't in all inputs (because >>> a2 = ak.transform(add_fields, a)
>>> b2 = ak.transform(add_fields, b)
>>> a2
<Array [{a: 1, b: None}, {a: 2, ...}] type='2 * {a: int64, b: ?unknown}'>
>>> b2
<Array [{b: 1.1, a: None}, {b: 2.2, ...}] type='2 * {b: float64, a: ?unknown}'>
>>> ak.concatenate([a2, b2]).show(type=True)
type: 4 * {
a: ?int64,
b: ?float64
}
[{a: 1, b: None},
{a: 2, b: None},
{a: None, b: 1.1},
{a: None, b: 2.2}] This transformation goes down through all layers, even if there are multiple nested lists and option-types, until it finds a RecordArray. It does not continue through that RecordArray to any RecordArrays hidden within it. That's because the transformation function returns an array-type at the level of the RecordArray. (Returning an array-type from a transoformation function is how you say, "stop descending through the tree and use this as output.") >>> c = ak.Array([[[None, {"a": 1}]]])
>>> c2 = ak.transform(add_fields, c)
>>> c2.show(type=True)
type: 1 * var * var * ?{
a: int64,
b: ?unknown
}
[[[None, {a: 1, b: None}]]] So that's why this gets more complicated if you want to continue the procedure into nested records, because you'd have to prepare |
I believe this is closed by #2365! |
Description of new feature
The request
I would like to do recursive iteration over multiple arrays without having to have aligned regular dimensions (especially the first).
An example
I would like to take multiple arrays, and subset their records to a common set of keys. Here's a quick example of how I would expect to do this:
This works since the arrays are of similar length:
This does not work if the first dimension for the arrays is not the same size.
The text was updated successfully, but these errors were encountered: