Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ refactoring: _getitem_array implementation #959

Merged
merged 13 commits into from
Jul 20, 2021
Merged

Conversation

ioanaif
Copy link
Collaborator

@ioanaif ioanaif commented Jun 24, 2021

  • Content.__getitem__ in the right order
  • EmptyArray._getitem_array
  • NumpyArray._getitem_array
  • RegularArray._getitem_array
  • ListArray._getitem_array
  • ListOffsetArray._getitem_array
  • RecordArray._getitem_array handing the allow_lazy case
  • IndexedArray._getitem_array
  • IndexedOptionArray._getitem_array
  • ByteMaskedArray._getitem_array
  • BitMaskedArray._getitem_array
  • UnmaskedArray._getitem_array
  • UnionArray._getitem_array

@jpivarski
Copy link
Member

Copying from Slack, Content.__getitem__ should be roughly:

  1. check int
  2. check slice
  3. check string
  4. check non-tuple iterable of strings
  5. ...
  6. check ak.Array, pull out its layout and call __getitem__ on that so that it can be handled by rule 7 or 8
  7. check NumpyArray (to enter into _getitem_array)
  8. check generic Content (other cases, like jagged arrays and missing values)
  9. check tuple (for the generic _getitem_next)
  10. check remaining Iterable by calling nplike.asarray on it, wrapping it in NumpyArray and passing it into __getitem__ again so that it gets picked up by rule 7 and goes into _getitem_array
  11. TypeError

(roughly)

@jpivarski
Copy link
Member

For testing, you can alternate between old-style and new-style arrays. The ak._v2.tmp_for_testing module defines v1v2_equal, which you can use to see if an old-style and a new-style array are exactly equal, and v1_to_v2/v2_to_v1 for conversions.

When your project is done, we'll be removing the ak._v2.tmp_for_testing module as well as the old-style arrays, so while they're useful for ensuring that your new functions are compatible with the old ones, the final tests will have to be written in such a way that we can remove the old stuff and not lose the tests.

You can do this by developing with direct comparisons (v1v2_equal), then make the final tests look like

result = your_v2_array.something()
assert ak.to_list(result) = [...]
assert result.form.tolist() == {...}
assert v1v2_equal(result, v2_to_v1(your_v2_array).something())

where you get [...] and {...} from the old-style array, and they're plain Python data. When we remove the old-style arrays, we can remove the last assertion, but the other two completely check the correctness of the array (values and detailed type).

@jpivarski
Copy link
Member

All Content subclass constructors (e.g. calling ListArray, IndexedArray, etc.) need to pass in the appropriate identifiers and parameters. We can't define identifiers yet, but I'll make utility functions to do the right thing when they're not all None (and pass through None for now). Passing through or creating empty parameters is very important, so the next PR should get started doing that.

Additionally, I'll make the very early (low numbered) tests start testing the new code soon.

This PR was in very good shape before I started messing with it. A few things were simplified by using kernels, rather than NumPy functions, which is something that we can do now, due to #966.

I'll try to get some underlying infrastructure done before you come back to this next week.

@jpivarski jpivarski merged commit 63552d6 into main Jul 20, 2021
@jpivarski jpivarski deleted the ioanaif/getitem_array branch July 20, 2021 00:40
jpivarski added a commit that referenced this pull request Oct 7, 2021
jpivarski added a commit that referenced this pull request Oct 12, 2021
* Starting to implement the type tracer for Awkward-Dask.

* Actually testing it.

* It predicts Forms and Types.

* Convenience methods for making typetracers.

* TypeTracer.instance().

* TypeTracerArray.__getitem__(slice).

* Black.

* Through RecordArray.

* Through BitMaskedArray.

* Added checks throughout 0896.

* Rename test-data.yml to kernel-test-data.yml.

* Instrumented tests from #959.

* Instrumented tests from #1031.

* numpy.broadcast_shapes is too new.

* Through 0011_v2-listarray in the v2 tests.

* Python 2.7 needs Ellipsis to be spelled out.

* Through 0020_v2-support-unsigned-indexes in the v2 tests.

* Through 0023_v2-regular-array in the v2 tests.

* Through all slicing in the v2 tests; next is 0070_v2-argmin-and-argmax.

* Through test_0070_v2-argmin-and-argmax.py in the v2 tests; ak.to_list was wrong, and subsequently the reducer implementation.

* First use of 'fill' (had to split into 'fill_zero', 'fill_other').

* Through 0111_v2-jagged-and-masked-getitem in the v2 tests; added nplike.known_shape.

* Through 0115_v2-generic-reducer-operation in the v2 tests.

* Through 1059_v2-localindex in the v2 tests.

* Added typetracer to all v2 tests (where possible).

* Fixed pre-commit.

* Fixed merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants