-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add VLenNDArray #200
base: main
Are you sure you want to change the base?
add VLenNDArray #200
Conversation
I have now added tests in |
hmm - tests are failing on |
Most tests pass now - I had to add the There's still one doc string test failing for py3.7 because of where the new-line gets split. |
I also want to note that when I try and run ____________________________________________________________________________ ERROR collecting numcodecs/tests/test_blosc.py ____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_blosc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_blosc.py:12: in <module>
from numcodecs import blosc
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so, 2): Symbol not found: _blosc_cbuffer_complib
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
_____________________________________________________________________________ ERROR collecting numcodecs/tests/test_lz4.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_lz4.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_lz4.py:9: in <module>
from numcodecs.lz4 import LZ4
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so, 2): Symbol not found: _LZ4_compressBound
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
____________________________________________________________________________ ERROR collecting numcodecs/tests/test_zstd.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_zstd.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_zstd.py:9: in <module>
from numcodecs.zstd import Zstd
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so, 2): Symbol not found: _ZSTD_compress
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so |
@sofroniewn Thank you for your effort in trying to support ragged nD arrays. Just want to ask if there is any update on this feature? |
No update - maybe I can ping @jakirkham and @ryan-williams to take another look / help me get the tests passing / resolve conflicts - i'll note that I think the conflicts / test failures come from the failures of my dev environment not the code I'm trying to add |
Hi @sofroniewn, just to apologise for not looking at this sooner. Re the conflicting .c files, those will just be due to changes in non-essential information that gets output by cython and which depends on which computer the C files where generated on. I'd suggest to just remove any changes to those C files from this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sofroniewn, apologies again for slow review.
Implementation looks fine to me. Only question is whether to use a single byte for the number of dimensions rather than 4 byte int, could save a bit of space.
A couple of small comments on docstrings.
Also would need some API docs.
|
||
def check_out_param(out, n_items): | ||
if not isinstance(out, np.ndarray): | ||
raise TypeError('out must be 1-dimensional array') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise TypeError('out must be 1-dimensional array') | |
raise TypeError('out must be a numpy array') |
|
||
|
||
class VLenNDArray(Codec): | ||
"""Encode variable-length n-dimensional arrays via UTF-8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Encode variable-length n-dimensional arrays via UTF-8. | |
"""Encode an array of variable-length n-dimensional arrays. |
data_length += l + 4 * (n + 2) # 4 bytes to store number of | ||
# dimensions, 4 bytes per | ||
# dimension to store dimension | ||
# and 4 bytes to store the length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a single byte to store the number of dimensions?
These error messages are a bit odd, they suggest some problem with how the other extension modules were compiled. Afraid I haven't got anything very intelligent to suggest, other than cleaning out all the .so and .c files and trying a full build from scratch. |
@sofroniewn, by any chance, would you still have interest in finishing this PR? :) |
This idea could probably be superseded by the proposal for an awkward-zarr project for GSoC. |
So sorry both for dropping the ball on this - if there are already plans for this to be superseded please press on with those, or if you have a contributor who wants to take this over PR please take it over. Thanks!! |
@martindurant Do you have a link to that GSoC proposal? I could only find this open issue, but not sure if that's related?: |
@NumesSanguis here's the ideas-list.md and here's the Awkward Array project details. |
This PR closes #199 by adding support for ragged nD arrays, where each array can be a different shape and dimensionality. It does this using the scheme described in #199.
It's usage is as follows:
I have not added tests yet, but will do so. I will adapt the tests from
test_vlen_array.py
.Any comments on the implementation are appreciated. I'm pretty new to this code base, so may have made some wrong choices.
Oh and I also seem to have a bunch of
.c
files the came when I rancythonize -a -i ./numcodecs/vlen_nd.pyx
that I may or may not have wanted to change, any advice around those would be appreciated too.TODO:
tox -e py37
passes locallytox -e py27
passes locallytox -e docs
passes locally