Replies: 59 comments 1 reply
-
W.r.t. the question on why
for lack of involvement in the other design decisions. Here, we discussed that a simple
I think having a GIL-protected lazily initialized static like in rust-numpy/src/slice_container.rs Lines 109 to 112 in eb1068b might be preferable to a single global data structure?
While the overhead of producing a |
Beta Was this translation helpful? Give feedback.
-
That seems to be the case indeed: So we can probably replace our calls to
I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they? |
Beta Was this translation helpful? Give feedback.
-
Then my initial guess was correct. I wonder though whether it's actually noticeable, i.e. has anyone tried to measure it? With all the other Python cruft on top, I would strongly suspect there'd be no visible difference at all. I understand where it comes from, it's just there's another implicit logical requirement in the
With record types, it's pretty much a requirement; anything else would be extremely boilerplate-ish and unsafe (with the user having to provide all offsets and descriptors for all fields manually). It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.
Yes, I think it does (and so does pybind11).
Yep - thanks for pointing that out, something like that should be sufficient. |
Beta Was this translation helpful? Give feedback.
-
Just checked it out: if one replaces Line 850 in eb1068b with something runtime like if !T::get_dtype(py).is_object() { the difference in For comparison, creating arange(1, 5, 1) is 31ns. So yea... you could say it's noticeable (but only if you create hundreds millions of arrays). |
Beta Was this translation helpful? Give feedback.
-
ATM, the
I think if anything has measurable overhead, then it is acquiring the reference to a
proc-macros generate code that is injected into the dependent crate, i.e. they have to use this crate's public API and all the code they produce must have been possible to be written by hand. The point is not that I expect user code to manually produce those trait implementations (but avoiding the build time overhead of proc-macros would be one reason to do so), but that we can focus the discussion on the mechanisms and leave the proc-macro UI for later. |
Beta Was this translation helpful? Give feedback.
-
One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space. |
Beta Was this translation helpful? Give feedback.
-
As a first actionable result of your findings, why not start with a PR to simplify the |
Beta Was this translation helpful? Give feedback.
-
I fear that it might actually be unsound: During extraction, we only check the data type to be object, but not whether all the elements are of type I fear we need to remove this suggestion and limit ourselves to |
Beta Was this translation helpful? Give feedback.
-
While I'm often guilty in micro-optimizing Rust code down to every single branch hint myself, I just don't think it matters here, the last thing you're going to want to be doing in numpy bindings is instantiate millions of array objects in a loop - that was kind of my point. Here's a few more thoughts on what would need to happen if
Oh yea, and also, record types can be nested - that would be fun to implement as const as well (at runtime we can just box it all on the heap). |
Beta Was this translation helpful? Give feedback.
-
The note above only applies if it remains as If it's changed to |
Beta Was this translation helpful? Give feedback.
-
Yea, I was asking myself whether I was reading that code correctly - it only checks for There's three options:
|
Beta Was this translation helpful? Give feedback.
-
Yes, I've already started on this, as a first step, will try to push something shortly. There's actually a few more subtle dtype-related bugs I've uncovered in the process that I'll fix as well (the tests are very sparse so I guess noone noticed anything so far...). |
Beta Was this translation helpful? Give feedback.
-
(This is a bit of a wall of text, so thanks in advance to whoever reads through it in its entirety. I tried to keep it as organized as possible.) tagging @adamreichold @kngwyu @davidhewitt Ok, now that step 1 (#256) is ready to be merged, let's think about next steps. I'll use pybind11 as reference here since that's what I'm most familiar with having implemented a big chunk of that myself in the past. Descriptor (format) strings and PEP 3118In pybind11, dtypes can be constructed from buffer descriptor strings (see The benefit of using descriptor strings - they're easy to hash if a registry is used, there's no nesting, etc - it's just a simple string that can cover any dtype possible, including scalar types, object types, multi-dimensionals subarrays, recarrays, and any combination of the above. We can also generate them with proc-macros at compile time and make them Now, if we go this route, we'll have to delve into "format chars", "type kinds", "buffer descriptor strings" etc. There's obviously a big overlap with Buffers,
|
Beta Was this translation helpful? Give feedback.
-
Just my first thoughts after reading:
|
Beta Was this translation helpful? Give feedback.
-
Some more progress - due to In a nutshell, you can just use |
Beta Was this translation helpful? Give feedback.
-
Some more progress updates (this is kinda where all of this has been heading from the start) - here's one of the most recent tests. This compiles and runs: #[derive(Clone, numpy::Record)]
#[repr(C)]
struct Foo<T, const N: usize> {
x: [[T; N]; 3],
y: (PyObject, Timedelta64<units::Nanosecond>),
}
assert_eq!(
Foo::<i64, 2>::type_descriptor(),
td!({"x":0 => [(3, 2); <i64], "y":48 => {0 => O, 8 => m8[ns] [16, 8]} [64, 8]})
); There's still lots of edge cases and bugs to be squashed and tests to be written etc, but on the surface, the core of it seems to be working quite nicely. |
Beta Was this translation helpful? Give feedback.
-
If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to
I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.
Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.) |
Beta Was this translation helpful? Give feedback.
-
Yea, I had the same feeling. For now I've just added transparent wrapper types themselves that can only do two things (aside from being registered in the dtype system) - be constructed from
I've added the min-const-generics part of it, but I can also add the macro-based impl for up-to-length-32 arrays for older compiler versions. Personally, I'm not a huge fan of #[rustversion::since(1.51)]
// impl via min const geneics
#[rustversion::before(1.51)]
// impl via macros The main benefit is that we won't break some dependent crates the day we kill the
I think, first of all, we need to categorise how a type descriptor can be attached to a type. There will be two ways:
There's an |
Beta Was this translation helpful? Give feedback.
-
Yeah, I think using a crate like
I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that I think we should always have an alignment (even if trivial, i.e. one) and we should set the My understanding of the |
Beta Was this translation helpful? Give feedback.
-
The way it works in numpy itself ( PyArray_Descr *new = PyArray_DescrNewFromType(NPY_VOID); // <-- new->alignment := 1
...
if (align) { // <-- align comes from either align=True or from being called recursively
new->alignment = maxalign; // <-- maxalign is computed C-alignment if offsets are not provided
}
...
/* Structured arrays get a sticky aligned bit */
if (align) {
new->flags |= NPY_ALIGNED_STRUCT;
} IIUC, There's 4 cases in numpy constructor itself:
So, the following must hold true: "if Back to our business - I guess we can do something like this? struct RecordDescriptor<T> {
...
alignment: usize, // always set
is_aligned: Option<bool>,
} Where
|
Beta Was this translation helpful? Give feedback.
-
Here's an illustration: def check_alignment(*, offsets=None, align=False):
print(f'offsets={offsets}, align={align} => ', end='')
try:
args = {
'names': ['x', 'y'],
'formats': ['u2', 'u8'],
}
if offsets is not None:
args['offsets'] = offsets
dtype = np.dtype(args, align=align)
dtype_offsets = [dtype.fields[n][1] for n in args['names']]
print(
f'alignment={dtype.alignment}, '
f'isalignedstruct={dtype.isalignedstruct}, '
f'offsets={dtype_offsets}'
)
except BaseException as e:
print(f'error: "{e}"') check_alignment(offsets=None, align=False)
check_alignment(offsets=None, align=True)
check_alignment(offsets=[0, 8], align=False)
check_alignment(offsets=[0, 8], align=True)
check_alignment(offsets=[0, 5], align=False)
check_alignment(offsets=[0, 5], align=True) which outputs offsets=None, align=False => alignment=1, isalignedstruct=False, offsets=[0, 2]
offsets=None, align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 8], align=False => alignment=1, isalignedstruct=False, offsets=[0, 8]
offsets=[0, 8], align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 5], align=False => alignment=1, isalignedstruct=False, offsets=[0, 5]
offsets=[0, 5], align=True => error: "offset 5 for NumPy dtype with fields is not divisible by the field alignment 8 with align=True" |
Beta Was this translation helpful? Give feedback.
-
The semantics look reasonable to me. I do still wonder whether
So this means that NumPy treats this like a |
Beta Was this translation helpful? Give feedback.
-
Yes, it sort of does. But I don't think it makes a distinction between aligned and unaligned reads, all that is left to the compiler. You can see that the usage of |
Beta Was this translation helpful? Give feedback.
-
Ok, re: the last comment, maybe I wasn't entirely clear. In some cases, NumPy does distinguish between aligned and unaligned arrays. /* TODO: We may need to distinguish aligned and itemsize-aligned */
aligned &= PyArray_ISALIGNED(arrays[i]);
}
if (!aligned && !(self->method->flags & NPY_METH_SUPPORTS_UNALIGNED)) {
PyErr_SetString(PyExc_ValueError,
"method does not support unaligned input.");
return NULL;
} Whether the array is aligned or not, e.g. if you select a field like So, in the example above, examples 2, 3 and 4 will have |
Beta Was this translation helpful? Give feedback.
-
My impression is that indeed this means that NumPy will store these records unaligned? So resulting type descriptor would match a Rust type with |
Beta Was this translation helpful? Give feedback.
-
(Sorry, didn't mention that: Personally, I don't think we need roundtripping. I am basically only interested in the |
Beta Was this translation helpful? Give feedback.
-
Well, to start with... NumPy allows overlapping fields. As long as neither of them contains
I think we have to do the reverse mapping, which will be used in
What exactly do you mean by that? You can pass offsets It will only be 'packed' if you don't specify offsets. In this case |
Beta Was this translation helpful? Give feedback.
-
I have to think about this. My first instinct is that this would rather be an argument for improving the NumPy code instead of rolling our own.
From your example in #254 (comment), I got the impression that |
Beta Was this translation helpful? Give feedback.
-
Hey. Just wondering if there were any implementation results from this discussion. I'd like to use structured data types and I got stuck when trying to implement |
Beta Was this translation helpful? Give feedback.
-
I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.
Let's look at
Element
:npy_type()
is used inPyArray::new()
and the like. Instead, one should usePyArray_NewFromDescr()
to make use of the custom descriptor. Should all places wherenpy_type()
is used split between "simple type, useNew
" and "user type, useNewFromDescr
"? Or, alternatively, should arrays always be constructed from descriptor? (in which case,npy_type()
becomes redundant and should be removed)same_type()
needed at all? It is only used inFromPyObject::extract
where one could simply usePyArray_EquivTypes
(like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)DATA_TYPE
constant is really only used to check if it's an object or not in 2 places, like this:Element
essentially is justDataType
? E.g.:DataType::Void
? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, viaPyArrayDescr
).&PyArrayDescr
would probably require:Element
should probably be implemented for tuples and fixed-size arrays.#[repr(C)]
,#[repr(packed)]
or#[repr(transparent)]
#[numpy(record, repr = "C")]
. (or not)Copy
? (or not? technically, you could have object-type fields inside)Element
for it manually. This seems largely redundant, given that theDATA_TYPE
will always beObject
. It would be nice if any#[pyclass]
-wrapped types could automatically implementElement
, but it would be impossible due to orphan rule. An alternative would be something like this:OrderedFloat<f32>
orWrapping<u64>
or somePyClassFromOtherCrate
? We can try doing something like what serde does for remote types.Beta Was this translation helpful? Give feedback.
All reactions