-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evision (OpenCV's cv::Mat
) as an Nx.Backend?
#48
Comments
Could be a separate library? Perhaps yes. But since nx is one of the dependencies of this project, it should be fine if I do it here. |
To get an ROI of an image, we may use img = Evision.imread!("image.jpg")
img[{10..30, 10..30, :all}] |
Checklist: Required callbacks
Optional callbacks
|
I dont mind having a sane API. I feel python goes off the rails with their syntax and article explaining just that https://www.cigrainger.com/introducing-explorer/ By sane I mean Evision.crop(img, %{x_begin: 10, x_end: 30, y_begin: 10, y_end: 30})
#(maybe instead of crop it could be called a generic mutate, since it will probably need to incorporate stride and other fun stuff) |
@vans163 thanks for the suggestion :) I agree that evision should have such kind of helper functions. I plan to put them in a dedicated module, maybe |
OpenCV does not support the following types
(Although it's possible to store values with those types using custom types, the resulting The type inference function, Nx.Type.infer/1, in cc @josevalim What do you think? If this sounds good, I can open a PR for this :) |
Unfortunately I think this won't be enough. :( For example, inside I think the best option for now is for you to simply treat s64 as s32 and document that the maximum precision is s32, so everything gets downcast. I would perhaps raise for u32/u64 though. |
For the 1.
|
Bit | 31-3 |
2-0 |
---|---|---|
DDD |
MSB LSB
31............................| 2...0 |
|.............................| depth |
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| DDD |
The other hard-coded thing is #define CV_CN_MAX 512
, and 512 = 1 << 9
, therefore, cv::Mat
's channel information is stored from bit 3 to bit 11.
Bit | 31-3 |
11-3 |
2-0 |
---|---|---|---|
CCCCCCCCC |
DDD |
MSB LSB
31...................| 11......3 | 2...0 |
|....................| channels | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCCC | DDD |
The channels
infomation in the cv::Mat
is used by some OpenCV functions (via CV_MAT_CN(mat.type())
) for some sanity checks, for example, some functions that only works with 3-channel 2D images.
Now, let's suppose that we agreed we can reduce the number of bits for channels
from 9
to 8
, and use that saved 1 bit for depth
:
Bit | 31-3 |
11-4 |
3-0 |
---|---|---|---|
CCCCCCCC |
DDDD |
MSB LSB
31...................| 11.....4 | 3...0 |
|....................| channels | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCC | DDDD |
Then we can make the following modifications to that header file
#define CV_CN_MAX 256
#define CV_CN_SHIFT 4
#define CV_DEPTH_MAX (1 << CV_CN_SHIFT)
#define CV_8U 0
#define CV_8S 1
#define CV_16U 2
#define CV_16S 3
#define CV_32S 4
#define CV_32F 5
#define CV_64F 6
#define CV_16F 7
// add the custom `CV_64S` macro
#define CV_64S 8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U 9
#define CV_32U 10
On the surface, this looks pretty legit, and in fact, if you made all the mentioned changes to OpenCV, you can compile a cv::Mat
that is initialized with CV_64S
as its type.
#include <iostream>
#include <opencv2/opencv.hpp>
#include <vector>
using namespace cv;
template <typename T, typename AS=T>
void print_data(cv::Mat& mat, const char * name) {
for (int i = 0; i < 3; i++) {
std::cout << name << '[' << i << "]: " << (AS)mat.template at<T>(i) << '\n';
}
std::cout << '\n';
}
int main() {
std::vector<int64_t> data1 = {INT64_MAX, INT64_MAX - 1, INT64_MAX - 2};
std::vector<int64_t> data2 = {0, 1, 2};
std::vector<int> as_shape = {1, 1, 3};
cv::Mat mat1((int)as_shape.size(), as_shape.data(), CV_64S, data1.data());
cv::Mat mat2((int)as_shape.size(), as_shape.data(), CV_64S, data2.data());
print_data<uint64_t>(mat1, "mat1");
print_data<uint64_t>(mat2, "mat2");
}
The output is
mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805
mat2[0]: 0
mat2[1]: 1
mat2[2]: 2
4. The magic number -- 0x28442211
However, once we try to do some operation on them, even the simplest one, like adding two matrices, we would get an incorrect result:
int main() {
// ... skipped
print_data<uint64_t>(mat2, "mat2");
// add `mat1` and `mat2`
auto mat3 = cv::Mat(mat1 + mat2);
print_data<uint64_t>(mat3, "mat3");
}
The output is:
mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805
mat2[0]: 0
mat2[1]: 1
mat2[2]: 2
mat3[0]: 16777215
mat3[1]: 0
mat3[2]: 0
Obviously, we got some wrong numbers. But we do have some clues from the value 16777215
, which is 0xFF_FF_FF
.
This means somewhere deep inside OpenCV, it still thinks that these matrices are some other type instead of CV_64S
.
After a quick grep in the OpenCV's code base, the following lines in particular drew my attention (in modules/core/include/opencv2/core/cvdef.h
):
/** Size of each channel item,
0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem) */
#define CV_ELEM_SIZE1(type) ((0x28442211 >> CV_MAT_DEPTH(type)*4) & 15)
It's a pretty compact way to store the size info of all 8 data types into a single 32-bit integer.
// LSB
// 0001
#define CV_8U 0
// 0001
#define CV_8S 1
// 0010
#define CV_16U 2
// 0010
#define CV_16S 3
// 0100
#define CV_32S 4
// 0100
#define CV_32F 5
// 1000
#define CV_64F 6
// MSB
// 0010
#define CV_16F 7
I would probably do the same thing if I knew that my library would only deal with 8 data types.
Nevertheless, for this line, it's still relatively simple to change it so that it fits our needs.
As a reminder, we've added 3 types after exisiting ones,
// add the custom `CV_64S` macro
#define CV_64S 8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U 9
#define CV_32U 10
Hence we should prepend 3 4-bit size info to this magic number
/** Original
0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem)
Size of each channel item (new),
0x48828442211 = 0100 1000 1000 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem)
MSB
0100 - CV_32U
1000 - CV_64U
1000 - CV_64S
...
LSB
*/
#define CV_ELEM_SIZE1(type) (int)((0x48828442211 >> CV_MAT_DEPTH(type)*4) & 15)
5. More changes needed, but does it worth the effort?
Well, it would be a happy ending if it worked after all the patches above, but I found more hard-coded things in OpenCV's code base, for example, this data conversion function in modules/core/src/matrix_sparse.cpp
static ConvertData getConvertElem(int fromType, int toType)
{
static ConvertData tab[][8] =
{{ convertData_<uchar, uchar>, convertData_<uchar, schar>,
convertData_<uchar, ushort>, convertData_<uchar, short>,
convertData_<uchar, int>, convertData_<uchar, float>,
convertData_<uchar, double>, 0 },
{ convertData_<schar, uchar>, convertData_<schar, schar>,
convertData_<schar, ushort>, convertData_<schar, short>,
convertData_<schar, int>, convertData_<schar, float>,
convertData_<schar, double>, 0 },
{ convertData_<ushort, uchar>, convertData_<ushort, schar>,
convertData_<ushort, ushort>, convertData_<ushort, short>,
convertData_<ushort, int>, convertData_<ushort, float>,
convertData_<ushort, double>, 0 },
{ convertData_<short, uchar>, convertData_<short, schar>,
convertData_<short, ushort>, convertData_<short, short>,
convertData_<short, int>, convertData_<short, float>,
convertData_<short, double>, 0 },
{ convertData_<int, uchar>, convertData_<int, schar>,
convertData_<int, ushort>, convertData_<int, short>,
convertData_<int, int>, convertData_<int, float>,
convertData_<int, double>, 0 },
{ convertData_<float, uchar>, convertData_<float, schar>,
convertData_<float, ushort>, convertData_<float, short>,
convertData_<float, int>, convertData_<float, float>,
convertData_<float, double>, 0 },
{ convertData_<double, uchar>, convertData_<double, schar>,
convertData_<double, ushort>, convertData_<double, short>,
convertData_<double, int>, convertData_<double, float>,
convertData_<double, double>, 0 },
{ 0, 0, 0, 0, 0, 0, 0, 0 }};
ConvertData func = tab[CV_MAT_DEPTH(fromType)][CV_MAT_DEPTH(toType)];
CV_Assert( func != 0 );
return func;
}
Again, it's not hard to add a few specialized template functions of convertData_
. The core issue here from my perspective is -- does it worth all the effort?
The reasons why I hesitate to go further are that:
-
Even if I managed to find all the hard-coded lines (relevant ones) and patched them correctly, we would only get limited operations from OpenCV that are available to these added types.
-
The
raw_type
inEvision.Mat
(or the value ofint cv::Mat type()
) will be totally different than the ones returned from the official build.It doesn't seem to be a huge problem at the first glance, however, OpenCV does have the functionality to persist/serialise
cv::Mat
to disk. Therefore, if one tries to load the serialised data which was generated by the original code, it would fail or return wrong data.Simply put, the header part of the serialised data will be different because we changed the what the underlying bits represent in
cv::Mat
'sflags
member. -
Even if we somehow managed to recognise if the serialised data was produced by the modified code or the original one, the amount of patches together with all the python code in this project would make it even harder for anyone who's willing to contribute ti this project.
-
It's possible to submit all the patches to the upsteam (OpenCV), yet I personally highly doubt that if they would accept the PR because
- all the compatibilities issues (as in 2.);
- these types are not often used in computer vision (otherwise OpenCV would have supported these types in the first place).
-
Even if they were willing to add these types, these new types would not be available until the next major update (OpenCV 5.0) because of these compatibilities issues.
For example,
CV_USRTYPE1
was available in OpenCV 3.x, and OpenCV decided to replaceCV_USRTYPE1
withCV_16F
(half-precision float). But they had to do that in a major update, i.e., OpenCV 4.0.// modules/core/include/opencv2/core/hal/interface.h // in OpenCV 3.x #define CV_8U 0 #define CV_8S 1 #define CV_16U 2 #define CV_16S 3 #define CV_32S 4 #define CV_32F 5 #define CV_64F 6 #define CV_USERTYPE1 7 // in OpenCV 4.x #define CV_USRTYPE1 (void)"CV_USRTYPE1 support has been dropped in OpenCV 4.0" #define CV_8U 0 #define CV_8S 1 #define CV_16U 2 #define CV_16S 3 #define CV_32S 4 #define CV_32F 5 #define CV_64F 6 #define CV_16F 7
6. Any workarounds?
There are two workarounds that I can think of at the moment, and they all have different trade-offs.
a. Map these types to some other types
It's possible set a map for those unsupported types in the config.exs
file.
config :evision, unsupported_type_map: %{
{:s, 64} => {:f, 64},
{:u, 64} => {:f, 64},
{:u, 32} => {:f, 32}
}
The above config would map :s64
and :u64
to :f64
, and map :u32
to :f32
. And the very first drawback is that it would be a totally different type. Secondly, value-wise :f64
does not cover every single possible value of :u64
or :s64
.
The 64-bit double (assuming using the IEEE 754 standard) can has 52 bits of mantissa, so the largest integer you can store in a double without losing precision is
b. Use other Nx backends
i) :nx
Nx.BinaryBackend
is implemented in pure Elixir, and :nx
is a dependency of this library, so you can use it out-of-box. However, Nx.BinaryBackend
could be really slow if you have a relatively large martix.
ii) :torchx
Torchx.Backend
is another Nx backend and it uses libtorch
. Very fast and superb library, but the official prebuilt binaries of libtorch
only support x86_64
CPUs (and Apple Silicon (aarch64-apple-darwin) via brew
).
cv::Mat
(and its variants) is the OpenCV's implementation of multi-dimensional array, so technically, it can be used as a backend for the Nx (numerical-elixir).The text was updated successfully, but these errors were encountered: