-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Features to run MobileVLM on orin #5132
Conversation
1. Sum_Rows: fix cuda kernel overflow fix block shape error when nrows too big 2. Im2Col: Support Batch in cuda Support f32 to f32 both in cpu && cuda 3. DepthWiseConv: Support by Im2Col && MulMat 4. Pool_2d: Supoort avg pooling in cuda 5. HardSigmoid: Imp in cuda 6. HardSwish: Imp in cuda
@@ -1493,7 +1493,8 @@ extern "C" { | |||
int p1, | |||
int d0, | |||
int d1, | |||
bool is_2D); | |||
bool is_2D, | |||
enum ggml_type dst_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this API.
Long term we would like all ops to be able to output certain type which would mean we would have to extend all ops with dst_type
. Is this OK?
We already started using another pattern with ggml_mul_mat_set_prec()
.
Not that it is great, but we might want to look for consistency here.
Somewhat related: ggerganov/ggml#455
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something that we will need to deal with when we add support for full F16 pipeline. I don't think adding a dst_type
parameter to every op would be a good way to do this, a setting in ggml_context
that sets the output type for all the ops it would make more sense, but as it is now, I don't think we have a better way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now should we merge it like this and figure out later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
The pool 2D test doesn't pass. This failure indicates a both a correctness issue and a buffer overflow.
|
This is my result, |
I fix a kernel bug in 1556d4c |
nits
00925f8
to
1556d4c
Compare
The buffer overflow is fixed, but the error is still high:
The result is the same when building with |
Would it work if you change the kernel like this?
|
I do fix a bug in 379f89f, but it's weird that passed when I test. |
Max pool works now, but avg pool still fails:
|
count_include_pad, fix in 49f09aa |
212da07
to
49f09aa
Compare
The original tests pass. I have added more tests and some don't pass with F16. Here is the full result:
|
Only support fp32 for now. |
I didn't hit any asserts while testing F16, so I assume that it works. I see now that there are asserts in |
ggml.h
Outdated
int p0, | ||
int p1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a reason for these to be float. @ggerganov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the yolo
example relies on this:
Context: ggerganov/ggml#576 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fallback in 66dd123
|
|
* New Feature: 1. Sum_Rows: fix cuda kernel overflow fix block shape error when nrows too big 2. Im2Col: Support Batch in cuda Support f32 to f32 both in cpu && cuda 3. DepthWiseConv: Support by Im2Col && MulMat 4. Pool_2d: Supoort avg pooling in cuda 5. HardSigmoid: Imp in cuda 6. HardSwish: Imp in cuda * fix tabs instead of spaces * code clean * CUDA POOL2D * ADD POOL2D test case in test-backend-ops.cpp * code clean * fix pool2d_kernel nits * fix bug in pool2d kernel * fix avg pooling, count_include_pad nits * test-backend-ops : add more pool_2d tests * cuda : fix warnings and formatting * ggml : check types in release builds too in pool_2d * test-backend-ops : remove f16 pool_2d tests * cuda : more style fixes * Add assert in ggml_cuda_op_pool2d * pool2d float padding fallback * test-backend-ops : add dst_type to im2col --------- Co-authored-by: slaren <[email protected]>
* New Feature: 1. Sum_Rows: fix cuda kernel overflow fix block shape error when nrows too big 2. Im2Col: Support Batch in cuda Support f32 to f32 both in cpu && cuda 3. DepthWiseConv: Support by Im2Col && MulMat 4. Pool_2d: Supoort avg pooling in cuda 5. HardSigmoid: Imp in cuda 6. HardSwish: Imp in cuda * fix tabs instead of spaces * code clean * CUDA POOL2D * ADD POOL2D test case in test-backend-ops.cpp * code clean * fix pool2d_kernel nits * fix bug in pool2d kernel * fix avg pooling, count_include_pad nits * test-backend-ops : add more pool_2d tests * cuda : fix warnings and formatting * ggml : check types in release builds too in pool_2d * test-backend-ops : remove f16 pool_2d tests * cuda : more style fixes * Add assert in ggml_cuda_op_pool2d * pool2d float padding fallback * test-backend-ops : add dst_type to im2col --------- Co-authored-by: slaren <[email protected]>
New Feature:
fix block shape error when nrows too big
Support Batch in cuda
Support f32 to f32 both in cpu && cuda
Support by Im2Col && MulMat
Supoort avg pooling in cuda
Imp in cuda
Imp in cuda