-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaskRCNN Inference #884
MaskRCNN Inference #884
Conversation
I started the same project today but you are head of me. Maybe you need to drop the last fc layer of the backbone right? |
models/mask_rcnn.py
Outdated
module = make_conv3x3(next_feature, layer_features, | ||
dilation=dilation, stride=1, use_gn=use_gn | ||
) | ||
exec(f"self.{layer_name} = module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kinda cursed, you should be able to change the name during the weight loading process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wozeparrot yes, a lot of things are cursed right now, will be taking this up one by one, while I'm adding the calls
yes, they can be removed, but I won't be using them anyway in the forward call |
Changes made in
|
@geohot So there are still some torch functions which need to be removed, but here's an example output |
I'm aware that the results aren't exactly the same, this is because the resnet block output doesn't exactly match reference implementation, it matches with atol=1e-3 |
tinygrad/tensor.py
Outdated
@@ -435,6 +461,7 @@ def dot(self, w:Tensor) -> Tensor: | |||
|
|||
def contiguous(self): return mlops.Contiguous.apply(self) | |||
def log(self): return mlops.Log.apply(self) | |||
def log2(self): return mlops.Log.apply(self)/0.69314718056 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- (math.log(math.e)/math.log(2)) for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geohot 0.69314718056 is math.log(2), to change base from math.e i divided it by log of 2
tinygrad/tensor.py
Outdated
bs, c, py, px = x.shape | ||
return x.reshape(bs, c, py, 1, px, 1).expand(bs, c, py, scale_factor, px, scale_factor).reshape(bs, c, py * scale_factor, px * scale_factor) | ||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If things use numpy like this, they don't belong in tensor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed from tensor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are still in tensor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed it.. I've kept interpolate in tensor.py after removing staticmethod, moved sort and topk
Is this ready for me to test? Will run on 7900XTX and confirm it meets the target |
yes @geohot , run on 7900XTX should take around 3-4 hours |
Testing now, required |
Made it to: and got pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE 7900XTX with 24GB of VRAM |
So I’ve been using OPT=1 because of the kernel fusion issue, i usually get 8 sec per image on rtx 3060 mobile, I’m suspecting this behaviour is because of OPT=2. Could you please try OPT=1 @geohot ? |
Pulled, and rerunning with OPT=1 |
3%|████▋ | 142/5000 [27:45<15:49:38, 11.73s/it] pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE |
I think this is actually because you are running out of kernel program space, maybe try with method cache disabled? |
At 150 now with method cache disabled, but this is brutally slow. ETA is over 24 hours. |
26 hours later, congrats! Either post e-mail here or reach out to [email protected] to claim bounty It should be faster, but the inference bounty didn't have a speed requirement :)
|
@geohot thanks! I know this is very slow ATM, and the one who'll pick this up for training is going to have a very hard time. Sharing some thoughts for that person:
Both of these need some kind of X[y] instructions, so I had a hard time trying to make work in tinygrad. 1 could be done fully in tinygrad, but it was even slower. I have sent a paypal payment request to [email protected], my email is [email protected] |
So far I've created base classes based on reference implementation (thanks @wozeparrot ), and I'm able to load the weights @geohot
https://github.com/mlcommons/training/tree/master/object_detection/pytorch/maskrcnn_benchmark
TODO:
Load weights of the saved model
Load reference model in torch and verify same parameters present in tinygrad model
Add call to each module and verify it works by comparing it with reference torch call
Test model call works end to end
Add inference code
Remove torch functions, lower usage of .numpy()
Run model on test dataset, Box AP should be similar
Calculate inference time(s/im)