Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaskRCNN Inference #884

Merged
merged 72 commits into from
Jun 25, 2023
Merged

MaskRCNN Inference #884

merged 72 commits into from
Jun 25, 2023

Conversation

kunwar31
Copy link
Contributor

@kunwar31 kunwar31 commented May 31, 2023

So far I've created base classes based on reference implementation (thanks @wozeparrot ), and I'm able to load the weights @geohot

https://github.com/mlcommons/training/tree/master/object_detection/pytorch/maskrcnn_benchmark

TODO:

  • Load weights of the saved model

  • Load reference model in torch and verify same parameters present in tinygrad model

  • Add call to each module and verify it works by comparing it with reference torch call

    • Add call to ResNetFPN
    • Add call to RPN
    • Add call to RoIHeads
  • Test model call works end to end

    • test call to ResNetFPN
    • test call to RPN
    • test call to RoIHeads
  • Add inference code

  • Remove torch functions, lower usage of .numpy()

  • Run model on test dataset, Box AP should be similar

  • Calculate inference time(s/im)

@kunwar31 kunwar31 marked this pull request as draft May 31, 2023 20:12
@Marcelo5444
Copy link

I started the same project today but you are head of me. Maybe you need to drop the last fc layer of the backbone right?

module = make_conv3x3(next_feature, layer_features,
dilation=dilation, stride=1, use_gn=use_gn
)
exec(f"self.{layer_name} = module")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda cursed, you should be able to change the name during the weight loading process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wozeparrot yes, a lot of things are cursed right now, will be taking this up one by one, while I'm adding the calls

@kunwar31
Copy link
Contributor Author

I started the same project today but you are head of me. Maybe you need to drop the last fc layer of the backbone right?

yes, they can be removed, but I won't be using them anyway in the forward call

@geohot geohot added the bounty locked Bounty is locked to someone label May 31, 2023
@tinyb0t
Copy link

tinyb0t commented Jun 1, 2023

Changes made in tinygrad/:

------------------------------------------------------------
files                             insertions       deletions
------------------------------------------------------------
tinygrad/tensor.py                         2               1
------------------------------------------------------------
lines added in the tinygrad folder: 1

@kunwar31
Copy link
Contributor Author

kunwar31 commented Jun 2, 2023

@geohot So there are still some torch functions which need to be removed, but here's an example output
image

@kunwar31
Copy link
Contributor Author

kunwar31 commented Jun 2, 2023

Reference output for the same image
image

@kunwar31
Copy link
Contributor Author

kunwar31 commented Jun 2, 2023

I'm aware that the results aren't exactly the same, this is because the resnet block output doesn't exactly match reference implementation, it matches with atol=1e-3
If I use resnet output from reference, and everything else from my implementation, results match exactly end to end

@kunwar31
Copy link
Contributor Author

kunwar31 commented Jun 2, 2023

confidence_threshold=0.6

Bbox outputs from tinygrad
image

Bbox outputs from maskrcnn_benchmark
image

@@ -435,6 +461,7 @@ def dot(self, w:Tensor) -> Tensor:

def contiguous(self): return mlops.Contiguous.apply(self)
def log(self): return mlops.Log.apply(self)
def log2(self): return mlops.Log.apply(self)/0.69314718056
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • (math.log(math.e)/math.log(2)) for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geohot 0.69314718056 is math.log(2), to change base from math.e i divided it by log of 2

tinygrad/tensor.py Outdated Show resolved Hide resolved
bs, c, py, px = x.shape
return x.reshape(bs, c, py, 1, px, 1).expand(bs, c, py, scale_factor, px, scale_factor).reshape(bs, c, py * scale_factor, px * scale_factor)

@staticmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If things use numpy like this, they don't belong in tensor.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed from tensor.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are still in tensor.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it.. I've kept interpolate in tensor.py after removing staticmethod, moved sort and topk

@kunwar31 kunwar31 marked this pull request as draft June 20, 2023 17:02
@kunwar31 kunwar31 marked this pull request as ready for review June 21, 2023 11:15
@geohot
Copy link
Collaborator

geohot commented Jun 21, 2023

Is this ready for me to test? Will run on 7900XTX and confirm it meets the target

@kunwar31
Copy link
Contributor Author

Is this ready for me to test? Will run on 7900XTX and confirm it meets the target

yes @geohot , run on 7900XTX should take around 3-4 hours
GPU=1 MODEL=mrcnn python examples/mlperf/model_eval.py

@geohot
Copy link
Collaborator

geohot commented Jun 23, 2023

Testing now, required mkdir datasets/COCO but looks like it's running

models/mask_rcnn.py Outdated Show resolved Hide resolved
@geohot
Copy link
Collaborator

geohot commented Jun 23, 2023

Made it to:
3%|████ 3%|████ | 136/5000 [1:35:07<56:42:24, 41.97s/it]

and got

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

7900XTX with 24GB of VRAM

@kunwar31
Copy link
Contributor Author

kunwar31 commented Jun 23, 2023

Made it to: 3%|████ 3%|████ | 136/5000 [1:35:07<56:42:24, 41.97s/it]

and got

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

7900XTX with 24GB of VRAM

So I’ve been using OPT=1 because of the kernel fusion issue, i usually get 8 sec per image on rtx 3060 mobile, I’m suspecting this behaviour is because of OPT=2. Could you please try OPT=1 @geohot ?

@geohot
Copy link
Collaborator

geohot commented Jun 23, 2023

Pulled, and rerunning with OPT=1

@geohot
Copy link
Collaborator

geohot commented Jun 23, 2023

OPT=1 PYTHONPATH="." GPU=1 MODEL=mrcnn python examples/mlperf/model_eval.py

3%|████▋ | 142/5000 [27:45<15:49:38, 11.73s/it]

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

@wozeparrot
Copy link
Collaborator

I think this is actually because you are running out of kernel program space, maybe try with method cache disabled?

@geohot
Copy link
Collaborator

geohot commented Jun 24, 2023

At 150 now with method cache disabled, but this is brutally slow. ETA is over 24 hours.

@geohot
Copy link
Collaborator

geohot commented Jun 25, 2023

26 hours later, congrats! Either post e-mail here or reach out to [email protected] to claim bounty

It should be faster, but the inference bounty didn't have a speed requirement :)

██████████████████████████████| 5000/5000 [26:39:09<00:00, 19.19s/it]
loading annotations into memory...
Done (t=0.35s)
creating index...      
index created!
Loading and preparing results...
DONE (t=0.54s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=20.64s).
Accumulating evaluation results...
DONE (t=3.75s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.592
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.215
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.490
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
loading annotations into memory...
Done (t=0.35s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.88s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=23.95s).
Accumulating evaluation results...
DONE (t=3.62s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.363
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.155
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.448
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.271
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623

@geohot geohot merged commit 5d3310c into tinygrad:master Jun 25, 2023
@kunwar31
Copy link
Contributor Author

@geohot thanks! I know this is very slow ATM, and the one who'll pick this up for training is going to have a very hard time. Sharing some thoughts for that person:
2 major reasons why its slow is:

  1. Huge gathers (even doing them in numpy is slow because of data transfer) (currently this also blocks gradient)
  2. topk (doing this is numpy but the data transfers slow it down)

Both of these need some kind of X[y] instructions, so I had a hard time trying to make work in tinygrad. 1 could be done fully in tinygrad, but it was even slower.

I have sent a paypal payment request to [email protected], my email is [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty locked Bounty is locked to someone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants