-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad results - Investigate reason #11
Comments
Sadly so far my mscoco results in Keras-FCN were no good. The loss function went negative with the full image segmentation and one_hot encoding, then I ran out of time to investigate the details. |
That's too bad... FCNSS doesn't report performance on MS-COCO iirc but in the PSPNet paper, some IoU results are mentioned on page 8 (Table 6) and they seem to be pretty decent. |
Hi guys, |
Not really, sorry. It does converge but not well enough. In the paper, the encoder is pretrained on ImageNet and the full pipeline is then fine-tuned on Cityscapes, CamVid and Sun RGB-D. However, I haven't set them up yet so I've only trained the network on MS-COCO (which often gives awful results). I'd like to finish the project at some point but I've had to move on to other stuff so at the moment I don't have the resources to do it properly, unfortunately. :( |
No, worries I'll pick it up from here and see what's the problem. |
There has been a bugfix in densenet that solved some problems so it might work better now! |
@ahundt can you elaborate further how the densenet fix may be applicable to enet-keras? it seems as if the main gradient flow and the pooling indices are connected properly or am I missing something? |
@jmtatsch Sorry my post is totally irrelevant I must have mixed up tabs on my browser or something. |
Hi guys, |
@dkorkino the PReLu also seems to be missing as compared to https://github.com/e-lab/ENet-training/blob/master/train/models/encoder.lua#L86 Could you maybe publish the converted weights? |
You're both right, @ghost and @jmtatsch. I also noticed a division bug in MaxPoolingWithArgmax2D that resulted in unwanted behavior on python 3 and another in the data generator. Thanks a lot for the feedback 👍. Sorry for taking this long to tackle the issue but I'd been on vacation until yesterday. |
@dkorkino @jmtatsch I am also looking forward to the release of the converted weights. |
Does anyone have any idea why it takes so long to train? I'm getting something like 25K seconds per epoch on MS-COCO (~80K samples) on a K40 for input dimensions of 256x256. That amounts to ~0.3s per sample, so let's say about 10 fps for just the forward pass. That's much slower than the reported performance (135.4 fps for 640x360 on a Titan X) I used to think it might be due to preprocessing but it actually only takes a fraction of that time. Any thoughts? |
Keras spends a lot of time with an empty gpu. There are collectively quite a few reasons, some of which are discussed in keras-team/keras#6928. Putting things into a tfrecord, using #6928 and using the TF staging areas could help. Alternately, there are some ways to do it with tensorflow proper, but there aren't great public examples aside from https://www.tensorflow.org/performance/performance_models, which is a bit convoluted. |
That's a bummer to the extent it's true, I'd rather it was 100% my own mistake. There's definitely room for improvement in my implementation (still waiting for training to finish but judging by the progression of the loss, I don't expect the results to be much better than the current ones), however speed is an issue that hinders prototyping and evaluation, especially when this network takes more than 10x as much time as it should to train, and I'm not sure what I could do to fix it. I've monitored the utilization of the GPU and it's not that low though, maybe that's not always such a big deal? I'll check out the available solutions when I find some time, thanks @ahundt . |
It will definitely vary a bunch by use case and your physical hardware. For example if you've got a titan x but no super fast SSD I don't think it will be feasible to train 135fps. Wouldn't that figure most likely be with 8x titan x devices? |
Hmm, I don't think they used multiple gpus to get that number, because the
authors report 10x-20x better performance than segnet and my results with
segnet are comparable to the ones they mention.
On Aug 26, 2017 01:38, "Andrew Hundt" <[email protected]> wrote:
It will definitely vary a bunch by use case and your physical hardware. For
example if you've got a titan x but no super fast SSD I don't think it will
be feasible to train 135fps. Wouldn't that figure most likely be with 8x
titan x devices?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXtwcq6BkExPP3qVtU8Q571zrNJCf3Pks5sb0zYgaJpZM4Nia9P>
.
CC r a
|
@jmtatsch @ColdCodeCool @ghost @ahundt
Any questions/comments/criticism are welcome as always :) |
Haven't tried to train the network yet but I'll let you know how it goes when I do. |
@PavlosMelissinos Hey I was looking through your latest version, and perhaps I misunderstood what I read, but have you considered changing your loss function when training from scratch? Something like these may be necessary for segmentation: |
The main problem is that it doesn't work well enough even with the pretrained weights. However, crossentropy without bg seems interesting and it might be what I need, thanks. I'll check it out! |
I added some segmentation metrics and losses: |
This is using the official mscoco script.
Setup as: full image as input, each pixel gets classified using a one hot vector with a size of 81, 0 to 80 inclusive, that correspond to the actual category ids in MS-COCO. More specifically, index 0 is background, ..., index 12 corresponds to class id 13 (stop sign), ..., and index 80 is in fact class 90 (toothbrush). Output is the full image, not a crop. Then a script is used to separate the pixels of each detected object. No classes were used in the evalCOCO.py script (useCats = False).
These are really bad scores, and at the moment I have no idea why it's like that. I'll push the changes soon.
Which script do you use for evaluation @athundt ? If you have a working version maybe I should just replace mine with it. Does this work for mscoco?
The text was updated successfully, but these errors were encountered: