-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiles without annotations are not supported #216
Comments
thanks for the issue, we were just discussing this yesterday. I was under the impression that this is not supported in torchvision retinanet. The links you posted are from other libraries. I was trying to find the issue that stated this (it was true in the past), but it looks like there has been some movement? I do not know if it works yet for retinanet, i see changes to RCNN. Try adding a dummy csv with boxes of 0's to see. Please report back and we can automate this if it works. something like
|
Thanks for the fast response :) Sadly it does not work too: Leads to:
|
ya, i've been playing around here too. That one is easy to get around, i'm
writing a test currently to see what can be done. I'll update.
…On Wed, Jul 14, 2021 at 9:25 AM Nikolai Poliarnyi ***@***.***> wrote:
Thanks for the fast response :)
Sadly it does not work too:
[image: image]
<https://user-images.githubusercontent.com/1218605/125657700-2c4b9a9d-ae18-4ea0-8919-7f945de26de2.png>
Leads to:
File "/.../src/detect_trees.py", line 410
trainer.fit(self.m, train_ds)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
self._run(model)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
self.dispatch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
self.accelerator.start_training(self)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
return self.run_train()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
self.train_loop.run_training_epoch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
model_ref.optimizer_step(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
optimizer.step(closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/sgd.py", line 87, in step
loss = closure()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
result = self.training_step_and_backward(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
training_step_output = self.trainer.accelerator.training_step(args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
return self.training_type_plugin.training_step(*args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
return self.lightning_module.training_step(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/deepforest/main.py", line 349, in training_step
loss_dict = self.model.forward(images, targets)
File "/.../python/lib/python3.8/site-packages/torchvision/models/detection/retinanet.py", line 508, in forward
raise ValueError("All bounding boxes should have positive height and width."
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#216 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJHBLFYY4H2245XD2CQN4DTXW27BANCNFSM5ALXCMOA>
.
--
Ben Weinstein, Ph.D.
Postdoctoral Fellow
University of Florida
http://benweinstein.weebly.com/
|
It looks like we need to be pinned on a more recent version, recreating the env, this is the test
|
…ision>=0.9.0 required, weecology#216
I'm going to return to this, my initial reading was that torchvision 0.8.1 which is standard elsewhere is too old. Looks like this functionality was newer than that. |
@PolarNick239 see the empty_frames branch where is in dev. Tests pass for the model, but not yet for the dataset. following https://medium.com/jumio/object-detection-tutorial-with-torchvision-82b8f269f6ff if we want a negative training sample add a row with 0,0,0,0 for the bounding box, which will convert to a blank torch tensor.
|
Now it doesn't throw any exceptions, but loss became nan (in addition to zero values I set label to 'Tree'):
|
please show the annotations file. I believe this is unrelated, did you literally only do blank images? I see loss falling nicely until the model memorizes all images. Reduce learning rate, more annotations, etc. This looks expected. |
Maybe a minimal reproducer will help? empty_frames_nan_repro.zip (including annotations CSV, all images, training script and visualization of annotations on all images). There are only 17 empty tiles out of 129 (note that each tile is x8 augmented - original and mirrored one multiplied with 4 rotations: 0, 90, 180, 270): |
Running this example now, I can start by saying that I agree after 1 epoch you can loss dropping but then it goes NAN.
The first question is does this happen in the absence of the Negative Samples. If I edit and try again everything looks fine. If I remove that blank image: I still see the NaN, but more slowly (I almost stopped too early to see it). |
Looking here, it suggest this is a formatting error, which is odd because it definitely looks like we are following the proscribed format. DeepForest/deepforest/dataset.py Line 87 in d339212
@sethhenrymorgan are you getting NaN as well? Can everyone report their deepforest and torchvision versions so we are on the same page.
|
I'm stopping for the day, but i'm interested in hearing thoughts on this. With the toy example its not obvious what we should expect. I can confirm that if you reduce the number of empty frames to 2, you don't see NAN, atleast on the cpu i'm running, I can try GPU tomorrow. Loss looks normal as it jumps around.
My conclusion so far is that there is nothing 'wrong' with the code, but rather than under this toy example, having too many negative frames is bad for the model performance. Especially because those negative frames have genuine trees. So the model is focusing really hard on predicting trees, but then we show it an image with trees and act like there is nothing there. I think this causes the loss to become unstable. |
import deepforest
print(deepforest.__version__)
# 1.0.9 - installed from github - https://github.com/weecology/DeepForest/commit/b8998326c53755d017c2dc16cf1b3dfd75380876
import torchvision
print(torchvision.__version__)
# 0.10.0+cu102 It's pretty sad, currently, I am getting good results with additional training for transfer learning even on so small datasets (without empty tiles). So it is possible to easily annotate some small area to specify what kind of objects (f.e. what kind of trees) do we need, and then prediction works pretty well (even for cars). But empty tiles are pretty important because in areas like parks (where no cars were annotated) - there are a lot of false-positive detections. Anyway, big thanks for your efforts! I am still very happy that loss doesn't go crazy when training on cars :) |
Here's what I'm working with:
If I reduced my number of negative samples to 1 then I eliminated the NaN's, but the model still made no predictions.
Here's the output from m.evaluate:
But plot twist! When I remove the negative samples the same thing happens. Is there a chance that something else is wrong with DeepForest version 1.1? Version 0.2.3 had no trouble without negative samples--but it rejected the 0-size boxes with an error. |
@sethhenrymorgan since it feels like your problem has nothing to do with negative samples, can you make a separate issue and give the pertinent details, attach the annotations file and a couple sample images? Let's deal with negative samples here. @PolarNick239 I'm going to continue to chase this today. Let me see if I understand, you are starting from the tree release model, but predicting cars, not trees, even though your label says 'Tree', that's fine, its just a name. You get decent results for car prediction, but too many false positives for things which are not cars. You then specified empty images like this one (ignoring that I think there is a car in the top right?). but adding empty images does not reduce your false positive rate? My sense is this is a machine learning question and not a problem with this code base. I'm interested in the right answer either way, but my gut reaction is that because retinanet's use focal loss, meaning they focus on the hardest examples, negative samples are either too easy and don't contribute much, or so hard that it causes the loss to become unstable. I'm looking for examples online to anchor my thinking. |
Yes, DeepForest is very easy to use and the pre-trained RetinaNet backbone seems to work well for other objects (for cars and even for sea lions :) ). Yes, cars and sea lions are just very strange tree species.
My bad - missed it. Yes, I want to add empty tiles to reduce the false-positive rate (they are often encountered in zones that are not presented in non-empty tiles, i.e. zones without target objects). Before it led to exceptions, currently it leads to NaN losses. In fact, it seems to be possible to workaround the problem via blending empty tiles with some random object sample from some random non-empty tile, but it is a hack with a potential problem - it is not obvious what to do with hard bounding box edges (or we need some kind of object-background segmentation, so that object will be enblended into empty tile without its background content from its original tile). If I will find time to try such a hack - will write about my findings. Yes, I agree that it seems to be a RetinaNet related problem. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Reopening since this shouldn't have been closed since it's definitely an important issue. |
We're running into this problem with the Everglades work as well weecology/EvergladesTools#42 @bw4sz - it sounds like I should try out the the |
Sorry - this has been merged already, I just had a hard time seeing it in the commit logs. |
What was the resolution to this issue?? Which commit fixed it and how?? |
Unfortunately it's spread out over a period of time so it's a bit hard to point to a single point. In concept we had support from this from the beginning, but if I'm remembering correctly it wasn't initially supported by pytorch (when we made the switch from tensor flow) and when it became available we still had some bugs to clean up. Here are a few relevant commits: 1d8910b, 4d9593e, 7d6bb14
|
I believe the codebase works for this use case. The bigger question in my mind, is when, if ever do empty frames make a tangible difference. We will continue to work on this in our bird detection work, but I remain a little skeptical. |
Describe the bug
Sometimes it is important to have empty tiles without any annotations in train data (to show to the retinanet that this is not what should be detected). But it seems to be not supported in DeepForest.
To Reproduce
Steps to reproduce the behavior:
This leads to
Additional context
While I see also this check:
DeepForest/deepforest/dataset.py
Line 96 in c2b57e2
In my opinion this is important option, and it seems to be supported in retinanets - see:
The text was updated successfully, but these errors were encountered: