Tiles without annotations are not supported #216

PolarNick239 · 2021-07-14T15:41:53Z

Describe the bug
Sometimes it is important to have empty tiles without any annotations in train data (to show to the retinanet that this is not what should be detected). But it seems to be not supported in DeepForest.

To Reproduce
Steps to reproduce the behavior:

all_annotations = pd.DataFrame(columns=['image_path', 'xmin', 'ymin', 'xmax', 'ymax', 'label'])

empty_tile_name = "empty_tile.jpg"
empty_tile = np.zeros((400, 400, 3), np.uint8)
empty_tile[:, :, :] = 255
cv2.imwrite(empty_tile_name, empty_tile)

all_annotations = all_annotations.append({'image_path': empty_tile_name, 'xmin': '', 'ymin': '', 'xmax': '', 'ymax': '', 'label': ''}, ignore_index=True)

...

all_annotations.to_csv(annotations_file, header=True, index=False)

trainer = Trainer(max_epochs=20, gpus=1, auto_select_gpus=True)
train_ds = m.load_dataset(annotations_file, root_dir=os.path.dirname(annotations_file))
trainer.fit(self.m, train_ds)

This leads to

File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
  data = fetcher.fetch(index)
File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
  data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
  data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.../python/lib/python3.8/site-packages/deepforest/dataset.py", line 81, in __getitem__
  targets["labels"] = image_annotations.label.apply(
File "/.../python/lib/python3.8/site-packages/pandas/core/series.py", line 4356, in apply
  return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/.../python/lib/python3.8/site-packages/pandas/core/apply.py", line 1036, in apply
  return self.apply_standard()
File "/.../python/lib/python3.8/site-packages/pandas/core/apply.py", line 1092, in apply_standard
  mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
File "/.../python/lib/python3.8/site-packages/deepforest/dataset.py", line 82, in <lambda>
   lambda x: self.label_dict[x]).values.astype(int)

Additional context

While I see also this check:

DeepForest/deepforest/dataset.py

Line 96 in c2b57e2

    
           raise ValueError("Blank annotations are not allowed in retinanets. Check data augmentation for image {} with shape {}, no overlapping boxes found".format(self.image_names[idx], image.shape))

In my opinion this is important option, and it seems to be supported in retinanets - see:

The text was updated successfully, but these errors were encountered:

bw4sz · 2021-07-14T16:04:48Z

thanks for the issue, we were just discussing this yesterday. I was under the impression that this is not supported in torchvision retinanet. The links you posted are from other libraries. I was trying to find the issue that stated this (it was true in the past), but it looks like there has been some movement?
pytorch/vision#1598
pytorch/vision#1911
pytorch/vision#3032

I do not know if it works yet for retinanet, i see changes to RCNN. Try adding a dummy csv with boxes of 0's to see. Please report back and we can automate this if it works. something like

image_path, xmin, ymin, xmax, ymax, label
img.png, 0,0,0,0,"Tree"

PolarNick239 · 2021-07-14T16:25:10Z

Thanks for the fast response :)

Sadly it does not work too:

Leads to:

File "/.../src/detect_trees.py", line 410
  trainer.fit(self.m, train_ds)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
  self._run(model)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
  self.dispatch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
  self.accelerator.start_training(self)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
  self.training_type_plugin.start_training(trainer)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
  self._results = trainer.run_stage()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
  return self.run_train()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
  self.train_loop.run_training_epoch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
  batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
  self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
  model_ref.optimizer_step(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
  optimizer.step(closure=optimizer_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
  self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
  trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
  self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
  self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
  optimizer.step(closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
  return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
  return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/sgd.py", line 87, in step
  loss = closure()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
  result = self.training_step_and_backward(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
  result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
  training_step_output = self.trainer.accelerator.training_step(args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
  return self.training_type_plugin.training_step(*args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
  return self.lightning_module.training_step(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/deepforest/main.py", line 349, in training_step
  loss_dict = self.model.forward(images, targets)
File "/.../python/lib/python3.8/site-packages/torchvision/models/detection/retinanet.py", line 508, in forward
  raise ValueError("All bounding boxes should have positive height and width."
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0.

bw4sz · 2021-07-14T16:36:41Z

ya, i've been playing around here too. That one is easy to get around, i'm writing a test currently to see what can be done. I'll update.

…

On Wed, Jul 14, 2021 at 9:25 AM Nikolai Poliarnyi ***@***.***> wrote: Thanks for the fast response :) Sadly it does not work too: [image: image] <https://user-images.githubusercontent.com/1218605/125657700-2c4b9a9d-ae18-4ea0-8919-7f945de26de2.png> Leads to: File "/.../src/detect_trees.py", line 410 trainer.fit(self.m, train_ds) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit self._run(model) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run self.dispatch() File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch self.accelerator.start_training(self) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training self.training_type_plugin.start_training(trainer) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training self._results = trainer.run_stage() File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage return self.run_train() File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train self.train_loop.run_training_epoch() File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step model_ref.optimizer_step( File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step optimizer.step(closure=optimizer_closure) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step optimizer.step(closure=lambda_closure, **kwargs) File "/.../python/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, **kwargs) File "/.../python/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/.../python/lib/python3.8/site-packages/torch/optim/sgd.py", line 87, in step loss = closure() File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure result = self.training_step_and_backward( File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward result = self.training_step(split_batch, batch_idx, opt_idx, hiddens) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step training_step_output = self.trainer.accelerator.training_step(args) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step return self.training_type_plugin.training_step(*args) File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step return self.lightning_module.training_step(*args, **kwargs) File "/.../python/lib/python3.8/site-packages/deepforest/main.py", line 349, in training_step loss_dict = self.model.forward(images, targets) File "/.../python/lib/python3.8/site-packages/torchvision/models/detection/retinanet.py", line 508, in forward raise ValueError("All bounding boxes should have positive height and width." ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#216 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJHBLFYY4H2245XD2CQN4DTXW27BANCNFSM5ALXCMOA> .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

bw4sz · 2021-07-14T17:13:29Z

It looks like we need to be pinned on a more recent version, recreating the env, this is the test

#Empty tester from https://github.com/datumbox/vision/blob/06ebee1a9f10c76d8ac5768fd578362dd5ace6e9/test/test_models_detection_negative_samples.py#L14
def _make_empty_sample():
    images = [torch.rand((3, 100, 100), dtype=torch.float32)]
    boxes = torch.zeros((0, 4), dtype=torch.float32)
    negative_target = {"boxes": boxes,
                       "labels": torch.zeros(0, dtype=torch.int64),
                       "image_id": 4,
                       "area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]),
                       "iscrowd": torch.zeros((0,), dtype=torch.int64)}

    targets = [negative_target]
    return images, targets

def test_forward_negative_sample_retinanet(self):
    model = torchvision.models.detection.retinanet_resnet50_fpn(
        num_classes=2, min_size=100, max_size=100)

    images, targets = self._make_empty_sample()
    loss_dict = model(images, targets)

    self.assertEqual(loss_dict["bbox_regression"], torch.tensor(0.))

…ision>=0.9.0 required, weecology#216

bw4sz · 2021-07-19T15:48:16Z

I'm going to return to this, my initial reading was that torchvision 0.8.1 which is standard elsewhere is too old. Looks like this functionality was newer than that.

bw4sz · 2021-07-23T18:20:22Z

@PolarNick239 see the empty_frames branch where is in dev. Tests pass for the model, but not yet for the dataset.

following https://medium.com/jumio/object-detection-tutorial-with-torchvision-82b8f269f6ff

if we want a negative training sample add a row with 0,0,0,0 for the bounding box, which will convert to a blank torch tensor.

boxes = boxes = torch.zeros((0, 4), dtype=torch.float32)

PolarNick239 · 2021-07-26T17:00:34Z

Now it doesn't throw any exceptions, but loss became nan (in addition to zero values I set label to 'Tree'):

Training:   0%|          | 0/129 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/129 [00:00<?, ?it/s] 
Epoch 0:   1%|          | 1/129 [00:00<00:33,  3.79it/s]
Epoch 0:   1%|          | 1/129 [00:00<00:33,  3.78it/s, loss=0.243]
Epoch 0:   2%|1         | 2/129 [00:00<00:22,  5.56it/s, loss=1.16] 
Epoch 0:   2%|2         | 3/129 [00:00<00:19,  6.63it/s, loss=1.16]
Epoch 0:   2%|2         | 3/129 [00:00<00:19,  6.62it/s, loss=1.49]
Epoch 0:   3%|3         | 4/129 [00:00<00:16,  7.42it/s, loss=1.62]
Epoch 0:   4%|3         | 5/129 [00:00<00:15,  8.06it/s, loss=1.62]
Epoch 0:   4%|3         | 5/129 [00:00<00:15,  8.06it/s, loss=1.64]
Epoch 0:   5%|4         | 6/129 [00:00<00:14,  8.57it/s, loss=1.61]
Epoch 0:   5%|5         | 7/129 [00:00<00:13,  8.93it/s, loss=1.61]
Epoch 0:   5%|5         | 7/129 [00:00<00:13,  8.93it/s, loss=1.59]
Epoch 0:   6%|6         | 8/129 [00:00<00:13,  9.25it/s, loss=1.52]
Epoch 0:   7%|6         | 9/129 [00:00<00:12,  9.46it/s, loss=1.52]
Epoch 0:   7%|6         | 9/129 [00:00<00:12,  9.46it/s, loss=1.47]
Epoch 0:   8%|7         | 10/129 [00:01<00:12,  9.68it/s, loss=1.42]
Epoch 0:   9%|8         | 11/129 [00:01<00:11,  9.86it/s, loss=1.42]
Epoch 0:   9%|8         | 11/129 [00:01<00:11,  9.86it/s, loss=1.37]
Epoch 0:   9%|9         | 12/129 [00:01<00:11, 10.03it/s, loss=1.33]
Epoch 0:  10%|#         | 13/129 [00:01<00:11, 10.16it/s, loss=1.33]
Epoch 0:  10%|#         | 13/129 [00:01<00:11, 10.15it/s, loss=1.33]
Epoch 0:  11%|#         | 14/129 [00:01<00:11, 10.29it/s, loss=1.29]
Epoch 0:  12%|#1        | 15/129 [00:01<00:10, 10.41it/s, loss=1.29]
Epoch 0:  12%|#1        | 15/129 [00:01<00:10, 10.41it/s, loss=1.29]
Epoch 0:  12%|#2        | 16/129 [00:01<00:10, 10.52it/s, loss=1.29]
Epoch 0:  13%|#3        | 17/129 [00:01<00:10, 10.61it/s, loss=1.29]
Epoch 0:  13%|#3        | 17/129 [00:01<00:10, 10.61it/s, loss=3.07]
Epoch 0:  14%|#3        | 18/129 [00:01<00:10, 10.71it/s, loss=8.96]
Epoch 0:  15%|#4        | 19/129 [00:01<00:10, 10.78it/s, loss=8.96]
Epoch 0:  15%|#4        | 19/129 [00:01<00:10, 10.78it/s, loss=147] 
Epoch 0:  16%|#5        | 20/129 [00:01<00:10, 10.85it/s, loss=9.94e+29]
Epoch 0:  16%|#6        | 21/129 [00:01<00:09, 10.92it/s, loss=9.94e+29]
Epoch 0:  16%|#6        | 21/129 [00:01<00:09, 10.91it/s, loss=nan]     
...
nans everywhere

bw4sz · 2021-07-26T21:13:26Z

please show the annotations file. I believe this is unrelated, did you literally only do blank images? I see loss falling nicely until the model memorizes all images. Reduce learning rate, more annotations, etc. This looks expected.

PolarNick239 · 2021-07-27T08:05:06Z

Maybe a minimal reproducer will help? empty_frames_nan_repro.zip (including annotations CSV, all images, training script and visualization of annotations on all images).

There are only 17 empty tiles out of 129 (note that each tile is x8 augmented - original and mirrored one multiplied with 4 rotations: 0, 90, 180, 270):

bw4sz · 2021-08-16T22:15:33Z

Running this example now, I can start by saying that I agree after 1 epoch you can loss dropping but then it goes NAN.

root_dir = "/Users/benweinstein/Downloads/empty_frames_nan_repro/empty_frames_nan_repro/"
annotations_file = root_dir + "annotations_no_empty.csv"

from deepforest import main
from pytorch_lightning import Trainer

m = main.deepforest()
m.use_release()

trainer = Trainer(max_epochs=2, gpus=0)
train_ds = m.load_dataset(annotations_file, root_dir=root_dir)
trainer.fit(m, train_ds)

The first question is does this happen in the absence of the Negative Samples. If I edit

annotations_no_empty.csv

and try again everything looks fine.

If I remove that blank image:

annotations_no_blank.csv

I still see the NaN, but more slowly (I almost stopped too early to see it).

bw4sz · 2021-08-16T22:19:22Z

Looking here, it suggest this is a formatting error, which is odd because it definitely looks like we are following the proscribed format.

pytorch/vision#2144

DeepForest/deepforest/dataset.py

Line 87 in d339212

boxes = boxes = torch.zeros((0, 4), dtype=torch.float32)

@sethhenrymorgan are you getting NaN as well?

Can everyone report their deepforest and torchvision versions so we are on the same page.

print(__version__)
1.1.0
from torchvision import __version__
print(__version__)
0.10.0

bw4sz · 2021-08-16T22:48:09Z

I'm stopping for the day, but i'm interested in hearing thoughts on this. With the toy example its not obvious what we should expect. I can confirm that if you reduce the number of empty frames to 2, you don't see NAN, atleast on the cpu i'm running, I can try GPU tomorrow. Loss looks normal as it jumps around.

Reading config file: deepforest_config.yml
Model from DeepForest release https://github.com/weecology/DeepForest/releases/tag/1.0.0 was already downloaded. Loading model from file.
Loading pre-built model: https://github.com/weecology/DeepForest/releases/tag/1.0.0
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)

Validation sanity check: 0it [00:00, ?it/s]/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:378: UserWarning: One of given dataloaders is None and it will be skipped.
  rank_zero_warn("One of given dataloaders is None and it will be skipped.")

                                           
/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(

Training: -1it [00:00, ?it/s]
Training:   0%|          | 0/128 [00:00<00:00, 11915.64it/s]
Epoch 0:   0%|          | 0/128 [00:00<00:00, 2416.07it/s]  /Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

Epoch 0:   1%|          | 1/128 [00:09<10:18,  4.87s/it]  
Epoch 0:   1%|          | 1/128 [00:09<10:18,  4.87s/it, loss=1.62, v_num=15]
Epoch 0:   2%|▏         | 2/128 [00:19<13:35,  6.47s/it, loss=1.62, v_num=15]
Epoch 0:   2%|▏         | 2/128 [00:19<13:35,  6.47s/it, loss=1.57, v_num=15]
Epoch 0:   2%|▏         | 3/128 [00:28<14:57,  7.18s/it, loss=1.57, v_num=15]
Epoch 0:   2%|▏         | 3/128 [00:28<14:57,  7.18s/it, loss=1.59, v_num=15]
Epoch 0:   3%|▎         | 4/128 [00:38<15:56,  7.72s/it, loss=1.59, v_num=15]
Epoch 0:   3%|▎         | 4/128 [00:38<15:56,  7.72s/it, loss=1.47, v_num=15]
Epoch 0:   4%|▍         | 5/128 [00:48<16:24,  8.00s/it, loss=1.47, v_num=15]
Epoch 0:   4%|▍         | 5/128 [00:48<16:24,  8.00s/it, loss=1.36, v_num=15]
Epoch 0:   5%|▍         | 6/128 [00:57<16:44,  8.23s/it, loss=1.36, v_num=15]
Epoch 0:   5%|▍         | 6/128 [00:57<16:44,  8.23s/it, loss=1.3, v_num=15] 
Epoch 0:   5%|▌         | 7/128 [01:07<16:55,  8.39s/it, loss=1.3, v_num=15]
Epoch 0:   5%|▌         | 7/128 [01:07<16:55,  8.39s/it, loss=1.23, v_num=15]
Epoch 0:   6%|▋         | 8/128 [01:16<17:05,  8.55s/it, loss=1.23, v_num=15]
Epoch 0:   6%|▋         | 8/128 [01:16<17:05,  8.55s/it, loss=1.19, v_num=15]
Epoch 0:   7%|▋         | 9/128 [01:26<17:13,  8.68s/it, loss=1.19, v_num=15]
Epoch 0:   7%|▋         | 9/128 [01:26<17:13,  8.68s/it, loss=1.19, v_num=15]
Epoch 0:   8%|▊         | 10/128 [01:37<17:27,  8.88s/it, loss=1.19, v_num=15]
Epoch 0:   8%|▊         | 10/128 [01:37<17:27,  8.88s/it, loss=1.18, v_num=15]
Epoch 0:   9%|▊         | 11/128 [01:47<17:26,  8.95s/it, loss=1.18, v_num=15]
Epoch 0:   9%|▊         | 11/128 [01:47<17:26,  8.95s/it, loss=1.14, v_num=15]
Epoch 0:   9%|▉         | 12/128 [01:57<17:24,  9.00s/it, loss=1.14, v_num=15]
Epoch 0:   9%|▉         | 12/128 [01:57<17:24,  9.00s/it, loss=1.1, v_num=15] 
Epoch 0:  10%|█         | 13/128 [02:06<17:21,  9.05s/it, loss=1.1, v_num=15]
Epoch 0:  10%|█         | 13/128 [02:06<17:21,  9.05s/it, loss=1.07, v_num=15]
Epoch 0:  11%|█         | 14/128 [02:16<17:17,  9.10s/it, loss=1.07, v_num=15]
Epoch 0:  11%|█         | 14/128 [02:16<17:17,  9.10s/it, loss=1.05, v_num=15]
Epoch 0:  12%|█▏        | 15/128 [02:25<17:10,  9.12s/it, loss=1.05, v_num=15]
Epoch 0:  12%|█▏        | 15/128 [02:25<17:10,  9.12s/it, loss=1.03, v_num=15]
Epoch 0:  12%|█▎        | 16/128 [02:35<17:05,  9.15s/it, loss=1.03, v_num=15]
Epoch 0:  12%|█▎        | 16/128 [02:35<17:05,  9.15s/it, loss=1.01, v_num=15]
Epoch 0:  13%|█▎        | 17/128 [02:45<16:59,  9.18s/it, loss=1.01, v_num=15]
Epoch 0:  13%|█▎        | 17/128 [02:45<16:59,  9.18s/it, loss=0.995, v_num=15]
Epoch 0:  14%|█▍        | 18/128 [02:56<17:00,  9.27s/it, loss=0.995, v_num=15]
Epoch 0:  14%|█▍        | 18/128 [02:56<17:00,  9.27s/it, loss=0.983, v_num=15]
Epoch 0:  15%|█▍        | 19/128 [03:06<16:58,  9.35s/it, loss=0.983, v_num=15]
Epoch 0:  15%|█▍        | 19/128 [03:06<16:58,  9.35s/it, loss=0.972, v_num=15]
Epoch 0:  16%|█▌        | 20/128 [03:17<16:56,  9.41s/it, loss=0.972, v_num=15]
Epoch 0:  16%|█▌        | 20/128 [03:17<16:56,  9.41s/it, loss=1.43, v_num=15] 
Epoch 0:  16%|█▋        | 21/128 [03:27<16:51,  9.45s/it, loss=1.43, v_num=15]
Epoch 0:  16%|█▋        | 21/128 [03:27<16:51,  9.45s/it, loss=1.63, v_num=15]
Epoch 0:  17%|█▋        | 22/128 [03:37<16:41,  9.45s/it, loss=1.63, v_num=15]
Epoch 0:  17%|█▋        | 22/128 [03:37<16:41,  9.45s/it, loss=2.23, v_num=15]
Epoch 0:  18%|█▊        | 23/128 [03:47<16:34,  9.47s/it, loss=2.23, v_num=15]
Epoch 0:  18%|█▊        | 23/128 [03:47<16:34,  9.47s/it, loss=2.28, v_num=15]
Epoch 0:  19%|█▉        | 24/128 [03:56<16:24,  9.47s/it, loss=2.28, v_num=15]
Epoch 0:  19%|█▉        | 24/128 [03:56<16:24,  9.47s/it, loss=2.32, v_num=15]
Epoch 0:  20%|█▉        | 25/128 [04:06<16:15,  9.47s/it, loss=2.32, v_num=15]
Epoch 0:  20%|█▉        | 25/128 [04:06<16:15,  9.47s/it, loss=2.37, v_num=15]
Epoch 0:  20%|██        | 26/128 [04:15<16:06,  9.48s/it, loss=2.37, v_num=15]
Epoch 0:  20%|██        | 26/128 [04:15<16:06,  9.48s/it, loss=2.4, v_num=15] 
Epoch 0:  21%|██        | 27/128 [04:25<15:57,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  21%|██        | 27/128 [04:25<15:57,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  22%|██▏       | 28/128 [04:34<15:47,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  22%|██▏       | 28/128 [04:34<15:47,  9.48s/it, loss=2.43, v_num=15]
Epoch 0:  23%|██▎       | 29/128 [04:44<15:38,  9.48s/it, loss=2.43, v_num=15]
Epoch 0:  23%|██▎       | 29/128 [04:44<15:38,  9.48s/it, loss=2.45, v_num=15]
Epoch 0:  23%|██▎       | 30/128 [04:54<15:29,  9.48s/it, loss=2.45, v_num=15]
Epoch 0:  23%|██▎       | 30/128 [04:54<15:29,  9.48s/it, loss=2.42, v_num=15]
Epoch 0:  24%|██▍       | 31/128 [05:03<15:20,  9.49s/it, loss=2.42, v_num=15]
Epoch 0:  24%|██▍       | 31/128 [05:03<15:20,  9.49s/it, loss=2.45, v_num=15]
Epoch 0:  25%|██▌       | 32/128 [05:13<15:12,  9.51s/it, loss=2.45, v_num=15]
Epoch 0:  25%|██▌       | 32/128 [05:13<15:12,  9.51s/it, loss=2.51, v_num=15]
Epoch 0:  26%|██▌       | 33/128 [05:23<15:03,  9.51s/it, loss=2.51, v_num=15]
Epoch 0:  26%|██▌       | 33/128 [05:23<15:03,  9.51s/it, loss=2.56, v_num=15]
Epoch 0:  27%|██▋       | 34/128 [05:32<14:53,  9.51s/it, loss=2.56, v_num=15]
Epoch 0:  27%|██▋       | 34/128 [05:32<14:53,  9.51s/it, loss=2.6, v_num=15] 
Epoch 0:  27%|██▋       | 35/128 [05:42<14:45,  9.53s/it, loss=2.6, v_num=15]
Epoch 0:  27%|██▋       | 35/128 [05:42<14:45,  9.53s/it, loss=2.64, v_num=15]
Epoch 0:  28%|██▊       | 36/128 [05:52<14:37,  9.54s/it, loss=2.64, v_num=15]
Epoch 0:  28%|██▊       | 36/128 [05:52<14:37,  9.54s/it, loss=2.69, v_num=15]
Epoch 0:  29%|██▉       | 37/128 [06:02<14:28,  9.55s/it, loss=2.69, v_num=15]
Epoch 0:  29%|██▉       | 37/128 [06:02<14:28,  9.55s/it, loss=2.73, v_num=15]
Epoch 0:  30%|██▉       | 38/128 [06:12<14:20,  9.56s/it, loss=2.73, v_num=15]
Epoch 0:  30%|██▉       | 38/128 [06:12<14:20,  9.56s/it, loss=2.77, v_num=15]
Epoch 0:  30%|███       | 39/128 [06:22<14:11,  9.57s/it, loss=2.77, v_num=15]
Epoch 0:  30%|███       | 39/128 [06:22<14:11,  9.57s/it, loss=2.8, v_num=15] 
Epoch 0:  31%|███▏      | 40/128 [06:32<14:02,  9.58s/it, loss=2.8, v_num=15]
Epoch 0:  31%|███▏      | 40/128 [06:32<14:02,  9.58s/it, loss=2.37, v_num=15]
Epoch 0:  32%|███▏      | 41/128 [06:42<13:54,  9.59s/it, loss=2.37, v_num=15]
Epoch 0:  32%|███▏      | 41/128 [06:42<13:54,  9.59s/it, loss=2.4, v_num=15] 
Epoch 0:  33%|███▎      | 42/128 [06:53<13:46,  9.61s/it, loss=2.4, v_num=15]
Epoch 0:  33%|███▎      | 42/128 [06:53<13:46,  9.61s/it, loss=1.81, v_num=15]
Epoch 0:  34%|███▎      | 43/128 [07:02<13:36,  9.61s/it, loss=1.81, v_num=15]
Epoch 0:  34%|███▎      | 43/128 [07:02<13:36,  9.61s/it, loss=1.8, v_num=15] 
Epoch 0:  34%|███▍      | 44/128 [07:12<13:26,  9.60s/it, loss=1.8, v_num=15]
Epoch 0:  34%|███▍      | 44/128 [07:12<13:26,  9.60s/it, loss=1.79, v_num=15]
Epoch 0:  35%|███▌      | 45/128 [07:21<13:16,  9.59s/it, loss=1.79, v_num=15]
Epoch 0:  35%|███▌      | 45/128 [07:21<13:16,  9.59s/it, loss=1.77, v_num=15]
Epoch 0:  36%|███▌      | 46/128 [07:30<13:06,  9.59s/it, loss=1.77, v_num=15]
Epoch 0:  36%|███▌      | 46/128 [07:30<13:06,  9.59s/it, loss=1.78, v_num=15]
Epoch 0:  37%|███▋      | 47/128 [07:39<12:56,  9.58s/it, loss=1.78, v_num=15]
Epoch 0:  37%|███▋      | 47/128 [07:39<12:56,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 48/128 [07:49<12:46,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 48/128 [07:49<12:46,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 49/128 [07:58<12:36,  9.57s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 49/128 [07:58<12:36,  9.57s/it, loss=1.82, v_num=15]
Epoch 0:  39%|███▉      | 50/128 [08:07<12:25,  9.56s/it, loss=1.82, v_num=15]
Epoch 0:  39%|███▉      | 50/128 [08:07<12:25,  9.56s/it, loss=1.85, v_num=15]
Epoch 0:  40%|███▉      | 51/128 [08:16<12:15,  9.56s/it, loss=1.85, v_num=15]
Epoch 0:  40%|███▉      | 51/128 [08:16<12:15,  9.56s/it, loss=1.79, v_num=15]
Epoch 0:  41%|████      | 52/128 [08:26<12:05,  9.55s/it, loss=1.79, v_num=15]
Epoch 0:  41%|████      | 52/128 [08:26<12:05,  9.55s/it, loss=1.8, v_num=15] 
Epoch 0:  41%|████▏     | 53/128 [08:35<11:56,  9.55s/it, loss=1.8, v_num=15]
Epoch 0:  41%|████▏     | 53/128 [08:35<11:56,  9.55s/it, loss=1.8, v_num=15]

My conclusion so far is that there is nothing 'wrong' with the code, but rather than under this toy example, having too many negative frames is bad for the model performance. Especially because those negative frames have genuine trees. So the model is focusing really hard on predicting trees, but then we show it an image with trees and act like there is nothing there. I think this causes the loss to become unstable.

PolarNick239 · 2021-08-17T09:26:43Z

import deepforest
print(deepforest.__version__)
# 1.0.9 - installed from github - https://github.com/weecology/DeepForest/commit/b8998326c53755d017c2dc16cf1b3dfd75380876

import torchvision
print(torchvision.__version__)
# 0.10.0+cu102

It's pretty sad, currently, I am getting good results with additional training for transfer learning even on so small datasets (without empty tiles). So it is possible to easily annotate some small area to specify what kind of objects (f.e. what kind of trees) do we need, and then prediction works pretty well (even for cars). But empty tiles are pretty important because in areas like parks (where no cars were annotated) - there are a lot of false-positive detections.

Anyway, big thanks for your efforts! I am still very happy that loss doesn't go crazy when training on cars :)

sethhenrymorgan · 2021-08-17T14:10:22Z

Here's what I'm working with:

!pip install git+https://[email protected]/weecology/DeepForest.git
from deepforest import (__version__) 
print(__version__) 
1.1.0 
from torchvision import __version__
print(__version__)
0.10.0+cu102

If I reduced my number of negative samples to 1 then I eliminated the NaN's, but the model still made no predictions.

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]
Epoch 1: 100%
69/69 [00:12<00:00, 5.68it/s, loss=0.679

Here's the output from m.evaluate:

No predictions made
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

But plot twist! When I remove the negative samples the same thing happens. Is there a chance that something else is wrong with DeepForest version 1.1? Version 0.2.3 had no trouble without negative samples--but it rejected the 0-size boxes with an error.

bw4sz · 2021-08-17T16:13:23Z

@sethhenrymorgan since it feels like your problem has nothing to do with negative samples, can you make a separate issue and give the pertinent details, attach the annotations file and a couple sample images? Let's deal with negative samples here.

@PolarNick239 I'm going to continue to chase this today. Let me see if I understand, you are starting from the tree release model, but predicting cars, not trees, even though your label says 'Tree', that's fine, its just a name. You get decent results for car prediction, but too many false positives for things which are not cars. You then specified empty images like this one (ignoring that I think there is a car in the top right?).

but adding empty images does not reduce your false positive rate?

My sense is this is a machine learning question and not a problem with this code base. I'm interested in the right answer either way, but my gut reaction is that because retinanet's use focal loss, meaning they focus on the hardest examples, negative samples are either too easy and don't contribute much, or so hard that it causes the loss to become unstable. I'm looking for examples online to anchor my thinking.

PolarNick239 · 2021-08-17T16:46:18Z

Yes, DeepForest is very easy to use and the pre-trained RetinaNet backbone seems to work well for other objects (for cars and even for sea lions :) ). Yes, cars and sea lions are just very strange tree species.

there is a car in the top right?

My bad - missed it.

Yes, I want to add empty tiles to reduce the false-positive rate (they are often encountered in zones that are not presented in non-empty tiles, i.e. zones without target objects). Before it led to exceptions, currently it leads to NaN losses.

In fact, it seems to be possible to workaround the problem via blending empty tiles with some random object sample from some random non-empty tile, but it is a hack with a potential problem - it is not obvious what to do with hard bounding box edges (or we need some kind of object-background segmentation, so that object will be enblended into empty tile without its background content from its original tile). If I will find time to try such a hack - will write about my findings.

Yes, I agree that it seems to be a RetinaNet related problem.

github-actions · 2021-09-17T01:51:24Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2021-10-01T01:51:51Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

ethanwhite · 2021-10-11T15:34:03Z

Reopening since this shouldn't have been closed since it's definitely an important issue.

ethanwhite · 2021-10-11T15:45:13Z

Yes, I want to add empty tiles to reduce the false-positive rate (they are often encountered in zones that are not presented in non-empty tiles, i.e. zones without target objects).

We're running into this problem with the Everglades work as well weecology/EvergladesTools#42

@bw4sz - it sounds like I should try out the the empty_frames branch and see if it works for this use case?

ethanwhite · 2021-10-11T15:53:00Z

Sorry - this has been merged already, I just had a hard time seeing it in the commit logs.

carsumptive · 2022-03-30T05:43:35Z

What was the resolution to this issue?? Which commit fixed it and how??

ethanwhite · 2022-03-30T11:31:21Z

Unfortunately it's spread out over a period of time so it's a bit hard to point to a single point. In concept we had support from this from the beginning, but if I'm remembering correctly it wasn't initially supported by pytorch (when we made the switch from tensor flow) and when it became available we still had some bugs to clean up.

Here are a few relevant commits: 1d8910b, 4d9593e, 7d6bb14

git log --all --grep="empty" should find most of the relevant commits (plus a number of extra ones)

bw4sz · 2022-03-30T18:34:47Z

I believe the codebase works for this use case. The bigger question in my mind, is when, if ever do empty frames make a tangible difference. We will continue to work on this in our bird detection work, but I remain a little skeptical.

PolarNick239 added a commit to PolarNick239/DeepForest that referenced this issue Jul 15, 2021

Support for images with no annotations for RetinaNet training, torchv…

7f459eb

…ision>=0.9.0 required, weecology#216

PolarNick239 mentioned this issue Jul 15, 2021

Support for images with no annotations for RetinaNet training #218

Closed

ethanwhite mentioned this issue Aug 11, 2021

Negative Samples error #243

Closed

github-actions bot added the stale label Sep 17, 2021

github-actions bot closed this as completed Oct 1, 2021

ethanwhite reopened this Oct 11, 2021

ethanwhite closed this as completed Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiles without annotations are not supported #216

Tiles without annotations are not supported #216

PolarNick239 commented Jul 14, 2021

bw4sz commented Jul 14, 2021

PolarNick239 commented Jul 14, 2021

bw4sz commented Jul 14, 2021 via email

bw4sz commented Jul 14, 2021

bw4sz commented Jul 19, 2021

bw4sz commented Jul 23, 2021 •

edited

Loading

PolarNick239 commented Jul 26, 2021 •

edited

Loading

bw4sz commented Jul 26, 2021

PolarNick239 commented Jul 27, 2021

bw4sz commented Aug 16, 2021 •

edited

Loading

bw4sz commented Aug 16, 2021 •

edited

Loading

bw4sz commented Aug 16, 2021

PolarNick239 commented Aug 17, 2021

sethhenrymorgan commented Aug 17, 2021

bw4sz commented Aug 17, 2021

PolarNick239 commented Aug 17, 2021

github-actions bot commented Sep 17, 2021

github-actions bot commented Oct 1, 2021

ethanwhite commented Oct 11, 2021

ethanwhite commented Oct 11, 2021

ethanwhite commented Oct 11, 2021

carsumptive commented Mar 30, 2022

ethanwhite commented Mar 30, 2022

bw4sz commented Mar 30, 2022

Tiles without annotations are not supported #216

Tiles without annotations are not supported #216

Comments

PolarNick239 commented Jul 14, 2021

bw4sz commented Jul 14, 2021

PolarNick239 commented Jul 14, 2021

bw4sz commented Jul 14, 2021 via email

bw4sz commented Jul 14, 2021

bw4sz commented Jul 19, 2021

bw4sz commented Jul 23, 2021 • edited Loading

PolarNick239 commented Jul 26, 2021 • edited Loading

bw4sz commented Jul 26, 2021

PolarNick239 commented Jul 27, 2021

bw4sz commented Aug 16, 2021 • edited Loading

bw4sz commented Aug 16, 2021 • edited Loading

bw4sz commented Aug 16, 2021

PolarNick239 commented Aug 17, 2021

sethhenrymorgan commented Aug 17, 2021

bw4sz commented Aug 17, 2021

PolarNick239 commented Aug 17, 2021

github-actions bot commented Sep 17, 2021

github-actions bot commented Oct 1, 2021

ethanwhite commented Oct 11, 2021

ethanwhite commented Oct 11, 2021

ethanwhite commented Oct 11, 2021

carsumptive commented Mar 30, 2022

ethanwhite commented Mar 30, 2022

bw4sz commented Mar 30, 2022

bw4sz commented Jul 23, 2021 •

edited

Loading

PolarNick239 commented Jul 26, 2021 •

edited

Loading

bw4sz commented Aug 16, 2021 •

edited

Loading

bw4sz commented Aug 16, 2021 •

edited

Loading