Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiles without annotations are not supported #216

Closed
PolarNick239 opened this issue Jul 14, 2021 · 24 comments
Closed

Tiles without annotations are not supported #216

PolarNick239 opened this issue Jul 14, 2021 · 24 comments
Labels

Comments

@PolarNick239
Copy link
Contributor

Describe the bug
Sometimes it is important to have empty tiles without any annotations in train data (to show to the retinanet that this is not what should be detected). But it seems to be not supported in DeepForest.

To Reproduce
Steps to reproduce the behavior:

all_annotations = pd.DataFrame(columns=['image_path', 'xmin', 'ymin', 'xmax', 'ymax', 'label'])

empty_tile_name = "empty_tile.jpg"
empty_tile = np.zeros((400, 400, 3), np.uint8)
empty_tile[:, :, :] = 255
cv2.imwrite(empty_tile_name, empty_tile)

all_annotations = all_annotations.append({'image_path': empty_tile_name, 'xmin': '', 'ymin': '', 'xmax': '', 'ymax': '', 'label': ''}, ignore_index=True)

...

all_annotations.to_csv(annotations_file, header=True, index=False)

trainer = Trainer(max_epochs=20, gpus=1, auto_select_gpus=True)
train_ds = m.load_dataset(annotations_file, root_dir=os.path.dirname(annotations_file))
trainer.fit(self.m, train_ds)

This leads to

File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
  data = fetcher.fetch(index)
File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
  data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.../python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
  data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.../python/lib/python3.8/site-packages/deepforest/dataset.py", line 81, in __getitem__
  targets["labels"] = image_annotations.label.apply(
File "/.../python/lib/python3.8/site-packages/pandas/core/series.py", line 4356, in apply
  return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/.../python/lib/python3.8/site-packages/pandas/core/apply.py", line 1036, in apply
  return self.apply_standard()
File "/.../python/lib/python3.8/site-packages/pandas/core/apply.py", line 1092, in apply_standard
  mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
File "/.../python/lib/python3.8/site-packages/deepforest/dataset.py", line 82, in <lambda>
   lambda x: self.label_dict[x]).values.astype(int)

Additional context

While I see also this check:

raise ValueError("Blank annotations are not allowed in retinanets. Check data augmentation for image {} with shape {}, no overlapping boxes found".format(self.image_names[idx], image.shape))

In my opinion this is important option, and it seems to be supported in retinanets - see:

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 14, 2021

thanks for the issue, we were just discussing this yesterday. I was under the impression that this is not supported in torchvision retinanet. The links you posted are from other libraries. I was trying to find the issue that stated this (it was true in the past), but it looks like there has been some movement?
pytorch/vision#1598
pytorch/vision#1911
pytorch/vision#3032

I do not know if it works yet for retinanet, i see changes to RCNN. Try adding a dummy csv with boxes of 0's to see. Please report back and we can automate this if it works. something like

image_path, xmin, ymin, xmax, ymax, label
img.png, 0,0,0,0,"Tree"

@PolarNick239
Copy link
Contributor Author

Thanks for the fast response :)

Sadly it does not work too:

image

Leads to:

File "/.../src/detect_trees.py", line 410
  trainer.fit(self.m, train_ds)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
  self._run(model)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
  self.dispatch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
  self.accelerator.start_training(self)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
  self.training_type_plugin.start_training(trainer)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
  self._results = trainer.run_stage()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
  return self.run_train()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
  self.train_loop.run_training_epoch()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
  batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
  self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
  model_ref.optimizer_step(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
  optimizer.step(closure=optimizer_closure)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
  self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
  trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
  self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
  self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
  optimizer.step(closure=lambda_closure, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
  return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
  return func(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/torch/optim/sgd.py", line 87, in step
  loss = closure()
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
  result = self.training_step_and_backward(
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
  result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
  training_step_output = self.trainer.accelerator.training_step(args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
  return self.training_type_plugin.training_step(*args)
File "/.../python/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
  return self.lightning_module.training_step(*args, **kwargs)
File "/.../python/lib/python3.8/site-packages/deepforest/main.py", line 349, in training_step
  loss_dict = self.model.forward(images, targets)
File "/.../python/lib/python3.8/site-packages/torchvision/models/detection/retinanet.py", line 508, in forward
  raise ValueError("All bounding boxes should have positive height and width."
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0.

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 14, 2021 via email

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 14, 2021

It looks like we need to be pinned on a more recent version, recreating the env, this is the test

#Empty tester from https://github.com/datumbox/vision/blob/06ebee1a9f10c76d8ac5768fd578362dd5ace6e9/test/test_models_detection_negative_samples.py#L14
def _make_empty_sample():
    images = [torch.rand((3, 100, 100), dtype=torch.float32)]
    boxes = torch.zeros((0, 4), dtype=torch.float32)
    negative_target = {"boxes": boxes,
                       "labels": torch.zeros(0, dtype=torch.int64),
                       "image_id": 4,
                       "area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]),
                       "iscrowd": torch.zeros((0,), dtype=torch.int64)}

    targets = [negative_target]
    return images, targets

def test_forward_negative_sample_retinanet(self):
    model = torchvision.models.detection.retinanet_resnet50_fpn(
        num_classes=2, min_size=100, max_size=100)

    images, targets = self._make_empty_sample()
    loss_dict = model(images, targets)

    self.assertEqual(loss_dict["bbox_regression"], torch.tensor(0.))

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 19, 2021

I'm going to return to this, my initial reading was that torchvision 0.8.1 which is standard elsewhere is too old. Looks like this functionality was newer than that.

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 23, 2021

@PolarNick239 see the empty_frames branch where is in dev. Tests pass for the model, but not yet for the dataset.

following https://medium.com/jumio/object-detection-tutorial-with-torchvision-82b8f269f6ff

if we want a negative training sample add a row with 0,0,0,0 for the bounding box, which will convert to a blank torch tensor.

boxes = boxes = torch.zeros((0, 4), dtype=torch.float32)

@PolarNick239
Copy link
Contributor Author

PolarNick239 commented Jul 26, 2021

Now it doesn't throw any exceptions, but loss became nan (in addition to zero values I set label to 'Tree'):

Training:   0%|          | 0/129 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/129 [00:00<?, ?it/s] 
Epoch 0:   1%|          | 1/129 [00:00<00:33,  3.79it/s]
Epoch 0:   1%|          | 1/129 [00:00<00:33,  3.78it/s, loss=0.243]
Epoch 0:   2%|1         | 2/129 [00:00<00:22,  5.56it/s, loss=1.16] 
Epoch 0:   2%|2         | 3/129 [00:00<00:19,  6.63it/s, loss=1.16]
Epoch 0:   2%|2         | 3/129 [00:00<00:19,  6.62it/s, loss=1.49]
Epoch 0:   3%|3         | 4/129 [00:00<00:16,  7.42it/s, loss=1.62]
Epoch 0:   4%|3         | 5/129 [00:00<00:15,  8.06it/s, loss=1.62]
Epoch 0:   4%|3         | 5/129 [00:00<00:15,  8.06it/s, loss=1.64]
Epoch 0:   5%|4         | 6/129 [00:00<00:14,  8.57it/s, loss=1.61]
Epoch 0:   5%|5         | 7/129 [00:00<00:13,  8.93it/s, loss=1.61]
Epoch 0:   5%|5         | 7/129 [00:00<00:13,  8.93it/s, loss=1.59]
Epoch 0:   6%|6         | 8/129 [00:00<00:13,  9.25it/s, loss=1.52]
Epoch 0:   7%|6         | 9/129 [00:00<00:12,  9.46it/s, loss=1.52]
Epoch 0:   7%|6         | 9/129 [00:00<00:12,  9.46it/s, loss=1.47]
Epoch 0:   8%|7         | 10/129 [00:01<00:12,  9.68it/s, loss=1.42]
Epoch 0:   9%|8         | 11/129 [00:01<00:11,  9.86it/s, loss=1.42]
Epoch 0:   9%|8         | 11/129 [00:01<00:11,  9.86it/s, loss=1.37]
Epoch 0:   9%|9         | 12/129 [00:01<00:11, 10.03it/s, loss=1.33]
Epoch 0:  10%|#         | 13/129 [00:01<00:11, 10.16it/s, loss=1.33]
Epoch 0:  10%|#         | 13/129 [00:01<00:11, 10.15it/s, loss=1.33]
Epoch 0:  11%|#         | 14/129 [00:01<00:11, 10.29it/s, loss=1.29]
Epoch 0:  12%|#1        | 15/129 [00:01<00:10, 10.41it/s, loss=1.29]
Epoch 0:  12%|#1        | 15/129 [00:01<00:10, 10.41it/s, loss=1.29]
Epoch 0:  12%|#2        | 16/129 [00:01<00:10, 10.52it/s, loss=1.29]
Epoch 0:  13%|#3        | 17/129 [00:01<00:10, 10.61it/s, loss=1.29]
Epoch 0:  13%|#3        | 17/129 [00:01<00:10, 10.61it/s, loss=3.07]
Epoch 0:  14%|#3        | 18/129 [00:01<00:10, 10.71it/s, loss=8.96]
Epoch 0:  15%|#4        | 19/129 [00:01<00:10, 10.78it/s, loss=8.96]
Epoch 0:  15%|#4        | 19/129 [00:01<00:10, 10.78it/s, loss=147] 
Epoch 0:  16%|#5        | 20/129 [00:01<00:10, 10.85it/s, loss=9.94e+29]
Epoch 0:  16%|#6        | 21/129 [00:01<00:09, 10.92it/s, loss=9.94e+29]
Epoch 0:  16%|#6        | 21/129 [00:01<00:09, 10.91it/s, loss=nan]     
...
nans everywhere

@bw4sz
Copy link
Collaborator

bw4sz commented Jul 26, 2021

please show the annotations file. I believe this is unrelated, did you literally only do blank images? I see loss falling nicely until the model memorizes all images. Reduce learning rate, more annotations, etc. This looks expected.

@PolarNick239
Copy link
Contributor Author

Maybe a minimal reproducer will help? empty_frames_nan_repro.zip (including annotations CSV, all images, training script and visualization of annotations on all images).

There are only 17 empty tiles out of 129 (note that each tile is x8 augmented - original and mirrored one multiplied with 4 rotations: 0, 90, 180, 270):

image

@bw4sz
Copy link
Collaborator

bw4sz commented Aug 16, 2021

Running this example now, I can start by saying that I agree after 1 epoch you can loss dropping but then it goes NAN.

root_dir = "/Users/benweinstein/Downloads/empty_frames_nan_repro/empty_frames_nan_repro/"
annotations_file = root_dir + "annotations_no_empty.csv"

from deepforest import main
from pytorch_lightning import Trainer

m = main.deepforest()
m.use_release()

trainer = Trainer(max_epochs=2, gpus=0)
train_ds = m.load_dataset(annotations_file, root_dir=root_dir)
trainer.fit(m, train_ds)

The first question is does this happen in the absence of the Negative Samples. If I edit

annotations_no_empty.csv

and try again everything looks fine.

If I remove that blank image:

annotations_no_blank.csv

I still see the NaN, but more slowly (I almost stopped too early to see it).

@bw4sz
Copy link
Collaborator

bw4sz commented Aug 16, 2021

Looking here, it suggest this is a formatting error, which is odd because it definitely looks like we are following the proscribed format.

pytorch/vision#2144

boxes = boxes = torch.zeros((0, 4), dtype=torch.float32)

@sethhenrymorgan are you getting NaN as well?

Can everyone report their deepforest and torchvision versions so we are on the same page.

print(__version__)
1.1.0
from torchvision import __version__
print(__version__)
0.10.0

@bw4sz
Copy link
Collaborator

bw4sz commented Aug 16, 2021

I'm stopping for the day, but i'm interested in hearing thoughts on this. With the toy example its not obvious what we should expect. I can confirm that if you reduce the number of empty frames to 2, you don't see NAN, atleast on the cpu i'm running, I can try GPU tomorrow. Loss looks normal as it jumps around.

Reading config file: deepforest_config.yml
Model from DeepForest release https://github.com/weecology/DeepForest/releases/tag/1.0.0 was already downloaded. Loading model from file.
Loading pre-built model: https://github.com/weecology/DeepForest/releases/tag/1.0.0
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)

Validation sanity check: 0it [00:00, ?it/s]/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:378: UserWarning: One of given dataloaders is None and it will be skipped.
  rank_zero_warn("One of given dataloaders is None and it will be skipped.")

                                           
/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(

Training: -1it [00:00, ?it/s]
Training:   0%|          | 0/128 [00:00<00:00, 11915.64it/s]
Epoch 0:   0%|          | 0/128 [00:00<00:00, 2416.07it/s]  /Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

Epoch 0:   1%|          | 1/128 [00:09<10:18,  4.87s/it]  
Epoch 0:   1%|          | 1/128 [00:09<10:18,  4.87s/it, loss=1.62, v_num=15]
Epoch 0:   2%|▏         | 2/128 [00:19<13:35,  6.47s/it, loss=1.62, v_num=15]
Epoch 0:   2%|▏         | 2/128 [00:19<13:35,  6.47s/it, loss=1.57, v_num=15]
Epoch 0:   2%|▏         | 3/128 [00:28<14:57,  7.18s/it, loss=1.57, v_num=15]
Epoch 0:   2%|▏         | 3/128 [00:28<14:57,  7.18s/it, loss=1.59, v_num=15]
Epoch 0:   3%|▎         | 4/128 [00:38<15:56,  7.72s/it, loss=1.59, v_num=15]
Epoch 0:   3%|▎         | 4/128 [00:38<15:56,  7.72s/it, loss=1.47, v_num=15]
Epoch 0:   4%|▍         | 5/128 [00:48<16:24,  8.00s/it, loss=1.47, v_num=15]
Epoch 0:   4%|▍         | 5/128 [00:48<16:24,  8.00s/it, loss=1.36, v_num=15]
Epoch 0:   5%|▍         | 6/128 [00:57<16:44,  8.23s/it, loss=1.36, v_num=15]
Epoch 0:   5%|▍         | 6/128 [00:57<16:44,  8.23s/it, loss=1.3, v_num=15] 
Epoch 0:   5%|▌         | 7/128 [01:07<16:55,  8.39s/it, loss=1.3, v_num=15]
Epoch 0:   5%|▌         | 7/128 [01:07<16:55,  8.39s/it, loss=1.23, v_num=15]
Epoch 0:   6%|▋         | 8/128 [01:16<17:05,  8.55s/it, loss=1.23, v_num=15]
Epoch 0:   6%|▋         | 8/128 [01:16<17:05,  8.55s/it, loss=1.19, v_num=15]
Epoch 0:   7%|▋         | 9/128 [01:26<17:13,  8.68s/it, loss=1.19, v_num=15]
Epoch 0:   7%|▋         | 9/128 [01:26<17:13,  8.68s/it, loss=1.19, v_num=15]
Epoch 0:   8%|▊         | 10/128 [01:37<17:27,  8.88s/it, loss=1.19, v_num=15]
Epoch 0:   8%|▊         | 10/128 [01:37<17:27,  8.88s/it, loss=1.18, v_num=15]
Epoch 0:   9%|▊         | 11/128 [01:47<17:26,  8.95s/it, loss=1.18, v_num=15]
Epoch 0:   9%|▊         | 11/128 [01:47<17:26,  8.95s/it, loss=1.14, v_num=15]
Epoch 0:   9%|▉         | 12/128 [01:57<17:24,  9.00s/it, loss=1.14, v_num=15]
Epoch 0:   9%|▉         | 12/128 [01:57<17:24,  9.00s/it, loss=1.1, v_num=15] 
Epoch 0:  10%|█         | 13/128 [02:06<17:21,  9.05s/it, loss=1.1, v_num=15]
Epoch 0:  10%|█         | 13/128 [02:06<17:21,  9.05s/it, loss=1.07, v_num=15]
Epoch 0:  11%|█         | 14/128 [02:16<17:17,  9.10s/it, loss=1.07, v_num=15]
Epoch 0:  11%|█         | 14/128 [02:16<17:17,  9.10s/it, loss=1.05, v_num=15]
Epoch 0:  12%|█▏        | 15/128 [02:25<17:10,  9.12s/it, loss=1.05, v_num=15]
Epoch 0:  12%|█▏        | 15/128 [02:25<17:10,  9.12s/it, loss=1.03, v_num=15]
Epoch 0:  12%|█▎        | 16/128 [02:35<17:05,  9.15s/it, loss=1.03, v_num=15]
Epoch 0:  12%|█▎        | 16/128 [02:35<17:05,  9.15s/it, loss=1.01, v_num=15]
Epoch 0:  13%|█▎        | 17/128 [02:45<16:59,  9.18s/it, loss=1.01, v_num=15]
Epoch 0:  13%|█▎        | 17/128 [02:45<16:59,  9.18s/it, loss=0.995, v_num=15]
Epoch 0:  14%|█▍        | 18/128 [02:56<17:00,  9.27s/it, loss=0.995, v_num=15]
Epoch 0:  14%|█▍        | 18/128 [02:56<17:00,  9.27s/it, loss=0.983, v_num=15]
Epoch 0:  15%|█▍        | 19/128 [03:06<16:58,  9.35s/it, loss=0.983, v_num=15]
Epoch 0:  15%|█▍        | 19/128 [03:06<16:58,  9.35s/it, loss=0.972, v_num=15]
Epoch 0:  16%|█▌        | 20/128 [03:17<16:56,  9.41s/it, loss=0.972, v_num=15]
Epoch 0:  16%|█▌        | 20/128 [03:17<16:56,  9.41s/it, loss=1.43, v_num=15] 
Epoch 0:  16%|█▋        | 21/128 [03:27<16:51,  9.45s/it, loss=1.43, v_num=15]
Epoch 0:  16%|█▋        | 21/128 [03:27<16:51,  9.45s/it, loss=1.63, v_num=15]
Epoch 0:  17%|█▋        | 22/128 [03:37<16:41,  9.45s/it, loss=1.63, v_num=15]
Epoch 0:  17%|█▋        | 22/128 [03:37<16:41,  9.45s/it, loss=2.23, v_num=15]
Epoch 0:  18%|█▊        | 23/128 [03:47<16:34,  9.47s/it, loss=2.23, v_num=15]
Epoch 0:  18%|█▊        | 23/128 [03:47<16:34,  9.47s/it, loss=2.28, v_num=15]
Epoch 0:  19%|█▉        | 24/128 [03:56<16:24,  9.47s/it, loss=2.28, v_num=15]
Epoch 0:  19%|█▉        | 24/128 [03:56<16:24,  9.47s/it, loss=2.32, v_num=15]
Epoch 0:  20%|█▉        | 25/128 [04:06<16:15,  9.47s/it, loss=2.32, v_num=15]
Epoch 0:  20%|█▉        | 25/128 [04:06<16:15,  9.47s/it, loss=2.37, v_num=15]
Epoch 0:  20%|██        | 26/128 [04:15<16:06,  9.48s/it, loss=2.37, v_num=15]
Epoch 0:  20%|██        | 26/128 [04:15<16:06,  9.48s/it, loss=2.4, v_num=15] 
Epoch 0:  21%|██        | 27/128 [04:25<15:57,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  21%|██        | 27/128 [04:25<15:57,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  22%|██▏       | 28/128 [04:34<15:47,  9.48s/it, loss=2.4, v_num=15]
Epoch 0:  22%|██▏       | 28/128 [04:34<15:47,  9.48s/it, loss=2.43, v_num=15]
Epoch 0:  23%|██▎       | 29/128 [04:44<15:38,  9.48s/it, loss=2.43, v_num=15]
Epoch 0:  23%|██▎       | 29/128 [04:44<15:38,  9.48s/it, loss=2.45, v_num=15]
Epoch 0:  23%|██▎       | 30/128 [04:54<15:29,  9.48s/it, loss=2.45, v_num=15]
Epoch 0:  23%|██▎       | 30/128 [04:54<15:29,  9.48s/it, loss=2.42, v_num=15]
Epoch 0:  24%|██▍       | 31/128 [05:03<15:20,  9.49s/it, loss=2.42, v_num=15]
Epoch 0:  24%|██▍       | 31/128 [05:03<15:20,  9.49s/it, loss=2.45, v_num=15]
Epoch 0:  25%|██▌       | 32/128 [05:13<15:12,  9.51s/it, loss=2.45, v_num=15]
Epoch 0:  25%|██▌       | 32/128 [05:13<15:12,  9.51s/it, loss=2.51, v_num=15]
Epoch 0:  26%|██▌       | 33/128 [05:23<15:03,  9.51s/it, loss=2.51, v_num=15]
Epoch 0:  26%|██▌       | 33/128 [05:23<15:03,  9.51s/it, loss=2.56, v_num=15]
Epoch 0:  27%|██▋       | 34/128 [05:32<14:53,  9.51s/it, loss=2.56, v_num=15]
Epoch 0:  27%|██▋       | 34/128 [05:32<14:53,  9.51s/it, loss=2.6, v_num=15] 
Epoch 0:  27%|██▋       | 35/128 [05:42<14:45,  9.53s/it, loss=2.6, v_num=15]
Epoch 0:  27%|██▋       | 35/128 [05:42<14:45,  9.53s/it, loss=2.64, v_num=15]
Epoch 0:  28%|██▊       | 36/128 [05:52<14:37,  9.54s/it, loss=2.64, v_num=15]
Epoch 0:  28%|██▊       | 36/128 [05:52<14:37,  9.54s/it, loss=2.69, v_num=15]
Epoch 0:  29%|██▉       | 37/128 [06:02<14:28,  9.55s/it, loss=2.69, v_num=15]
Epoch 0:  29%|██▉       | 37/128 [06:02<14:28,  9.55s/it, loss=2.73, v_num=15]
Epoch 0:  30%|██▉       | 38/128 [06:12<14:20,  9.56s/it, loss=2.73, v_num=15]
Epoch 0:  30%|██▉       | 38/128 [06:12<14:20,  9.56s/it, loss=2.77, v_num=15]
Epoch 0:  30%|███       | 39/128 [06:22<14:11,  9.57s/it, loss=2.77, v_num=15]
Epoch 0:  30%|███       | 39/128 [06:22<14:11,  9.57s/it, loss=2.8, v_num=15] 
Epoch 0:  31%|███▏      | 40/128 [06:32<14:02,  9.58s/it, loss=2.8, v_num=15]
Epoch 0:  31%|███▏      | 40/128 [06:32<14:02,  9.58s/it, loss=2.37, v_num=15]
Epoch 0:  32%|███▏      | 41/128 [06:42<13:54,  9.59s/it, loss=2.37, v_num=15]
Epoch 0:  32%|███▏      | 41/128 [06:42<13:54,  9.59s/it, loss=2.4, v_num=15] 
Epoch 0:  33%|███▎      | 42/128 [06:53<13:46,  9.61s/it, loss=2.4, v_num=15]
Epoch 0:  33%|███▎      | 42/128 [06:53<13:46,  9.61s/it, loss=1.81, v_num=15]
Epoch 0:  34%|███▎      | 43/128 [07:02<13:36,  9.61s/it, loss=1.81, v_num=15]
Epoch 0:  34%|███▎      | 43/128 [07:02<13:36,  9.61s/it, loss=1.8, v_num=15] 
Epoch 0:  34%|███▍      | 44/128 [07:12<13:26,  9.60s/it, loss=1.8, v_num=15]
Epoch 0:  34%|███▍      | 44/128 [07:12<13:26,  9.60s/it, loss=1.79, v_num=15]
Epoch 0:  35%|███▌      | 45/128 [07:21<13:16,  9.59s/it, loss=1.79, v_num=15]
Epoch 0:  35%|███▌      | 45/128 [07:21<13:16,  9.59s/it, loss=1.77, v_num=15]
Epoch 0:  36%|███▌      | 46/128 [07:30<13:06,  9.59s/it, loss=1.77, v_num=15]
Epoch 0:  36%|███▌      | 46/128 [07:30<13:06,  9.59s/it, loss=1.78, v_num=15]
Epoch 0:  37%|███▋      | 47/128 [07:39<12:56,  9.58s/it, loss=1.78, v_num=15]
Epoch 0:  37%|███▋      | 47/128 [07:39<12:56,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 48/128 [07:49<12:46,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 48/128 [07:49<12:46,  9.58s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 49/128 [07:58<12:36,  9.57s/it, loss=1.82, v_num=15]
Epoch 0:  38%|███▊      | 49/128 [07:58<12:36,  9.57s/it, loss=1.82, v_num=15]
Epoch 0:  39%|███▉      | 50/128 [08:07<12:25,  9.56s/it, loss=1.82, v_num=15]
Epoch 0:  39%|███▉      | 50/128 [08:07<12:25,  9.56s/it, loss=1.85, v_num=15]
Epoch 0:  40%|███▉      | 51/128 [08:16<12:15,  9.56s/it, loss=1.85, v_num=15]
Epoch 0:  40%|███▉      | 51/128 [08:16<12:15,  9.56s/it, loss=1.79, v_num=15]
Epoch 0:  41%|████      | 52/128 [08:26<12:05,  9.55s/it, loss=1.79, v_num=15]
Epoch 0:  41%|████      | 52/128 [08:26<12:05,  9.55s/it, loss=1.8, v_num=15] 
Epoch 0:  41%|████▏     | 53/128 [08:35<11:56,  9.55s/it, loss=1.8, v_num=15]
Epoch 0:  41%|████▏     | 53/128 [08:35<11:56,  9.55s/it, loss=1.8, v_num=15]

My conclusion so far is that there is nothing 'wrong' with the code, but rather than under this toy example, having too many negative frames is bad for the model performance. Especially because those negative frames have genuine trees. So the model is focusing really hard on predicting trees, but then we show it an image with trees and act like there is nothing there. I think this causes the loss to become unstable.

@PolarNick239
Copy link
Contributor Author

import deepforest
print(deepforest.__version__)
# 1.0.9 - installed from github - https://github.com/weecology/DeepForest/commit/b8998326c53755d017c2dc16cf1b3dfd75380876

import torchvision
print(torchvision.__version__)
# 0.10.0+cu102

It's pretty sad, currently, I am getting good results with additional training for transfer learning even on so small datasets (without empty tiles). So it is possible to easily annotate some small area to specify what kind of objects (f.e. what kind of trees) do we need, and then prediction works pretty well (even for cars). But empty tiles are pretty important because in areas like parks (where no cars were annotated) - there are a lot of false-positive detections.

Anyway, big thanks for your efforts! I am still very happy that loss doesn't go crazy when training on cars :)

@sethhenrymorgan
Copy link

Here's what I'm working with:

!pip install git+https://[email protected]/weecology/DeepForest.git
from deepforest import (__version__) 
print(__version__) 
1.1.0 
from torchvision import __version__
print(__version__)
0.10.0+cu102

If I reduced my number of negative samples to 1 then I eliminated the NaN's, but the model still made no predictions.

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]
Epoch 1: 100%
69/69 [00:12<00:00, 5.68it/s, loss=0.679

Here's the output from m.evaluate:

No predictions made
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

But plot twist! When I remove the negative samples the same thing happens. Is there a chance that something else is wrong with DeepForest version 1.1? Version 0.2.3 had no trouble without negative samples--but it rejected the 0-size boxes with an error.

@bw4sz
Copy link
Collaborator

bw4sz commented Aug 17, 2021

@sethhenrymorgan since it feels like your problem has nothing to do with negative samples, can you make a separate issue and give the pertinent details, attach the annotations file and a couple sample images? Let's deal with negative samples here.

@PolarNick239 I'm going to continue to chase this today. Let me see if I understand, you are starting from the tree release model, but predicting cars, not trees, even though your label says 'Tree', that's fine, its just a name. You get decent results for car prediction, but too many false positives for things which are not cars. You then specified empty images like this one (ignoring that I think there is a car in the top right?).

1-1-3-0

but adding empty images does not reduce your false positive rate?

My sense is this is a machine learning question and not a problem with this code base. I'm interested in the right answer either way, but my gut reaction is that because retinanet's use focal loss, meaning they focus on the hardest examples, negative samples are either too easy and don't contribute much, or so hard that it causes the loss to become unstable. I'm looking for examples online to anchor my thinking.

@PolarNick239
Copy link
Contributor Author

Yes, DeepForest is very easy to use and the pre-trained RetinaNet backbone seems to work well for other objects (for cars and even for sea lions :) ). Yes, cars and sea lions are just very strange tree species.

there is a car in the top right?

My bad - missed it.

Yes, I want to add empty tiles to reduce the false-positive rate (they are often encountered in zones that are not presented in non-empty tiles, i.e. zones without target objects). Before it led to exceptions, currently it leads to NaN losses.

In fact, it seems to be possible to workaround the problem via blending empty tiles with some random object sample from some random non-empty tile, but it is a hack with a potential problem - it is not obvious what to do with hard bounding box edges (or we need some kind of object-background segmentation, so that object will be enblended into empty tile without its background content from its original tile). If I will find time to try such a hack - will write about my findings.

Yes, I agree that it seems to be a RetinaNet related problem.

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Sep 17, 2021
@github-actions
Copy link

github-actions bot commented Oct 1, 2021

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Oct 1, 2021
@ethanwhite
Copy link
Member

Reopening since this shouldn't have been closed since it's definitely an important issue.

@ethanwhite ethanwhite reopened this Oct 11, 2021
@ethanwhite
Copy link
Member

Yes, I want to add empty tiles to reduce the false-positive rate (they are often encountered in zones that are not presented in non-empty tiles, i.e. zones without target objects).

We're running into this problem with the Everglades work as well weecology/EvergladesTools#42

@bw4sz - it sounds like I should try out the the empty_frames branch and see if it works for this use case?

@ethanwhite
Copy link
Member

Sorry - this has been merged already, I just had a hard time seeing it in the commit logs.

@carsumptive
Copy link

What was the resolution to this issue?? Which commit fixed it and how??

@ethanwhite
Copy link
Member

Unfortunately it's spread out over a period of time so it's a bit hard to point to a single point. In concept we had support from this from the beginning, but if I'm remembering correctly it wasn't initially supported by pytorch (when we made the switch from tensor flow) and when it became available we still had some bugs to clean up.

Here are a few relevant commits: 1d8910b, 4d9593e, 7d6bb14

git log --all --grep="empty" should find most of the relevant commits (plus a number of extra ones)

@bw4sz
Copy link
Collaborator

bw4sz commented Mar 30, 2022

I believe the codebase works for this use case. The bigger question in my mind, is when, if ever do empty frames make a tangible difference. We will continue to work on this in our bird detection work, but I remain a little skeptical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants