Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDTReeS - Clip boxes outside of image bounds #760

Merged
merged 16 commits into from
Sep 7, 2022

Conversation

isaaccorley
Copy link
Collaborator

@isaaccorley isaaccorley commented Sep 5, 2022

This PR

  • clips the bounding box coordinates to the image size (essentially clips boxes that go outside of the image bounds) and filters any boxes that have zero area after clipping.
  • fixed bug where id of some geometries were being overriden resulting in some images have no objects.

Fixes part of #684

For MLBS_5 the boxes before

tensor([[107, -57, 165, -10],
        [157, -48, 190, -10],
        [127,  24, 198,  92],
        [149, 122, 199, 174],
        [ 30,  70,  70, 109],
        [ -6, -80,  48, -10],
        [ 47, 121, 114, 197],
        [119,  84, 144, 129],
        [108,  74, 129, 101],
        [ 36,  21,  83,  73],
        [137,  73, 187, 128],
        [ 38, -63,  81, -10],
        [ 19,  93,  62, 152],
        [ 71,  81, 117, 133]])

and after filtering

tensor([[127,  24, 198,  92],
        [149, 122, 199, 174],
        [ 30,  70,  70, 109],
        [ 47, 121, 114, 197],
        [119,  84, 144, 129],
        [108,  74, 129, 101],
        [ 36,  21,  83,  73],
        [137,  73, 187, 128],
        [ 19,  93,  62, 152],
        [ 71,  81, 117, 133]])

Visual result:
image

@isaaccorley isaaccorley self-assigned this Sep 5, 2022
@isaaccorley isaaccorley marked this pull request as draft September 5, 2022 19:59
@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Sep 5, 2022
@ashnair1
Copy link
Collaborator

ashnair1 commented Sep 5, 2022

The coordinate swap was already done in #683

@isaaccorley
Copy link
Collaborator Author

@ashnair1 Thanks for pointing that out. Totally missed it.

@adamjstewart adamjstewart added this to the 0.3.1 milestone Sep 5, 2022
@isaaccorley isaaccorley changed the title Fix IDTReeS bugs IDTReeS - Clip boxes outside of image bounds Sep 5, 2022
weiji14
weiji14 previously approved these changes Sep 6, 2022
@isaaccorley
Copy link
Collaborator Author

Created generic torchgeo.datasets.utils.filter_boxes function in case this occurs in other object detection datasets. Also refactored to not hardcode the image size but to use the current sample image (h, w).

torchgeo/datasets/idtrees.py Outdated Show resolved Hide resolved
torchgeo/datasets/idtrees.py Outdated Show resolved Hide resolved
@isaaccorley isaaccorley requested review from weiji14 and adamjstewart and removed request for weiji14 September 6, 2022 15:46
@isaaccorley isaaccorley marked this pull request as ready for review September 6, 2022 17:47
weiji14
weiji14 previously approved these changes Sep 6, 2022
Copy link
Contributor

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code logic and docstrings seems to be sound. Did you check the image and box alignment? I'm not quite sure how to interpret the sample image you posted at #760 (comment), the colors look a bit weird 😅

@isaaccorley
Copy link
Collaborator Author

isaaccorley commented Sep 6, 2022

Code logic and docstrings seems to be sound. Did you check the image and box alignment? I'm not quite sure how to interpret the sample image you posted at #760 (comment), the colors look a bit weird 😅

Yes. I don't select the colors. torchvision's draw_bounding_boxes will randomly select box colors based on the number of classes in the image since I don't provide any. See #684 for more info.

@isaaccorley
Copy link
Collaborator Author

isaaccorley commented Sep 6, 2022

Just wanted to add some notes for everyone's clarity on the dataset.

  • The test set - task 1 has no boxes or labels only RGB/HSI/CHM images and LAS point clouds
  • The test set - task 2 shapefiles (e.g. task2/ITC/test_MLBS.shp) contains the image filename the boxes belong to in the properties, e.g. below in properties["plotID"] the image file is MLBS_1.tif.
{
  'geometry': {'coordinates': [[(541886.6194198467, 4136580.9452913683),
                                 (541886.6194198467, 4136588.0890556537),
                                 (541891.3025542134, 4136588.0890556537),
                                 (541891.3025542134, 4136580.9452913683),
                                 (541886.6194198467, 4136580.9452913683)]],
                'type': 'Polygon'},
   'id': '0',
   'properties': OrderedDict([('indvdID', 'MLBSE00048'),
                              ('plotID', 'MLBS_1.tif')]),
   'type': 'Feature'
}
  • The train set shapefiles (e.g. train/ITC/train_MLBS.shp) contain the boxes but does not contain the image filename. However, it does contain a unique geometry id properties["id"].
{
  'geometry': {'coordinates': [[(542071.1868343, 4134996.0734410095),
                                 (542071.1868343, 4134999.0),
                                 (542074.96994003, 4134999.0),
                                 (542074.96994003, 4134996.0734410095),
                                 (542071.1868343, 4134996.0734410095)]],
                'type': 'Polygon'},
   'id': '0',
   'properties': OrderedDict([('id', 57), ('indvdID', 'MLBS01681')]),
   'type': 'Feature'
}

The image <-> geometry id mapping is inside train/Field/itc_rsFile.csv.

"id","indvdID","rsFile"
57,"MLBS01681","MLBS_1.tif"
45,"MLBS01679","MLBS_1.tif"
49,"MLBS01674","MLBS_1.tif"
63,"MLBS01678","MLBS_1.tif"
...

Note that the test set does not have any csv files only the train set. Because of this, the implementation for parsing the dataset can become quite verbose and requires different logic for the the train set, test set - task1, and test set - task2.

Hopefully this clarifies some of the implementation decisions.

@adamjstewart adamjstewart mentioned this pull request Sep 6, 2022
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I finally understand the code well enough to review this now!

So if I understand correctly, this PR only solves issue (3) in #684. Once this PR is merged, A) all bounding boxes will be clipped to the image bounds which will fix plotting, and B) bounding boxes won't be accidentally deleted from the testing set due to integer overlap.

My only change request is to replace the current id = len(features) + 1 logic with a counter i = 0, i += 1 so we aren't computing the length of a list in each iteration.

@isaaccorley
Copy link
Collaborator Author

Okay, I think I finally understand the code well enough to review this now!

So if I understand correctly, this PR only solves issue (3) in #684. Once this PR is merged, A) all bounding boxes will be clipped to the image bounds which will fix plotting, and B) bounding boxes won't be accidentally deleted from the testing set due to integer overlap.

My only change request is to replace the current id = len(features) + 1 logic with a counter i = 0, i += 1 so we aren't computing the length of a list in each iteration.

I've made the requested change. Also note that this PR doesn't just fix plotting but also allows a user to actually train on the dataset whereas before you would have issues dealing with boxes with negative values.

@adamjstewart adamjstewart merged commit 212d9b9 into microsoft:main Sep 7, 2022
@isaaccorley isaaccorley deleted the idtrees-bugs branch September 7, 2022 17:23
adamjstewart pushed a commit that referenced this pull request Sep 8, 2022
* reverse idtrees coords

* clip boxes to image bounds

* undo coordinate reversal

* add filter_boxes function

* filter boxes outside of bounds in idtrees

* format

* Revert utils.py

* flake8 fixes

* fix mypy errors

* fix bug overriding some labels

* fix image size

* Remove version added line

* add function overloads

* add comments for clarity

* use id counter for test set
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* reverse idtrees coords

* clip boxes to image bounds

* undo coordinate reversal

* add filter_boxes function

* filter boxes outside of bounds in idtrees

* format

* Revert utils.py

* flake8 fixes

* fix mypy errors

* fix bug overriding some labels

* fix image size

* Remove version added line

* add function overloads

* add comments for clarity

* use id counter for test set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants