You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the trained models given by the authors. However, I discovered that the detection output locations are wrong and in many cases outside the image range!
To further investigate about this, I used the detection code in examples , and compared the results from the original SSD implementation of Wei and the new SSD model with ResNet101 introduced here. I tested the same image proposed in the examples (examples/images/fish-bike.jpg).
With the old SSD code with VGG model I get right results as follows:
[0.028087676, 0.23656183, 0.88743579, 0.95228869, 2, 0.82035357, u'bicycle']
[0.42205709, 0.026113272, 0.70970505, 0.51584023, 15, 0.99626094, u'person']
But with the new SSD model in this repository I get:
[3.6806207, 3.2651825, 3.9601259, 3.7642479, 1, 0.99327165, u'person']
[1.065011, 0.64031565, 1.3344085, 1.142953, 1, 0.9836536, u'person']
[255.99077, 256.33829, 256.99084, 256.97342, 2, 0.7926603, u'bicycle']
[14.477366, 14.709455, 15.462966, 15.438332, 2, 0.69000566, u'bicycle']
The first 4 numbers are the normalized detection locations [x_min, y_min, x_max, y_max], so after multiplying by the image width and height (481 and 323 in this case), I should get the bbox locations inside the tested image. This is the case with the original SSD models, as I get the right locations:
[14, 76, 427, 308]
[203, 8, 341, 167]
but with the new SSD-ResNet model introduced here, I get the locations:
[1770, 1055, 1905, 1216]
[512, 207, 642, 369]
[123132, 82797, 123613, 83002]
[6964, 4751, 7438, 4987]
which introduce bboxes outside the image! It is also obvious problem since the normalized positions should not exceed 1.0. Note that the same problem happens when using the DSSD model.
Thank you.
The text was updated successfully, but these errors were encountered:
I discovered the problem.
In the deploy.prototxt change the current offsets and instead put the percentage of the offset divided by step (offset/step).
In old SSD code this was 0.5 for all layers, however here it is different between layers. Therefore, the new offsets will be like this [0.31, 0.156, 0.328, 0.41, 0.457, 0.478, 0.5 ] instead of [2.5, 2.5, 10.5, 26.5, 58.5, 122.5, 256.5].
After doing this, the detection bboxes are right but not accurate as original SSD with VGG. for example, I found negative numbers in the bbox points!
[0.42572773, 0.010289893, 0.70523298, 0.50935513, 1, 0.99327165, u'person'] --> [205, 3, 339, 165]
[-0.0092230439, 0.3382884, 0.99083799, 0.97341633, 2, 0.7926603, u'bicycle'] --> [-4, 109, 477, 314]
and the image is like this:
Hi,
I am trying to use the trained models given by the authors. However, I discovered that the detection output locations are wrong and in many cases outside the image range!
To further investigate about this, I used the detection code in examples , and compared the results from the original SSD implementation of Wei and the new SSD model with ResNet101 introduced here. I tested the same image proposed in the examples (examples/images/fish-bike.jpg).
With the old SSD code with VGG model I get right results as follows:
[0.028087676, 0.23656183, 0.88743579, 0.95228869, 2, 0.82035357, u'bicycle']
[0.42205709, 0.026113272, 0.70970505, 0.51584023, 15, 0.99626094, u'person']
But with the new SSD model in this repository I get:
[3.6806207, 3.2651825, 3.9601259, 3.7642479, 1, 0.99327165, u'person']
[1.065011, 0.64031565, 1.3344085, 1.142953, 1, 0.9836536, u'person']
[255.99077, 256.33829, 256.99084, 256.97342, 2, 0.7926603, u'bicycle']
[14.477366, 14.709455, 15.462966, 15.438332, 2, 0.69000566, u'bicycle']
The first 4 numbers are the normalized detection locations [x_min, y_min, x_max, y_max], so after multiplying by the image width and height (481 and 323 in this case), I should get the bbox locations inside the tested image. This is the case with the original SSD models, as I get the right locations:
[14, 76, 427, 308]
[203, 8, 341, 167]
but with the new SSD-ResNet model introduced here, I get the locations:
[1770, 1055, 1905, 1216]
[512, 207, 642, 369]
[123132, 82797, 123613, 83002]
[6964, 4751, 7438, 4987]
which introduce bboxes outside the image! It is also obvious problem since the normalized positions should not exceed 1.0. Note that the same problem happens when using the DSSD model.
Thank you.
The text was updated successfully, but these errors were encountered: