TestPowerGradientShiftZero, TestPowerGradient fail with certain boost #1252

shelhamer · 2014-10-10T06:15:57Z

The PowerLayer::Backward checks seem to fail with certain versions of boost on OS X / ubuntu.

boost 1.55 passes, but boost 1.56 and and 1.57 fail.

[ RUN      ] PowerLayerTest/0.TestPowerGradientShiftZero
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.16171693801879883, which exceeds threshold_ * scale, where
computed_gradient evaluates to 6.6543664932250977,
estimated_gradient evaluates to 6.8160834312438965, and
threshold_ * scale evaluates to 0.068160831928253174.
debug: (top_id, top_data_id, blob_id, feat_id)=0,65,0,65; feat = 0.027440188452601433; objective+ = 0.55363553762435913; objective- = 0.41731387376785278
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU (3 ms)

#######################################################################################################################
[ RUN      ] PowerLayerTest/1.TestPowerGradientShiftZero
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.66645549482483268, which exceeds threshold_ * scale, where
computed_gradient evaluates to 9.0545713301684909,
estimated_gradient evaluates to 9.7210268249933236, and
threshold_ * scale evaluates to 0.097210268249933243.
debug: (top_id, top_data_id, blob_id, feat_id)=0,66,0,66; feat = 0.016829367669263143; objective+ = 0.48941214282974049; objective- = 0.29499160632987403
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.48462038369962279, which exceeds threshold_ * scale, where
computed_gradient evaluates to 8.4754232528829139,
estimated_gradient evaluates to 8.9600436365825367, and
threshold_ * scale evaluates to 0.089600436365825362.
debug: (top_id, top_data_id, blob_id, feat_id)=0,71,0,71; feat = 0.01869104873835549; objective+ = 0.50171265479916738; objective- = 0.32251178206751663
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.24489777273781588, which exceeds threshold_ * scale, where
computed_gradient evaluates to 7.3061654292184715,
estimated_gradient evaluates to 7.5510632019562873, and
threshold_ * scale evaluates to 0.075510632019562873.
debug: (top_id, top_data_id, blob_id, feat_id)=0,99,0,99; feat = 0.023657563288965969; objective+ = 0.53224239845290788; objective- = 0.38122113441378214
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU (4 ms)

#######################################################################################################################
[ RUN      ] PowerLayerTest/1.TestPowerGradient
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 1.206900511134485, which exceeds threshold_ * scale, where
computed_gradient evaluates to 10.15816285551514,
estimated_gradient evaluates to 11.365063366649625, and
threshold_ * scale evaluates to 0.11365063366649626.
debug: (top_id, top_data_id, blob_id, feat_id)=0,57,0,57; feat = 2.9055876775560447; objective+ = 0.46979585546340097; objective- = 0.24249458813040844
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU (3 ms)

The text was updated successfully, but these errors were encountered:

mprat · 2014-11-13T22:35:45Z

This failed for me with the native BLAS provided by OSX 10.9, so I tried with openBLAS and it gave me the same 3 errors. Does anyone have any suggestions for getting openBLAS to work?

II-Matto · 2014-11-14T07:35:34Z

I built caffe with MKL and also encountered such failures. To be specific, there were actually six failed tests. The boost library used is of the newest version, i.e. 1.57.0, and the Anaconda Python 2.7.

[----------] Global test environment tear-down
[==========] 838 tests from 169 test cases ran. (1664414 ms total)
[ PASSED ] 832 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[ FAILED ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU
[ FAILED ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU
[ FAILED ] PowerLayerTest/2.TestPowerGradientShiftZero, where TypeParam = caffe::FloatGPU
[ FAILED ] PowerLayerTest/3.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleGPU
[ FAILED ] PowerLayerTest/3.TestPowerGradient, where TypeParam = caffe::DoubleGPU

Do the failures indicate that caffe will not work appropriately? How should I deal with them?

mprat · 2014-11-14T18:31:51Z

I also just tried compiling with MKL for all my libraries and I am still getting the errors in TestPowerGradient. @II-Matto , you have 6 errors and I have 3 because you are using GPU compilation and I am not.

mprat · 2014-11-14T20:04:09Z

I got it to work with MKL and Boost 1.55.

geekan · 2014-11-16T15:34:28Z

I've faced same problem and I've solved it.
I tried to uninstall Boost 1.56 and install Boost 1.55, then reinstall caffe, all tests passed! (with openblas)

relh · 2015-01-12T17:22:57Z

Still having the same errors with Boost 1.57, downgrading to 1.55 solved the problem.

lou-k · 2015-01-14T20:40:58Z

I think the BLAS issue here is a red herring; the tests passed for me with Atlas and Boost 1.55.

Boost 1.56 failed with both OpenBLAS and Atlas.

relh · 2015-01-14T21:05:54Z

Agreed, a boost problem then
On Jan 14, 2015 3:41 PM, "lou-k" [email protected] wrote:

I think the BLAS issue here is a red herring; the tests passed for me with
Atlas and Boost 1.55.

Boost 1.56 failed with both OpenBLAS and Atlas.

—
Reply to this email directly or view it on GitHub
#1252 (comment).

svanschalkwyk · 2015-01-19T05:32:16Z

I'm getting it with boost1.54.0. Ubuntu 14.04, boost1.54.0, mkl from intel version 15 c++.
Any other ideas?

lazywei · 2015-01-31T14:42:47Z

Confirm the same problem here. CentOS6, boost 1.57, mkl-203

[----------] Global test environment tear-down
[==========] 838 tests from 169 test cases ran. (98290 ms total)
[  PASSED  ] 832 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/2.TestPowerGradientShiftZero, where TypeParam = caffe::FloatGPU
[  FAILED  ] PowerLayerTest/3.TestPowerGradient, where TypeParam = caffe::DoubleGPU
[  FAILED  ] PowerLayerTest/3.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleGPU

dgolden1 · 2015-02-03T22:47:57Z

@shelhamer, as users, should we be concerned about the test failures with Boost 1.57? Will Caffe give erroneous results? Or can we ignore the failures for now?

blackyang · 2015-02-04T21:57:44Z

Same problem(6 failed test) at first, on OSX 10.9.5 with atlas, boost 1.57 and Anaconda Python 2.7. After downgrading boost to 1.55(others remain unchanged) and reinstall caffe, it works now

shelhamer · 2015-02-06T09:06:29Z

While I can't dismiss these numerical errors, the consolation is that these are isolated to PowerLayer, and quite rare at that with only 1-3 out of 120 elements out of tolerance, so only models that (1) define a POWER layer or make the rare choice of the WITHIN_CHANNEL mode of the LRN layer are at risk -- and even these might be ok. That said these errors are worth resolving.

The gradient checker fails on certain elements of the PowerLayer checks, but only 1-3 sometimes fail out of the 120 elements tested. This is not due to any numerical issue in the PowerLayer, but the distribution of the random inputs for the checks. boost 1.56 switched the normal distribution RNG engine from Box-Muller to Ziggurat.

shelhamer · 2015-02-06T09:48:35Z

I looked into this a little and @jeffdonahue was quick to note that boost RNG is used by all the fillers regardless of mode -- and I found this boost thread on RNG that notes the normal distribution RNG was rewritten for the 1.56 release. A little good old fashioned hand calculation confirmed this is nothing more than a precision error, so #1840 fixes this by reducing the step size for the finite-differencing.

There's no need to keep to boost 1.55.

(For those who like RNG the switch was from Box-Muller to Ziggurat.)

The gradient checker fails on certain elements of the PowerLayer checks, but only 1-3 sometimes fail out of the 120 elements tested. This is not due to any numerical issue in the PowerLayer, but the distribution of the random inputs for the checks. boost 1.56 switched the normal distribution RNG engine from Box-Muller to Ziggurat.

shelhamer added the compatibility label Oct 10, 2014

shelhamer mentioned this issue Oct 12, 2014

boost --with-python required on osx for pycaffe target #465 #1193

Closed

shelhamer mentioned this issue Jan 16, 2015

Support OS X Yosemite / 10.10 #1740

Merged

shelhamer changed the title ~~TestPowerGradientShiftZero, TestPowerGradient fail with vecLib on OS X~~ TestPowerGradientShiftZero, TestPowerGradient fail with vecLib with certain boost Jan 20, 2015

shelhamer changed the title ~~TestPowerGradientShiftZero, TestPowerGradient fail with vecLib with certain boost~~ TestPowerGradientShiftZero, TestPowerGradient fail with certain boost Jan 20, 2015

shelhamer mentioned this issue Feb 6, 2015

Fix PowerLayer gradient check failures by reducing step size #1840

Merged

shelhamer closed this as completed Feb 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestPowerGradientShiftZero, TestPowerGradient fail with certain boost #1252

TestPowerGradientShiftZero, TestPowerGradient fail with certain boost #1252

shelhamer commented Oct 10, 2014

mprat commented Nov 13, 2014

II-Matto commented Nov 14, 2014

mprat commented Nov 14, 2014

mprat commented Nov 14, 2014

geekan commented Nov 16, 2014

relh commented Jan 12, 2015

lou-k commented Jan 14, 2015

relh commented Jan 14, 2015

svanschalkwyk commented Jan 19, 2015

lazywei commented Jan 31, 2015

dgolden1 commented Feb 3, 2015

blackyang commented Feb 4, 2015

shelhamer commented Feb 6, 2015

shelhamer commented Feb 6, 2015

TestPowerGradientShiftZero, TestPowerGradient fail with certain boost #1252

TestPowerGradientShiftZero, TestPowerGradient fail with certain boost #1252

Comments

shelhamer commented Oct 10, 2014

mprat commented Nov 13, 2014

II-Matto commented Nov 14, 2014

mprat commented Nov 14, 2014

mprat commented Nov 14, 2014

geekan commented Nov 16, 2014

relh commented Jan 12, 2015

lou-k commented Jan 14, 2015

relh commented Jan 14, 2015

svanschalkwyk commented Jan 19, 2015

lazywei commented Jan 31, 2015

dgolden1 commented Feb 3, 2015

blackyang commented Feb 4, 2015

shelhamer commented Feb 6, 2015

shelhamer commented Feb 6, 2015