Hotfix for accuracy interfering with training #5987

Noiredd · 2017-10-16T14:39:26Z

Instead of reusing "unused" memory of the bottom blob (diff), it's safer to allocate some internal blob (gpu_buffer_) to make sure the layer does not interact with external memory that might be potentially used elsewhere (as was demonstrated in #5981). Bug was initially introduced in 62e0c85.

In the long run, we might want to optimize this to eg. only allocate GPU memory (instead of an entire blob), or even take a closer look at what is exactly going on in that buffer (i.e. how much memory do we actually need). On the other hand, using Accuracy layer during training is not very common usage - so we might leave this PR pending for now (for the affected users to pull) and only accept after said optimizations.

…instead of reusing bottom blob

Noiredd · 2017-10-16T18:28:48Z

@shaibagon Can we optimize this memory-wise in any way? I.e. do we need the intermediate blob to be as large as the whole bottom[0]? If not, I say we merge. @shelhamer?

ghost · 2017-11-12T21:31:04Z

Hi,
what the progress on merging this fix? It is already almost a month for caffe not being able to calculate/display accuracy statistics in training phase.

shelhamer · 2018-01-29T01:31:04Z

@Noiredd see #6202 for a combined fix to the scratch usage of bottoms diffs. I decided to clear the diffs in that fix instead of needing separate memory. Please let me know what you think of that choice.

Hotfix for accuracy interfering with training: internal buffer added …

ddc0587

…instead of reusing bottom blob

Noiredd mentioned this pull request Oct 16, 2017

New Accuracy Layer on GPU interferes with training #5981

Closed

shelhamer added the bug label Oct 16, 2017

shaibagon mentioned this pull request Nov 22, 2017

Fixed a bug in AccuracyLayer. #6066

Closed

Noiredd mentioned this pull request Jan 3, 2018

Put the acc_data in a new syncedmemory block #6141

Closed

duygusar mentioned this pull request Jan 11, 2018

Loss output increases and always stops at 87.3365 while learning in GPU-mode, however it decreases and I can get quite good accuracy in CPU-mode. #6130

Closed

BVLC deleted a comment from duygusar Jan 29, 2018

shelhamer mentioned this pull request Jan 29, 2018

Clear Scratch Diffs to Prevent Contaminating Backward through Splits #6202

Merged

shelhamer closed this in #6202 Jan 29, 2018

Noiredd deleted the accuracy-hotfix branch February 12, 2018 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix for accuracy interfering with training #5987

Hotfix for accuracy interfering with training #5987

Noiredd commented Oct 16, 2017

Noiredd commented Oct 16, 2017

ghost commented Nov 12, 2017

shelhamer commented Jan 29, 2018

Hotfix for accuracy interfering with training #5987

Hotfix for accuracy interfering with training #5987

Conversation

Noiredd commented Oct 16, 2017

Noiredd commented Oct 16, 2017

ghost commented Nov 12, 2017

shelhamer commented Jan 29, 2018