Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting malloc(): memory corruption #185

Closed
chuckcho opened this issue Aug 3, 2017 · 14 comments
Closed

Getting malloc(): memory corruption #185

chuckcho opened this issue Aug 3, 2017 · 14 comments
Labels

Comments

@chuckcho
Copy link

chuckcho commented Aug 3, 2017

Issue summary

It built with a problem. When I run it on sample images, it complains about memory corruption. Please advise.

Executed command (if any)

distribute/bin/openpose.bin --image_dir /home/ubuntu/tmp/frames --write_images /home/ubuntu/tmp/output/frames --no_display --write_keypoint_json /home/ubuntu/tmp/output/  --logging_level 0

OpenPose output (if any)

Starting pose estimation demo.
src/openpose/utilities/flagsToOpenPose.cpp:flagsToProducer():95
src/openpose/utilities/flagsToOpenPose.cpp:flagsToProducerType():66
src/openpose/utilities/flagsToOpenPose.cpp:flagsToPoseModel():14
src/openpose/utilities/flagsToOpenPose.cpp:flagsToScaleMode():38
examples/openpose/openpose.cpp:openPoseDemo():195
Configuring OpenPose wrapper. In examples/openpose/openpose.cpp:openPoseDemo():198
./include/openpose/wrapper/wrapper.hpp:configure():432
Auto-detecting GPUs... Detected 1 GPU(s), using them all.
./include/openpose/wrapper/wrapper.hpp:configure():612
./include/openpose/wrapper/wrapper.hpp:configure():838
Starting thread(s)
./include/openpose/wrapper/wrapper.hpp:configureThreadManager():1146
./include/openpose/thread/threadManager.hpp:exec():163
./include/openpose/thread/queueBase.hpp:addPusher():360
./include/openpose/thread/queueBase.hpp:addPusher():360
./include/openpose/thread/threadManager.hpp:exec():168
./include/openpose/thread/thread.hpp:startInThread():138
./include/openpose/thread/thread.hpp:startInThread():138
./include/openpose/thread/thread.hpp:threadFunction():182
./include/openpose/thread/thread.hpp:threadFunction():185
./include/openpose/thread/thread.hpp:threadFunction():182
./include/openpose/thread/thread.hpp:threadFunction():185
./include/openpose/thread/thread.hpp:threadFunction():182
Starting initialization on thread. In src/openpose/pose/poseExtractorCaffe.cpp:netInitializationOnThread():44
Finished initialization on thread. In src/openpose/pose/poseExtractorCaffe.cpp:netInitializationOnThread():67
Starting initialization on thread. In src/openpose/pose/poseRenderer.cpp:initializationOnThread():81
Finished initialization on thread. In src/openpose/pose/poseRenderer.cpp:initializationOnThread():88
./include/openpose/thread/thread.hpp:threadFunction():185
*** Error in `distribute/bin/openpose.bin': malloc(): memory corruption (fast): 0x0000000000a2ee90 ***
Aborted (core dumped)

Type of issue

  • Execution error

Your system configuration

Operating system
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

CUDA version
CUDA Version 8.0.61

cuDNN version:
#define CUDNN_MAJOR 6
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 21

GPU model
Tesla K80

Caffe version
Default from OpenPose

OpenCV version
OpenCV 3.3.0 built w/ CMake Compiler + gcc v4.8.4

@gineshidalgo99
Copy link
Member

gineshidalgo99 commented Aug 4, 2017

Please, try to use OpenCV 2.4 to see if its a OpenPose-OpenCV error

@gineshidalgo99 gineshidalgo99 added the help wanted/question Extra attention is needed label Aug 4, 2017
@chuckcho
Copy link
Author

chuckcho commented Aug 8, 2017 via email

@lvboodvl
Copy link

try to run it with "sudo"

@swframe
Copy link

swframe commented Aug 22, 2017

Also try with valgrind but note it can take around 24 hours to crash because valgrind slows things down a lot.

@baardkrk
Copy link

baardkrk commented Aug 24, 2017

Hello! I think I have the same problem as you. I ran valgrind, but I'm not really proficient at reading the output. Can you see anything wrong, except for valgrind crashing for some reason? (I haven't tried compiling with opencv 2 yet, but that is the next step.)
valgrind-output.txt
seems to me that it has something to do with op::floatPtrToUCharCvMat See "invalid write size of 1" at line 106 in the output.

@swframe
Copy link

swframe commented Aug 24, 2017

I agree. You're seeing the same error. I'm not sure yet, if it's related to the malloc crash; there could be additional problems. Unfortunately, I don't know opencv and C++ well enough to spot the problem. The code looks fine but it is very scary that it accesses a raw pointer instead of using a method.

I have a few ideas that I will look into in the next few days:

  1. Create a smaller version of the problem so I can figure out how to fix it. (I need to review the opencv API and examples).
  2. Review the code with someone who knows opencv better to see if they can identify the problem.
    (see http://answers.opencv.org/question/172949/help-finding-memory-corruption-in-code-that-writes-to-a-cvmat/)
  3. Stop the loop early, to prevent the valgrind error in order to see if there are other problems that need to be fixed.

@swframe
Copy link

swframe commented Aug 27, 2017

Some progress but not enough.

I have not tried '1' yet.
I have not gotten a response from anyone for '2'.
I tried '3'. The code copies a float image to a cv:Mat and I changed it so it doesn't copy the last row and column. Openpose now gets further. It stops due to another memory corruption. Unfortunately, the valgrind error was very cryptic. I ran it in gdb and got better stack trace. There is another memory corruption in include/openpose/core/wOpOutputToCvMat.hpp.
I tried a few things and decided to just comment out the body of void WOpOutputToCvMat<TDatums>::work(TDatums& tDatums).

That got the process to run further but it crashes again. That last crash looks like it happened when the process was exiting. I did get a openpose window and it drew a bunch of pictures quickly. Unfortunately, I didn't see any poses. I think the code called from WOpOutputToCvMat needs to be fixed instead of commented out.

Sounds bad but I hope that these code areas are main problem:
op::floatPtrToUCharCvMat
op::WOpOutputToCvMat

Both functions are accessing a cv:Mat. Given the checks in the code, I suspect the memory wasn't allocated properly or accessing the raw internal data ptr no longer works as expected.
I will try option '1' to see if I can rewrite the code.

@swframe
Copy link

swframe commented Aug 28, 2017

Yeah! I found the major problem:
The calls to cv::Mat{rows, cols, type} don't actually initialize the instance as expected.
I'm not sure why. Maybe we've not set the compiler options correctly.

The fix is to change to cv:Mat(rows, cols, type) instead.
In src/openpose/utilities/openCv.cpp function floatPtrToUCharCvMat
I change to cv::Mat(resolutionSize.y, resolutionSize.x, CV_8UC3).

I also changed src/openpose/core/opOutputToCvMat.cpp OpOutputToCvMat::formatToCvMat
so the local variable cvMat is initialized
cv::Mat cvMat(mOutputResolution.y, mOutputResolution.x, CV_8UC3);

Finally, in src/openpose/utilities/openCv.cpp function floatPtrToUCharCvMat
I also changed *(cvMat.ptr<uchar>(y) + x*resolutionChannels + c) = value; to
*(cvMat.ptr<uchar>(y, x) + c) = value; which should be much safer.

I still get a crash when the executable exits in google::protobuf::internal::DestroyDefaultRepeatedFields.
I will have a look at it later. It is called by an on-exit clean up function.

I suspect there might be more compiler related problems but at least we know where to look.

@gineshidalgo99
Copy link
Member

gineshidalgo99 commented Aug 28, 2017

(I'm sorry it took me some hours to do the fix based on @swframe answer)

Thank you @swframe and everyone else for your feed-back! I've just pushed a fixed based on @swframe message:

  1. cv::Mat{} replaced by cv::Mat(). I have actually replace it in the whole OpenPose, since it was used in most modules. Let me know if I have skipped some cv::Mat. Reason: OpenCV incorporated a new array initialization to its cv::Mat class, so it overrides the previous behavior provoking this error.
  2. src/openpose/core/opOutputToCvMat.cpp: I have rather fixed it inside floatPtrToUCharCvMat so that people using the API and the function by itself does not find this mistake too. It should be equivalent to your solution, but let me know if it fails (given that it works for me).
  3. *(cvMat.ptr<uchar>(y) + x*resolutionChannels + c) = value;: I've finished a more efficient version but it will not be as safe as yours... this is of crucial importance in multi-GPU setting so I prefer speed over clarity/safeness here... But it should work fine, and it's faster and more clear than the original one
  4. About your pending crash in protobuf: I'll be waiting for any further feed-back. Thank you again!

I cannot replicate it at the moment since I have a different OpenCV version and I cannot uninstall it right now... So any further feed-back is welcome!

Please, let me know if it works now. Thanks!

@baardkrk
Copy link

Thank you very much for your help! I've pulled the new version and compiled. The example works (tried with face/hands in video, as well as with the example media.
Though I believe I still get the same error as @swframe, as it crashes on exit. I'm continuing to investigate.

I don't know if this is relevant, but the issue that got fixed (using parentheses instead of swirly brackets) is also present in any other program when compiling with g++ without adding the flag -std=c++11 or -std=c++0x. However I tried recompiling opencv with ENABLE_CXX11 but no effect. -std=c++11 is already in the 'COMMON_FLAGS' parameter in openpose, so I don't think the issue is there either.

@gineshidalgo99
Copy link
Member

Good to hear that. If you find the exit crashing reason, just let me know and I'll also fix it, since I cannot debug it myself I have no idea where it comes from. Thanks!

@swframe
Copy link

swframe commented Aug 29, 2017

FYI: I googled the protobuf on-exit crash. It is caused when a proto message has a field reference to another message and that referenced message is freed twice. There is better way to allocate the referenced message. Unfortunately, I was not able to find the code that is doing the improper allocation; it seems to be in caffe. I will see if I can enable additional logging (or if valgrind can help).

@gineshidalgo99
Copy link
Member

OK thanks. The interesting part is that only happen with some compiled OpenCV versions, it did not happen e.g. with the default apt-get libopencv-dev. So might be that OpenCV also use it and that's why both OCV and Caffe try to free it?

@gineshidalgo99 gineshidalgo99 added 3rd party (unsupported - might not reply) duplicate This issue or pull request already exists issue template not followed: read posting rules... This doesn't seem right and removed help wanted/question Extra attention is needed labels Dec 18, 2017
@gineshidalgo99
Copy link
Member

Duplicated of #68. Fixed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants