Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug and Fix: Memory Leakage #41

Merged
merged 2 commits into from
Dec 9, 2020
Merged

Bug and Fix: Memory Leakage #41

merged 2 commits into from
Dec 9, 2020

Conversation

jhliu17
Copy link
Contributor

@jhliu17 jhliu17 commented Dec 8, 2020

Bug Details:
I try to extract the RoI features in my own dataset (a big dataset including more than 50k+ pictures), the extract_features.py will continually allocate the memory without releasing and finally exceed the memory limitation. The main reason is due to the ray memory management mechanism which will hold the memory until the task nodes are deleted. So the way we generate npz file is infeasible, which will cause memory leakage.

Solution
I reorganize the save mechanism without hurting the speed performance which can reach 6.48it/s on average, also be faster than before, while the memory usage is controllable.

Copy link
Contributor

@Zoroaster97 Zoroaster97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but there are some syntax errors in line 119, 128, 152 ( unmatched parentheses )

@Zoroaster97
Copy link
Contributor

Thanks for your concern. In our current method, if the number of CPUs is not enough, it will cause memory leakage problem indeed, because the bottleneck of processing speed is in CPU( in NMS). When we running it on our machine which has 32 CPU cores and 4 TITAN V GPUs, the memory usage is stable, because the computing capability of CPU is matched with GPU. And when we limit the number of CPUs on 8, the mismatch of computing capability leads to the memory leakage problem, it could be something like waiting queues stacking in memory.

@Zoroaster97
Copy link
Contributor

When we are using 32 CPUs & 4 GPUs, 16CPUs & 2 GPUs, and 8 CPUs & 1 GPU, the speed is about 15it/s, 7.4it/s, and 3.7it/s respectively, and without memory leakage. So if you have enough CPU cores or take a appropriate match strategy of CPUs and GPUs, the current method could be powerful and safe. Unfortunately, the appropriate match strategy cannot be detected automatically. So we decide to keep your solution as a safety method. Thanks again!

@jhliu17
Copy link
Contributor Author

jhliu17 commented Dec 9, 2020

😊 Thanks for your experiments verifying the true reason is the mismatch of computing capability leads to the memory leakage problem. In my extraction process, I used 4 CPUs & 3 GPUs and that is why the memory usage grows up quickly.

unmatched parentheses
Copy link
Contributor

@Zoroaster97 Zoroaster97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix possible memory leakage problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants