-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak? #33
Comments
I haven't seen this but I'll definitely look into it. Is this a recent issue? i.e. do you think there has been a regression in memory management? To help reproduce the issue, can you let me know which version of go and operating system this was compiled with? |
It's first time I've used it on a huge object. |
I've made some changes on master that will reduce the memory usage. Please test and let me know if it helped. The main cause of memory usage is due to part size increases. These are necessary in order to reach the maximum S3 object size of 5 TB while remaining within the 10,000 part limit imposed by S3. Part size is doubled every 1000 parts if necessary. For your case of a 671 GB stream starting with the default part size of 20 MB, this means that the part size will be 640 MB by the end of the upload. The best case memory usage, ignoring memory usage other than part buffers, would be 640 * 11 buffers = 7 GB. In practice it may be less since not all buffers may be used at the largest part size and they are only grown if necessary. In order to reduce memory usage for a large stream you can set the initial part size lower or reduce concurrency. There are some additional things that can be done to further reduce memory usage:
|
The buffer pooling code has been completely rewritten to precisely use byte slices instead of the byte.Buffer types that were used previously in #34. This reduces overall memory usage and gc pressure and byte slices are only allocated as needed. |
With #34 and the other changes, I believe this is fixed. Memory efficiency is about as good as possible with go, including when slice growth is necessary. Closing for now. @hrez, can you test and let me know if you still see issues. Compiling with go 1.3 may help as well since there were a number of gc-related improvements since go 1.2. |
go 1.3 doesn't work on RHEL5 complaining on "kernel too old". |
You're welcome, hope it helps. As far as RHEL5 and go 1.3 you can cross-compile a linux binary with go1.3 (or grab it from the releases if you prefer) and that should run fine on RHEL5 since go binaries have essentially no dependencies. I have run the gof3r cli thousands of times per day on a legacy server running RHEL 5.4 with no observed issues. |
I did the same upload with 0.4.8 binary. |
Thanks for testing. I'm surprised that the memory usage was that large, as I didn't see usage that large in my tests of large files while making the buffering changes. Was this new upload the same size, 671 GB, and on default settings? I'll test again, as I wouldn't expect memory usage to exceed even a couple GB for an upload of that size with 0.4.8. |
Correct. Same size, default options and binary from https://github.com/rlmcpherson/s3gof3r/releases/download/v0.4.8/gof3r_0.4.8_linux_amd64.tar.gz |
Thanks for the clarification. I'll retest to try to replicate and profile to identify memory issues. |
If it's of any use here are process stats. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |
What I said earlier was incorrect. At over 600 GB and starting at the default part size of 20 MB, the part size would have increased to 320 MB. At a minimum of 12 buffers (1 for the current write, 10 uploading, 1 preallocated in the buffer pool), the buffers alone will use 3840 MB. I tested a 1 TB upload with default settings and the maximum memory usage was stable at 6.8 GB after the last part size increase to 320 MB at the 600 GB threshold. While 6.8 GB is obviously larger than that theoretical minimum of 3.8 GB, most of this extra usage is due to the go garbage collector being slow to return memory to the operating system after the previous-sized buffers are garbage collected. I haven't found a good solution to this yet. Calling debug.FreeOSMemory() actually seems to result in worse memory usage overall. I don't have a good explanation for why you would see higher memory usage but it could possibly be related to differences between your CentOS version and Ubuntu 12.10 that I was testing on. I may do some more profiling, but in general for larger uploads lower concurrency is recommended if memory is an issue. |
Closing, as memory usage was reduced the change to slices and other mitigations e.g. #62. |
Apologies for resurrecting this issue, but we've been experiencing memory issues with gof3r. We've had a daily backup running of a mysql db for some time, that has been slowly growing in size. The last successful backups are around 1.1TB. Recently it's been failing on S3's 10k part count limit, so I tried bumping up the initial The actual command being used is:
The machine has 16GB, and baseline usage by other processes (mostly mysql) tends to use only about 42% of that. This machine is running CentOS 6.8. |
Hi,
Can something be done about memory consumption?
I've "gof3r put" with default options of what turned out to be 671Gb object.
It took 416m56.254s and gof3r memory utilization grew in the end to 21G virtual and 14G residential.
gof3r version 0.4.5
The text was updated successfully, but these errors were encountered: