-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too much CPU used when using multipart - should have a way to throttle upload speed ? #32
Comments
If you want to throttle upload speed (and you're in control of the machine, and it's running some flavor of Linux, and etc...), take a look at /sbin/tc. It's not the most user-friendly tool out there, but it's very powerful. With a little bit of scripting you can run it before you start the glacier upload and it's probably the most effective way to throttle your bandwidth. For some inspiration, here's the relevant portion from the script I use: TC=/sbin/tc
IF=eth0
REGION="us-east-1"
IP=`dig +short +answer "glacier.${REGION}.amazonaws.com" A | grep -v '\.$' | tr '\n' ' '`
U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32"
$TC qdisc add dev $IF root handle 1: htb default 30
$TC class add dev $IF parent 1: classid 1:2 htb rate 200kbps
for ip in $IP; do
$U32 match ip dst $ip/32 flowid 1:2
done And to remove the filtering $TC qdisc del dev $IF root The nice part is that this technique works for any application not just the glacier command line tool. |
Well solution by @gburca is cool and i think it should solve most of the problems, but still we might implement speed throttle once there won't be any more important bugs to solve, so let's leave this ticket open. |
Looking back at this issue I suspect it had to do with memory use rather than upload speed (the original upload code would use 4-5 times block size - so files >100 MB would eat up 400-500 MB of RAM - not surprising a cloud host would baulk at such a resource demand). For throttling upload speed: at this moment glacier-cmd supports only a single upload thread at a time (now that could be an enhancement: allowing multiple uploads in parallel). It will use only as much speed as the system allows. Besides that I have no idea on how to throttle speeds, I think this would have to be done in boto, which is where the data is actually sent out. |
Yup, since we are migrating to boto this will most probably have to be done in the boto itself. Whether or not they will accept this or if they want this, I have no idea. I did a bit of research on the subject and it appears it can be done, but seems to be a bit complicated. See http://stackoverflow.com/questions/456649/throttling-with-urllib2 and http://pastie.org/3120175. It also appears twisted can do it but I would rather not mix Twisted into the equation if we can do it on our own. http://twistedmatrix.com/documents/10.1.0/api/twisted.protocols.policies.ThrottlingFactory.html |
I just had a quick look at the sources, and I think it'd be rather easy to implement because basically what they do is "send some data, wait a bit, send some more data, wait again" so that the overall rate is within a limit. We could do the same: send a part of data, wait a bit, send another part of data. But then you're not really limiting the rate, you're sending in bursts, saturating your pipe part of the time, sending nothing the rest of the time. |
Example in the doc will do. Thats completely in line with the whole Linux philosophy of having one tool for the job and in that fashion TC gives a lot more flexibility to our users that we could ever provide. |
This may not be easy to fix, but it's feedback, never bad to give...
I'm using glacier-cmd-interface to upload from DreamHost shared hosting to Amazon. however, for files that are bigger than 100MB I get:
If the file is less than 100MB things are OK.
The process is killed while in:
So it may be that we are sending too much / too fast. I've tried to throtle CPU usage, but to no avail.
I would suggest to add a way to throttle the upload speed (as an option): I would suppose it would fix this, and be useful for many people (you don't want backup upload to take all the bandwidth...)
Probably not easy to implement - but who know...
Since this library seems very useful, I thought it was worth reporting any issue I have ! thank you for this lib.
The text was updated successfully, but these errors were encountered: