Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel uploads #91

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Parallel uploads #91

wants to merge 25 commits into from

Conversation

wvmarle
Copy link
Contributor

@wvmarle wvmarle commented Oct 23, 2012

Parallel uploads; upload resumption; download resumption. All in one go - should be able to apply this to master without conflicts.

Wouter van Marle added 18 commits October 14, 2012 21:06
- switched to using archive_id as item key in the bookkeeping db.
- added updatedb command for people that are upgrading to change their db.
- updated readme to reflect changes.
- some minor clean-up of the files.
Works mostly (hash check currently fails and needs work).
It didn't work as it should...
If the flag --resume is given, glacier-cmd will check existing data in <out_file> and if it matches, continue the download from where it was.
Conflicts:
	glacier/GlacierWrapper.py
	glacier/glaciercorecalls.py
Mmap has a 2 GB limit; now mapping the relevant part of the files (either the part to be checked or the part to be uploaded) instead of attempting to map the complete file in one go.
Conflicts:
	glacier/GlacierWrapper.py
Conflicts:
	glacier/glacier.py
Added command line --sessions to give the number of upload sessions to use.

It is far from perfect: at the moment issues with upload processes dying (amazon rejects our signature, as if the key is invalid?!) and then the whole thing just hangs and needs to be killed manually. This needs a workaround.
@offlinehacker
Copy link
Contributor

Will we ever migrate upload/download process to boto? What are the plans. They have parallel upload support too.

@wvmarle
Copy link
Contributor Author

wvmarle commented Oct 24, 2012

Interesting, I missed that part of Boto. Will look into it, maybe it works better than my solution (I always get response errors).
It seems no progress updates; may consider expanding the class or even lifting the code and amending it.

Wouter van Marle and others added 4 commits October 25, 2012 12:20
- switched to using archive_id as item key in the bookkeeping db.
- added updatedb command for people that are upgrading to change their db.
- updated readme to reflect changes.
- some minor clean-up of the files.

Fix for updatedb.
It didn't work as it should...

Fix for Python <2.7

mmapping file now in parts, instead of trying to mmap it completely.

Fixed upload of large files.
Mmap has a 2 GB limit; now mapping the relevant part of the files (either the part to be checked or the part to be uploaded) instead of attempting to map the complete file in one go.

Hope to have improved reaction times on the connection (see issue uskudnik#71)

Added resumption of downloads (untested).

Download resumption stage 1.
Works mostly (hash check currently fails and needs work).

Implemented download resumption.

If the flag --resume is given, glacier-cmd will check existing data in <out_file> and if it matches, continue the download from where it was.

fix

Implemented parallel upload sessions.

Added command line --sessions to give the number of upload sessions to use.

It is far from perfect: at the moment issues with upload processes dying (amazon rejects our signature, as if the key is invalid?!) and then the whole thing just hangs and needs to be killed manually. This needs a workaround.

Updated documentation to include the new options and commands.
…acier-cmd-interface into parallel_uploads

Conflicts:
	glacier/GlacierWrapper.py
	glacier/glacier.py
@SitronNO
Copy link

I have pulled this branch using the following code:

git clone -b parallel_uploads git://github.com/wvmarle/amazon-glacier-cmd-interface.git amazon-glacier-cmd-interface_parallel_uploads

and then build it with:

cd amazon-glacier-cmd-interface_parallel_uploads/
sudo python setup.py install

However, it does not work:

$ glacier-cmd upload Test Privat/amazon/amazon_glacier_testfile.data --description "Random data"
Traceback (most recent call last):
  File "/usr/local/bin/glacier-cmd", line 9, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 811, in main
    args.func(args)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 302, in upload
    args.resume, args.sessions)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 264, in sdb_connect_wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 961, in upload
    part_size = self._check_part_size(part_size, total_size)
  File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 415, in _check_part_size
    part_size = self._next_power_of_2(total_size / (1024*1024*self.MAX_PARTS))
AttributeError: 'GlacierWrapper' object has no attribute 'MAX_PARTS'

Am I doing something wrong, or is there a bug somewhere?

@offlinehacker
Copy link
Contributor

There's a bug. Please uncomment line 100 in GlacierWrapper.py

MAX_PARTS = 10000

On Fri, Oct 26, 2012 at 12:53 PM, Vidar Hoel [email protected]:

I have pulled this branch using the following code:

git clone -b parallel_uploads git://github.com/wvmarle/amazon-glacier-cmd-interface.git amazon-glacier-cmd-interface_parallel_uploads

and then build it with:

cd amazon-glacier-cmd-interface_parallel_uploads/
sudo python setup.py install

However, it does not work:

$ glacier-cmd upload Test Privat/amazon/amazon_glacier_testfile.data --description "Random data"
Traceback (most recent call last):
File "/usr/local/bin/glacier-cmd", line 9, in
load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 811, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
return fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 302, in upload
args.resume, args.sessions)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 264, in sdb_connect_wrap
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 961, in upload
part_size = self._check_part_size(part_size, total_size)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 415, in _check_part_size
part_size = self._next_power_of_2(total_size / (1024_1024_self.MAX_PARTS))
AttributeError: 'GlacierWrapper' object has no attribute 'MAX_PARTS'

Am I doing something wrong, or is there a bug somewhere?


Reply to this email directly or view it on GitHubhttps://github.com//pull/91#issuecomment-9809306.

@offlinehacker
Copy link
Contributor

... and set it to 1000. Or change variable name of next line.

On Fri, Oct 26, 2012 at 12:57 PM, Jaka Hudoklin [email protected]:

There's a bug. Please uncomment line 100 in GlacierWrapper.py

MAX_PARTS = 10000

On Fri, Oct 26, 2012 at 12:53 PM, Vidar Hoel [email protected]:

I have pulled this branch using the following code:

git clone -b parallel_uploads git://github.com/wvmarle/amazon-glacier-cmd-interface.git amazon-glacier-cmd-interface_parallel_uploads

and then build it with:

cd amazon-glacier-cmd-interface_parallel_uploads/
sudo python setup.py install

However, it does not work:

$ glacier-cmd upload Test Privat/amazon/amazon_glacier_testfile.data --description "Random data"
Traceback (most recent call last):
File "/usr/local/bin/glacier-cmd", line 9, in
load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 811, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
return fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 302, in upload
args.resume, args.sessions)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 264, in sdb_connect_wrap
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 961, in upload
part_size = self._check_part_size(part_size, total_size)
File "/usr/local/lib/python2.7/dist-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 415, in _check_part_size
part_size = self._next_power_of_2(total_size / (1024_1024_self.MAX_PARTS))
AttributeError: 'GlacierWrapper' object has no attribute 'MAX_PARTS'

Am I doing something wrong, or is there a bug somewhere?


Reply to this email directly or view it on GitHubhttps://github.com//pull/91#issuecomment-9809306.

@wvmarle
Copy link
Contributor Author

wvmarle commented Oct 26, 2012

Whoops - only a 0 was supposed to go, not that S. My bad!

Anyway it seems that the 10,000 parts should also work now?

@SitronNO
Copy link

@wvmarle: Yes, both the current code and the code with MAX_PARTS = 10000 works. I have tested both. This code should be merged with the main branch, as it's fixing the bug I reported.

Wouter van Marle added 2 commits October 29, 2012 20:25
Now it checks whether processes are still alive, and waits until the last upload process exits. Then in case the queue is not empty, a single process is created to clear up the work queue.
@uskudnik
Copy link
Owner

uskudnik commented Nov 4, 2012

A week without any fixes - I will presume this is stable and merge tomorrow unless @wvmarle says otherwise and no new bugs are discovered.

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 5, 2012

As stable as it gets I think.
Haven't had much time recently to do anything with the code.

The only issue I have is the continuous and mysterious "response error" replies from Amazon...

@SitronNO
Copy link

SitronNO commented Nov 5, 2012

I get this issue with larger files:

Process Process-1:6.0 GB (76%). Average rate 374.80 KB/s, eta 20:28:50.
Traceback (most recent call last):
  File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 110, in upload_part_process
    writer.write(part, start=start)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 188, in write
    code=e.code)
ResponseException

At this point it just hangs, so I have to break (press CTRL+C) and that gives the following error:

^CTraceback (most recent call last):
  File "/usr/local/bin/glacier-cmd", line 9, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 811, in main
    args.func(args)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 302, in upload
    args.resume, args.sessions)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 264, in sdb_connect_wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/GlacierWrapper.py", line 1251, in upload
    time.sleep(1)   # manage timeouts and status updates etc.
KeyboardInterrupt

When this happens, I just repeat the same command, just adding a --resume and it goes on from where it broke.

Is this the same error you are referring to, or something you have not seen before?

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 5, 2012

Yes, that's the issue I'm referring to. Very irritating.

Dozens if not hundreds of parts are uploaded and accepted fine, and then suddenly the signature is not accepted (and by my understanding, signature is related to your login credentials - it's done by Boto and I've dug so deep as to know how that is done exactly).

@uskudnik
Copy link
Owner

@wvmarle Any luck tracking bug down? Do you know if it's boto issue or Amazon?

@offlinehacker
Copy link
Contributor

The problem here is different. First of all boto upload implemementation
sucks. When you are doing uploads, calls to http functions they must be
able to timeout and you must alway expect that something will fail and
detect what failed. If it's disk corruption you must have an option to
verify uploads if it's network error you also must be ablo to reapload.
Amazon has great api allowing to reapload any part, so if uploading some
part fails you can simply upload once again.
Upload is critial part. If that is not working you can throw this or any
app using this in trash.

...so that's why i'm reimplementing whole upload part fixing some if not
all of the problems.
On Nov 11, 2012 2:53 AM, "Urban Škudnik" [email protected] wrote:

@wvmarle https://github.com/wvmarle Any luck tracking bug down? Do you
know if it's boto issue or Amazon?


Reply to this email directly or view it on GitHubhttps://github.com//pull/91#issuecomment-10262310.

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 11, 2012

@uskudnik : nothing done yet on this one. Just got myself a set of new toys (including an Epson wifi printer: took me 6 hours to get that installed!! Had to hunt down an unofficial ISO of the installation CD as I don't have CDROM players anymore and the official downloads are broken...) and a new netbook :-) So my priorities are distracted :-)
It is an issue that must be tracked down. The hardest part is that it's an error that I don't know how to trigger intentionally.

@offlinehacker : you mean you're re-implementing the boto upload routines? Wasn't that present in the original glaciercorecalls.py file already?
Time-out I took care of already; other parts not other than indirectly through the resume function (which has no problem with non-sequential parts to upload).

@offlinehacker
Copy link
Contributor

@wvmarle : I am reimplementing whole uploaded and still deciding if i will
use this upload_part routine from boto or not. The problem is current
implementation from us and from boot is not good, especially for parallel
uploads. Functionality is implemented in one function doing everything and we hope it does not crash. I am taking a lot of code from you and make it a little bit better. And please don't understand wrong your work here is awsome, the problem is upload must not have mistakes and must be implemented without bugs! One thing to say everything will hopefully work when i complete this ;)

Currently the most helpfull part for me would be better formating of
exceptions, handling that cause variable, which btw is awsome, and
printing whole exception tree(this should work, at least that
CausedException had that and was removed somwhere in betwene?). This way we
will be able to debug much easier. So instead of trying to debug upload
routines once again and again on not very good implementations please help
me with implementing things above. I will make some commits during the day
and if somebody will test new functionality or write tests i will be also
very happy, or else i will have to write tests, before we will even merge anything in master!

And please start writing tests, before you implement anything else, or we
will end up in a blob of non working code!

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 11, 2012

CausedException is integrated into GlacierException, and the stack trace is dumped in the log file at DEBUG level. This as the users normally don't need to see this, and this way developers can still get it.

Any non-caught exceptions of course dump the stack trace to screen.

Agreed upload must not have bugs; writing tests otoh is also not easy until we fully and thoroughly understand Amazon's responses (like this response error issue) to be able to simulate errors.

@offlinehacker
Copy link
Contributor

Tnx, but was wondering because we have a lot of copy-pastes without full
traces. But if exception occurs, why not printing the whole stack trace to
user, they won't understand it mostly anyway but we will. Those exceptions
that must be pretty printed of course need to be handled differently.

On Sun, Nov 11, 2012 at 11:54 AM, wvmarle [email protected] wrote:

CausedException is integrated into GlacierException, and the stack trace
is dumped in the log file at DEBUG level. This as the users normally don't
need to see this, and this way developers can still get it.

Any non-caught exceptions of course dump the stack trace to screen.

Agreed upload must not have bugs; writing tests otoh is also not easy
until we fully and thoroughly understand Amazon's responses (like this
response error issue) to be able to simulate errors.


Reply to this email directly or view it on GitHubhttps://github.com//pull/91#issuecomment-10265147.

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 11, 2012

Exceptions done in that way as I want to make it look a lot better for the end users, while still being able to get to the stack trace if really needed.
For most of these exceptions (vault not found, invalid file name, etc) the stack trace doesn't have any meaning anyway. Actually for all the exceptions that we catch that should be the case, as the software behaves as expected in those situations. And all non-handled exceptions will have a stack trace no matter what.

We may consider having a constant defined say DEVELOPMENT = True at the start of the script, and then dump a stack trace based on this key. Setting it to False when an actual release is done.

@offlinehacker
Copy link
Contributor

Yes that flag would be cool, i support ;)

I've also committed almost finished, but completely untested new upload implementation available on my github(don't even try to run it it won't start), but you can see the core ideas(function _upload in GlacierWrapper and class Part in corecalls). Completely same code can upload using multiprocessing and without it, using mutiprocessing.imap or itertools.imap and some "hacks" behind it. It has proper mmap support and allows to resume if data comes from nonsequential and sequential input(by switching to reading instead of mmaping and disabling multiprocessing upload). If you have some notes on what additional exceptions should i take care of please do tell me.

I will hopefully finish it tomorrow(without tests, which will come in later days) also taking quite some code from this commit.

@wvmarle wvmarle mentioned this pull request Nov 12, 2012
@skin
Copy link

skin commented Nov 19, 2012

@ wvmarle Hi, i tried to use your parallel-upload branch but it seems to have some problem with files greater than 2GB.
It's something related to mmap.mmap call on glaciercorecalls.py :

part = mmap.mmap(fileno=f.fileno(),
length=stop-start
offset=start,
access=mmap.ACCESS_READ)

I guess this error is similar to that one #99

@wvmarle
Copy link
Contributor Author

wvmarle commented Nov 20, 2012

Yes, same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants