-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3 sync from local to s3 bucket re-uploads files with version 1.3.6 #749
Comments
Can you share the filename? If it's related to 718 it would have whitespace in the filename. |
There certainly are whitespaces in full path (not necessarily in filename) as below. Should it not have been fixed with #718?
|
Yep. Seems to be working, here's what I tried:
Perhaps for some reason after determining that the file needs to be synced the actual file transfer fails. Can you confirm that the sync commands exits cleanly (RC of 0) and that the file in question is actually uploaded to S3 successfully? |
Your test works for me too, but there is something else going on with the files I have. Files are indeed re-uploaded every time successfully. In fact my local files and S3 files (50,000+) are in sync except for 13 files, and 1.3.6 want to upload 1200+ files every time. I just did some tests with same set of local files and S3 bucket and:
Please let me know if you need anything from me to identify this issue. I've downgraded to 1.3.1 (last working version) for now. |
This is similar to issue #648 regarding In that issue files move the opposite direction (S3 to local disk) but there are no spaces in the file names (we replace them with underscores). |
I was just looking into this a little bit. Seems to be that the two lists being analyzed in the comparator get out of sync at some point. From what I can tell, it's going along just fine and then suddenly a retry is needed. Then there's a big chunk of xml showing another set of keys. I think it's overwriting dest_files with this new list instead of appending. I added a simple print of src_file.compare_key and dest_file.compare_key. Before retry: ... Then the retry. Then: 2014-04-14 17:18:05,902 - botocore.hooks - DEBUG - Event service-created: calling handler <function register_retries_for_service at 0x1e7bc08> Again, the dest file that is not working is the first file returned in the new chunk of xml. Hope that helps.. |
Hi, I don't really know how to use github, but I have a patch that fixes the issue. The problem is botocore/paginate.py is not unquoting the "next_marker" key. When it creates the query for the next request, the "+" in the key gets encoded as %2B.
|
Awesome, thanks for your help. Looking at this now... |
Just wanted to give an update here. I'm able to repro the issue and can confirm what @jonbrock has said. I want to work on a set of test cases here to ensure we don't regress on this again. I'll send a PR once I have this worked out. |
Fixes aws#749. This was a regression from the fix for aws#675 where we use the encoding_type of "url" to workaround the stdlib xmlparser not handling new lines. The problem is that pagination in s3 uses the last key name as the marker, and because the keys are returned urlencoded, we need to urldecode the keys so botocore sends the correct next marker. In the case where urldecoded(key) != key we will incorrectly sync new files.
Fixes aws#749. This was a regression from the fix for aws#675 where we use the encoding_type of "url" to workaround the stdlib xmlparser not handling new lines. The problem is that pagination in s3 uses the last key name as the marker, and because the keys are returned urlencoded, we need to urldecode the keys so botocore sends the correct next marker. In the case where urldecoded(key) != key we will incorrectly sync new files. Also added an integ test for syncing with '+' chars.
Fixes aws#749. This was a regression from the fix for aws#675 where we use the encoding_type of "url" to workaround the stdlib xmlparser not handling new lines. The problem is that pagination in s3 uses the last key name as the marker, and because the keys are returned urlencoded, we need to urldecode the keys so botocore sends the correct next marker. In the case where urldecoded(key) != key we will incorrectly sync new files. Also added an integ test for syncing with '+' chars.
Ok I believe this issue has been fixed (#755). Thanks to everyone for their help in debugging this issue. |
* release-1.3.7: (28 commits) Bumping version to 1.3.7 Add #742 to changelog Add a comment about why get_stdout_text_writer is needed. Code review feedback Py3 integ test fixes Update changelog with #749 Add compat layer for text based stream writers Fix S3 sync issue with keys containing urlencode values Add issue to changelog Remove print statement in test Fix issue with scalar/non-scalar lists Fix doc example for s3api put-object Refactor load-cli-arg common event code Add 750 to the changelog Update paramfile custom argument to use events Aggregate dupe keys into a list in datapipeline translation Add issue to CHANGELOG Do not auto parse JSON based on filename Update tests to not mock __builtin__.open Allow custom param values to be read from files/urls ...
It is probably related to #718 but it is still happening on my OSX Mavericks with Python 2.7. I sync local files to S3 using "aws s3 sync" but some files always want to re-upload. I did a debug and comparator thinks that "file does not exist at destination".
I even removed all scripts and python packages for awscli, botocore and jmespath, and then reinstalled latest 1.3.6 but no luck. I use awscli for s3 sync since version 1.0 without any issues and have only started to experiences such issues starting with 1.3.2.
Appreciate if you could look into it. Let me know if additional information was needed.
The text was updated successfully, but these errors were encountered: