Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optionally delete from S3 what was NOT uploaded #3117

Conversation

peterbe
Copy link
Contributor

@peterbe peterbe commented Mar 3, 2021

Part of #2224

The reason this PR does not claim to resolve #2224 is partly because 1) this new feature is not enabled yet in prod-build.yml, and 2) let's close the issue once we know it has fully worked in production.

I'm not fond of the name --delete-unrecognized. But writer's block is real so I plowed through with it. --delete-old is equally bad (or worse) because time isn't important (exception for static assets).
What about --delete-leftovers? Or --delete-remaining? Or --delete-not-uploaded?

I've been testing this locally with my personal S3 bucket which is only the en-US stuff. I have been uploading to that for testing the Deployer dating back to mid-2020. So my S3 bucket was full of all sorts of junk:

▶ DEPLOYER_LOG_EACH_SUCCESSFUL_UPLOAD=false poetry run deployer  upload --content-root=/Users/peterbe/dev/MOZILLA/MDN/content/files --bucket=peterbe-yari --delete-unrecognized /tmp/build
Deployer (0.3.0)
Upload files from: /tmp/build
Upload redirects from: /Users/peterbe/dev/MOZILLA/MDN/content/files
Upload into: main/ of peterbe-yari
Total pending redirect uploads: 12,197 (10.2ms)
Total pending file uploads: 48,280 (1.4s)
Total existing S3 objects: 77,407 (21.7s)
  [▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋]  60477/60477  100%
Total uploaded files: 0 (0B)
Total uploaded redirects: 12,197
Total skipped files: 48,280 matched existing S3 objects
Total upload/skip time: 34.6s
Total pending task deletions: 16,886
  [▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋]  16886/16886  100%
Total deleted keys: 16,886
Done in 1m28s.

And if I run it a second time, immediately after I get:

▶ DEPLOYER_LOG_EACH_SUCCESSFUL_UPLOAD=false poetry run deployer upload --content-root=/Users/peterbe/dev/MOZILLA/MDN/content/files --bucket=peterbe-yari --delete-unrecognized /tmp/build
Deployer (0.3.0)
Upload files from: /tmp/build
Upload redirects from: /Users/peterbe/dev/MOZILLA/MDN/content/files
Upload into: main/ of peterbe-yari
Total pending redirect uploads: 12,197 (9.9ms)
Total pending file uploads: 48,280 (1.4s)
Total existing S3 objects: 60,521 (19.0s)
  [▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋]  60477/60477  100%
Total uploaded files: 0 (0B)
Total uploaded redirects: 12,197
Total skipped files: 48,280 matched existing S3 objects
Total upload/skip time: 35.8s
Total pending task deletions: 0
Total deleted keys: 0
Done in 56.2s.

Here, note the lines:

Total pending task deletions: 0
Total deleted keys: 0

@peterbe
Copy link
Contributor Author

peterbe commented Mar 3, 2021

To double-check, I also manually uploaded some random file from my desktop into some random folder:
Screen Shot 2021-03-03 at 2 00 07 PM
Run the deployer again:

Upload files from: /tmp/build
Upload redirects from: /Users/peterbe/dev/MOZILLA/MDN/content/files
Upload into: main/ of peterbe-yari
Total pending redirect uploads: 12,197 (13.1ms)
Total pending file uploads: 48,280 (2.1s)
Total existing S3 objects: 60,522 (19.6s)
  [▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋]  60477/60477  100%
Total uploaded files: 0 (0B)
Total uploaded redirects: 12,197
Total skipped files: 48,280 matched existing S3 objects
Total upload/skip time: 37.4s
Total pending task deletions: 1
Total deleted keys: 1
Done in 59.2s.

And refresh in the AWS S3 console UI:
Screen Shot 2021-03-03 at 2 01 12 PM

Copy link
Contributor

@escattone escattone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbe Overall, nicely done, but I do have some comment nits, some suggested changes, and one bug fix.

deployer/src/deployer/main.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
deployer/src/deployer/upload.py Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
@peterbe peterbe requested a review from escattone March 11, 2021 20:11
This reverts commit 4bfd17c.
Copy link
Contributor

@escattone escattone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @peterbe!

@escattone escattone merged commit 671cbc0 into mdn:main Mar 11, 2021
@peterbe peterbe deleted the 2224-optionally-delete-from-s3-what-was-not-uploaded branch March 11, 2021 20:51
peterbe added a commit to peterbe/yari that referenced this pull request Jun 1, 2021
* optionally delete from S3 what was NOT uploaded

Part of mdn#2224

* Apply suggestions from code review

Co-authored-by: Ryan Johnson <[email protected]>

* more feedbacked

* rename the option properly

* python lint

* Revert "python lint"

This reverts commit 4bfd17c.

Co-authored-by: Ryan Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deployer doesn't handle deleted documents or redirects
2 participants