Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1686] Write list_objects, delete_object, and tests for boto3 s3_hook #2716

Closed
wants to merge 1 commit into from

Conversation

andyxhadji
Copy link
Contributor

@andyxhadji andyxhadji commented Oct 22, 2017

Dear Airflow maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:
  • added useful boto3 methods to s3 hook, as well as new test file and tests.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
  • s3 hook unit test file

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-io
Copy link

codecov-io commented Oct 22, 2017

Codecov Report

Merging #2716 into master will increase coverage by 0.11%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2716      +/-   ##
==========================================
+ Coverage      73%   73.12%   +0.11%     
==========================================
  Files         156      156              
  Lines       11889    11899      +10     
==========================================
+ Hits         8680     8701      +21     
+ Misses       3209     3198      -11
Impacted Files Coverage Δ
airflow/hooks/S3_hook.py 38.29% <66.66%> (+7.34%) ⬆️
airflow/jobs.py 79.62% <0%> (+0.44%) ⬆️
airflow/utils/helpers.py 56.32% <0%> (+2.87%) ⬆️
airflow/task_runner/bash_task_runner.py 100% <0%> (+6.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b06b66...6166bb4. Read the comment docs.

@andyxhadji andyxhadji closed this Oct 22, 2017
@andyxhadji andyxhadji reopened this Oct 22, 2017
@bolkedebruin
Copy link
Contributor

Please review your commit message.

@andyxhadji andyxhadji force-pushed the 1686 branch 2 times, most recently from ff73ed2 to 6b3db74 Compare October 23, 2017 12:57
@andyxhadji
Copy link
Contributor Author

@bolkedebruin Updated commit message, please review when you get a chance!

@andyxhadji
Copy link
Contributor Author

Note, this is one of the follow-ups to #2532

Delimiter=delimiter)
return [p.Prefix for p in response['CommonPrefixes']] if response.get('CommonPrefixes') else None

def list_objects_v2(self, bucket_name, prefix='', delimiter=''):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a great method name, and the docs don't make it clear when to use this over otherlist_methods.

Do we need a separate method that just calls the underlying method? There's nothing to stop the callers doing s3_hook.get_conn().list_objects_v2 directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reasoning for this here was for use in conjunction with an operator that I have, which utilize this method directly, in which case having it here and well-tested would be important to ensure sustainability of the operator. Since the operator is out of the scope of the PR, perhaps I should remove the method & test here and reserve it for a later PR if it's usage is more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was mainly around the function name, specifically the _v2 suffix. That and the docs don't make it clear the difference between this and prefixes, keys.

It is probably a useful addition still.

Adds additional functionality for s3_hook, using boto3 library. List_objects and delete_object methods are useful for the common cleanup task of removing older files in an s3 bucket. Also adds the s3_hook test file, with corresponding tests for the additions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants