Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy manifest and bundle from S3 to jenkins #316

Closed
wants to merge 36 commits into from

Conversation

meghasaik
Copy link

Description

[Describe what this change achieves]
This change basically achieves two things:

  1. Copying the files from S3
  2. Putting the files into S3

Issues Resolved

[List any issues this PR will resolve]
Issue No. 260
#260

Check List

  • Signed-off-by: Megha Sai Kavikondala [email protected]
    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@meghasaik meghasaik changed the title Created methods for 1. copying the files from S3 to jenkins and 2. putting the files to S3 Copy manifest and bundle from S3 to jenkins Aug 25, 2021
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This needs to be called by something before we merge.
  2. Dependencies need to be added into Pipfile.
  3. It needs some OO, create a class that represents a bucket/destination and implement put/get methods on that. Maybe a Builder-like pattern, maybe Publisher that we use to collect results?
  4. The code swallows exceptions and fails silently, that can't be good.
  5. Add tests and documentation of where/how this is done in README's.

@peternied
Copy link
Member

Could the use cases for this code described in Jenkins pipeline DSL? There is a built-in function that is being used by the build system for many common tasks so they don't need to be reimplemented https://www.jenkins.io/doc/pipeline/steps/pipeline-aws/

E.g. https://github.com/mch2/opensearch-build/blob/609d7163052d877f843f76582416f72229f1da71/bundle-workflow/Jenkinsfile#L64

meghasaik and others added 2 commits August 27, 2021 09:31
…verting an op file to HTML file and then calling another function to put it inside an S3 bucket.

return args

def get_S3files(self, args):
Copy link
Member

@owaiskazi19 owaiskazi19 Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name it as download_files_from_S3

import boto3


class Read_Write_Files:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class name should be like ReadWriteFiles.
Ref: https://www.python.org/dev/peps/pep-0008/#class-names

@@ -9,6 +9,7 @@ requests = "~=2.26"
isort = "~=5.9"
flake8 = "~=3.9"
pytest = "*"
boto3 = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assign it with the latest stable version and not *?

@@ -0,0 +1,36 @@
# !/usr/bin/env python3
Copy link
Member

@owaiskazi19 owaiskazi19 Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name this file as download_files_from_S3 or something similar following the same naming convention

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surface area of this pull request has moved considerable from its initial revision. By separating many changes into smaller pull requests this will allow for faster and better feedback to get approval on the pull requests.

I would recommend the follow kinds of pull requests for the work I am seeing done here:

  • Overall workflow updates - when where is this code called and how does it tie back to the test systems, codifies the high level requirements, e.g. aws permissions, data in/out contracts
  • Remote File interactions
  • File parsing mechanics
  • Reporting UX
  • Transformation of local data into reporting format

Comment on lines 12 to 19
# get_S3Objects:Used for getting the files from S3
# bucket_name: Name of the bucket you want to get the file from
# role_arn: Amazon Resource Name (ARN) of the role that you want to assume
# role_session_name: Name for the session
# bucket_bundle: Object path from where you want to get the bundle
# bucket_manifest: Object path from where you want to get the manifest
# path_to_save_bundle: Path where you want to save the bundle at
# path_to_save_manifest: Path where you want to save the manifest at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these comments as pydoc

"""
"""

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added them in the
"""
"""
pydoc block.

Comment on lines 40 to 42
# put_S3Objects:Used for putting the files into S3
# file_name: The file that you want to put in the S3 bucket
# file_path: The Object path where you want to save the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add these comments as pydocs



import argparse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space

Copy link
Contributor

@VachaShah VachaShah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests and documentation to show how the functionality can be used.

self.role_arn = role_arn
self.role_session_name = role_session_name

def get_S3_objects(self, bucket_name, bucket_bundle, bucket_manifest, path_to_save_bundle, path_to_save_manifest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is getting the manifest mandatory for this method? What if I just have an S3 path and want to get the files that are in that bucket, for example for DependencyInstaller. Currently the code here is assuming that all the user passes in all the parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the method to take only the bundle parameter where through the S3 path we can get the files that are in that bucket

client.download_file(bucket_name, bucket_bundle, path_to_save_bundle)
client.download_file(bucket_name, bucket_manifest, path_to_save_manifest)
except:
print("An exception occurred. No file is present in the bucket.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be other valid exceptions which need to be raised, currently for all the exceptions, this is printing a "no file present" error message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it by printing out the exception that has occurred as either of the three exceptions might occur which are 1.bucket name does not exist 2. Not a directory for the object path 3. local file path when not given properly

try:
client.upload_file(file_name, bucket_name, file_path)
except:
print("An exception occurred. No file found for uploading.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment for exceptions above.

self.role_arn = role_arn
self.role_session_name = role_session_name

def get_S3_objects(self, bucket_name, bucket_bundle, bucket_manifest, path_to_save_bundle, path_to_save_manifest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make this more generic to achieve two use cases -

  1. Download a particular S3 object
  2. Download all maven dependencies from a given s3 location

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it now, as the bundle_path arg will be used for downloading the entire file/folder.

Comment on lines 20 to 22
def __init__(self, role_arn, role_session_name):
self.role_arn = role_arn
self.role_session_name = role_session_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the role parameters optional. Abstract the role passing part from caller, by using defaults from environment vars (for now) to read/write from s3. By keeping these optional, the caller can choose to pass these for specific use cases, while the defaults work for most

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role parameters will be optional and the default values will there for these parameters to take their values from.

… name, added comments and also made changes to download the folder
@meghasaik meghasaik requested review from VachaShah and setiah August 31, 2021 00:14
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs tests

import boto3


class ReadWriteFiles:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should represent a thing, e.g. S3 or S3Bucket and take bucket as a parameter since it's required in both put/get.

self.role_arn = role_arn
self.role_session_name = role_session_name

def get_S3_objects(self, bucket_name, bucket_bundle, path_to_save_bundle=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to get or download.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters bucket_bundle and path_to_save_bundle don't make sense in S3 context. Should be prefix and dest.

What's a None dest? I would just require it.

continue
my_bucket.download_file(obj.key, target)
except Exception as e:
print(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't swallow exception, let it get raised.

file_name: The file that you want to put in the S3 bucket
file_path: The Object path where you want to save the file
"""
def put_S3_objects(self, bucket_name, file_name, file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to put or upload.


try:
client.upload_file(file_name, bucket_name, file_path)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't swallow exception.

RoleSessionName=self.role_session_name,
DurationSeconds=3600)['Credentials']

boto3.client('s3',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is the same between get and put. Extract into the constructor that creates a new client for a bucket.

@setiah
Copy link
Contributor

setiah commented Sep 3, 2021

This is replaced with #367

@setiah setiah closed this Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants