-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added DependencyInstaller to copy dependencies from s3 to maven local #318
Added DependencyInstaller to copy dependencies from s3 to maven local #318
Conversation
a3d73d5
to
5a9e878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments.
Still don't understand how this is used and where, not sure why we would commit files that aren't used.
self.version = version | ||
self.arch = arch | ||
|
||
def get_dependency_path(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it into the constructor as self.dependency_path =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it.
def get_dependency_path(self): | ||
return f"org/opensearch/{self.dependency_name}/{self.version}/" | ||
|
||
def get_maven_local_path(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, move into the constructor to expose as a property. If you want to abstract the method make it private like __get_aven_local_path(self)
.
|
||
# TODO: This is currently a stubbed function which returns files from the current directory, | ||
# to be replaced after it is implemented | ||
def download_from_s3(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should just be called download
, more future proof
if os.path.isfile(os.path.join(os.path.dirname(os.path.abspath(__file__)), file_name)) | ||
] | ||
|
||
def copy_to_maven_local(self, dependency_from_s3, maven_local_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably me a private method. Does it do anything other than copy a file to ~/.m2?
local_file_path = os.path.join( | ||
os.path.dirname(os.path.abspath(__file__)), file_name | ||
) | ||
if os.path.isfile(local_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignores files that don't exist. Something seems off, when is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the else condition.
def install(self): | ||
# s3_path = f"/builds/{self.version}/{self.build_id}/{self.arch}/maven/{self.get_dependency_path()}" | ||
maven_local_path = self.get_maven_local_path() | ||
if not os.path.exists(maven_local_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a maven path, but then copies in another function. Extract this whole copying business into a MavenLocal class that ensures that local paths exist and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Extracted the copying functionality to a separate class.
|
||
|
||
class DependencyInstallerTests(unittest.TestCase): | ||
dependency_installer = DependencyInstaller("7", "job-scheduler", "1.1.0.0", "arm64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this into setUp(self)
, self.dependency_installer =
from src.test_workflow.dependency_installer import DependencyInstaller | ||
|
||
|
||
class DependencyInstallerTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you extract Maven implementation then you don't need to be checking what gets copied and what is not, just that Maven().copy
is called with the right parameters. Then you can test the Maven
class separately in a custom temporary folder.
This is to be used in PR #287 Line 47, sorry I should have mentioned it in the description. |
Sounds like we can wait for that to merge before this one? |
Sure, then I can edit the usage in that PR to use the DependencyInstaller along with introducing DependencyInstaller in this PR. I will address the other comments till then. |
@VachaShah I'm waiting on reviewing until after you've addressed db's comments, I'll be keeping an eye out for updates. |
72f7f3f
to
40ad330
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more comments. A copy from S3 to maven local should just do that.
file_handler.copy(self.download(), self.maven_local_path) | ||
|
||
|
||
class MavenLocalFileHandler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's an implementation detail of DependencyInstaller, make it a subclass of that? (tab it in to the right :))
|
||
def copy(self, dependency_from_s3, maven_local_path): | ||
if not os.path.exists(maven_local_path): | ||
os.makedirs(maven_local_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Takes exist_ok=True
to avoid the if
above.
if os.path.isfile(local_file_path): | ||
shutil.copy(local_file_path, maven_local_path) | ||
else: | ||
raise ValueError(f"No file found at path: {local_file_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you use copyfile
it will properly raise errors and you don't need any of the special if/handling of whether the file exists or not. Which begs the question, aren't we copying everything from dependency_from_s3
? Then this is just a copy of a folder into another folder, which is a single copy call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its just a single copy call, removed the check.
if os.path.isfile(os.path.join(test_dir, file_name)) | ||
] | ||
|
||
def clean_maven_local_path(maven_local_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am scared of code that does rmtree
. Why would I need this?
In production I'd expect to have a clean maven local when executing on a new node. In test I definitely do not want the test job to blow away my local maven!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, removed the clean up path, I had added it earlier to clean up between tests but its not needed anymore.
|
||
def test_install(self): | ||
maven_local_path = self.dependency_installer.maven_local_path | ||
TestUtils.clean_maven_local_path(maven_local_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above. I certainly don't expect running tests to delete all my files in ~/.m2!
40ad330
to
85739e2
Compare
self.version = version | ||
self.arch = arch | ||
self.dependency_path = f"org/opensearch/{self.dependency_name}/{self.version}/" | ||
self.maven_local_path = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple concurrent builds/tests on jenkins can corrupt this path ~/.m2/repository
and cause test failures. It should be a custom path based on execution id and should also have cleanup method exposed to remove the contents from the location after build execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@setiah Are we allowing jenkins test workers to be reused? AFAIK we were not allowing worker reuse to avoid handling problems like you are describing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure what you mean by workers reuse. What I'm saying here is the workers would sync the dependencies in ~/.m2/repository of jenkins host. If there are two concurrent test jobs running, one for 1.1 another for 1.0, since they filesystem is same, they might end up overwriting each others dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added build-id in the path to keep the executions independent.
Provides functionality to copy the maven dependencies from S3 to maven local. | ||
""" | ||
|
||
def __init__(self, build_id, dependency_name, version, arch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of "build_id, version, arch" -> use build_manfiest as input, which has all these and more. Also, you can skip dependency_name in __init__
. DependencyInstaller should provide a method to sync - all maven dependencies, or - a list of dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that if the user passes in the list of dependencies, then get the files for those dependencies and if nothing is passed in, get the files for all dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 2 functions essentially -
- for syncing a list of maven dependencies say for a specific plugin.
- for syncing all maven dependencies from build/ folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can be treated optional for now. 1 is must have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added functionality for 1.
self.version = version | ||
self.arch = arch | ||
self.dependency_path = f"org/opensearch/{self.dependency_name}/{self.version}/" | ||
self.maven_local_path = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@setiah Are we allowing jenkins test workers to be reused? AFAIK we were not allowing worker reuse to avoid handling problems like you are describing
] | ||
|
||
def install(self): | ||
# s3_path = f"/builds/{self.version}/{self.build_id}/{self.arch}/maven/{self.dependency_path()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the TODO where the files will be downloaded from S3 once the functionality is ready in PR #316.
for file_name in os.listdir(os.path.dirname(os.path.abspath(__file__))) | ||
if os.path.isfile( | ||
os.path.join(os.path.dirname(os.path.abspath(__file__)), file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off making this functionality unpredictable. Based on the comment this isn't final code, can we remove/reduce to a minimal case to keep the checked in code of a high bar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipenv run black .
reformatted it this way since the flake8
was failing with the if section being right indented. Definitely agree on the high bar! For the final code, the currently stubbed method for downloading the dependencies from s3 needs to be replaced once that functionality is merged and the tests corresponding to that stubbed method would change as well. I raised this PR early to get it out there and get feedback on the other parts and then can get a final review once the stubbed method is replaced.
|
||
|
||
class TestUtils: | ||
def get_test_dependencies(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you declare this function inside of the test class, if you foresee its reuse elsewhere please add comments or additional configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the common test utils and added it in the test class itself.
85739e2
to
68d813b
Compare
Codecov Report
@@ Coverage Diff @@
## main #318 +/- ##
==========================================
+ Coverage 59.04% 60.00% +0.95%
==========================================
Files 38 39 +1
Lines 1111 1145 +34
==========================================
+ Hits 656 687 +31
- Misses 455 458 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good! I think there are some should/must have's below but it's close.
) | ||
|
||
|
||
class MavenLocalFileHandlerTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you nest tests the same way as classes to match? NBD
self.maven_local_file_handler = DependencyInstaller.MavenLocalFileHandler() | ||
|
||
def test_copy(self): | ||
maven_local_path = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use with TemporaryFile as
or you need to get rid of maven_local_path
in teardownd.
test_files = get_test_dependencies() | ||
self.maven_local_file_handler.copy(test_files, maven_local_path) | ||
self.assertCountEqual(test_files, os.listdir(maven_local_path)) | ||
self.assertListEqual(test_files, os.listdir(maven_local_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably needs a sort to work reliably.
maven_local_path = tempfile.mkdtemp() | ||
test_files = get_test_dependencies() | ||
self.maven_local_file_handler.copy(test_files, maven_local_path) | ||
self.assertCountEqual(test_files, os.listdir(maven_local_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure test_files
isn't blank or this test passes through.
self.assertListEqual(test_files, os.listdir(maven_local_path)) | ||
|
||
|
||
def get_test_dependencies(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to be a static and private __get_test_dependencies()
method in the test class instead of global.
"../tests_assemble_workflow/data/opensearch-build-1.1.0.yml", | ||
) | ||
) as f: | ||
self.manifest = BuildManifest.from_file(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BuildManifest now has from_path
, use that so you don't need to open
.
def test_copy(self): | ||
with tempfile.TemporaryFile() as maven_local_path: | ||
test_files = DependencyInstallerTests.__get_test_dependencies() | ||
self.assertTrue(test_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only makes sure "something" is returned. Check that len(), the number of files returned is > 0.
01cb80e
to
ce55c4b
Compare
self.s3_maven_location = ( | ||
f"/builds/{self.version}/{self.build_id}/{self.arch}/maven" | ||
) | ||
self.s3_build_location = ( | ||
f"/builds/{self.version}/{self.build_id}/{self.arch}/plugins" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this location provider logic is common across build and test workflows and should be read from a single place. Check if there's already a utility/or we can make it. It's okay if you create an issue and take it separately outside this PR.
a8a58c2
to
ce55c4b
Compare
ce55c4b
to
7553b81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some should have's, but won't hold the PR for them.
elif os.path.isdir(local_path): | ||
shutil.rmtree(local_path) | ||
except OSError as e: | ||
print(f"Failed to clean {local_path}. Reason: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging.error
.
s3_path = f"{dependency}-{version}.zip" | ||
s3_bucket.download_file(s3_path, custom_local_path) | ||
|
||
def cleanup(self, local_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deletes a folder, which is ... dangerous if we get it wrong. For maven dependencies I don't think we will ever want to be deleting them after downloading explicitly because they will be intermixed with other dependencies that may have been potentially dropped into ~m2
. We already are doing cleanup in #403. I think you should delete it from the PR and close any issues that say otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dblock - valid point. But I think we still need to cleanup dependencies which are synced as part of one test run, else we will unnecessarily bloat the local maven over a period of time. IMO we should switch back to the custom maven local path, based on a random tmp location, where we download dependencies and cleanup after the test run. This keeps the concurrent test runs isolated. We planned on doing it earlier but then switched to having docker provide that isolation by mounting on a randomized tmp path. I think we need it in the code as developers will run this code on local setup as well where docker mounting won't provide that sort of isolation. @VachaShah and I discussed about it yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bloat the local maven over a period of time.
Jenkins Agent nodes will only exist for a single workflow then are torn down, how fast is this bloat? Enough that we need to clean it out to complete a workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record CI will be providing throw away containers with their own file system for these jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the cleanup logic.
|
||
@patch("test_workflow.dependency_installer.S3Bucket") | ||
def test_install_build_dependencies(self, mock_s3_bucket): | ||
s3_bucket = mock_s3_bucket.return_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= S3Bucket() - because it's mocked you will get the mock object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it but it tries to call the actual constructor instead of mocked.
|
||
@patch("test_workflow.dependency_installer.S3Bucket") | ||
def test_install_maven_dependencies(self, mock_s3_bucket): | ||
s3_bucket = mock_s3_bucket.return_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= S3Bucket()
@patch("test_workflow.dependency_installer.S3Bucket") | ||
def test_install_maven_dependencies(self, mock_s3_bucket): | ||
s3_bucket = mock_s3_bucket.return_value | ||
self.dependency_installer = DependencyInstaller(self.manifest.build) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test, don't assign anything to self.
other than in setUp
. Looks like dependency_installer
is needed in every test, so move this into setUp
.
) | ||
self.dependency_installer.install_maven_dependencies(dependencies) | ||
self.assertEqual(s3_bucket.download_folder.call_count, 2) | ||
maven_local_paths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract into a method in DependencyInstaller
that gives you a maven local path for a dependency. You have similar code in DependencyInstaller itself.
[ | ||
call( | ||
"opensearch-job-scheduler/1.1.0.0", | ||
maven_local_paths[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will become something like dependency_installer.maven_local_path('opensearch-job-scheduler')
os.path.expanduser("~"), | ||
f".m2/repository/org/opensearch/{dependency}/{version}/", | ||
) | ||
s3_bucket.download_folder(s3_path, maven_local_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is s3_relative_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be s3_maven_location for the dependency which is a relative path inside the bucket. An example - builds/1.1.0/24/x64/maven/org/opensearch/
:param dependency_dict: list of dependency names with version for which the maven artifacts need to be downloaded. | ||
Example: {'opensearch-job-scheduler':'1.1.0.0', 'opensearch-core':'1.1.0'} | ||
""" | ||
s3_bucket = S3Bucket(self.s3_maven_location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks incorrect. S3Bucket needs to be initialized with bucket_name
Example: {'opensearch-job-scheduler':'1.1.0.0'} | ||
:param custom_local_path: the path where the downloaded dependencies need to copied. | ||
""" | ||
s3_bucket = S3Bucket(self.s3_build_location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
) | ||
self.manifest = BuildManifest.from_path(self.manifest_filename) | ||
|
||
@patch("test_workflow.dependency_installer.S3Bucket") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can replace this with S3Bucket that's checked into main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the mocked version of the S3Bucket
checked in main.
Signed-off-by: Vacha <[email protected]>
Signed-off-by: Vacha <[email protected]>
Signed-off-by: Vacha <[email protected]>
Signed-off-by: Vacha <[email protected]>
Signed-off-by: Vacha <[email protected]>
Signed-off-by: Vacha <[email protected]>
7553b81
to
5649d4f
Compare
Signed-off-by: Vacha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VachaShah. LGTM!
Signed-off-by: Vacha [email protected]
Description
This PR introduces a DependencyInstaller, requirements in #276. This will be used by integration tests, performance tests and bwc tests when run in the pipeline. It covers the following scenarios:
Note: I have merged the changes from #367 to use the S3 utility functions.
Issues Resolved
#276
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.