[Cloud Deployment IVa] EFS creation and mounting #1018

CodyCBakerPhD · 2024-08-19T16:45:05Z

breaks up various parts of #393, starting with EFS handling

CodyCBakerPhD · 2024-08-19T16:45:41Z

setup.py

@@ -15,7 +15,7 @@ def read_requirements(file):


 extras_require = defaultdict(list)
-extras_require["full"] = ["dandi>=0.58.1", "hdf5plugin"]
+extras_require["full"] = ["dandi>=0.58.1", "hdf5plugin", "boto3"]


Looks like this got missed in the pyproject.toml refactor

src/neuroconv/tools/aws/_submit_aws_batch_job.py

…conv into efs_mounting

CodyCBakerPhD · 2024-09-06T13:37:43Z

tests/test_minimal/test_tools/aws_tools.py

 import os
 import time

 import boto3

 from neuroconv.tools.aws import submit_aws_batch_job

+_RETRY_STATES = ["RUNNABLE", "PENDING", "STARTING", "RUNNING"]


This is one of those bug fixes I mentioned sneaking into this PR - after running the tests so many times I did see some edge cases show up in the waiting times

I think a final follow-up in the series (once everything works together) could be more elegant with some common utils for such things

CodyCBakerPhD · 2024-09-06T13:39:13Z

tests/test_minimal/test_tools/aws_tools.py

+    job = None
+    max_retries = 10
+    retry = 0
+    while retry < max_retries:
+        job_description_response = batch_client.describe_jobs(jobs=[job_id])
+        assert job_description_response["ResponseMetadata"]["HTTPStatusCode"] == 200
+
+        jobs = job_description_response["jobs"]
+        assert len(jobs) == 1


This is a repeated example of the wait condition until a desired outcome is reached

CodyCBakerPhD · 2024-09-06T13:40:19Z

tests/test_minimal/test_tools/aws_tools.py

+        efs_client.delete_mount_target(MountTargetId=mount_target["MountTargetId"])
+
+    time.sleep(60)
+    efs_client.delete_file_system(FileSystemId=efs_id)


The the future I plan to be more elegant with pytest cleanup, possibly via pytest_sessionfinish in a conftest.py

It might also cleanup any job definitions that start with 'test_neuroconv'

CodyCBakerPhD · 2024-09-06T13:42:26Z

src/neuroconv/tools/aws/_submit_aws_batch_job.py

+    if efs_id is not None:
+        volumes = [
+            {
+                "name": "neuroconv_batch_efs_mounted",


The name identifier here is entirely local (does not need to match the 'name' tagged on the actual EFS volume)

CodyCBakerPhD · 2024-09-06T13:46:24Z

src/neuroconv/tools/aws/_submit_aws_batch_job.py

+    if efs_id is not None:
+        job_definition_name += f"_{efs_id}"


Turned out to be quite important to include the filesystem ID in the definition name to avoid reusing a previous definition created without the EFS mount configured

I have toyed with the idea of making the job definition a hash of all the 'unique' configuration aspects, but it is also important for it to be readable - a follow-up towards the end might apply a bunch of human readable tags to this effect

CodyCBakerPhD · 2024-09-06T13:48:58Z

src/neuroconv/tools/aws/_submit_aws_batch_job.py

+            "minvCpus": 0,  # Note: if not zero, will always keep an instance running in active state on standby
+            "maxvCpus": 8,  # Note: not currently exposing control over this since these are mostly I/O intensive


@h-mayorquin This is a very important detail to be aware of

CodyCBakerPhD · 2024-09-06T13:50:26Z

src/neuroconv/tools/aws/_submit_aws_batch_job.py

+                reason="MISCONFIGURATION:COMPUTE_ENVIRONMENT_MAX_RESOURCE",
+                state="RUNNABLE",
+                maxTimeSeconds=minimum_time_to_kill_in_seconds,
+                action="CANCEL",
+            ),
+            dict(
+                reason="MISCONFIGURATION:JOB_RESOURCE_REQUIREMENT",


Another bug fix I'm sneaking in here - previous testing suites ran under an existing compute environment that did not have this, but now it is actually setup properly (and tested in practice but no clue how to make a proper test for it) and will end any zombie jobs

CodyCBakerPhD · 2024-09-06T13:51:23Z

@h-mayorquin This is ready for review now

Evidence of passing tests in CI: https://github.com/catalystneuro/neuroconv/actions/runs/10726561768/job/29746896174

src/neuroconv/tools/aws/_submit_aws_batch_job.py

h-mayorquin

LGTM after discussion.

src/neuroconv/tools/aws/_submit_aws_batch_job.py

codecov · 2024-12-11T11:00:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.36%. Comparing base (81a022d) to head (b175c60).
Report is 64 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1018      +/-   ##
==========================================
+ Coverage   90.32%   90.36%   +0.04%     
==========================================
  Files         129      129              
  Lines        7996     7999       +3     
==========================================
+ Hits         7222     7228       +6     
+ Misses        774      771       -3

Flag	Coverage Δ
unittests	`90.36% <ø> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 3 files with indirect coverage changes

adding EFS support; adjusting tests

cf7407a

CodyCBakerPhD self-assigned this Aug 19, 2024

CodyCBakerPhD changed the title ~~[Cloud Deployment III] EFS creation and mounting~~ [Cloud Deployment IVa] EFS creation and mounting Aug 19, 2024

CodyCBakerPhD commented Aug 19, 2024

View reviewed changes

src/neuroconv/tools/aws/_submit_aws_batch_job.py Outdated Show resolved Hide resolved

CodyCBakerPhD and others added 25 commits August 19, 2024 12:46

comment out efs test temporarily

3d4e62b

disable main tests

664cf97

try suppressing reason

84c320b

fix reason

f5ad841

fix default volume

570b1cc

fix default volume

51d2403

Merge branch 'main' into efs_mounting

03ac300

Merge branch 'main' into efs_mounting

93ddecc

finally figured it out

a0c3e47

Merge branch 'main' into efs_mounting

1160584

fix import

a2c2453

Merge branch 'efs_mounting' of https://github.com/catalystneuro/neuro…

997adf7

…conv into efs_mounting

enhance test assertions

68b91a1

debug retry delays

dc384fe

test debugs

6f93e81

debug

b072225

remove size assertion

b4a5972

debug definition name assertion

ecc0f81

relax one zone

acd0a70

final debugs

d8dc4f0

Merge branch 'main' into efs_mounting

a0d23d4

try to suppress automatic job definition reuse

fd29b2f

Merge branch 'efs_mounting' of https://github.com/catalystneuro/neuro…

372bec5

…conv into efs_mounting

scope in private; add fs ID to definition name

e991109

debug

4b7a1e7

let knew definition name decide

cf22cfd

CodyCBakerPhD commented Sep 6, 2024

View reviewed changes

add TODO note

1ef23b9

CodyCBakerPhD requested a review from h-mayorquin September 6, 2024 13:50

CodyCBakerPhD marked this pull request as ready for review September 6, 2024 13:50

CodyCBakerPhD and others added 6 commits September 6, 2024 09:51

restore tests

635d2d4

Merge branch 'main' into efs_mounting

3b847be

Update CHANGELOG.md

78f5bb1

Merge branch 'main' into efs_mounting

ed975c8

Update CHANGELOG.md

ddc54ac

Merge branch 'main' into efs_mounting

71f162c

h-mayorquin reviewed Sep 10, 2024

View reviewed changes

src/neuroconv/tools/aws/_submit_aws_batch_job.py Outdated Show resolved Hide resolved

h-mayorquin approved these changes Sep 10, 2024

View reviewed changes

CodyCBakerPhD commented Sep 10, 2024

View reviewed changes

src/neuroconv/tools/aws/_submit_aws_batch_job.py Outdated Show resolved Hide resolved

CodyCBakerPhD added 3 commits September 10, 2024 17:51

Update src/neuroconv/tools/aws/_submit_aws_batch_job.py

6424f69

Merge branch 'main' into efs_mounting

94eb0fd

Merge branch 'main' into efs_mounting

b175c60

CodyCBakerPhD enabled auto-merge (squash) September 12, 2024 07:12

CodyCBakerPhD merged commit 78c1177 into main Sep 12, 2024
35 checks passed

CodyCBakerPhD deleted the efs_mounting branch September 12, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cloud Deployment IVa] EFS creation and mounting #1018

[Cloud Deployment IVa] EFS creation and mounting #1018

CodyCBakerPhD commented Aug 19, 2024

CodyCBakerPhD Aug 19, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD Sep 6, 2024

CodyCBakerPhD commented Sep 6, 2024

h-mayorquin left a comment

codecov bot commented Dec 11, 2024

		"minvCpus": 0, # Note: if not zero, will always keep an instance running in active state on standby
		"maxvCpus": 8, # Note: not currently exposing control over this since these are mostly I/O intensive

[Cloud Deployment IVa] EFS creation and mounting #1018

[Cloud Deployment IVa] EFS creation and mounting #1018

Conversation

CodyCBakerPhD commented Aug 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodyCBakerPhD commented Sep 6, 2024

h-mayorquin left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2024

Codecov Report