-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Divide github actions into multiple steps #26
base: main
Are you sure you want to change the base?
Conversation
branches: | ||
- main | ||
- dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe you still want these or you will have a lot of skipped/failed runs when the if conditions are unmet.
|
||
The GCP deployment itself is found within the `deploy` composite action (`.github/actions/deploy/action.yml`). You will set the various GCP parameters (schedule, name, etc) in either the global variables in the `Set globals` step or directly in the main `Deploy Cloud Function` step. | ||
|
||
The cloud functions may need 512 MB or 1 GB of RAM to run successfully. The source dir should point to `src/skidname`. A cloud function just runs the specified function in the `main.py` file in the source dir; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: the skid dependencies, palletjack, supervisor, etc, don't require that memory right? I would guess that the amount of memory is directly related to the amount of rows and attributes in the data frames x 2 or 3 depending on how many are used in the pipeline. Is this close to accurate?
The cloud functions may need 512 MB or 1 GB of RAM to run successfully. The source dir should point to `src/skidname`. A cloud function just runs the specified function in the `main.py` file in the source dir; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however. | |
The cloud function will need the RAM size monitored to find the right size for the skid to run successfully. The `source_dir` should point to `src/skidname`. A cloud function runs the specified function in the `main.py` file based on the path in the `source_dir`; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, it depends on the memory used by the dataframes, which will vary based on the geometry type and dataframe size. I like your changes, but I also want to keep the 512/1GB in there as a starting point for future me when I forget what I've done before. I'll blend your suggestion in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. If you want to keep those numbers in there can you provide some context to the data frame size that may help owners choose a starting value?
My suggestion to find the right size would be to run with a gig, look at the resource usage, and adjust until it's the right size.
Since these values determine the cost of the skid, it'd be nice to steer users to finding the right size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but there's really no good rule of thumb. A process that loads all state address points is going to be different sized than one that loads 100 points, or loads a couple simple polygons, or loads all county boundaries, or runs some spatial analysis on parcels, etc. It's just never that simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty much sums up my reasoning to replace the suggestions.
- Pushes to main (usually from merging a PR into main, but also direct pushes) will open a release PR to create a GitHub release with associated git tag and version bump. | ||
- Once the GitHub release is created by merging the release PR, the actions will deploy the skid to the prod GCP environment. | ||
|
||
The actions rely on several GitHub secrets to do all this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: it might be worth mentioning that the github terraform module can set 3 of the 4 of these things.
FYI
- the identity provider handles the identity federation between github and gcp.
- the service account defines the gcp permissions allowed by the github action.
- and the project id tells gcp where the work should be done
- I'm unclear about the storage. Logs or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. I added a few suggestions that hopefully improve the documentation.
Co-authored-by: steveoh <[email protected]>
|
||
notify: | ||
name: Notifications | ||
runs-on: ubuntu-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runs-on: ubuntu-latest | |
runs-on: ubuntu-latest | |
needs: deploy-prod |
You only want this to run if the deploy was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't match what's in setup.py. You also may want to add a comment here and in setup.py to remind forgetful folks like me that they need to be kept in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original template had a single action for testing and deploying to both dev and prod. Following the pattern just pushed to palletjack, this now has a single deploy composite action that is triggered on both pushes to dev (dev env) and releases (prod env). Pull requests trigger tests, and pushes to main trigger release PRs.