Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Divide github actions into multiple steps #26

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

jacobdadams
Copy link
Member

The original template had a single action for testing and deploying to both dev and prod. Following the pattern just pushed to palletjack, this now has a single deploy composite action that is triggered on both pushes to dev (dev env) and releases (prod env). Pull requests trigger tests, and pushes to main trigger release PRs.

@jacobdadams jacobdadams requested review from steveoh and stdavis August 16, 2024 20:38
Comment on lines -5 to -7
branches:
- main
- dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe you still want these or you will have a lot of skipped/failed runs when the if conditions are unmet.


The GCP deployment itself is found within the `deploy` composite action (`.github/actions/deploy/action.yml`). You will set the various GCP parameters (schedule, name, etc) in either the global variables in the `Set globals` step or directly in the main `Deploy Cloud Function` step.

The cloud functions may need 512 MB or 1 GB of RAM to run successfully. The source dir should point to `src/skidname`. A cloud function just runs the specified function in the `main.py` file in the source dir; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: the skid dependencies, palletjack, supervisor, etc, don't require that memory right? I would guess that the amount of memory is directly related to the amount of rows and attributes in the data frames x 2 or 3 depending on how many are used in the pipeline. Is this close to accurate?

Suggested change
The cloud functions may need 512 MB or 1 GB of RAM to run successfully. The source dir should point to `src/skidname`. A cloud function just runs the specified function in the `main.py` file in the source dir; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however.
The cloud function will need the RAM size monitored to find the right size for the skid to run successfully. The `source_dir` should point to `src/skidname`. A cloud function runs the specified function in the `main.py` file based on the path in the `source_dir`; it doesn't pip install the function itself. It will pip install any dependencies listed in `src/skidname/requirements.txt`, however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, it depends on the memory used by the dataframes, which will vary based on the geometry type and dataframe size. I like your changes, but I also want to keep the 512/1GB in there as a starting point for future me when I forget what I've done before. I'll blend your suggestion in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If you want to keep those numbers in there can you provide some context to the data frame size that may help owners choose a starting value?

My suggestion to find the right size would be to run with a gig, look at the resource usage, and adjust until it's the right size.

Since these values determine the cost of the skid, it'd be nice to steer users to finding the right size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but there's really no good rule of thumb. A process that loads all state address points is going to be different sized than one that loads 100 points, or loads a couple simple polygons, or loads all county boundaries, or runs some spatial analysis on parcels, etc. It's just never that simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty much sums up my reasoning to replace the suggestions.

README.md Outdated Show resolved Hide resolved
- Pushes to main (usually from merging a PR into main, but also direct pushes) will open a release PR to create a GitHub release with associated git tag and version bump.
- Once the GitHub release is created by merging the release PR, the actions will deploy the skid to the prod GCP environment.

The actions rely on several GitHub secrets to do all this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: it might be worth mentioning that the github terraform module can set 3 of the 4 of these things.

FYI

  • the identity provider handles the identity federation between github and gcp.
  • the service account defines the gcp permissions allowed by the github action.
  • and the project id tells gcp where the work should be done
  • I'm unclear about the storage. Logs or something?

README.md Outdated Show resolved Hide resolved
Copy link
Member

@steveoh steveoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. I added a few suggestions that hopefully improve the documentation.


notify:
name: Notifications
runs-on: ubuntu-latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runs-on: ubuntu-latest
runs-on: ubuntu-latest
needs: deploy-prod

You only want this to run if the deploy was successful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't match what's in setup.py. You also may want to add a comment here and in setup.py to remind forgetful folks like me that they need to be kept in sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants