Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap reusable workflows for initial release #1

Merged
merged 59 commits into from
Aug 2, 2024

Conversation

nwiltsie
Copy link
Member

@nwiltsie nwiltsie commented Aug 1, 2024

Description

This PR adds reusable workflows to manage the release process for our pipelines and software packages (see https://github.com/uclahs-cds/group-software-best-practices/discussions/43).

There are two workflows:

wf-prepare-release.yaml is triggered manually (via a workflow_dispatch) and takes the following actions:

  1. Compute the target version number based on existing tags and user input for major/minor/patch/prerelease.
  2. Re-write the CHANGELOG.md file to move unreleased changes into a new dated release section.
  3. Open a PR listing the target version number and release tag.

wf-finalize-release.yaml, triggered when a release PR is merged, takes the following actions:

  1. Create a new release with auto-generated notes and the target tag.
    • By default the new release is a draft, so no public release or tag are created without user intervention.
  2. Comment on the release PR with a link to the new release.

Building blocks

Version definition

In all cases, semver and not, I'm assuming that git release tags begin with v[0-9]. The corresponding version number does not have the leading v (e.g. tag v1.2.3 and version 1.2.3).

Version computation

I'm using the semver python package to compute the next semantic version. That's based on the most recent ancestor tag of the main branch and user input (major/minor/patch/prerelease) for which bump type to perform.

Non-semver repos can use the bump type exact and supply the version with exact_version, in which case no version validation is done (other than asserting that the version does begin with a digit and does not begin with a v).

Changelog manipulation

I'm using markdown-it-py and mdformat to parse/standardize the CHANGELOG files. I've got some additional parsing logic built on top of that to massage everything into the keep-a-changelog format. Aside from the obvious renaming of the unreleased section and standardization, I perform the following adjustments:

  • Version numbers in H2 headings are linked (as footnotes) to the GitHub changes since the prior release.
  • Any H2 section that begins with "Add", "Fix", "Change", or "Remove" (case-insensitive) is turned into an H3 section.
  • Any H1 section that begins with [<digit> is turned into an H2 section.
  • Any non-list-items under an H2 are grouped and turned into common changelog notices. Common changelog is a different standard than keep-a-changelog, but I like the notices.
  • H3 sections within a version are reordered as "Added", "Changed", "Deprecated", "Removed", "Fixed", and "Security".
  • Repeated H3 sections (e.g. multiple "Added" sections) within a version are merged.
  • Non-list-items not under an H3 are moved to the "Changes" section.

The above manipulations should all be idempotent. That means a badly formatted CHANGELOG will have many changes the first time it runs through this workflow but very few on each subsequent release.

Workflow data handoff

There's not a great way to pass data between two arbitrary workflows in a repository, so I settled on encoding the target version number into the pull request's branch name (e.g. automation-create-release-<version>. Unlike the PR title or comment, I do not believe the branch can be renamed (intentionally or inadvertently), so that should be safe enough.

I built off of GitHub's recommendations for triggering the second workflow when the release PR is merged. GitHub's example would run whenever any pull request was merged, so I added more conditional checks in both the example calling and the reusable workflows. This has the effect that a "skipped" workflow run will appear under the Actions tab whenever other pull requests are merged.

Event Workflow run appears? Conclusion
Pull request opened No --
Pull request closed (no merge) No --
Non-release pull request merged Yes Skipped
Release pull request merged Yes Success / Failure

Reusable workflow hackery

Unlike Composite Actions, when a reusable workflow is called from another repository it only brings along the specific workflow file and no sidecar scripts. In order to get those you have to fetch information about the current run and parse out the reusable workflow SHA from the referenced_workflows section. You can then use that with actions/checkout to get the appropriate scripts. Yes, that's frustratingly difficult, there are multiple open issues about it.

Testing

I've been kicking the tires with my testing repository:

I've made a number of releases, but please feel free to make more! It's currently setup to create draft releases, so if you want them to "take" you'll need to find the release (follow the link posted in each release PR) and publish them.

Next steps

The majority of our things-to-be-released fall into one of three categories:

  • Nextflow pipelines
  • Docker images
  • Python packages

Taking Nextflow as an example, each pipeline stores its current version inside the nextflow.config file. These workflows should update that file automatically, but I haven't gotten there yet.

Checklist

  • This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
    Disclosing PHI is a major problem1 - Even a small leak can be costly2.

  • This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.

  • This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.

  To automatically exclude such files using a .gitignore file, see here for example.

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have set up or verified the main branch protection rule following the github standards before opening this pull request.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

Footnotes

  1. UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records

  2. The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records.

  3. Genetic information is considered PHI.
    Forensic assays can identify patients with as few as 21 SNPs

  4. RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

@nwiltsie nwiltsie requested a review from a team August 1, 2024 22:25
@nwiltsie
Copy link
Member Author

nwiltsie commented Aug 1, 2024

Oh, the "tool-create-release" name and everything else about the branding is up for review too.

Copy link

@j2salmingo j2salmingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall well done, the only major question which can be directed to the rest of the group is just the usage of asserts. The other questions can spark discussion but should not otherwise hold up the PR.

if bump_type == "exact":
body_values["Exact version"] = exact_version

# Write the PR body into a temporary file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: non-blocking Is there a better way of doing this, with either a template or a different file type like JSON/YAML/etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This" being the use of a temporary file, or the templating?

The temporary file is useful because multiline string outputs get complicated in Actions.

I'm open to suggestions about better ways of templating - this felt too trivial for something like Jinja2 (plus I'm not super familiar with it).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the string.Template class, but I am not sure if it just complicates it for no real readability benefit in this case, and I would agree that bringing in an entire templating engine like Jinja2 for something like this is too heavy for my tastes.

https://docs.python.org/3/library/string.html

bumpchanges/bump.py Show resolved Hide resolved
class Version:
"Class to help manage individual releases within CHANGELOG.md files."

link_heading_re: ClassVar = re.compile(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: non-blocking: Is there an alternative to regex for this problem or are we forced into it because we're parsing unknown markdown? I'm always a little apprehensive about using regex because it requires a bit of effort to understand when it's not my code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's the unknown-ness of it that's the problem. I'm basically trying to extract a version name and a date from a string that should look like [1.2.3] - 2024-01-01. I don't know of an easier way of doing that using regexes.



def update_changelog(changelog_file: Path, repo_url: str, version: str, date: datetime.date):
"Rewrite a CHANGELOG file for a new release."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chore, non-blocking: Not to be that guy again but docstrings should be triple-quoted. I would also suggest that for non-obvious variables that they are included in the docstring, but I'll leave that up to you if you want to do that. In this instance I think it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I really hoped that either black or ruff would handle this for me. You're right that PEP-257 says to always use triple-quotes, but I'm leery to entertain stylistic changes that can't be handled automatically.

I am displeased with this lack in our tooling - I'll fix it manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with f66c0b3.

changelog_file.read_text(encoding="utf-8")
)

for token, nexttoken in itertools.pairwise(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I am always impressed when people use itertools, I feel like there's so many good things in there that easily solve problems whenever I need it, but I can never remember to see what I can do with itertools.

bumpchanges/changelog.py Outdated Show resolved Hide resolved
Copy link

@sorelfitzgibbon sorelfitzgibbon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I created draft release PR. It was easy to do and easy for me to understand. I can't merge it myself to see the next step but seems it will be just as clear.

@nwiltsie nwiltsie merged commit 7c83849 into main Aug 2, 2024
9 checks passed
@nwiltsie nwiltsie deleted the nwiltsie-reusable-workflows branch August 2, 2024 23:19
@nwiltsie nwiltsie mentioned this pull request Aug 2, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants