Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup the release process #130

Open
9 tasks
jrudolph opened this issue Jan 26, 2023 · 81 comments
Open
9 tasks

Setup the release process #130

jrudolph opened this issue Jan 26, 2023 · 81 comments
Milestone

Comments

@jrudolph
Copy link
Contributor

jrudolph commented Jan 26, 2023

Apache has certain requirements for releasing a public version. We should strive for as much automation as possible. This ticket should be an overview over all steps necessary and the progress on those items. Please add items as needed.

Let's only consider public releases like RCs and GAs here (but not snapshots).

(Please add your name and/or PRs and issues to the items as needed)

References:

Organizational steps

Maven-related steps

  • Publish to Apache Nexus into staging
  • Initiate voting
  • Propagate to Maven Central when vote is positive
  • ...

Apache-related steps

  • (one-time) setup signing infrastructure (how are keys created, where can a release be signed)
  • Create signed release and upload artifacts (where and when does that happen? Can signing keys be safely used from GHA?)
  • ...
@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 26, 2023

I believe one thing that we have to do under Apache-related steps is that we have to copy the sources (not the generated binaries, i.e. jars in our case) to an SVN repo. In Apache lingo I think this is called a "source package", see https://www.apache.org/legal/release-policy.html#source-packages. The sources also need to be signed.

In our case it would just be the contents of the git repo.

@jrudolph
Copy link
Contributor Author

I believe one thing that we have to do under Apache-related steps is that we have to copy the sources (not the generated binaries, i.e. jars in our case) to an SVN repo.

Just creating the source archive shouldn't be a big problem.

One of the main questions is if we can trust GHA enough to make releases there and also sign those releases (it probably makes sense to do both on one machine). Also, even if Apache considers binaries only a convenience, the reality is that > 99% of all users will use the binaries from Maven Central which after all carry a much bigger risk than the distributed source files. If we are not allowed to release from GHA or decide against it, we would have to setup something that would allow to run safe and reproducible releases from the release manager's machine (e.g. using docker).

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 26, 2023

One of the main questions is if we can trust GHA enough to make releases there and also sign those releases (it probably makes sense to do both on one machine). Also, even if Apache considers binaries only a convenience, the reality is that > 99% of all users will use the binaries from Maven Central which after all carry a much bigger risk than the distributed source files. If we are not allowed to release from GHA or decide against it, we would have to setup something that would allow to run safe and reproducible releases from the release manager's machine (e.g. using docker).

If by release you are talking about a Maven release, I don't think there is any issue in GHA doing this. Many other Apache projects do this, and I think that everything aside from signing has already been setup with the #129. For context, in order to get the snapshot deploys working I had to make an INFRA ticket to get credentials added as github secrets and in general I have seen that Apache is facilitating doing as much as possible via GHA but judging from the fact that Nexus username/passwords are being stored as secrets it would be surprising if keys are treated any differently.

What I predict could be annoying is the official Apache release (even though 99% of users won't use it). I have spoken to some people involved Apache and apparently the proper way to do this is to manually sign it on a machine with a key that is supposed to be stored externally (i.e. via usb) and then upload it. Such info can be outdated though (and it has been in the past)

@pjfanning
Copy link
Contributor

I have #78 and #85 open already. Do we need to consolidate these issues?

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 26, 2023

I have #78 and #85 open already. Do we need to consolidate these issues?

I would say that this issue can work as a general epic meta issue where we can track the other related issues. What would be handy is if we can update the original checklist and reference these specific issues. @jrudolph Do you want to do this? I can also just edit your post.

@jrudolph
Copy link
Contributor Author

Do you want to do this? I can also just edit your post.

Please edit yourself, everyone. :)

@jrudolph
Copy link
Contributor Author

What I predict could be annoying is the official Apache release (even though 99% of users won't use it). I have spoken to some people involved Apache and apparently the proper way to do this is to manually sign it on a machine with a key that is supposed to be stored externally (i.e. via usb) and then upload it. Such info can be outdated though (and it has been in the past)

Of course, we can just follow the rules as given but let's at least note how paradox the situation is: we would go to great lengths to sign source code securely which no one will use or look at (and which might be signed much more easily e.g. by signing the release tag in git). On the other hand, the binaries which everyone will be running directly and which will be much harder to verify will be released on third-party machines in a process which can much more easy be convinced to tamper with the binaries or leak the secrets...

@mdedetrich
Copy link
Contributor

Of course, we can just follow the rules as given but let's at least note how paradox the situation is: we would go to great lengths to sign source code securely which no one will use or look at (and which might be signed much more easily e.g. by signing the release tag in git). On the other hand, the binaries which everyone will be running directly and which will be much harder to verify will be released on third-party machines in a process which can much more easy be convinced to tamper with the binaries or leak the secrets...

Oh definitely the irony is not lost on me whatsoever especially considering that Pekko is a library and not an application.

@justinmclean
Copy link
Member

justinmclean commented Jan 27, 2023 via email

@pjfanning pjfanning added this to the 1.0.0 milestone Jan 27, 2023
@justinmclean
Copy link
Member

justinmclean commented Jan 27, 2023 via email

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

I should point out why that ASF does this is that it provides you with legal protection and means you are covered by the insurance the ASF has. Go outside these boundaries and you may not have that legal protection.

@justinmclean As you stated (and I suspected) such policies are likely in place due to legal reasons but as @jrudolph said, especially in the case of Pekko and its modules there is an extremely strong disconnect behind the policy and what happens in reality/practice 99% of the time (I can confirm that for the users of Pekko, almost no one is going to download/test the raw source package, they will add it as a dependency to their build tool that will be resolved via Apache's Nexus repo and if they are going to get the source its going to be via git on github).

Of course we are going to follow this rule, this isn't up to debate however is there a general avenue where this can be discussed/raised?

@justinmclean
Copy link
Member

justinmclean commented Jan 28, 2023 via email

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

Even if they obtain it from elsewhere it must be based on an official ASF release

If "based on" includes users downloading generated JVM artifacts from the same source package as the official ASF release (which will also be the same as the git repo at the same checksum of the tagged release) then like almost every other Apache JVM project that publishes JVM artifacts then yes that will be the case.

I think the point being raised is that the Apache software package particularly for libraries that are using git is practically ceremonial/checkboxing. As pointed out earlier, pushing a signed git tag to signify a release (which then triggers a pipeline to upload artifacts to repositories generated from that exact source code for that release) technically achieves the exactly same goal, especially with github repo's being synced with gitbox.

@justinmclean
Copy link
Member

justinmclean commented Jan 28, 2023 via email

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

Releases are not based off "pushing a signed git tag”, releases need to manually voted on by the (P)PMC and placed in the offical ASF distribution area.. Please read the links I posted earlier.

I am aware that releases need to be voted on by (P)PMC, I am talking about the steps after a release is voted on (which does currently require placing software in the Apache Distribution area).

To clarify, I am talking about hypothetical alternative for distribution after a (P)PMC vote but I don't think this thread is a productive area for this conversation so I will leave it

@pjfanning
Copy link
Contributor

We will need something that essentially splits the release into 2 parts.

  • prep candidate artifacts and put them in a staging area where they can be reviewed
  • if the vote on the release fails, the candidate artifacts are removed - if the vote succeeds, the the candidate artifacts are released from staging

With the Nexus part of release, we can release to Nexus staging and then after the vote, we can abandon the staged release or complete its release to Maven Central using the Nexus Repository Manager.

sbt plugins like sbt-release, sbt-ci-release, sbt-sonatype, etc. can be configured not to complete the releases - just to put them in staging.

With the source and binary distributions, there are repositories where the files can be shared. If and when the release is approved, they can be uploaded to https://dlcdn.apache.org

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

We will need something that essentially splits the release into 2 parts.

  • prep candidate artifacts and put them in a staging area where they can be reviewed
  • if the vote on the release fails, the candidate artifacts are removed - if the vote succeeds, the the candidate artifacts are released from staging

With the Nexus part of release, we can release to Nexus staging and then after the vote, we can abandon the staged release or complete its release to Maven Central using the Nexus Repository Manager.

sbt plugins like sbt-release, sbt-ci-release, sbt-sonatype, etc. can be configured not to complete the releases - just to put them in staging.

If this is the process that we use (and its a perfectly reasonable one) then assuming we only want to push a git tag for an actual release that has been successfully voted by (P)PMC on it becomes more complicated about which of these plugins to use and in what order.

For example one way of doing things would be to just use sbt +publish to put the artifacts into staging when a (P)PMC voting round is initiated for a release, and then if it is successful then in addition creating the Apache source package we would push a git tag which would trigger sbt-ci-release to close (i.e. promote to release) the staging repository.

If using sbt +publish is not enough then in this instance we can use sbt-release to add some more steps, doing this is however somewhat misleading because its not really a release but a pre-release. Alternatively and if allowed, instead of using sbt-ci-release to promote the staging repository one can configure sbt-release to do the proper Apache Software Distribution release after a successfully (P)PMC vote , i.e. the (P)PMC member would initiate on their machine a sbt release and if done this way sbt-release would also be responsible for pushing the git tag as well as promoting the staging repository (from a previous sbt +publish).

This alternative method I think is cleanest and probably closest to the "Apache way", i.e. sbt +publish once a release process is initiated and then sbt release after a successful vote which will then handle everything behind the scenes. Even the naming of the various sbt commands is very clear in conveying the intent of whats going on.

The main issue I foresee with these method/s is that we would likely have to resort to having a static value for the version of the project in build.sbt rather than the current setup of getting the version from a git tag because the git tag won't exist yet when promoting to staging. The problem/s with these method/s can be avoided by pushing the git tag when a release process is initiated however we run into the problem where if a release fails then we have to remove the git tag otherwise its not in sync with the official Apache releases.

@pjfanning
Copy link
Contributor

pjfanning commented Jan 28, 2023

ASF projects tend to use a concept of a release manager - an actual person, who can do some documented manual steps.

We can start with having a few manual steps and automate more later.

It's more important to define a process than to tailor a process to the way that a particular sbt plugin works.

The artifacts that are voted on should be signed and we need to provide a KEYS file with the public key parts of any keys that have ever been used to sign our artifacts. From an sbt perspective, that means we need sbt publishSigned.

These keys are typically keys associated with actual people so signing the artifacts is more likely to be done on the release manager's computer than to be automated. I don't see any mechanism by which the release manager's signing key can be made available to a Github Action workflow.

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

ASF projects tend to use a concept of a release manager - an actual person, who can do some documented manual steps.

We can start with having a few manual steps and automate more later.

It's more important to define a process than to tailor a process to the way that particular sbt plugin works.

This is why I prefer sbt-release because it is manual, i.e. it has to be manually triggered on a machine by the release manager (i.e. via sbt release or just release if you are already in the shell) to go through the documented manual steps and as part of the process sbt-release can also interactively ask for signing keys on the release managers machine (amongst other things).

sbt-release also has some nice quality of life features, for example if your current git status is unclean (i.e. you have unstaged/committed changes) it will immediately halt the release process. I think that some of these preconditions can also be configured.

The issue with sbt-ci-release is that its triggered by git tag pushes and not manually, which means in addition to not being interactive (meaning its quite limited in what we do) if we want to the git tags to be in sync with actual approved Apache releases then sbt-ci-release could only be used to promote a staging repo to release which is kind of overkill, it would be better to just use sonatypeBundleRelease from sbt-sonatype directly.

The artifacts that are voted should be signed and we need to provide a KEYS file with the public key parts of any keys that have ever been used to sign our artifacts. From an sbt perspective, that means we need sbt publishSigned.

These keys are typically keys associated with actual people so signing the artifacts is more likely to be done on the release manager's computer than to be automated. I don't see any mechanism by which the release manager's signing key can be made available to a Github Action workflow.

I already started asking these questions in #asfinfra.

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 28, 2023

The artifacts that are voted should be signed and we need to provide a KEYS file with the public key parts of any keys that have ever been used to sign our artifacts. From an sbt perspective, that means we need sbt publishSigned.

These keys are typically keys associated with actual people so signing the artifacts is more likely to be done on the release manager's computer than to be automated. I don't see any mechanism by which the release manager's signing key can be made available to a Github Action workflow.

Adding this as another comment as you edited yours, but incase it's not clear sbt-release works by defining a set of already existing steps so nothing is stopped it from calling +publishSigned and provide keys as you describe. Its documented at https://github.com/sbt/sbt-release#no-git-and-no-toy-projects on how to do this, the linked example shows how to configure it when git isn't available (which will be our case since release process expects to download source directly, not from git).

The more important point that we may be missing is that sbt-pgp (which is where +publishSigned comes from) may not expect signing keys in the same way as as is documented on ASF so I wouldn't be surprised if we have to tailor some steps due to this reason (and perhaps others)

@jrudolph
Copy link
Contributor Author

It's more important to define a process than to tailor a process to the way that a particular sbt plugin works.

👍 Yep, figuring out how we want and need to do the process is the most important part, we can always get the tooling to do what we want afterwards.

Note, how in the happy case (release worked, positive vote), the procedure is not so much different from what we had for akka-http: https://github.com/apache/incubator-pekko-http/blob/main/scripts/release-train-issue-template.md#cutting-the-release. Here we also only automated until staging and then had some manual testing steps and a manual triggering of promotion to Maven Central.

The question is how to deal with the unhappy cases where something goes wrong with a release or the vote fails. A few alternatives come to mind:

  • don't start from git tag but hardcode version number before release, so it can more easily redone (needs a good process to make sure there finally is a tag for the source code version that was used for the release)
  • skip version numbers for unpublished releases and just keep going with the next fix number
  • delete and redo git tag after fixing up problems

In the past, I have been usually quite pragmatic about it. Before a release had been announced, I was ready to just redo the tag in the git repo and restart the whole process after a fix (usually, mutating tags or main branch will only lead to short term hassles if done in a timely fashion). On the other hand, sometimes enough of a release had already slipped (e.g. to Maven Central) so that a new release version was necessary, in which case the process was just redone.

In general, the most principled approached would be just to skip version numbers in case of a problem. That would also have the benefit of reusing most of the past processes.

@jrudolph
Copy link
Contributor Author

jrudolph commented Jan 31, 2023

We should make sure that we also help people giving there approving vote according to https://www.apache.org/legal/release-policy.html#release-approval which require

Before casting +1 binding votes, individuals are REQUIRED to download all signed source code packages onto their own hardware, verify that they meet all requirements of ASF policy on releases as described below, validate all cryptographic signatures, compile as provided, and test the result on their own platform.

Especially, we should clarify/understand what it means to "test the result on their own platform".

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 31, 2023

We should make sure that we also help people giving there approving vote according to https://www.apache.org/legal/release-policy.html#release-approval which require

Before casting +1 binding votes, individuals are REQUIRED to download all signed source code packages onto their own hardware, verify that they meet all requirements of ASF policy on releases as described below, validate all cryptographic signatures, compile as provided, and test the result on their own platform.

Especially, we should clarify/understand what it means to "test the result on their own platform".

On the same note, one thing I want to explore is that if we end up using sbt-release, adding steps to the release process (initiated by sbt release) which would print out messages reminding the release manager on the external steps required.

There are some things we can automate, i.e. compiling and running the test suite (which can be done with sbt compile and sbt test and more complex tests can also be integrated in this way) but for other things such as announcing a release, sbt-release can just print a message saying "Did you announce the release according to https://www.apache.org/legal/release-policy.html#release-announcements y/n" as a helpful reminder for the release manager on what steps need to be done.

Note that the context behind the suggestions that I am making is two things

  • Its becoming pretty clear to me that the Apache release process is quite foreign to a lot of Scala OS developers. Most Scala OS projects have quite a simple release process that has been automated as much as possible within sbt (thats why sbt-ci-release even exists in the first place). On the other hand the Apache release process is quite manual and to me this handholding which hopefully can be implemented in sbt-release can help alleviate these issues from both sides. While there is truth to what @jrudolph says that the Apache release process is quite similar to what Lightbend was doing, we currently only have a couple of ex Lightbend people on the PPMC and so for everyone else its quite new. In summary, I see sbt-release as a way to handhold people through the release.
  • There is some contention behind the "sbt way of doing things" and the typical JVM/Maven (talking about build tool here) way of doing things. Fortunately or unfortunately (depending on how you look at it), sbt is designed in a way to encourage almost everything to be done within sbt. This is why plugins like sbt-publish-rsync or sbt-release even exist, normally for your "typical JVM project" such things are outside the realm of the build tool. One can make a general claim that the general way the current ASF release process is conveyed is based on the assumption that the build tool is really only for building and nothing else. This is more of a meta issue, but for me at least personally if we want to justify the continued use of sbt (which for Pekko is going to be non-debatable for some time), we should use its strengths and one of them is sbt providing a framework to automate these things otherwise we kind of have the worst of both worlds.

Note that regarding the release process, I have helped out a couple of times for Kafka release and there are lot of colleagues on my team who are Apache committers/PPMC members for various Apache TLP's so I am continuously speaking with them to get a general idea of how the release process works with other projects because while there is a the strict policy at https://www.apache.org/legal/release-policy.html#release-approval, there is some level of bespoke tailoring depending on the TLP project as long as it conforms to the ASF release process.

In the past, I have been usually quite pragmatic about it. Before a release had been announced, I was ready to just redo the tag in the git repo and restart the whole process after a fix (usually, mutating tags or main branch will only lead to short term hassles if done in a timely fashion). On the other hand, sometimes enough of a release had already slipped (e.g. to Maven Central) so that a new release version was necessary, in which case the process was just redone.

I think that for now what need to decide on is how we approach git tags because this will have an effect on how we design the release process, i.e. do we tag immediately when a release vote is started and remove it later if a release vote fails or do suspend the creation of the git tag only when a formal Apache release vote is approved? I have a personal process for suspending the creation of the git tag because its simpler in the exception case but its not a hill that I will die on if we go with other options.

@pjfanning
Copy link
Contributor

  • we should not git tag candidate builds as if they are full releases
  • so we can tag the candidate as v1.0.0-rc1
  • when the vote passes for v1.0.0-rc1, we can add tag v1.0.0 to that same commit.
  • or we can avoid tagging the commit until we agree the release - if we think the approach above is too noisy

@mdedetrich
Copy link
Contributor

mdedetrich commented Jan 31, 2023

  • we should not git tag candidate builds as if they are full releases

Is this because it breaks some part of a fundamental ASF process or because it complicates what you mention aftewards (in the sense that if we do git tag release candidates than the last release candidate would need to support multiple git tags if we promote it to release)?

  • so we can tag the candidate as v1.0.0-rc1
  • when the vote passes for v1.0.0-rc1, we can add tag v1.0.0 to that same commit.
  • or we can avoid tagging the commit until we agree the release - if we think the approach above is too noisy

My concern here with using RC's as a backdrop for making a release is that a project can make many release candidates before doing an actual release thats voted on, in which case because of the process it can be a bit unclear which release candidate counter can turn into an actual release while its happening.

We would also have to mandate that we always have to a release candidate just before the release which in some cases can be excessive, i.e. think of the case where we have some rc-x that has been fully tested and is likely going to be the last rc before a proper voted release but someone makes some minor non breaking changes afterwards, i.e. some basic documentation is added after rc-x. With this process we would have to make another rc-(x+1) and go through the entire hoopla of getting everyone to test rc-(x+1) even though its kind of pointless because nothing of worth has changed.

Even if we communicate that for that specific rc-(x+1) does not need to be tested because it only has minor documentation changes, it then becomes unclear (at least to me the primary goal of release candidates is to encourage the community to do manual testing on that rc throughout the lifecycle of that release to try and weed out any bugs/concerns).

On a similar note, putting git tags onto release candidates can also help communicate the stage of the lifecycle of the release, i.e. generally speaking during RC's you don't want to merge major changes into the project (that happens after a release) and with git tags its quite clear if the project is in the stage of a release from a git log perspective (you just need to see whether the last git tag is an release candidate).

@Claudenw
Copy link
Contributor

Take a look at https://cwiki.apache.org/confluence/display/JENA/Release+Process. This is the release process for Apache Jena. It is a much simpler project and is strictly java based, However, if you look at what it does with respect to the git repository you can see the creation of tags, and roll back on failure and other automated steps. I recommend that Pekko adopt something similar and that it also be written out in the wiki.

The verification is to verify that the build of the system matches the result built by the release. IMHO if you want to use GHA to build the release you could, but it will be a complex script. The verification of the build will still have to be done on other accounts, etc.

@jrudolph
Copy link
Contributor Author

jrudolph commented Jan 31, 2023

when the vote passes for v1.0.0-rc1, we can add tag v1.0.0 to that same commit.

That would mean that we would still have to rebuild because the version number is also included in the binaries. Is that what you mean? The source distribution could probably stay he same (Can it? The sources might also contain the version...), but the binaries would have to be rebuilt.

While the ASF requires the sources distribution to be validated before voting, we also need to make sure that the staged binaries are valid, so that process could still fail after a positive vote (if it had to be rebuilt for a new version number)...

@pjfanning
Copy link
Contributor

Ultimately, we can define the version in the sbt file, we don't necessarily need to derive it from git tags.

This might better suit an ASF compliant release process.

@raboof
Copy link
Member

raboof commented Feb 7, 2023

a release manager has to have control over the machine they do the release under

AFAICT the policy says the verification needs to happen on (a) machine(s) owned and controlled by the committer, but the artifact-being-verified might come from another machine.

There's some precedent for this approach in https://github.com/apache/logging-log4j-tools/blob/master/RELEASING.adoc though indeed it's early, and it might still be subject to change after this approach has been more formally described on Confluence and discussed further on the members@ list.

Doing so would mean putting your private/signing key as a secret into the CI machine which is not allowed, see https://infra.apache.org/openpgp.html#private-keyring-management.

It seems for logging infra has been able to create a separate keypair for this purpose (https://issues.apache.org/jira/browse/INFRA-23996) and gave the individual PMC members revocation rights.

For this reason it would make more sense to have a private key specific just for CI, but then this would break the paper trail to the release manager (which I presume is the main point of signing in the first place, even from an ASF legal/insurance standpoint).

When the build is reproducible, the RM (and other voters) can independently build&verify the artifact before voting, which I'd say (but IANAL) should close the loop.

Anyway, it's up to you which way to go, and I can imagine perhaps waiting until this process is more broadly vetted - just wanted to make you aware of the option ;)

@mdedetrich
Copy link
Contributor

mdedetrich commented Feb 7, 2023

Thanks for the response, not trying to be abrasive/obtrusive but to me there is mixed signalling/disconnect going on here, specifically what is stated in the docs and what some projects do

AFAICT the policy says the verification needs to happen on (a) machine(s) owned and controlled by the committer, but the artifact-being-verified might come from another machine.

This specifically sounds more inline with Apache's policies but is probably not too useful for us because at least for Pekko we would be publishing using publishSigned which does all of these steps in one go (artifact creation + signing + publishing) and the workflow of downloading artifacts from another machine would be cumbersome and not typical (at least of Scala/SBT/Java projects). What could be done (and came up in my head earlier) is setting up a Docker image which has JDK 11 + JDK 8 setup with JAVA_8_HOME that would mount the pekko-core source to make it easier for release managers to make a release (the docker machine would be used locally on a release managers machine). A Docker image should also help in creating reproducible builds, assuming release managers don't use the latest tag.

It seems for logging infra has been able to create a separate keypair for this purpose (https://issues.apache.org/jira/browse/INFRA-23996) and gave the individual PMC members revocation rights.

Good to know, I was thinking of doing this for -SNAPSHOT artifacts.

Anyway, it's up to you which way to go, and I can imagine perhaps waiting until this process is more broadly vetted - just wanted to make you aware of the option ;)

Many thanks for the help, I think that for now we will probably go with the solution described earlier but as you said there is always room to change especially if alternative solutions become more vetted/accepted.

@pjfanning
Copy link
Contributor

pjfanning commented Feb 7, 2023

Just my view, but I would prefer if Pekko team don't try to lead on the release process side. It's much easier for a TLP to innovate on the release process. A podling, like Pekko, does not just need PPMC approval but also IPMC approval for the releases.

And sbt also restricts our options. Maven has good tools to support SBOMs and other secure release innovations. sbt ecosystem is a bit behind the curve on this.

Once we get a v1.0.0 release out, maybe we can review the release process. But for now, we can copy the processes used by other podlings.

@mdedetrich
Copy link
Contributor

Once we get a v1.0.0 release out, maybe we can review the release process. But for now, we can copy the processes used by other podlings.

This is my view as well, the only exceptions which are practically infeasible for technical reasons (i.e. sbt plugins not working with source packages) but even then until we are a TLP we should be as accommodating as possible

@spangaer
Copy link

spangaer commented Feb 7, 2023

Well this a word storm if you have to pick it up. (I tried to consider all of the above statements, so apologies if I stupidly missed out on something with following reply).

Just want to throw something on to the table, what if:

  1. A tag in Git(Hub) instead of triggering a binary release workflow builds the zip which is considered "the release" by the ASF.
  2. In the process of constructing that zip 1 file, not in Git, is added to the archive, which holds all Git version info
  • Git commit sha
  • Git version as defined from tag
  1. The SBT build is aware that, if above file is defined, all version info is coming from that file and not from Git (the Zip has no Git context)
  2. Now you have the artifact on which the "validation" and "voting" should take place.
  3. Once the zip is voted to release and published on the "official location" some automation pipeline picks up that zip from that very place and builds and publishes the Maven Central artifacts from it.

I know it doesn't naturally fall in to the SBT plugin ecosystem, but I can't help at feel as this being how the authors of the Apache process envisioned it?

Clearly you should be damn sure about the release before setting the tag, because mutating tags seems really not done. So 4 above should be somewhat of a formality. It's 5 the publishing from the zip, where "the difference" sits from proposed strategies sits, I think.

@pjfanning
Copy link
Contributor

pjfanning commented Feb 7, 2023

@spangaer all Pekko releases for the next while will need to be voted on the Pekko PPMC but also the Incubator PMC. The Incubator PMC have a lot of podlings. It is much simpler, for now, for us to follow a similar release process to everyone else.

To summarise what a typical ASF release looks like:

  • A release manager (a person) volunteers.
  • The release manager gets agreement that it is a good time for a release.
  • The release manger builds the release artifacts that act as candidates
    • source zip and tgz
    • binary zip and tgz with the jars and dependency jars needed by them (probably also some tools to test the jars, eg shell scripts that run simple examples)
    • these zips and tgzs are published to an Apache staging website and accompanied by .asc (gpg signing) and .sha256/.sha512 (digests)
    • the release manager's public key must be added to a KEYS file that is accessible from our website (usually in git repo)
    • most ASF teams also publish the jars to repository.apache.org (a Nexus installation) - staged, as opposed to a full release - there are ways to configure sbt resolvers to use these staged jars
    • a git tag for the RC might be useful - but it might be more useful to include the git commit sha in the vote email (to avoid having a lot of tags -- you can be pretty much guaranteed that we will need to do a large number of RCs and votes for every release until we get the process right)
  • a vote is called and voters are expected to check the source and binary releases and ideally, the repository.apache.org jars
  • if vote fails, all the artifacts are removed (by the release manager)
  • if the vote passes, the same artifacts that were voted on are released - they should not be rebuilt
    • the source and binary zips and tgzs are published to the ASF release CDN (by the release manager)
    • these artifacts are removed from the staging area
    • our website is updated to announce the release and our download page is updated to access the new release artifacts (as well as asc and sha files)
    • web site is updated with latest docs and java/scala api docs
    • emails are sent to ASF announce mailing list as well as the Pekko dev and user mailing lists
    • the release manager logs into repository.apache.org and completes the release of the jars/poms to Maven Central
    • git tag should be added for this release

Other than the binary zip/tgz files and setting up the the access to the staging and release web sites, we have basically everything ready for a release manager - particularly one like me who has done ASF releases before - to do a release.

I don't see how adding lots of automation helps. The requirements currently call for a release manager who is going to have to do a lot of manual tasks. Adding automation to replace simple one liners like gpg signing a file and having those automations potentially be brittle - when we have loads of documentation to fix, loads of code to repackage, etc. - seems like a low priority to me.

After we get a release or 2 under our belts, we can look at the release process again - but until we become a TLP, I think we are pretty limited in straying from the current ASF norms.

@mdedetrich
Copy link
Contributor

I don't see how adding lots of automation helps.

When I am talking about automation, I am talking about checks that make sure release managers don't do something that is actually provably incorrect, i.e. an example of such a check would be making sure that the private/signing key you are using is registered to an apache id and is inside Apache's KEYS file, i.e. https://github.com/apache/kafka-site/blob/asf-site/KEYS

And such a thing would definitely be helpful because accidental releases without the correct signing key (at least for binaries) has happened, even with manual checking. Humans tend to be more fallible then well written programs (or checks in this regard).

Not saying that such checks are critical, but they are definitely helpful especially considering how manual and foreign the process is for the current Pekko community.

@pjfanning
Copy link
Contributor

pjfanning commented Feb 7, 2023

  • The voters are actually supposed to check the asc / sha files - they shouldn't be voting +1 if they haven't done checks like this
  • ASF has some automation for checking released artifacts (ones on the release downloads CDN) - I know this is a bit late but in the real world the asc and/or sha files can be regenerated and uploaded to the CDN
  • having our own automated checks sounds useful too but there is also a lot of other stuff to do.

Approx 10 git repos with dozens of modules overall to get redocumented, repackaged, retested. And a clock ticking with regards to the end of Akka v2.6 support - we really need to have the v1.0.0 releases started in a couple of months because the ecosystem of Akka-based libs are not likely to even look at Pekko until we get that out.

@mdedetrich
Copy link
Contributor

mdedetrich commented Feb 7, 2023

  • The voters are actually supposed to check the asc / sha files - they shouldn't be voting +1 if they haven't done checks like this
  • ASF has some automation for checking released artifacts (ones on the release downloads CDN) - I know this is a bit late but in the real world the asc and/or sha files can be regenerated and uploaded to the CDN
  • having our own automated checks sounds useful too but there is also a lot of other stuff to do.

Approx 10 git repos with dozens of modules overall to get redocumented, repackaged, retested. And a clock ticking with regards to the end of Akka v2.6 support - we really need to have the v1.0.0 releases started in a couple of months because the ecosystem of Akka-based libs are not likely to even look at Pekko until we get that out.

I know that voters are meant to check the signatures, my point is there is no harm done in adding an additional automatic check which is much less likely to fail. Again re-iterating my point about humans making mistakes.

Approx 10 git repos with dozens of modules overall to get redocumented, repackaged, retested. And a clock ticking with regards to the end of Akka v2.6 support - we really need to have the v1.0.0 releases started in a couple of months because the ecosystem of Akka-based libs are not likely to even look at Pekko until we get that out.

Yes this is a fair point, I am spending most of time on getting the package renaming/code changes done.

@jrudolph
Copy link
Contributor Author

jrudolph commented Feb 8, 2023

Thanks for all that input. I fully agree we should make sure we stay on the critical path to get a first release out and optimize later on. It's quite useful to have multiple alternatives already evaluated here but let's stay mostly on the well-known path even if that requires a fair bit of manual work. In that regard, I would almost fully support @pjfanning's suggestion.

What I would still like to avoid is for the release-manager to have to paste any commands. We can have a simple setup using shell scripts that does all the required steps and helps that no steps are missed and that they can easily be repeated.

I would not worry about the technicalities of source releases, these require a few steps but they should be easily scripted. I don't think we should even build them into sbt because we just don't have to (because it requires absolutely no information that only the sbt build has). The script should use the same environment variables that sbt uses for providing the GPG keys but other than that it should be easy to do in a script (easier than in sbt).

@mdedetrich
Copy link
Contributor

mdedetrich commented Feb 8, 2023

What I would still like to avoid is for the release-manager to have to paste any commands.

I think the issue here is that Apache release process expects release managers to manually paste commands. My own view is that I do want to automate this

I would not worry about the technicalities of source releases, these require a few steps but they should be easily scripted. I don't think we should even build them into sbt because we just don't have to (because it requires absolutely no information that only the sbt build has). The script should use the same environment variables that sbt uses for providing the GPG keys but other than that it should be easy to do in a script (easier than in sbt).

One reason I wanted to use sbt is that I want to reuse sbt-pgp to sign both the source package and the JVM artifacts. This would enforce that the same (and correct) key is used for both and would also mean that the key information (i.e. how to lookup the private/signing key) only needs to be in one place.

This would require some upstream changes to sbt-pgp but its actually not that difficult because sbt-pgp is just a wrapper around gpg anyways. I am willing to take this up but aside from scoping I have decided not to spend time on this before the 1.0.0 release because @pjfanning is correct here, the highest priority is to get that 1.0.0 release out even if the first release is manual.

@Claudenw
Copy link
Contributor

Claudenw commented Feb 8, 2023

I think the way to proceed here is to put all the bits into open issues. Flag the ones critical for release 1 and let's work on them. (not that I actually do much work), my original thought was to put them into a project so that we can see what needs to be done and be very clear about what does not need to be in 1.0.

I agree that the build should be manual to start. Only after it has been done a few times will it make sense to try to automate, as only after the first few times will we know where the pain points are.

@justinmclean
Copy link
Member

justinmclean commented Feb 9, 2023 via email

@mdedetrich
Copy link
Contributor

Hi,
If thats the case then as you said its kind of arbitrary and doesn't matter, as you would just create your own private key and publish it to some key repo. I just wanted to confirm with ASF if thats the case or whether we should use the same key as the source package for Maven releases (which does provide an actual benefit, as you can securely confirm that a Maven release artifact is signed with the same key as Apache's official source package).
The release manager uses their own KEY see https://infra.apache.org/release-signing.html https://infra.apache.org/release-signing.html Justin

Yes this is clear, we are talking about enforcing the use of that same release managers key for signing JVM jar artifacts that will be published to Apache's Maven Repository (which is considered a convenience package)

@pjfanning
Copy link
Contributor

pjfanning commented Feb 9, 2023

@mdedetrich I'm not dead set against having a docker image but I'd prefer to start by documenting what the release manager needs installed. It could be easier to just let the release manager check their computer. My main concern is gpg and the ~/.gpg folder.

In practice, the release manager needs:

  • java 8 and java 11 installed - I like sdkman
  • sbt installed - the project/build.properties controls which sbt version is used in the build but you need to have a sbt runtime installed - also sdkman good for this
  • gpg - maybe define a version minimum
  • the release manager has created a secret key, pushed its public part to a keyserver - and updated the KEYS file (usually kept in git)

I'm not sure that a docker image makes this easier. I know you can mount local dirs when you start a docker container.

@mdedetrich
Copy link
Contributor

mdedetrich commented Feb 9, 2023

java 8 and java 11 installed - I like sdkman

Yeah this is the issue, not everyone uses skdman (for example I don't, I instead use jsenv to switch between JDK's). Release managers have different OS's, some which can handle having multiple JDK's installed at once and others not. At least having a basic Dockerfile that can set this up I think would solve a lot of annoyance/pain even for a basic initial release. To complicate things further, technically speaking you don't even need multiple JDK's installed, you actually need JDK 11 installed and have the extracted contents of JDK 1.8 somewhere on the system.

A Dockerfile can cleanly and reliably abstract over this mess.

sbt installed

This we can defer to the sbt installation documentation. Thankfully nowadays sbt is pretty ergonomic, i.e. it will automatically update and use the correct sbt version as per project

gpg - maybe define a version minimum

Can also defer to official Apache GPG documentation (i.e. https://infra.apache.org/openpgp.html)

I'm not sure that a docker image makes this easier. I know you can mount local dirs when you start a docker container.

I have done this before and its a lot easier then it sounds and Apache Daffodil already does this. The Dockerfile can build the environment and its a single command to copy the contents of the pekko source into the docker container and the entire release process can be done within the container.

@sam-byng
Copy link
Contributor

Hi there,
Just read through this and i'm wondering what the next steps are here and where we need further investigations.

It looks like we've got a clearer picture on the organizational / maven / apache related steps required here so the original description checklist should be updated. It also seems that discussions here may solve some of the questions in #78 , so that card could do with a status update.

Echoing @Claudenw , could outstanding subtasks drawn up as issues in the 1.0.0 milestone?

@pjfanning
Copy link
Contributor

This task is done by a PMC member. We have mentors and PMC members who have release managed Apache projects before. I'm not sure if this needs to be finalised before the first release. The release manager can readily write up they do. Releasing is not that complicated if you've done it before.

@mdedetrich
Copy link
Contributor

One technical thing we can do is to integrate the docker image that @jrudolph helped setup at #188 . We can work more on his branch to clean it up/make it more professional.

I would say this docker image is a requirement because of how complex the release process of Pekko core will be (i.e. requiring multiple jdk's etc etc). Also need to test that the signing works with gpg properly (I am in the process of setting up an Apache master key for releases but it will be stored on a yubikey).

@pjfanning
Copy link
Contributor

pjfanning commented Apr 17, 2023

I don't understand 'master' key. The releases are signed by the release manager's personal key. The public parts of the keys that are used for signing have to be added to a KEYS file that we make accessible from our download page. The keys file is usually also checked into the main git repo for the project.

Examples:

@mdedetrich
Copy link
Contributor

I don't understand 'master' key. The releases are signed by the release manager's personal key.

Thats what I meant

@justinmclean
Copy link
Member

justinmclean commented Apr 18, 2023 via email

@mdedetrich
Copy link
Contributor

mdedetrich commented Apr 18, 2023

Hi, Just a reminder that the Incubator PMC will need to vote on your release. They will also likely use their own tools and methods of checking rather than any automation/scripts that you provide. In my experience automation can be helpful, but people can put too much faith in it and it can easily miss issues in the release. Kind Regards, Justin

So the specific automation we are talking about right now is just about creating a reproducible environment so we can make deterministic builds for a release which I would argue is necessary for us considering how complex the setup for creating a Pekko build is. (if we don't do this at best we waste a lot of time for release managers to actually make a release and at worst we will create builds that differ in subtle ways depending on who's machine is making a release).

As you pointed out however any additional automation is likely not necessary at least when it comes to the Incubator PMC voting on our release.

@pjfanning
Copy link
Contributor

@jrudolph @mdedetrich I've set these up for the RCs and releases, respectively.

You can use svn co <url> to check out these dirs.
If you commit changes, you need to use your Apache username and password.

Have a look at the https://dist.apache.org/repos/dist/dev/incubator and https://dist.apache.org/repos/dist/release/incubator pages to look at other incubator projects and see what they have published.

I still need to look into what if anything else needs to be done to link our release dir above so that everything that gets published to it gets properly loaded up to the Apache download and archive CDNs. It may be enough to have the dirs set up like this or I might need to find some config setting somewhere that I need to update to have the URL I've set up.

@pjfanning
Copy link
Contributor

pjfanning commented May 27, 2023

I've created https://github.com/apache/incubator-pekko-site/wiki/Pekko-Release-Process (initially started in my fork but moved on request - to faciliate collaboration).

There is a lot more work and detail needed. Initially, I'm focusing on the non-technical pieces like the sequence of events.

Building the release artifacts is by far the easiest bit.

@mdedetrich
Copy link
Contributor

@pjfanning Can you put it on https://github.com/apache/incubator-pekko-site so that others can edit it?

@pjfanning
Copy link
Contributor

@pjfanning Can you put it on https://github.com/apache/incubator-pekko-site so that others can edit it?

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: No status
Development

No branches or pull requests

8 participants