Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ppc64le #12449

Open
valen-mascarenhas14 opened this issue Jan 3, 2024 · 23 comments
Open

Add support for ppc64le #12449

valen-mascarenhas14 opened this issue Jan 3, 2024 · 23 comments
Labels
area/build Build or GithubAction/CI issues type/feature Feature request

Comments

@valen-mascarenhas14
Copy link

Summary

Requesting the argo team create official images of argo-workflows for ppc64le?

Use Cases

Our use case involves building Argo-workflow on a Kubernetes cluster with ppc64le architecture. The Argo image serves as a crucial dependency for facilitating seamless integration and testing of Kubeflow pipelines on ppc64le

Proposal

Change build infrastructure to build ppc64le variants of argo-workflows.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@valen-mascarenhas14 valen-mascarenhas14 added the type/feature Feature request label Jan 3, 2024
@valen-mascarenhas14
Copy link
Author

Our goal is to make workflow-controller, argocli, and argoexec images multi-arched.

We propose to add this
platform: [ linux/amd64, linux/arm64, linux/ppc64le ]
to the following line in the YAML file at link. for enhancing the image release pipeline to include ppc64le architecture support

@terrytangyuan
Copy link
Member

The Argo image serves as a crucial dependency for facilitating seamless integration and testing of Kubeflow pipelines on ppc64le

Can you point me to where KFP relies on it?

@valen-mascarenhas14
Copy link
Author

Sure @terrytangyuan This link shows how do we setup our cluster for testing
.

This link shows the steps to install Argo .

This is the yaml file that has both the dependencies (https://github.com/argoproj/argo-workflows/releases/download/v3.5.2/install.yaml)

@terrytangyuan
Copy link
Member

I meant documentation around the requirement of ppc64le.

@ghatwala
Copy link

ghatwala commented Jan 3, 2024

hey @terrytangyuan we are trying to enable ppc64le pipeline in our local prow cluster , details are here in this issue- GoogleCloudPlatform/oss-test-infra#1972 (comment)

@terrytangyuan
Copy link
Member

I see. If it's just as simple as adding one more platform in the workflow, then I don't see why not supporting it.

@terrytangyuan
Copy link
Member

Would you like to make the change and test if it works in your fork before submitting the PR?

@ghatwala
Copy link

ghatwala commented Jan 3, 2024

yes we could , requesting @valen-mascarenhas14 to try it once via fork and then submit PR here.

@agilgur5 agilgur5 added the area/build Build or GithubAction/CI issues label Jan 4, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Jan 4, 2024

I see. If it's just as simple as adding one more platform in the workflow, then I don't see why not supporting it.

IMO, I would probably reject this for the same reasons as RISC-V in #12067 (i.e. lack of usage and required maintenance and build time, same reason why Argo CD is reticent to add more) and recommend an unofficial build as I did #12067 (comment)

hey @terrytangyuan we are trying to enable ppc64le pipeline in our local prow cluster , details are here in this issue- GoogleCloudPlatform/oss-test-infra#1972 (comment)

I'm a little confused here, that's not in k8s test-infra, that's for GCP. Are Google and Kubeflow planning to support ppc64le officially? (in particular, GCP supporting IBM PowerPC seems odd) If so, that could shift my opinion, but that wasn't really clear to me from the issue

@ghatwala
Copy link

ghatwala commented Jan 4, 2024

hi @agilgur5 - on below , there are multiple kubeflow components already supported on ppc64le , more details in this umbrella issue - kubeflow/community#781

I'm a little confused here, that's not in k8s test-infra, that's for GCP. Are Google and Kubeflow planning to support ppc64le officially? (in particular, GCP supporting IBM PowerPC seems odd) If so, that could shift my opinion, but that wasn't really clear to me from the issue

@valen-mascarenhas14
Copy link
Author

@terrytangyuan I've created a fork and trying building the ppc64le specific argo-workflow images . It successfully builds the images on ppc64le .
Here's the workflow link .

I'll go ahead and raise a PR if all looks good

@terrytangyuan
Copy link
Member

Let's get @agilgur5's agreement before submitting the PR.

@sarabala1979
Copy link
Member

Agree with @agilgur5

IMO, I would probably reject this for the same reasons as RISC-V in #12067 (i.e. lack of usage and required maintenance and build time, same reason why Argo CD is reticent to add more) and recommend an unofficial build as I did #12067 (comment)

IMO align with @agilgur5
If there is a significant number of Argo Workflow users requesting a certain architecture, we can add it to the Argo Workflow build process. If not, users can fork and build the image, then update the image path in an issue so that other users can use it.

@agilgur5
Copy link
Contributor

agilgur5 commented Jan 4, 2024

If not, users can fork and build the image, then update the image path in an issue so that other users can use it.

Yea in #12067 (comment) I even suggested that we could host such unofficial builds for less common architectures in argoproj-labs as well.

Those can be automated to run on every release of Workflows. That doesn't necessarily even require a fork (per se), just a CI/GitHub Actions process

there are multiple kubeflow components already supported on ppc64le , more details in this umbrella issue - kubeflow/community#781

If I'm reading that issue correctly and some of the affiliations here and there, this sounds to be driven exclusively & entirely by IBM? RISC-V is more open than PowerPC (and both are RISC ISAs), so if we're going to support one, it would make sense to support both as experimental.
There's still quite a bit more phases therein too and no real user surveys provided as supporting evidence. Some of the other upstream deps seem to have expressed the same concerns.

@lehrig
Copy link

lehrig commented Mar 14, 2024

@agilgur5 thanks for looking into this.

  1. you are right that we as IBM a driving this probably to the greatest extend as we have seen a significant demand in the market for this. IBM Power customers love using open source these days, which has created this demand. Red Hat as part of IBM is further increasing such demands. IMO this is also good for widening the argo community to a wider market, in particular by not pushing those demands to tekton as alternative. We'd like to have all options on the table, so customer can select the best option in their context.
  2. given 1., we can also commit to help maintaining this line of work.
  3. as for user surveys, the umbrella issue Umbrella Issue: Porting Kubeflow to IBM Power (ppc64le) kubeflow/community#781 at least is one of the highest rates issues; also upvoted by a significant amount of non-IBM Kubeflow community members. Is this what you are looking for?

@agilgur5
Copy link
Contributor

agilgur5 commented Mar 14, 2024

IMO this is also good for widening the argo community to a wider market

I don't speak for or represent Argo (my words are my own), but to be fair, it is one of the largest CNCF projects already.

by not pushing those demands to tekton as alternative

Other projects in the ecosystem like Tekton supporting PowerPC and having user adoption of that are good arguments. For the latter, no data has been presented. The former took me a bit to find (there are no binaries in the GH releases), but Tekton does seem to make builds for PowerPC (but not RISC-V?). Although idk if all its components support PowerPC.

Related is that Kubeflow Pipelines is still on an old, unsupported version of Argo (kubeflow/pipelines#9301, kubeflow/pipelines#8935, kubeflow/pipelines#8942, etc). Even if Argo were to start building for PowerPC, KFP still wouldn't be able to use those images as they'd only be for supported Argo versions.
I'm unsure if the KFP Tekton fork is up-to-date on a version of Tekton that supports PowerPC.

Kubeflow also doesn't yet fully support arm64 (kubeflow/kubeflow#2337) (which has been increasingly popular due to Apple silicon).

Is this what you are looking for?

No. I mentioned that issue myself in my previous comment. It does not have any user surveys. Upvotes are not particularly nuanced as a signal (also, there are issues in Argo with many more upvotes if that measure were to be exclusively used, this one only has 4 upvotes. the Kubeflow issue is also for all of Kubeflow, not KFP specifically either as KFP users are a subset of Kubeflow users -- I have seen many that don't use KFP).

As an example, Argo does have a roughly yearly survey that has size & scale of users mentioned

of non-IBM Kubeflow community members

As there is no survey data or similar, that statement is difficult to quantify. How many organizations are using or interested in PowerPC and KFP (or Argo in a different fashion) on PowerPC? There is no data on that and nearly all comments are from IBM.

Also there still hasn't been a counter-argument presented for why PowerPC and other less common architectures could not be hosted in an unofficial builds repo. As I wrote above, that could be hosted within argoproj-labs. IBM could also host one for IBM-led archs (including s390x)

@gerrith3
Copy link

@agilgur5 one comment you made about argo having good acceptance already and that being kind of a counter to the need for adding Power support, but the whole point of a graduated CNCF project is to make it accessible to all CNCF community members for use and for contribution. And, as with any open source project, different end users or middlemen in the overall process contribute based on their own needs. In IBMs case, the feature that we contribute and support is typically support for additional architectures like ppc64le and s390x. Is the argument here that supported architectures are not a feature of the project? Communities typically evolve to serve all members and there are a lot of features that IBM might not need or endorse, but generally we wouldn't block them simply because we didn't think they were necessary or have proof that they were "widely enough" used, whatever that litmus test might be.

And, you do point to one of the challenges that IBM does have with open source communities - our end users (typically customers) are not very interactive in open source communities and thus we wind up as their proxies, which is more painful than you might expect. ;) But part of the challenge here is that we have built an ecosystem of $xxxM of product into broader ecosystems that involve billions of dollars in things like banking and finance or health care, etc. etc. Odds are the users of Power systems include every open source developer with a bank account or credit card, because that's the type of workloads that IBM Power and IBM Z can often be found in, running the world's largest and often most secure installations.

So, IBM voted, on behalf of its customers, via CNCF support for ArgoCD either directly or with Red Hat as a partner, we voted with Red Hat when we jointly created and released GitOps for Power, we vote when we have multiple developers engaging with open source communities and spend our dollars because our customers tell us that they want these capabilities.

I'm honestly not sure how we can get you the information you requested on votes - ideally someone like IDC or Gartner would have those connections into customers, but we wouldn't be out here advocating for changes like this if they weren't requested by what will be your own end users. ;)

Finally, with your argument of hosting things for Power and Z in cloned repositories, consider that we've worked with some >40,000 open source communities on Power in the past year alone. Cloning all of those projects and maintaining them with IBMers would be totally ridiculous in terms of cost and effort and the complete and total opposite of the point of open source. As you are probably aware, IBM has contributed to open source since the 90's at least, and probably even since the 60's, and in a volume proportionally larger than most other large companies throughout the years. We believe in open source, we believe in collaborations like the Linux Foundation, Apache Foundation, CNCF, etc. etc. Trying to clone all of GitHub for ppc64le and for s390x would be a massive waste of world brainpower, disk space and compute power. ;) What seems hard at the beginning but is ultimately easier is finding the right way to collaborate and embrace the promise of open source.

Sorry for the soap-boxy response, but I wanted to provide a little bit of a view from "the other side." :)

thanks for listening!

@agilgur5
Copy link
Contributor

agilgur5 commented Mar 21, 2024

Is the argument here that supported architectures are not a feature of the project?

No, the question is and has been (this issue was opened as a feature request and still is one) why should those architectures be officially supported as part of the core? Especially so when we don't have any tests on those architectures nor contributors actively running on those architectures (as another form of testing). We have either or both for the currently supported architectures.
Again as was mentioned above, other projects said very similar things (e.g. pyca/cryptography#7723)

As was already written, every feature incurs a maintenance burden, and Workflows already has some efforts ongoing to reduce that burden and move things into user-land (e.g. kubeflow/notebooks#92, #12694) as well as get more active contributors (c.f. the Sustainability Effort). Every project has to make a decision about the trade-offs of and priority of a feature, and if it can be implemented easily in user-land, that substantially decreases any priority or rationale for it to be supported in the core. As I wrote there, CD is also reticent to add more architectures for a similar trade-off of lack of usage vs. required maintenance and build time.

Finally, with your argument of hosting things for Power and Z in cloned repositories

That is similarly not what was said here. A separate build repo was suggested, similar to Node.js's https://github.com/nodejs/unofficial-builds/, which is a significantly larger project than Argo.

would be a massive waste of world brainpower, disk space and compute power.

That suggestion would in fact waste less resources, compared to builds for every commit on main for infrequently used architectures. Making all builds more complex and longer for infrequently used architectures is an argument against this feature.
Similar can be said about core maintenance and support for an untested build, as opposed to an explicit unofficial builds repo, which can also have separate maintainers etc.

I'm honestly not sure how we can get you the information you requested

As was already written, given that OSS projects, including CNCF projects like Argo and many others, are able to survey their users, I would think that IBM would definitely have the resources to do the same thing.

Sorry for the soap-boxy response

For context, I am a volunteer who has put thousands of hours of unpaid time into OSS communities, including Argo & CNCF (and all of that is readily available, publicly accessible information). A significant portion of OSS is run by passionate volunteers & hobbyists.
And that's about all I'll say on that point, it is preferred to keep things not personal, on topic, and focused on problem-solving and concrete data. Marketing statements by corporations are also generally discouraged and CNCF requires vendor neutrality (parts of your comment are definitely pretty close to the line).

As I have done a few times now, I would ask that all questions asked and concerns raised be addressed. Multiple IBM employees have yet to do so.

@lehrig
Copy link

lehrig commented Apr 17, 2024

@agilgur5 - thanks again; let's summarize where we are:

  1. Is the gist & potential next actions to run a survey looking for potential users of argo on ppc64le? ("If argo was available on ppc64le, would you use it?")
  2. If so, what would be a ball park number of users we need to drive this forward?
  3. Should we (IBM) run such a survey with our customers? Or would it make sense to incorporate a question for supported architectures in the yearly Argo survey?

@agilgur5
Copy link
Contributor

agilgur5 commented Apr 17, 2024

  1. That's one next action.
    1. I imagine IBM might want to also ask about s390x, and Argo would also want to ask about RISC-V and others.
    2. No IBM folks have answered the questions around a separate build repo similar to https://github.com/nodejs/unofficial-builds/. That can be done today, no survey or extra info needed. Some of y'all could potentially even be maintainers of such a repo. I can also personally sponsor the inclusion of that repo into argoproj-labs (as well as approve docs PRs to link to it).
      1. We'd also probably add other experimental builds there, such as RISC-V arch builds and FIPS 140-2 validated crypto builds. Any experimental builds can be hosted there without too significant of a need (including s390x, for example).
      2. At this point I'm considering making that repo myself, but tbh given that contributors don't seem to be willing to do that themselves, it honestly begs even more the question of its real utility, usage, and necessity.
    3. Another prior question, how would this arch be tested?
  2. I can't make that determination alone, but a very rough back of the napkin estimate off the top of my head would be several large orgs (5-7+) or many smaller orgs (15-20+)
  3. We could do both. I'm not sure how much overlap there is with regard to survey answerers, especially potential users. Like I imagine an IBM survey would get more responses regarding ppc64le support than an Argo survey. I also have not been involved in prior year surveys (I've been a contributor for about a ~year rn).
    1. cc @caelan-io I think Pipekit was helping run / analyze / summarize the survey? (as I see you were author of last year's summary)

@agilgur5
Copy link
Contributor

agilgur5 commented Jul 30, 2024

would be a massive waste of world brainpower, disk space and compute power.

That suggestion would in fact waste less resources, compared to builds for every commit on main for infrequently used architectures. Making all builds more complex and longer for infrequently used architectures is an argument against this feature.

To put some concrete numbers to this, from a recent build on main:

  • the 3 amd64 builds take between 2m - 4m
  • the 3 arm64 builds take between 18m - 35m

So cross-compilation for arm64 already takes around an order of magnitude longer. I imagine that less used architectures may take even longer (less cross-compilation optimizations).

Those are very real numbers that do non-trivially affect the length of our existing release process (and I've waited on the arm64 builds more than once; at least every time I release a patch version I notice this)

@terrytangyuan
Copy link
Member

terrytangyuan commented Aug 5, 2024

@lehrig and others from RH/IBM - Could you reach out to me via RH/IBM Slack?

@terrytangyuan
Copy link
Member

Related is that Kubeflow Pipelines is still on an old, unsupported version of Argo (kubeflow/pipelines#9301, kubeflow/pipelines#8935, kubeflow/pipelines#8942, etc). Even if Argo were to start building for PowerPC, KFP still wouldn't be able to use those images as they'd only be for supported Argo versions.

Quick note - this has been fixed and KFP upgraded to 3.4+ kubeflow/pipelines#10568.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build or GithubAction/CI issues type/feature Feature request
Projects
None yet
Development

No branches or pull requests

7 participants