Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojaloop Helm deployments are not compatible when deployed to ARM-arch based hosts #2317

Open
Tracked by #2459
mdebarros opened this issue Jun 29, 2021 · 9 comments
Open
Tracked by #2459
Assignees
Labels
bug Something isn't working or it has wrong behavior on a Mojaloop Core service infra-bug An infrastructure issue or bug such as in helm charts or documentation oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it

Comments

@mdebarros
Copy link
Member

mdebarros commented Jun 29, 2021

Summary:

Testing Mojaloop Helm deployment on a 2021 Mac with the new M1 (ARM arch) CPU results in the deployment failing due to compatibility issues.

Specific issues identified are with:

  • Zookeeper <-- this can be resolved by upgrading the Docker image version to a tag that supports ARM.
  • Kafka <-- Kafka fails to startup with a "Segmentation Fault". There is no clear solution at this time.

Severity:
Low

Priority:
Medium

Expected Behavior
Mojaloop Helm deployments should startup on ARM-arch based hosts as per AMD64.

Steps to Reproduce

  1. Buy M1 Mac
  2. Install Pre-requisites (Docker, etc)
  3. Deployment Mojaloop Helm (v12.x or v13.x)

Specifications

  • Component (if known): Helm
  • Version: v12.x+
  • Platform: ARM-arch
  • Subsystem: n/a
  • Type of testing: Manual
  • Bug found/raised by: Tan Soo Leng

Notes:

  • Severity when opened: Low
  • Priority when opened: Low
@mdebarros mdebarros added bug Something isn't working or it has wrong behavior on a Mojaloop Core service oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it labels Jun 29, 2021
@elnyry-sam-k elnyry-sam-k added the infra-bug An infrastructure issue or bug such as in helm charts or documentation label Jun 29, 2021
@mdebarros mdebarros mentioned this issue Sep 24, 2021
21 tasks
@tdaly61 tdaly61 self-assigned this Mar 31, 2022
@tdaly61
Copy link

tdaly61 commented Mar 31, 2022

hi @elnyry-sam-k , @mdebarros : I am taking a slight side road for the next week or 2 working on mini-loop deployment and hopefully making it work with a) ML 13.1.0 and b) Arm64 . So I assigned this to me as it will likely be necessary for me to get kafka chart working on arm to be able to do this. FWIW: I have kafka docker container working on arm , biggest issue is where to host an arm64 purpose built chart and how much work to do on this given that the bitnami folks are eventually going to get there (though not at all sure that is still not going to be years away).

@tdaly61
Copy link

tdaly61 commented Mar 31, 2022

also FWIW: I am making progress on this , I have kafka and zookeeper charts running and connecting ok.

@elnyry-sam-k
Copy link
Member

elnyry-sam-k commented Mar 31, 2022

Thanks for the update, Tom.. I've moved it to in progress based on your comment..

Regarding the question, maybe we can discuss on one of the calls (I suppose we can get some EC2 instances with ARM if required on AWS)

@tdaly61
Copy link

tdaly61 commented Apr 2, 2022

Ok it looks like npm / node.js is built on chrome javascript and the chrome javascript engine is c++ which is architecture specific AND baked into the mojaloop images. Unlike the external charts for zookeeper and mysql, kafka , this one is not a run-time issue but a "build-time" issue and therefore harder. ... I am looking at options for solving and for support , but needless to say this just got as I say ..harder

@millerabel
Copy link
Member

millerabel commented Apr 2, 2022 via email

@tdaly61
Copy link

tdaly61 commented Apr 4, 2022

Hi @millerabel well the good news is that MySQL , Kafka and Nginx (or your ingress controller of choice) are all configurable at deploy-time and I have all these running on arm64 vm in the cloud. It is more Alpine Linux, and NodeJS right now that I am working on i.e. the bits that are baked into "our container images" at build-time.

Your comments are very encouraging as I think I am realising that this is an increasingly important AI , not just because of the Apple MAC dev issue but because of AWS Graviton and other cloud vendors (likely including google) also heading down the ARM path. I am currently doing dev/test on the Oracle "always free tier" where they give you 4 Arm64 cpus and 24GB ram and everything else you need to do decent dev/test for free and no expiration date.

Now that said, is there anything written that says how the small/medium/large images were going to be hosted and differentiated and test that we could apply here for the different architectures ? TIA

@millerabel
Copy link
Member

millerabel commented Apr 4, 2022 via email

@tdaly61
Copy link

tdaly61 commented Apr 4, 2022

Very glad the platforms are factored out and can be assembled at deploy time. We have not previously considered the hardware architecture dimension, so this is new ground. I might look to other Docker standard images to see how naming and versioning is done and adopt something similar. Server Sizing As for the Small/Med/Large for runtime, we would likely defer this to actual implementers. We did this as an early demonstration when we first brought up Rancher/etc to demonstrate how a runtime admin would manage changes in load in production. Miguel / Sam can likely recall that work. I’d hear from others on this, and think we might need to anticipate different deployment scales, but leave it to a second phase of configuration work, as templates for use by implementers. The demonstrated idea was to use Rancher to “heal” the system from one structured configuration into another without downtime. I don’t think we know enough yet about production size variability to specify this in the release code. And might be distinct choices in Azure or AWS or Google clouds that suggest cost/perf dimensions we would capture. Consider server configs for dev/sandbox/prod and single-dev laptop and of course our own AWS ci/cd cloud config that we will use directly to test and build. What say you all? — Miller Miller Abel @.*** On Apr 4, 2022, at 3:31 AM, Tom Daly @.***> wrote: Hi @millerabel https://github.com/millerabel well the good news is that MySQL , Kafka and Nginx (or your ingress controller of choice) are all configurable at deploy-time and I have all these running on arm64 vm in the cloud. It is more Alpine Linux, and NodeJS right now that I am working on i.e. the bits that are baked into "our container images" at build-time. Your comments are very encouraging as I think I am realising that this is an increasingly important AI , not just because of the Apple MAC dev issue but because of AWS Graviton and other cloud vendors (likely including google) also heading down the ARM path. I am currently doing dev/test on the Oracle "always free tier" where they give you 4 Arm64 cpus and 24GB ram and everything else you need to do decent dev/test for free and no expiration date. Now that said, is there anything written that says how the small/medium/large images were going to be hosted and differentiated so that they could be configured in the charts at run-time ? Alternatively can I assume that we would do something create image names (on dockerhub) to say arm64/mojaloop/central-ledger ? Is that the sort of approach that we being discussed ? If so what about testing different arch's , anything discussed or any decisions made there already ? — Reply to this email directly, view it on GitHub <#2317 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6OJ6FFFE6MX6NEU2NZU23VDLAH7ANCNFSM47QMSBKA. You are receiving this because you were mentioned.

@millerabel : one clarification from me. I was not trying to pursue the idea of small/medium/large beyond what it could help us with in supporting both x86 and arm64 architectures. cheers.

@tdaly61
Copy link

tdaly61 commented May 3, 2022

Post conference update :
I am trying to build the images listed below and as you can see
all are building except:

  • auth-service
  • als-consent-oracle
  • central-kms
  • ml-testing-tookit-ui

The issue with auth-service is that there is an issue with the sqlite3 build and the others I have not yet diagnosed.

found image [als_oracle_pathfinder_local] so skipping build for now
found image [finance_portal_backend_service_local] so skipping build for now
found image [transaction_requests_service_local] so skipping build for now
no existing image for [auth-service] ; building ...
Error building docker image for [auth_service_local]
found image [email_notifier_local] so skipping build for now
found image [thirdparty_api_svc_local] so skipping build for now
found image [ml_test_toolkit_local] so skipping build for now
no existing image for [als-consent-oracle] ; building ...
Error building docker image for [als_consent_oracle_local]
found image [quoting_service_local] so skipping build for now
found image [central_settlement_local] so skipping build for now
found image [account_lookup_service_local] so skipping build for now
found image [simulator_local] so skipping build for now
found image [settlement_management_local] so skipping build for now
found image [ml_api_adapter_local] so skipping build for now
no existing image for [central-kms] ; building ...
Error building docker image for [central_kms_local]
found image [buld_api_adapter_local] so skipping build for now
found image [operator_settlement_local] so skipping build for now
found image [central_ledger_local] so skipping build for now
found image [finance_portal_ui_local] so skipping build for now
no existing image for [ml-testing-toolkit-ui] ; building ...
Error building docker image for [ml_testing_tookit_ui_local]
found image [central_event_processor_local] so skipping build for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working or it has wrong behavior on a Mojaloop Core service infra-bug An infrastructure issue or bug such as in helm charts or documentation oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it
Projects
None yet
Development

No branches or pull requests

4 participants