Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the support story surrounding ppc32 in the compiler #5490

Open
aviansie-ben opened this issue Aug 25, 2020 · 8 comments
Open

Clarify the support story surrounding ppc32 in the compiler #5490

aviansie-ben opened this issue Aug 25, 2020 · 8 comments

Comments

@aviansie-ben
Copy link
Contributor

The Power codegen currently contains a lot of code specifically for dealing with 32-bit, however neither OMR nor OpenJ9 make any claims as to their support of the ppc32 architecture and no tests are being run for this code in either project. As I understand it, this code is currently only used and tested internally by IBM. In fact, according to @zl-wang, 32-bit processors aren't even supported by IBM's internal builds anymore in the first place and only 32-bit mode on 64-bit processors is supported [1]. This entire support story is quite vague and confusing and (to my knowledge) is not documented anywhere.

With the confusion surrounding what is and is not supported and without external CI tests for ppc32 being run, this makes it nearly impossible for external contributors to modify any part of the Power codegen that has special cases for 32-bit without risking breakage of IBM's internal builds. I'd suggest that we either begin running tests on ppc32 and officially support this configuration in OMR (to some extent at least) or consider deprecating support for the platform altogether and removing the code that implements it.

[1] #5346 (comment)

@zl-wang
Copy link
Contributor

zl-wang commented Aug 25, 2020

although it becomes rarer and rarer, PMRs still are coming in for 32bit JVM now and then. there is really no urgency to deprecate 32bit codegen in immediate future.

switching to support only 64bit processors was hardened before OMR even existed (java7 time). Reasons: there were huge pains supporting long atomicity on 32bit hardware which were rarely available at that time, except in embedded systems.

@aviansie-ben
Copy link
Contributor Author

While I appreciate that IBM may still have customers using ppc32, that's not the issue I'm trying to bring up here. To my knowledge, the OMR compiler does not and has never had official support for ppc32 despite having de facto support. Our only officially supported architectures and platforms in OMR are those that are given a build status in our README.

What I'm taking issue with here is the way that IBM has been using this unofficial support in its products. In fact, I know of at least one instance where a PR was rushed through the review process against my objections to fix an issue with ppc32 that only appeared in internal IBM builds [1]. Considering that we don't officially support ppc32 in this project right now, I don't think it would be acceptable to either push for such a PR to be rushed in to fix ppc32 build breakages or to block any future PR on the basis that it would break ppc32 support.

I don't necessarily think we should remove our code for handling ppc32 (despite how much old, forgotten code this would allow us to get rid of), but we really need a better story around this in OMR if IBM plans to continue using the master branch of OMR to provide ppc32 support in its JVM. That would mean running OMR CI tests on ppc32 to avoid accidental breakage and documenting the "64-bit processor in 32-bit mode" requirement, as well as what exactly such a requirement means for the compiler (can we use 64-bit instructions? what about using the upper 32 bits of registers to store 64-bit values in a single GPR?).

[1] #5351

@mstoodle
Copy link
Contributor

From @zl-wang's comment, I understand that IBM would prefer that the OMR project continue to maintain the ppc32 support. @aviansie-ben's concern is that it's currently hard to impossible to do that because OMR has no test resources for this platform and so there's no way for contributors or committers to ensure ppc32 support continues to work as the code base evolves. And the 32-bit code is "gumming" up the rest of the code.

Given these two statements, I see two basic ways forward, but happy to hear others:

  1. mostly status quo: OMR keeps the code paths on the assumption that IBM will test and fix any problems that happen. However, the "high priority" fix problem Ben points out is definitely annoying. IBM should not assume that such fixes can be "rushed" in for their release schedules.

  2. IBM could add resources and tests to run on ppc32 so that the project is better able to measure and manage quality for this platform.

In both cases, it is my strong opinion that downstream consumers of OMR, including IBM, should deal with "urgency" in their downstream release. That should not and does not automatically translate into urgency for the upstream OMR project.

Of the two, I would prefer the second approach because then OMR can support these code paths for everyone to use making both contributors and committers happier. With the first approach, IBM is basically imposing on the OMR project to keep what would be otherwise dead code around that nobody can really rely or usefully build upon. Contributors and committers alike will be frustrated. I would also like to understand how long IBM would expect that code to be kept.

I suppose a third option that would probably be more attractive to the OMR project would be for IBM to maintain 32-bit support in their downstream internal project (if that's the only place it's needed), and then the OMR project could jettison this code.

Can anyone estimate how much code we're talking about and how frequently the code paths with ppc32 support are changing?

@aviansie-ben
Copy link
Contributor Author

aviansie-ben commented Aug 27, 2020

A quick search through the code found me ~290 locations where the Power codegen checks whether it's running on 32-bit or 64-bit (either through macros or runtime checks) and at least 2300 lines of trivially ppc32-only code that's otherwise dead. For context, the entire compiler/p folder (excluding the opcode properties table) is ~53000 lines of code. Most of this ppc32-only code is evaluators for long opcodes where 32-bit has to use register pairs, linkage code, and arraycopy code, however there are many other places in the compiler with slight differences between 32-bit and 64-bit that one might not expect (e.g. the icmp* evaluators have a 64-bit-only optimization and end up using a fallback path on 32-bit).

As for how often this code is being touched, I've had to touch it numerous times in my recent refactors. No less than 5 of my PRs in the last 3 months have had to touch 32-bit only code. This may be an overestimate of how often it's normally touched, though, as I've been doing more refactors than normal as of late. However, I worry that even seemingly innocuous changes to evaluators that have any ppc32-specific special cases or any helper functions that they call could cause breakage without explicitly touching any ppc32-specific code directly, and I've seen this happen on numerous occasions. For instance, newly added hardware exploitation and codegen optimizations sometimes don't work on or require extra code to handle 32-bit.

I don't think it'd be too much maintenance work to keep the support around, as ppc32 shares a lot of code with ppc64, but without adequate testing and official support for this platform, I worry that this code could very easily be broken. I'd also like to see such support usable for non-IBM consumers of OMR, even if we don't support all features on ppc32. If we're going to keep around support only for IBM, I'd really want a concrete timeline for how long such an arrangement would need to remain.

@mstoodle
Copy link
Contributor

mstoodle commented Sep 1, 2020

Hi @aviansie-ben, how much additional testing do you think would be warranted above creating a 32-bit POWER build and running our existing tests with it (on a 64-bit machine, I suppose)? Would that be a reasonable first step to getting the project in a position to claim to support the ppc32 platform?

@0xdaryl
Copy link
Contributor

0xdaryl commented Sep 1, 2020

I wonder if this issue should be expanded to clarify the support story of PPC32 in OMR in general (not just the compiler), or at least prompt a second issue to be opened to track that?

I recall that @ymanton undertook some work a couple of years back to bring in broader support for PPC32 (see #2930). So more work may be needed to support that beyond the compiler. This was largely driven by an OpenJ9 developer trying to build and run on a Power e500_v2.

@aviansie-ben
Copy link
Contributor Author

@mstoodle I think it would be reasonable to simply add ppc32 AIX builds to our CI pipelines and start running the compiler unit tests and compiler Tril tests there. I think beyond this, we just need to start requesting the addition of more compiler Tril and unit tests along with PRs touching the Power codegen and looking at addressing the current limitations of our testing strategy in general (e.g. better symref support in Tril) to address some areas that we've historically had difficulty testing in the compiler.

@0xdaryl I'd definitely be open to expanding this issue to encompass more than just the compiler, but I don't have enough experience with other OMR components to know what the current situation is for ppc32 support there myself. Anything that's getting consumed downstream by IBM would have to have some level of ppc32 support already so it's likely that they're in a similar position to the compiler, but I don't presume to know this for a fact.

@aviansie-ben
Copy link
Contributor Author

On this topic, it seems that we've ended up with yet another ppc32 build break that I feel really illustrates some of the risks that I'm concerned about here. @zl-wang recently opened #5534 to fix it and I posted some more in-depth technical info about the problem there, but the TL;DR is that the difference in the definition of size_t between ppc64 and ppc32 caused a call to a TR::MemoryReference constructor added in #5487 to be ambiguous between two different overloads when compiled on ppc32.

I don't think there was any reasonable way that an issue like this could have been found through code review, since it involves a lot of complex C++ function overloading rules and a subtle difference between 32-bit and 64-bit builds that wasn't explicit in the code. Really, the only way this could have been detected before merging IMO would be to have CI builds for ppc32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants