-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build JNI IFA switching helpers as C calls #2044
Conversation
f979881
to
39c57a0
Compare
@@ -66,6 +66,7 @@ JIT_PRODUCT_SOURCE_FILES+=\ | |||
compiler/z/codegen/J9MemoryReference.cpp \ | |||
compiler/z/codegen/J9S390CHelperLinkage.cpp \ | |||
compiler/z/codegen/J9S390PrivateLinkage.cpp \ | |||
compiler/z/codegen/J9S390JNILinkage.cpp \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to modify the CMake builds as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see CMake lists contain these cpp
file names. It seems to operate on directories for z codegen.
runtime/compiler/runtime/Runtime.cpp
Outdated
JIT_HELPER(jitPreJNICallOffloadCheck); | ||
JIT_HELPER(jitPostJNICallOffloadCheck); | ||
JIT_HELPER(fast_jitPreJNICallOffloadCheck); | ||
JIT_HELPER(fast_jitPostJNICallOffloadCheck); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for deviation from the naming convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use the same name as the VM helper name for consistency. Didn't realize that the JIT helpers have a naming convention. I'll change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping the fast_
in the names because these two lines are effectively extern c fast_*
and referring to the VM helpers.
@@ -0,0 +1,70 @@ | |||
/******************************************************************************* | |||
* Copyright (c) 2000, 2018 IBM Corp. and others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright needs to be updated to be 2018, 2018. Similarly for the other file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
namespace TR { | ||
|
||
class J9S390JNILinkage : public TR::S390PrivateLinkage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to Doxygen document this entire class.
39c57a0
to
4011d72
Compare
The changes are tested OK. Ready for reviews. |
To illustrate the differences before and after this change, here's the trace files from a JNI benchmark where the same native function is invoked in a loop for millions of times. Full trace files: Annotated excerpt for them: |
4011d72
to
7f72e17
Compare
@@ -49,6 +49,7 @@ | |||
#include "z/codegen/TRSystemLinkage.hpp" | |||
#include "runtime/J9Profiler.hpp" | |||
#include "runtime/J9ValueProfiler.hpp" | |||
#include "z/codegen/J9S390CHelperLinkage.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please place this with the other z/codegen includes in alphabetical order?
#endif | ||
|
||
TR::Register * | ||
TR::S390CHelperLinkage::buildIndirectDispatch(TR::Node * callNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems built very specifically for JNI IFA. Does it even belong here? There are also a handful of booleans below which can be folded away. I'm questioning the nature of the existence of this API. CHelper functions should all be statically addressable. Why exactly do we need indirect dispatch for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1 Admittedly this API is very customized towards the JNI IFA calls because IFA is the only consumer of this. Some of the booleans (isFastPathOnly
and even the #if defined(J9ZOS390)
for instance) can be folded away. I kept them there in case future users may need them. The IFA calls are zOS only but I figured that we can build indirect helper calls for other helpers on Linux.
#2 Also, the c function fast_jitPost(pre)JNICallOffloadCheck
can't be addressed statically as per current design. Direct invocation will result in VM assert failures. I suspect all function pointers in the JITConfig structure are not directly invoke-able. By directly invoke-able
, I mean BRASL jump to the fast_jitPre(post)JNICallOffloadCheck
function pointers directly.
I didn't use callNode->getSymbolReference()->getSymbol()->castToMethodSymbol()->getMethodAddress();
Will try and see how this is different from direct function pointer approach.
#3 One JNI IFA specific aspect is the register saving/restoring such as saveRegistersForIFAHelper()
. This logic used to be baked inside the glue code is needed if CEEHDLR is enabled. CEE handlers can crash if sys SP and the vmthread registers are not saved on the native side before invoking the native. This has to be baked into the IFA call builder.
In conclusion:
#1 is simple to do. I can fold away the booleans.
For #2 we have to see why the C functions are not directly invoke-able using function pointers.
#3 has to be in the builder function, whether it be a CHelper function or a function in a different class.
offsetof(J9JavaVM, jitConfig), cg())); | ||
|
||
|
||
// get J9TR_JitConfig_fast_jitPreJNICallOffloadCheck from jit config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a compile time constant? Why do we need these 3 indirections? i.e. how is this any different from fast_jitInstanceOf
? Are we able to extract the address statically at compile time via:
callNode->getSymbolReference()->getSymbol()->castToMethodSymbol()->getMethodAddress();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are compile time constants. The indirections are copied from the old glue code.
I had difficulty building direct BRASL
calls to fast_jitPre(post)JNICallOffloadCheck
. VM changes might be needed for us to jump directly to fast_jitPostJNICallOffloadCheck
.
* \endverbatim | ||
*/ | ||
void | ||
TR::J9S390JNILinkage::setupJNICallOutFrame(TR::Node * callNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to consolidate all TR::S390JNICallDataSnippet
construction into this function? Why split between two functions. Can we change the return type to TR::S390JNICallDataSnippet*
and construct the entire object here and return it?
Grepping this file for jniCallDataSnippet->set
it seems construction of this object is split between 3 functions but I don't see a reason why it can't all be consolidated into a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will cleanup the S390JNICallDataSnippet
and consolidate the code.
|
||
// set up Java Thread | ||
intptrj_t constJNICallOutFrameType = fej9->constJNICallOutFrameType(); | ||
TR_ASSERT( constJNICallOutFrameType < MAX_IMMEDIATE_VAL, "OMR::Z::Linkage::setupJNICallOutFrame constJNICallOutFrameType is too big for MVHI"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert needs fixing. What exactly is it guarding? Where is the MVHI
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is from the old code and should be deleted.
generateS390MemoryReference(jniCallDataSnippet->getBaseRegister(), jniCallDataSnippet->getPCOffset(), codeGen)); | ||
|
||
// store out jsp | ||
generateRXInstruction(codeGen, TR::InstOpCode::getStoreOpCode(), callNode, tempReg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename codeGen
to cg
to stay consistent with the rest of the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to cg()
fe9f781
to
44bad56
Compare
@fjeremic This PR is ready. |
@@ -1,5 +1,5 @@ | |||
/******************************************************************************* | |||
* Copyright (c) 2000, 2018 IBM Corp. and others | |||
* Copyright (c) 2017, 2018 IBM Corp. and others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a note that this is a valid change as the file was introduced in 2017 originally, not 2000.
|
||
TR::SymbolReference * offloadSymRef = cg()->symRefTab()->findOrCreateRuntimeHelper(helperIndex, false, false, false); | ||
offloadSymRef->getSymbol()->setName(symName); | ||
offloadSymRef->getSymbol()->castToMethodSymbol()->setMethodAddress(targetAddr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this code as this is already handled when the helper is created here:
Are we able to remove this and simply pass in the TR_RuntimeHelper
index instead of bool isPreJNIHelper
? We can also remove the above two extern "C"
definitions and simplify this function greatly. In fact it may become so small as to not even warrant an existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the extern
statements and used TR_RuntimeHelper
indices instead.
ce1f370
to
818c68a
Compare
After refactoring the
|
cursor = generateRILInstruction(cg(), TR::InstOpCode::IIHF, callNode, javaStackPointerRegister, envHi); | ||
cursor = generateRILInstruction(cg(), TR::InstOpCode::IILF, callNode, javaStackPointerRegister, envLo); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole sequence should simply use genLoadAddressConstant
and pass the environment
constant through. The function will figure out the best sequence to generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NigelYiboYu Loading environment register is necessary for calling a C function from JIT code. I omitted that part from Linkage as Assembly listing for fast_jitInstanceOf
was using it as volatile and we overwrite the register so anyway loaded value was not used. I checked other dual mode helper, Some of them returns the function pointer of the slow path helper function which belongs to same library and in that scenario environment register is needed to get the function pointer which is currently taken care by assembly glue in znathelp.m4
. That is the only use I could find currently.
Correct change here would be respecting XPLINK linkage and also pass environment register for instanceOf as well. Means before calling fastPath helper, environment register should be loaded.
extern "C" void fast_jitPostJNICallOffloadCheck(); | ||
|
||
void | ||
TR::J9S390JNILinkage::callIFAOffloadCheck(bool isPreJNIHelper, bool isJNICallOutFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fold this function away. I don't think it's needed. The environment is already present via TR_S390CEnvironmentAddress
so there is no need to deal with the function pointers etc. which renders this function an effective three liner:
TR::SymbolReference * offloadSymRef = cg()->symRefTab()->findOrCreateRuntimeHelper(helperIndex, false, false, false);
TR::Node* helperCallNode = TR::Node::createWithSymRef(TR::call, 0, offloadSymRef);
cHelperLinkage->buildDirectDispatch(helperCallNode, NULL, NULL, true,
isJNICallOutFrame,
reinterpret_cast<intptrj_t>(TR_S390CEnvironmentAddress));
I'm not sure it has a need for existence.
cursor = generateRILInstruction(cg(), TR::InstOpCode::IIHF, callNode, javaStackPointerRegister, envHi); | ||
cursor = generateRILInstruction(cg(), TR::InstOpCode::IILF, callNode, javaStackPointerRegister, envLo); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NigelYiboYu Loading environment register is necessary for calling a C function from JIT code. I omitted that part from Linkage as Assembly listing for fast_jitInstanceOf
was using it as volatile and we overwrite the register so anyway loaded value was not used. I checked other dual mode helper, Some of them returns the function pointer of the slow path helper function which belongs to same library and in that scenario environment register is needed to get the function pointer which is currently taken care by assembly glue in znathelp.m4
. That is the only use I could find currently.
Correct change here would be respecting XPLINK linkage and also pass environment register for instanceOf as well. Means before calling fastPath helper, environment register should be loaded.
818c68a
to
867ddec
Compare
2ef9b02
to
baf3ac6
Compare
runtime/oti/VMHelpers.hpp
Outdated
@@ -1133,7 +1133,7 @@ class VM_VMHelpers | |||
* @param[in] currentThread the current J9VMThread | |||
*/ | |||
static VMINLINE void | |||
beforeJNICall(J9VMThread *currentThread) | |||
beforeJNICall(J9VMThread *currentThread, J9Method* method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the documentation for this newly added parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the new param to documentation.
{ | ||
RealRegisterManager RealRegisters(cg()); | ||
bool isHelperCallWithinICF = deps != NULL; | ||
// TODO: Currently only jitInstanceOf is fast path helper. Need to modify following condition if we add support for other fast path only helpers | ||
bool isFastPathOnly = callNode->getOpCodeValue() == TR::instanceof; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we leave this TODO around? This is definitely something we still want to do once we refactor the runtime helpers into a sane table with properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO is added back
TR::RegisterDependencyConditions** deps, | ||
TR::Register *returnReg=NULL, | ||
bool forceFastPath = false, | ||
TR_RuntimeHelper helperIndex = TR_S390numRuntimeHelpers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the documentation here such that it is with the definition of the API and document foreFastPath
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is moved to the hpp
file with forceFastPath
added.
@r30shah anything else you'd like to review on this one? I still see an "x" beside your review but GitHub can't seem to find what it was referencing. |
baf3ac6
to
5ee8d18
Compare
Jenkins test sanity |
@@ -364,6 +361,17 @@ TR::Register * TR::S390CHelperLinkage::buildDirectDispatch(TR::Node * callNode, | |||
} | |||
TR::SymbolReference * callSymRef = callNode->getSymbolReference(); | |||
void * destAddr = callNode->getSymbolReference()->getSymbol()->castToMethodSymbol()->getMethodAddress(); | |||
|
|||
#if defined(J9ZOS390) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NigelYiboYu I think my previous comment was for this section. For now we need this part for calling out helpers written in C that means we need to load environment for instanceOf as well. Is that covered? Looking through your other changes.
I am going through your changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjeremic I am thinking of moving some of the context switching code from helper Glue code to in body and see the effect. I expect a small effect on the method footprint. If the effect is not much then we can move this inlined. I believe x86 already do this. We can make our helper linkage more generic this way.
Though it is something we can do later, should not affect this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r30shah let's give it a try and see if it has any negative effects on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstanceOf
is not covered in this PR. This whole pull request is for JNI and IFA switching. We can come back and change instanceOf
helper to load environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually good point @r30shah. Thinking about this again we have duplicate information here. The helper index is already stored in callNode->getSymbolReference()->getReferenceNumber()
so we shouldn't need to pass it as an argument to the CHelper linkage buildDirectDispatch
.
This information is already available through the callNode
. @NigelYiboYu are we able to get rid of this extra parameter? Once this is done we can remove the below check for if (helperIndex != TR_S390numRuntimeHelpers)
since a helper index must always be available. This will fix the instanceOf
helper as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra helper index is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
Environment pointer is now only loaded if the C helper is fast-path. Non-fastPath helpers jump to glue code, which does its own environment handling.
|
||
#if defined(J9ZOS390) | ||
// TR_S390numRuntimeHelpers represents invalid helper index | ||
if (helperIndex != TR_S390numRuntimeHelpers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NigelYiboYu I see that a declaration of this function defines helperIndex
with TR_S390numRuntimeHelpers
and for JNI linkage calls (for pre/post offload check) , this function (buildDirectDispatch)
will be called with TR_RuntimeHelper
equal to TR_S390jitPreJNICallOffloadCheck
or TR_S390jitPostJNICallOffloadCheck
. Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, JNI IFA uses TR_S390jitPreJNICallOffloadCheck and TR_S390jitPostJNICallOffloadCheck.
For the lack of a better choice, I use TR_S390numRuntimeHelpers
to validate the helper index parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okk. I think better choice here would be use of this isFastPathOnly. I think we only set this when we are calling a C helper function directly from JIT'd code. This should also cover the case for instanceOf
and load environment register for that.
5ee8d18
to
b346a0b
Compare
The JNI IFA switching on and off helpers use transition assembly sequences that saves and restores all GPRs, all FPRs and volatile VRFs. This is slow and unnecessary. This commit changes builds IFA calls as C function calls, which saves only the assigned registers. Signed-off-by: Nigel Yu <[email protected]>
b346a0b
to
27118b0
Compare
@r30shah could you please review again and see if all your concerns were addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NigelYiboYu @fjeremic Looks good to me.
Jenkins test sanity |
@fjeremic - you really need to stop this shotgun approach to launching tests, especially considering that the changes in this PR don't affect any of the platforms in PR testing. |
@gacholio is there a reason you say this? This change modified cnathelp.cpp which gets compiled and runs on all platforms. I've been bitten before by not fully testing changes before merging. Irregardless though IMO the commiters should be running tests on any change before it is merged. Edit: If we are stressing the farm by launching so many builds I think that should be a resource issue. We need proper testing before changes are merged and especially in a world where intermittent problems with JIT compilations are a thing I would rather be safe than sorry having to debug user issues in the field once bugs escape. |
I say this because I've been waiting for a single machine compile for hours, no doubt due to too much stuff being launched. It most certainly is a resource issue as well, but that's unlikely to be fixed soon (if ever). Even the simplest change takes days to get in. Also worth noting that running 8/9/10/11 testing is completely unnecessary on common code. |
I see, so do you propose for this change running only JDK8 for example as the right way to go? I would still not feel comfortable merging a change which affects all platforms without running sanity on the said platforms. I'm not sure what to say about the resource issue. We either need to have more granular tests or get more machines. I don't see a solution to this, and not running tests is not a solution that we should accept. |
We are working on getting more machines for some platforms, by sending hardware to UNB and helping to set it up. Hopefully by Sept. |
@pshipton do we have guidelines for committers on how we should be launching testing? If we are unnecessarily stressing the farm too much we should come up with guidelines to minimize this. Edit: For example in OMR we have:
|
Personally, I only ever run jdk8 with a single platform for things that are obviously common (which does occasionally result in problems on windows since the compiler is strict in ways no other one is). My first proposal is that we eliminate the shotgun from jenkins entirely - one comment per build launch will make people consider what they're doing. It's just too easy to spam the world at the moment. |
Not that I'm aware of. I think "test sanity" is the correct approach for JIT unless the change is specific to a platform, or generic across jdk versions. I expect JIT committers know best how to test JIT changes. Perhaps not all of xlinux, xlinux large heap, win and win-32 (jdk8 only) are always required since they are all x. Plus multiple jdk version may just duplicate testing depending on the change. The automatic launching of the jdk11 builds is a mistake, since they don't work yet. I've opened #2539. |
Works for me, at least for jdk versions. I've accidentally run multiple jdk versions when I didn't mean to. |
I agree this would be a step in the right direction. |
The JNI IFA switching on and off helpers use transition assembly sequences
that saves and restores all GPRs, all FPRs and volatile VRFs. This is slow
and unnecessary.
This commit changes builds IFA calls as C function calls, which saves only
the assigned registers.
Signed-off-by: Nigel Yu [email protected]