Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable toLowerCase/toUpperCase acceleration #2098

Merged
merged 2 commits into from
Jul 4, 2018

Conversation

dhong44
Copy link
Contributor

@dhong44 dhong44 commented Jun 6, 2018

Modify API call so that caseConversionHelper receives
underlying arrays and length of string.

Closes: #1934

Signed-off-by: Daniel Hong [email protected]

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 6, 2018

@fjeremic Could you please review this

return output;
int sLength = lengthInternal();

if (enableCompression && (null == compressionFlag || coder == LATIN1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to refactor this code and avoid the duplicate if. Can we merge with the path below?

@@ -2720,9 +2720,17 @@ public String toLowerCase(Locale locale) {
}

if (StrHWAvailable() && language == "en") { //$NON-NLS-1$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are in this area of code can we rename this method here to something more sane and place it in the JitHelpers class with proper documentation? Perhaps supportsIntrinsicCaseConversion or if you can think of a better name I'm fine with that?


if (enableCompression && (null == compressionFlag || coder == LATIN1)) {
byte[] output = new byte[sLength];
if (toLowerHWOptimizedLatin1(value, output, sLength))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can match the new name of the check and rename these accordingly to toLowerIntrinsicLatin1 / toLowerIntrinsicUTF16 (similarly for toUpper).

if (toLowerHWOptimizedLatin1(value, output, sLength))
return new String(output, LATIN1);
} else {
byte[] output = new byte[sLength * 2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be factored out as sLength << coder

} else {
byte[] output = new byte[sLength * 2];
if (toLowerHWOptimizedUTF16(value, output, sLength * 2))
return new String(output, UTF16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stay consistent with the rest of the codebase and add curly brackets around if blocks.

// DO NOT CHANGE CONTENTS OF THESE METHODS
private final boolean toUpperHWOptimized(String input) {
// DO NOT CHANGE THE CONTENTS OF THESE METHODS
private final static boolean toUpperHWOptimizedLatin1(byte[] value, byte[] output, int length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this guy to JitHelpers as well and fully document it. Similarly with others.

@@ -4569,6 +4593,21 @@ public String(char[] data, int start, int length) {
}
}

String(char[] data, int length, boolean compressed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need for this constructor to exist. Can we not just reuse the one below with length == 0? These things would typically get inlined and the constant will be propagated thus ending up with the same implementation as you have here.

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 7, 2018

Depends on eclipse/omr#2650

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 7, 2018

Addressed your changes requested from 3fe5f89 in 5d7021f.

case TR::java_lang_String_toLowerHWOptimizedLatin1:
return TR::TreeEvaluator::compressedStringToLowerCaseEvaluator(node, cg);
case TR::com_ibm_jit_JITHelpers_toUpperIntrinsicUTF16:
return TR::TreeEvaluator::UTF16StringToUpperCaseEvaluator(node, cg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably rename these to toUpperIntrinsicUTF16Evaluator for easier search ability. And of course use the same names on Z if needed.

Copy link
Contributor Author

@dhong44 dhong44 Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the name inlineToUpper in Z be appropriate or should I change it to something like toUpperIntrinsic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with toUpperIntrinsic. By grepping for the Java method we should be able to find all related methods/functions in the VM/JIT as well. This is the intention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed functions in x and z codegen in 5fd75e8.

* Boolean representing the string's compression
*
* \return
* 0 if the conversion is unsuccessful, a non zero value otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. The function returns a register which represents the return value of the Java call. We need to JavaDoc the JIT helpers you added to define what the input parameters represent and what the return value represents. This Doxygen comment needs to be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added JavaDoc comments to JIT helper methods and fixed doxygen comment in 91004aa

TR::Instruction* instruction;

const int elementSizeMask = (isCompressedString) ? 0x0 : 0x1; // byte or halfword mask
const int sizeOfVector = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cast this variable to an int8_t a little below. Are we able to avoid the cast by redefining the type at source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9b53d1e

const bool is64 = TR::Compiler->target.is64Bit();
uintptrj_t headerSize = TR::Compiler->om.contiguousArrayHeaderSizeInBytes();

TR::RegisterDependencyConditions * regDeps = new (cg->trHeapMemory()) TR::RegisterDependencyConditions(0, 14, cg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only count 13 dependencies below. Can we save some space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5143440

regDeps->addPostCondition(invalidRangeVector, TR::RealRegister::AssignAny);
regDeps->addPostCondition(invalidCondVector, TR::RealRegister::AssignAny);

if (is64){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting of braces should be consistent with the rest of the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56171e1

// Check for strings lengths that are too big in 32bit
if (!is64)
{
generateRIInstruction(cg, TR::InstOpCode::TMLH, node, lengthRegister, (uint16_t) 0x8000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a cast here? Also doesn't this check if lengthRegister < 0. I don't think this is possible. Can you explain what this code is supposed to handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/eclipse/openj9/blob/1ca0ab9841f1a6b17fde82402582249853a1e6e7/runtime/compiler/z/codegen/J9TreeEvaluator.cpp#L409-L417

Here there was some justification for this check. It does seem that this is checking for negative lengths, which would never happen so I don't see the need for this. I think this is dead code and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d0a4635


generateRRInstruction(cg, TR::InstOpCode::LR, node, loadLength, lengthRegister);

// Check for strings lengths that are too big in 32bit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the bitness of the JVM matters here. Couldn't the same problem happen on 64-bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reasons as stated in the comment at Line 407, it seems this whole code block is unnecessary and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d0a4635

generateVRSbInstruction(cg, TR::InstOpCode::VLL, node, charBufferVector, loadLength, generateS390MemoryReference(sourceRegister, addressOffset, headerSize, cg));

// Check for invalid characters, go to fallback individual character conversion implementation
if(!isCompressedString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency in space between if and parenthesis. Similarly elsewhere in the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56171e1

generateVRRcInstruction(cg, isToUpper ? TR::InstOpCode::VS : TR::InstOpCode::VA, node, modifiedCaseVector, charBufferVector, charOffsetVector, 0x0, 0x0, elementSizeMask);
generateVRReInstruction(cg, TR::InstOpCode::VSEL, node, modifiedCaseVector, modifiedCaseVector, charBufferVector, selectionVector);

instruction = generateVRSbInstruction(cg, TR::InstOpCode::VSTL, node, modifiedCaseVector, loadLength, generateS390MemoryReference(destRegister, addressOffset, headerSize, cg), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set instruction = here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4289d81


generateVRRdInstruction(cg, TR::InstOpCode::VSTRC, node, selectionVector, charBufferVector, alphaRangeVector, alphaCondVector, 0x4, elementSizeMask);
generateVRRcInstruction(cg, isToUpper ? TR::InstOpCode::VS : TR::InstOpCode::VA, node, modifiedCaseVector, charBufferVector, charOffsetVector, 0x0, 0x0, elementSizeMask);
instruction = generateVRReInstruction(cg, TR::InstOpCode::VSEL, node, modifiedCaseVector, modifiedCaseVector, charBufferVector, selectionVector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set instruction = here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4289d81


instruction=generateS390LabelInstruction(cg, TR::InstOpCode::LABEL, node, doneLabel);
instruction->setEndInternalControlFlow();
instruction->setDependencyConditions(regDeps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line can be folded into the generateS390LabelInstruction. There should be an overload accepting register dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b796109

cg->generateDebugCounter(isToUpper? "z13/simd/toUpper/null" : "z13/simd/toLower/null", 1, TR::DebugCounter::Cheap);
generateRRInstruction(cg, TR::InstOpCode::getXORRegOpCode(), node, destRegister, destRegister);

instruction=generateS390LabelInstruction(cg, TR::InstOpCode::LABEL, node, doneLabel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting around = needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56171e1

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 8, 2018

============================================================
; Live regs: GPR=5 FPR=0 VRF=0 GPR64=0 AR=0 {GPR_0210, &GPR_0152, GPR_0147, &GPR_0033, &GPR_0032}
------------------------------
 n107n    (  0)  treetop                                                                              [     0x3ff6b081130] bci=[-1,138,2723] rc=0 vc=394 vn=- li=- udi=- nc=1
 n106n    (  2)    icall  com/ibm/jit/JITHelpers.toUpperIntrinsicUTF16([B[BI)Z[#631  final virtual Method -128] [flags 0x20500 0x0 ] ()  [     0x3ff6b0810e0] bci=[-1,138,2723] rc=2 vc=394 vn=- li=- udi=- nc=4 flg=0x20
 n97n     (  1)      aload  java/lang/String.helpers Lcom/ibm/jit/JITHelpers;[#664  final Static] (obj1) [flags 0xa0307 0x0 ] (X!=0 sharedMemory )  [     0x3ff6b080e10] bci=[-1,126,2723] rc=1 vc=394 vn=- li=- udi=- nc=0 flg=0x4
 n99n     (  1)      l2a (X!=0 unneededConv sharedMemory )                                            [     0x3ff6b080eb0] bci=[-1,130,2723] rc=1 vc=394 vn=- li=- udi=- nc=1 flg=0x404
 n516n    (  1)        lshl (compressionSequence )                                                    [     0x3ff6b089100] bci=[-1,130,2723] rc=1 vc=394 vn=- li=- udi=- nc=2 flg=0x800
 n515n    (  1)          ==>iu2l (in GPR_0210) (X!=0 unneededConv )
 n513n    (  1)          iconst 2                                                                     [     0x3ff6b089010] bci=[-1,130,2723] rc=1 vc=394 vn=- li=- udi=- nc=0
 n94n     (  2)      ==>newarray (in &GPR_0152) (X!=0 sharedMemory )
 n105n    (  1)      ishl (X>=0 cannotOverflow )                                                      [     0x3ff6b081090] bci=[-1,137,2723] rc=1 vc=394 vn=- li=- udi=- nc=2 flg=0x1100
 n453n    (  1)        ==>iushr (in GPR_0147) (X>=0 cannotOverflow )
 n104n    (  1)        iconst 1 (X!=0 X>=0 )                                                          [     0x3ff6b081040] bci=[-1,136,2723] rc=1 vc=394 vn=- li=- udi=- nc=0 flg=0x104
------------------------------
------------------------------
 n107n    (  0)  treetop                                                                              [     0x3ff6b081130] bci=[-1,138,2723] rc=0 vc=394 vn=- li=- udi=- nc=1
 n106n    (  1)    icall  com/ibm/jit/JITHelpers.toUpperIntrinsicUTF16([B[BI)Z[#631  final virtual Method -128] [flags 0x20500 0x0 ] (in GPR_0212) ()  [     0x3ff6b0810e0] bci=[-1,138,2723] rc=1 vc=394 vn=- li=- udi=- nc=4 flg=0x20
 n97n     (  0)      aload  java/lang/String.helpers Lcom/ibm/jit/JITHelpers;[#664  final Static] (obj1) [flags 0xa0307 0x0 ] (X!=0 sharedMemory )  [     0x3ff6b080e10] bci=[-1,126,2723] rc=0 vc=394 vn=- li=- udi=- nc=0 flg=0x4
 n99n     (  0)      l2a (in &GPR_0211) (X!=0 unneededConv sharedMemory )                             [     0x3ff6b080eb0] bci=[-1,130,2723] rc=0 vc=394 vn=- li=- udi=- nc=1 flg=0x404
 n516n    (  0)        lshl (in &GPR_0211) (compressionSequence )                                     [     0x3ff6b089100] bci=[-1,130,2723] rc=0 vc=394 vn=- li=- udi=- nc=2 flg=0x800
 n515n    (  0)          ==>iu2l (in GPR_0210) (X!=0 unneededConv )
 n513n    (  0)          iconst 2                                                                     [     0x3ff6b089010] bci=[-1,130,2723] rc=0 vc=394 vn=- li=- udi=- nc=0
 n94n     (  1)      ==>newarray (in &GPR_0152) (X!=0 sharedMemory )
 n105n    (  0)      ishl (in GPR_0212) (X>=0 cannotOverflow )                                        [     0x3ff6b081090] bci=[-1,137,2723] rc=0 vc=394 vn=- li=- udi=- nc=2 flg=0x1100
 n453n    (  0)        ==>iushr (in GPR_0147) (X>=0 cannotOverflow )
 n104n    (  0)        iconst 1 (X!=0 X>=0 )                                                          [     0x3ff6b081040] bci=[-1,136,2723] rc=0 vc=394 vn=- li=- udi=- nc=0 flg=0x104
------------------------------

 [     0x3ff6b2502c0]	SLLG    &GPR_0211,GPR_0210,2
 [     0x3ff6b250440]	SLLK    GPR_0212,GPR_0147,1
 [     0x3ff6b250c90]	XGR     GPR_0213,GPR_0213	
 [     0x3ff6b250d80]	VGBM    VRF_0219,0x0
 [     0x3ff6b250e80]	VGBM    VRF_0220,0x0
 [     0x3ff6b250f80]	VGBM    VRF_0221,0x0
 [     0x3ff6b251080]	VGBM    VRF_0222,0x0
 [     0x3ff6b251180]	VLEIH   VRF_0219,0x617a,0
 [     0x3ff6b251280]	VLEIH   VRF_0219,0xe0f6,1
 [     0x3ff6b251380]	VLEIH   VRF_0219,0xf8fe,2
 [     0x3ff6b251480]	VUPLHB  VRF_0219,VRF_0219
 [     0x3ff6b251580]	VLEIH   VRF_0220,0xa000,0
 [     0x3ff6b251680]	VLEIH   VRF_0220,0xc000,1
 [     0x3ff6b251780]	VLEIH   VRF_0220,0xa000,2
 [     0x3ff6b251880]	VLEIH   VRF_0220,0xc000,3
 [     0x3ff6b251980]	VLEIH   VRF_0220,0xa000,4
 [     0x3ff6b251a80]	VLEIH   VRF_0220,0xc000,5
 [     0x3ff6b251b80]	VLEIH   VRF_0221,0xdfdf,0
 [     0x3ff6b251c80]	VLEIH   VRF_0221,0xb5b5,1
 [     0x3ff6b251d80]	VLEIH   VRF_0221,0xffff,2
 [     0x3ff6b251e80]	VUPLHB  VRF_0221,VRF_0221
 [     0x3ff6b251f80]	VLEIH   VRF_0222,0x8000,0
 [     0x3ff6b252080]	VLEIH   VRF_0222,0x8000,1
 [     0x3ff6b252180]	VLEIH   VRF_0222,0x8000,2
 [     0x3ff6b252280]	VLEIH   VRF_0222,0x8000,3
 [     0x3ff6b252380]	VLEIH   VRF_0222,0x2000,4
 [     0x3ff6b252480]	VLEIH   VRF_0222,0x2000,5
 [     0x3ff6b252580]	VREPIH  VRF_0218,0x20
 [     0x3ff6b252680]	LR      GPR_0214,GPR_0212	
 [     0x3ff6b252770]	NILF    GPR_0214,15	
 [     0x3ff6b252890]	BRC     VGNOP(0x8), Label L0080		# (Start of internal control flow)
 [     0x3ff6b252990]	SLFI    GPR_0214,1	
 [     0x3ff6b252cb0]	LA      GPR_0223,#672 0(GPR_0213,&GPR_0211)
 [     0x3ff6b252da0]	VLL     VRF_0215,GPR_0214,#673 8(GPR_0223)
 [     0x3ff6b252ea0]	VSTRCHS VRF_0216,VRF_0215,VRF_0221,VRF_0222,1
 [     0x3ff6b252fa0]	BRC     MASK5(0x4), Label L0083	
 [     0x3ff6b2530a0]	VSTRCH  VRF_0216,VRF_0215,VRF_0219,VRF_0220,4
 [     0x3ff6b2531a0]	VSH     VRF_0217,VRF_0215,VRF_0218
 [     0x3ff6b2532a0]	VSEL    VRF_0217,VRF_0217,VRF_0215,VRF_0216
 [     0x3ff6b2535a0]	LA      GPR_0224,#674 0(GPR_0213,&GPR_0152)
 [     0x3ff6b253690]	VSTL    VRF_0217,GPR_0214,#675 8(GPR_0224)
 [     0x3ff6b253790]	AHIK    GPR_0213,GPR_0214,1,	
 [     0x3ff6b2538a0]	Label L0080:	
 [     0x3ff6b2539a0]	CIJ     GPR_0212,Label L0082,16,BM(mask=0x4), 	
 [     0x3ff6b253ab0]	SRL     GPR_0212,GPR_0212,4
 [     0x3ff6b253bb0]	Label L0084:	
 [     0x3ff6b253d70]	VL      VRF_0215,#676 8(GPR_0213,&GPR_0211)
 [     0x3ff6b253e70]	VSTRCHS VRF_0216,VRF_0215,VRF_0221,VRF_0222,1
 [     0x3ff6b253f70]	BRC     MASK5(0x4), Label L0083	
 [     0x3ff6b254070]	VSTRCH  VRF_0216,VRF_0215,VRF_0219,VRF_0220,4
 [     0x3ff6b254170]	VSH     VRF_0217,VRF_0215,VRF_0218
 [     0x3ff6b254270]	VSEL    VRF_0217,VRF_0217,VRF_0215,VRF_0216
 [     0x3ff6b254430]	VST     VRF_0217,#677 8(GPR_0213,&GPR_0152)
 [     0x3ff6b2545f0]	LA      GPR_0213,#678 16(GPR_0213)
 [     0x3ff6b2546e0]	BRCT    GPR_0212,Label L0084	
 [     0x3ff6b2547d0]	Label L0082:	
 [     0x3ff6b2548d0]	LHI     GPR_0212,0x1	
 [     0x3ff6b2549d0]	BRC     NOP(0xf), Label L0081	
 [     0x3ff6b254ad0]	Label L0083:	
 [     0x3ff6b254bd0]	XR      GPR_0212,GPR_0212	
 [     0x3ff6b254f00]	ASSOCREGS
 [     0x3ff6b254cc0]	Label L0081:	
 POST:
 {AssignAny:&GPR_0211:R} {AssignAny:&GPR_0152:R} {AssignAny:GPR_0212:R} {AssignAny:GPR_0213:R} {AssignAny:GPR_0214:R} {AssignAny:VRF_0215:R} {AssignAny:VRF_0216:R} {AssignAny:VRF_0217:R}
 {AssignAny:VRF_0218:R} {AssignAny:VRF_0219:R} {AssignAny:VRF_0220:R} {AssignAny:VRF_0221:R} {AssignAny:VRF_0222:R}	# (End of internal control flow)

============================================================

// The index will begin at 0 and increment in multiples of 16
cursor = generateRXInstruction (cg, TR::InstOpCode::getLoadAddressOpCode(), node, rSrcVal, generateS390MemoryReference(rSrcVal, rResidue, 0, cg)); iComment("source.val += residue");
cursor = generateRXInstruction (cg, TR::InstOpCode::getLoadAddressOpCode(), node, rTgtVal, generateS390MemoryReference(rTgtVal, rResidue, 0 ,cg)); iComment("tgt.val += residue");
generateRIEInstruction(cg, is64 ? TR::InstOpCode::CGIJ : TR::InstOpCode::CIJ, node, lengthRegister, sizeOfVector, doneLabel, TR::InstOpCode::COND_BL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use TR::InstOpCode::getCmpImmBranchRelOpCode() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3bf85b1

/* ===================== Step 4.0: Setup for core-loop =====================*/
// Increment index by the remainder then add 1, since the loadLength contains the highest index, we must go one past that
generateRXInstruction(cg, TR::InstOpCode::getLoadAddressOpCode(), node, addressOffset, generateS390MemoryReference(addressOffset, loadLength, 0, cg));
generateRILInstruction(cg, is64 ? TR::InstOpCode::ALGFI : TR::InstOpCode::ALFI, node, addressOffset, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we introduce a missing TR::InstOpCode helper for this instruction pair?

Copy link
Contributor

@fjeremic fjeremic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 [     0x3ff87253590]	LA      GPR_0213,#676 0(GPR_0214,GPR_0213)
 [     0x3ff87253680]	ALGFI   GPR_0213,1	

These two instructions can be fused by folding the ALGFI into the memory reference displacement.

 [     0x3ff872548d0]	XGR     &GPR_0152,&GPR_0152	

This looks incorrect. Are we using the second child as the return value register? If so that is a bug. Any register you get from an evaluation of a node cannot be clobbered. It must represent the value of the node it was evaluated from at the exit point of any evaluator. You need to allocate a new register for the return value or reuse an existing one. Perhaps can you check if we can use lengthRegister for the return value since we already call gprClobberEvaluate on it?

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 8, 2018

Depends on eclipse/omr#2656

@fjeremic
Copy link
Contributor

fjeremic commented Jun 8, 2018

Depends on eclipse-omr/omr#2656

I don't think we need to depend on this. The ALGFI should go away after previous review is addressed. We should add the helper regardless though.

@fjeremic fjeremic added the depends:omr Pull request is dependent on a corresponding change in OMR label Jun 11, 2018
@@ -54,9 +54,6 @@ private JITHelpers() {

@CallerSensitive
public static JITHelpers getHelpers() {
if (Reflection.getCallerClass().getClassLoader() != null) {
throw new SecurityException("JITHelpers"); //$NON-NLS-1$
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewcraik wanted to point this one out to you. Is this safe? We need a handle to the JITHelpers object in Test_JITHelpers.java so we can do unit testing on the newly added APIs and have full control over expected inputs and outputs to test all corner cases. Unfortunately we cannot get a handle to the object without removing this or going the reflection route which seems a bit dodgy. Are you aware of any reason why this protection was put in place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is here for the same reason sun.misc.Unsafe has the same kind of check - to dissuade others from getting the instance easily. Using a trick similar to unsafe to get hold of the instance in the tests would seem better to me than exposing the API in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that makes sense. @dhong44 can you use a similar trick we use in String.java to get a hold of the JITHelpers object and revert the above deleted lines?


if (enableCompression && (null == compressionFlag || coder == LATIN1)) {

if (helpers.supportsIntrinsicCaseConversion() && language == "en") { //$NON-NLS-1$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpreter will never execute this code which means it will appear as a cold path due to no interpreter profiling data. Does the intrinsic recognition overcome this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. The interpreter will enter the if block immediately above on line 2728 so we will have profiling data for that path. The JIT will fold away the call to helpers.supportsIntrinsicCaseConversion() as a constant at compile time and the flow algorithm in the JIT should propagate the block frequencies to the if at line 2725 to be equal to that of the preceding if.

Just to be cautious though @dhong44 can we collect a log and verify the block frequencies we compute for the block in which we inline the intrinsic? Given your expertise @andrewcraik is my reasoning here valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key question is when the supportsIntrinsicCaseConversion is folded. It looks to be in ilgen so the flow for IProfiling will have a chance to happen - other profilers do not do a flow. The key thing is to check the frequency on the blocks with the custom implementation after cold block marking. I think a log is the best way to verify this is working right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n1398n    BBStart <block_125> (freq 5002)                                                     [     0x3ff4315a4b0] bci=[10,116,2664] rc=0 vc=257 vn=- li=- udi=- nc=0
n1432n    treetop                                                                             [     0x3ff4315af50] bci=[10,122,2664] rc=0 vc=257 vn=- li=- udi=- nc=1
n1431n      newarray  jitNewArray[#176  helper Method] [flags 0x400 0x0 ] (skipZeroInit sharedMemory )  [     0x3ff4315af00] bci=[10,122,2664] rc=2 vc=257 vn=- li=- udi=- nc=2 flg=0x8000
n1429n        ishl                                                                            [     0x3ff4315ae60] bci=[10,121,2664] rc=1 vc=257 vn=- li=- udi=- nc=2
n1426n          iload  <auto slot 3>[#796  Auto] [flags 0x3 0x0 ]                             [     0x3ff4315ad70] bci=[10,116,2664] rc=1 vc=257 vn=- li=- udi=- nc=0
n1428n          iloadi  java/lang/String.coder B[#802  final Shadow +8] [flags 0xa0601 0x0 ]  [     0x3ff4315ae10] bci=[10,118,2664] rc=1 vc=257 vn=- li=- udi=- nc=1
n1427n            aload  <temp slot 3>[#786  Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory )      [     0x3ff4315adc0] bci=[10,117,2664] rc=1 vc=257 vn=- li=- udi=- nc=0 flg=0x4
n1430n        iconst 8   ; array type is byte                                                 [     0x3ff4315aeb0] bci=[10,122,2664] rc=1 vc=257 vn=- li=- udi=- nc=0
n1433n    astore  <auto slot 4>[#810  Auto] [flags 0x7 0x0 ]                                  [     0x3ff4315afa0] bci=[10,124,2664] rc=0 vc=257 vn=- li=- udi=- nc=1
n1431n      ==>newarray
n1438n    compressedRefs                                                                      [     0x3ff4315b130] bci=[10,130,2666] rc=0 vc=257 vn=- li=- udi=- nc=2
n1436n      aloadi  java/lang/String.value [B[#804  final Shadow +4] [flags 0x400a0607 0x0 ]  [     0x3ff4315b090] bci=[10,130,2666] rc=2 vc=257 vn=- li=- udi=- nc=1
n1435n        aload  <temp slot 3>[#786  Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory )          [     0x3ff4315b040] bci=[10,129,2666] rc=1 vc=257 vn=- li=- udi=- nc=0 flg=0x4
n1437n      lconst 0                                                                          [     0x3ff4315b0e0] bci=[10,130,2666] rc=1 vc=257 vn=- li=- udi=- nc=0
n1444n    NULLCHK on n1434n [#30]                                                             [     0x3ff4315b310] bci=[10,138,2666] rc=0 vc=257 vn=- li=- udi=- nc=1
n1443n      icall  com/ibm/jit/JITHelpers.toLowerIntrinsicUTF16([B[BI)Z[#815  final virtual Method -136] [flags 0x20500 0x0 ] ()  [     0x3ff4315b2c0] bci=[10,138,2666] rc=3 vc=257 vn=- li=- udi=- nc=4 flg=0x20
n1434n        aload  java/lang/String.helpers Lcom/ibm/jit/JITHelpers;[#799  final Static] [flags 0xa0307 0x0 ]  [     0x3ff4315aff0] bci=[10,126,2666] rc=1 vc=257 vn=- li=- udi=- nc=0
n1436n        ==>aloadi
n1439n        aload  <auto slot 4>[#810  Auto] [flags 0x7 0x0 ]                               [     0x3ff4315b180] bci=[10,133,2666] rc=1 vc=257 vn=- li=- udi=- nc=0
n1442n        imul                                                                            [     0x3ff4315b270] bci=[10,137,2666] rc=1 vc=257 vn=- li=- udi=- nc=2
n1440n          iload  <auto slot 3>[#796  Auto] [flags 0x3 0x0 ]                             [     0x3ff4315b1d0] bci=[10,135,2666] rc=1 vc=257 vn=- li=- udi=- nc=0
n1441n          iconst 2                                                                      [     0x3ff4315b220] bci=[10,136,2666] rc=1 vc=257 vn=- li=- udi=- nc=0
n1445n    istore  <pending push temp 0>[#793  Auto] [flags 0x3 0x800 ]                        [     0x3ff4315b360] bci=[10,138,2666] rc=0 vc=257 vn=- li=- udi=- nc=1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working as intended. I validated the log with @dhong44. We seem to have good frequencies on the blocks containing the intrinsic calls.

@DanHeidinga
Copy link
Member

@smlambert @llxia Can one of you review the tests?

@@ -1042,7 +1042,7 @@
<test>
<testCaseName>JCL_Test_JIT_Helper</testCaseName>
<command>$(JAVA_COMMAND) $(JVM_OPTIONS) --add-exports=java.base/com.ibm.jit=ALL-UNNAMED \
-Xbootclasspath/a:$(Q)$(TEST_RESROOT)$(D)TestResources.jar$(Q) \
--add-opens=java.base/com.ibm.jit=ALL-UNNAMED -Xbootclasspath/a:$(Q)$(TEST_RESROOT)$(D)TestResources.jar$(Q) \
Copy link
Contributor

@fjeremic fjeremic Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in the area can you rename this test case name to "JCL_Test_JITHelpers" to make it searchable since the test file is Test_JITHelpers.java and the class name is JITHelpers. I would like to be able to find this line in the future by grepping for the class name.

Copy link
Contributor Author

@dhong44 dhong44 Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is only for SE90 and up, SE80 has it's own test above. I don't think the --add-opens is necessary or is even recognized in Java 8.

@dhong44 dhong44 force-pushed the caseConversion branch 2 times, most recently from 367a19f to 4553d5a Compare June 13, 2018 14:03
for (int j : edgecaseLengths){
buffer = new char[j];
if (helpers.toUpperIntrinsicUTF16(Arrays.copyOfRange(lowercaseUTF16Char, 0, j), buffer, j * 2)) {
AssertJUnit.assertTrue("UTF16 JITHelper upper case conversion with char arrays of " + j + " letters failed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using org.testng.AssertJUnit, we should directly use org.testng.Assert http://static.javadoc.io/org.testng/testng/6.8.17/org/testng/Assert.html
org.testng.AssertJUnit is used when we port/migrate code that uses JUnit's asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4a4b2df

@@ -1377,6 +1405,34 @@ public void test_toUpperCase() {

AssertJUnit.assertTrue("Wrong conversion", "\u00df".toUpperCase().equals("SS"));

// Long strings and strings of lengths near multiples of 16 bytes to test vector HW acceleration edge cases.
AssertJUnit.assertTrue("toUpperCase case conversion failed on a long string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. org.testng.Assert should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

09078d4

@dhong44
Copy link
Contributor Author

dhong44 commented Jun 13, 2018

I ran a bumble bench that called toLower on strings of random length between 0 and 1000. With vectorized case conversion the score was about 30 times higher than without.

cursor = generateRILInstruction (cg, TR::InstOpCode::getSubtractLogicalImmOpCode(), node, rResidue, 1); iComment("adjust residue for VLL/VSTL (length -> index)");

/* ===================== Step 3: Residue processing =====================*/
generateRRInstruction(cg, TR::InstOpCode::getLoadRegOpCode(), node, loadLength, lengthRegister);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary given the load above. Can we remove this? Also shouldn't lengthRegister always a 32-bit value? Shouldn't we always use an LR here? Similarly to above, why can't loadLength also be a 32-bit register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf269b9

cursor = generateRIInstruction (cg, TR::InstOpCode::TMLH, node, rLen, (uint16_t) 0x8000);
cursor = generateS390BranchInstruction (cg, TR::InstOpCode::BRC , TR::InstOpCode::COND_MASK2, node, handleInvalidChars);
cursor->setStartInternalControlFlow();
generateRRInstruction(cg, TR::InstOpCode::LLGFR, node, lengthRegister, lengthRegister);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly is this needed? Looking at the instruction selection from #2098 (comment) I see that all uses of this register can be made in a 32-bit context. We seem to be forcing it to be 64-bit here however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf269b9

}
buffer.getClass();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test supportsIntrinsicCaseConversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since the result of that method when compiled is dependent on the machine. There's no way we could really test that it returns the correct result.

@dhong44 dhong44 changed the title Re-enable toLowerCase/toUpperCase acceleration WIP: Re-enable toLowerCase/toUpperCase acceleration Jun 14, 2018
@fjeremic
Copy link
Contributor

Jenkins test sanity

@fjeremic fjeremic assigned joransiu and unassigned fjeremic Jun 18, 2018
@fjeremic
Copy link
Contributor

Transferring this to @joransiu as I'll be away for the next two weeks.

dhong44 added 2 commits June 27, 2018 16:36
Modify API call so that caseConversionHelper recieves
underlying arrays and length of string.

Closes: eclipse-openj9#1934

Signed-off-by: Daniel Hong <[email protected]>
Remove unused String constructors accepting one integer
and returning an empty String with an appropriate capacity.

Closes: eclipse-openj9#777

Signed-off-by: Daniel Hong <[email protected]>
@fjeremic fjeremic assigned fjeremic and unassigned joransiu Jul 4, 2018
@fjeremic fjeremic merged commit 15e283c into eclipse-openj9:master Jul 4, 2018
@fjeremic fjeremic changed the title WIP: Re-enable toLowerCase/toUpperCase acceleration Re-enable toLowerCase/toUpperCase acceleration Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit depends:omr Pull request is dependent on a corresponding change in OMR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants