-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add microbenchmarks for vector functions. #3
Conversation
8e32e7e
to
98c305e
Compare
98c305e
to
af890f1
Compare
@jtibshirani Thanks for this, great work! Great thinking to try I have added another function that combines both of your functions decoding with static float decodeWithBufferAndDotProductWithUnrolling(float[] queryVector, BytesRef vectorBR) {
if (vectorBR == null) {
throw new IllegalArgumentException("A document doesn't have a value for a vector field!");
}
ByteBuffer byteBuffer = ByteBuffer.wrap(vectorBR.bytes, vectorBR.offset, vectorBR.length);
float dot0 = 0;
float dot1 = 0;
float dot2 = 0;
float dot3 = 0;
int offset = vectorBR.offset;
int length = (queryVector.length / 4) * 4;
for (int dim = 0; dim < length; dim += 4, offset += 16) {
dot0 += byteBuffer.getFloat(offset) * queryVector[dim];
dot1 += byteBuffer.getFloat(offset + 4) * queryVector[dim + 1];
dot2 += byteBuffer.getFloat(offset + 8) * queryVector[dim + 2];
dot3 += byteBuffer.getFloat(offset + 12) * queryVector[dim + 3];
}
for (int dim = length; dim < queryVector.length; dim++, offset += 4) {
dot0 += byteBuffer.getFloat(offset) * queryVector[dim];
}
return dot0 + dot1 + dot2 + dot3;
} Here are results on my machine:
Indeed almost 2x speedups can be achieved with ByteBuffer for decoding and unrolling in dot product. My machine params:
OpenJDK JDK 11.0.2, VM 11.0.2+9-LTS |
5116d06
to
52a211f
Compare
52a211f
to
178cb07
Compare
178cb07
to
c358db8
Compare
c358db8
to
0cd71b8
Compare
This commit updates the vector encoding and decoding logic to use `java.nio.ByteBuffer`. Using `ByteBuffer` shows an improvement in [microbenchmarks](jtibshirani#3) and I think it helps code readability. The performance gain might be due to the fact `ByteBuffer` uses hotspot intrinsic candidates like `Unsafe#getIntUnaligned` under the hood.
This commit updates the vector encoding and decoding logic to use `java.nio.ByteBuffer`. Using `ByteBuffer` shows an improvement in [microbenchmarks](jtibshirani#3) and I think it helps code readability. The performance gain might be due to the fact `ByteBuffer` uses hotspot intrinsic candidates like `Unsafe#getIntUnaligned` under the hood.
A note for future context: although it helped in some microbenchmarks, on other platforms unrolled dot product was substantially slower. For example, on @mayya-sharipova's Linux server (40 core Intel Xeon, open jdk 11.0.1), unrolling took 151.91 ns versus a baseline of 122.68 ns. |
I'm going to close this PR, since we finished implementing a round of changes based off the results. |
This PR shows some microbenchmarks for the decoding vectors and taking the dot product of two vectors. These benchmarks are meant for local testing purposes and will not be merged into the elasticsearch repo.
The results suggest a few directions to pursue, that I'll explore next in search macrobenchmarks:
ByteBuffer
instead of manual shifts might help for decoding.dotProductWithUnrolling4
, we manually unroll the dot product loop to clarify there are no dependencies between operations. This likely encourages SIMD to kick in, resulting in an improvement.Platform information: