-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-528 cache the locations in the super blocks #529
GH-528 cache the locations in the super blocks #529
Conversation
|
||
@Override | ||
public String toString() { | ||
return "Bitmap375Big{}"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ likes to show the objects in the debugger, which is usually a good idea except that Bitmap375Big is usually a very very big object...so overriding toString() forces IntelliJ to use the results of that instead of analyzing and displaying the entire object.
qendpoint-core/src/main/java/com/the_qa_company/qendpoint/core/compact/bitmap/Bitmap375Big.java
Outdated
Show resolved
Hide resolved
qendpoint-core/src/main/java/com/the_qa_company/qendpoint/core/compact/bitmap/Bitmap375Big.java
Outdated
Show resolved
Hide resolved
mid = (min + max) / 2; | ||
} | ||
|
||
prevFound[index] = min; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are filling the prevFound (cache) dynamically, but it should be possible to do this at startup instead.
...int-core/src/main/java/com/the_qa_company/qendpoint/core/compact/sequence/SequenceInt32.java
Outdated
Show resolved
Hide resolved
f7e8e5e
to
ee858be
Compare
@ate47 This is still work in progress, but I've implemented an approach that scales with the size of the data and calculates the values at startup. I found out that a lot of the performance improvements come from dynamically updating the cache while the query is running, since the next value will usually be located very close to where we found the previous value. Also found a useful little hack where I on the second and third iteration of the binary search read the value at min + 1 to see if that allows us to shorten the number of iterations. With the EU dataset I'm seeing a decent improvement in performance: I'm using The query below goes from 4.592 seconds to 2.959 seconds
The query below goes from 1.042 seconds to 0.708 seconds
The query below goes from 1.437 seconds to 1.082 seconds
The query below goes from 17.308 seconds to 9.892 secondsThis query uses a combination of nested loop join (JoinIterator) and merge join, and it was actually faster when merge join was disabled (13.929 seconds). Without merge join but with the new binary search it runs in 10.723 seconds and with both merge join and the new binary search is runs in 9.892 seconds. So now it's faster with merge join :)
The query below goes from 2.068 seconds to 1.801 secondsQuery uses merge join!
The query below goes from 1.13 seconds to 1.129 secondsQuery uses merge join, which is this case is probably the reason why it's not any faster with the new binary search. When I run it with the old binary search and with merge join disabled it runs in 6.887 seconds, so merge join makes a big difference here!
PS: If you want to use the old binary search: |
And I've rebased this branch on dev, so it contains the previous improvements that were merged. All the above results are from after the rebase. |
ebe036a
to
dbb90b7
Compare
@ate47 I saw an interface called DynamicSequence that extends LongArray. Since it's called "dynamic" I assume that it wouldn't be safe for us to implement the caching of the estimated location since the cache might not be re-calculated when things change. Is this a correct assumption? |
Updated resultsThe query below goes from 4.592 seconds to 2.265 seconds
The query below goes from 1.042 seconds to 0.481 seconds
The query below goes from 1.437 seconds to 0.933 seconds
The query below goes from 17.308 seconds to 8.204 seconds
|
if (subjectID == 0 || subjectID == -1) { | ||
newSubj = subj; | ||
} else { | ||
newSubj = this.endpoint.getHdtConverter().subjectIdToIRI(subjectID); | ||
} | ||
if (predicateID == 0 || predicateID == -1) { | ||
newPred = pred; | ||
} else { | ||
newPred = this.endpoint.getHdtConverter().predicateIdToIRI(predicateID); | ||
} | ||
if (objectID == 0 || objectID == -1) { | ||
newObj = obj; | ||
} else { | ||
newObj = this.endpoint.getHdtConverter().objectIdToIRI(objectID); | ||
} | ||
|
||
if (graph) { | ||
graphID = new long[contexts.length]; | ||
newContextes = this.endpoint.getHdtConverter().graphIdToIRI(contexts, graphID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the NativeStore isn't in use then we don't need these variables. So it's simplest if we can delay creating them until we are sure we need them.
rank -= value & 1; | ||
bitpos++; | ||
value >>>= 1; | ||
int trailingZeros = Long.numberOfTrailingZeros(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is faster. Not certain why, but I assume that the Long.numberOfTrailingZeros(value) method uses a SIMD instruction to check multiple bits in a single operation.
… track of the estimated location of values in the underlying long array. This assumes that the values are ordered.
dafcffc
to
4d56b27
Compare
5fc81db
to
342e3e8
Compare
@ate47 This PR should be ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except my 2 comments, these changes are fine for me.
...e/src/main/java/com/the_qa_company/qendpoint/core/util/io/compress/WriteLongArrayBuffer.java
Outdated
Show resolved
Hide resolved
|
||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> | ||
</properties> | ||
|
||
<dependencies> | ||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for this add? Was this part of a compilation issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime issue actually. Complained of a method not found when generating the query explanation json. The project inherits two different versions of Jackson, so we apparently need to decide which one to use.
It would be cleaner to use dependency management instead, but I couldn't find that in any of the poms. Maybe I overlooked something?
Issue resolved (if any): #528
Description of this pull request:
Please check all the lines before posting the pull request:
mvn formatter:format
on the backend,npm run format
on the frontend) before posting my pull request,mvn formatter:validate
to validate the formatting on the backend,npm run validate
on the frontend