Skip to content

Commit

Permalink
Fix block size bug (#915)
Browse files Browse the repository at this point in the history
The block sizes for the OPS, OSP, SOP, and SPO permutations were too small because of a bug in how it was determined when to end an block. For example, for the current Wikidata index as of this writing, the PSO and POS permutation have 48,769 blocks each, but the SPO and SOP permutations have only 1967 blocks each. This bug is fixed now.

Co-authored-by: Johannes Kalmbach <[email protected]>
  • Loading branch information
hannahbast and joka921 authored Mar 18, 2023
1 parent c9b1958 commit b3aa675
Showing 1 changed file with 14 additions and 5 deletions.
19 changes: 14 additions & 5 deletions src/index/CompressedRelation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -332,18 +332,27 @@ void CompressedRelationWriter::addRelation(Id col0Id,
// explicitly below.
CompressedRelationMetadata metaData{col0Id, col1And2Ids.numRows(), multC1,
multC2};
auto sizeOfRelation =
col1And2Ids.numRows() * col1And2Ids.numColumns() * sizeof(Id);

// Determine the number of bytes the IDs stored in an IdTable consume.
// The return type is double because we use the result to compare it with
// other doubles below.
auto sizeInBytes = [](const auto& table) {
return static_cast<double>(table.numRows() * table.numColumns() *
sizeof(Id));
};

// If this is a large relation, or the currrently buffered relations +
// this relation are too large, we will write the buffered relations to file
// and start a new block.
if (sizeOfRelation > _numBytesPerBlock * 8 / 10 ||
sizeOfRelation + _buffer.numRows() > 1.5 * _numBytesPerBlock) {
bool relationHasExclusiveBlocks =
sizeInBytes(col1And2Ids) > 0.8 * static_cast<double>(_numBytesPerBlock);
if (relationHasExclusiveBlocks ||
sizeInBytes(col1And2Ids) + sizeInBytes(_buffer) >
static_cast<double>(_numBytesPerBlock) * 1.5) {
writeBufferedRelationsToSingleBlock();
}

if (sizeOfRelation > _numBytesPerBlock * 8 / 10) {
if (relationHasExclusiveBlocks) {
// The relation is large, immediately write the relation to a set of
// exclusive blocks.
writeRelationToExclusiveBlocks(col0Id, col1And2Ids);
Expand Down

0 comments on commit b3aa675

Please sign in to comment.