Skip to content

mv timed grooming2

Matthew Von-Maszewski edited this page Apr 15, 2016 · 3 revisions

Status

  • merged to develop - April 15, 2016
  • code complete - April 15, 2016
  • development started - April 14, 2016

History / Context

The original discussion for timed grooming is here:

https://github.com/basho/leveldb/wiki/mv-timed-grooming

This branch corrects an oversight in the original implementation. A completed compaction needs to mark its completion time in both the level initiating the compaction and the level receiving the compaction. The original implementation only marked the level initiating the compaction. This oversight could cause a timed compaction from level 0 to immediately cause another timed compaction from level 1. This cascading of compactions would happen when both levels had experienced no compactions within the timed grooming period.

The above oversight and its correction are irrelevant to an independent issue that lead to disabling timed grooming. Other research showed that Erlang's scheduler busy wait impacted leveldb throughput more than compactions. The Erlang option "+sbwt none" improve throughput by roughly 35%. This improvement overshadowed the slight performance gain of timed grooming. And timed grooming's impact to leveldb's block cache and Linux's page cache became more significant. The code for timed grooming remains to allow future work. But this branch disables it currently.

(The proposed feature of populating leveldb's block cache with blocks from the receiving level would mitigate timed grooming's cache impact. Implementation of such a feature would be a good time to reevaluate timed grooming.)

Branch Description

db/version_set.cc

This added line effectively disables timed grooming. Setting elapsed_micros to zero guarantees that the grooming threshold is always kL0_GroomingTrigger (4 as of this writing).

db/version_set.h

m_LastCompaction of the source level for a compaction and the destination level are now both populated with the current time. The database's mutex is held at the time this routine is called. There is no race condition.

Clone this wiki locally