Skip to content

dhly-etc/q-digest

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

q-digest

An implementation of the Q-Digest Data Structure in C++; documented here.

The Q-Digest lets you efficiently compute approximate percentile of large data sets with integral values. The data structure is space constrained and is parameterized by (K), the compression factor. Smaller values of K indicate more compression and less accuracy, whereas larger values of K indicate lesser compression and more accuracy.

The accuracy of the Q-Digest is reasonable for data that has values which are drawn uniformly from a random distribution of values. It does well on Poisson distributions, and extremely well on geometric distributions. The file test.cpp shows the differences.

Public API

namespace qdigest {
  QDigest::QDigest(size_t K, size_t upper_bound = 1);
  
  QDigest::QDigest(QDigest &&rhs);
  
  void QDigest::swap(QDigest &rhs);
  
  void QDigest::insert(size_t key, unsigned int count);
  
  size_t QDigest::size() const;
  
  bool QDigest::empty() const;
  
  // Returns (# internal nodes) / (sum of counts for all keys)
  double QDigest::compression_ratio() const;
  
  // Returns the 100p'th percentile. p takes values in the range [0..100]
  size_t QDigest::percentile(double p) const;
  
  // Serialize the structure into a string for storage/transmission
  std::string QDigest::toString() const;

  // De-serialize the structure into *this given a serialized representation
  void QDigest::fromString(std::string const &ser);

  // Merge 'rhs' into *this
  void QDigest::merge(QDigest const &rhs);
  QDigest& QDigest::operator+=(QDigest const &rhs);

  // Pretty print the Q-Digest (for debugging)
  std::ostream& operator<<(std::ostream &out, QDigest const &digest);
}

You can not directly copy a Q-Digest into another, since copying a Q-Digest is an expensive operation and involves copying/cloning all the internal nodes. However, you can do the following to make a copy if you wish:

qdigest::QDigest old(100, 10);
old.insert(1, 3);
old.insert(4, 2);

qdigest::QDigest curr(100);
// Since 'curr' started out empty, 'curr' is now a copy of 'old'
curr += old;

About

C++ Implementation of the Q-Digest

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 97.4%
  • CMake 2.6%