Skip to content

Python wrapper for KMC API

marekkokot edited this page Sep 27, 2019 · 9 revisions

Python wrapper for C++ KMC API is made to mimic C++ API interface. C++ are python are very different languages, so some compromise needs to be made. For example in python it is not possible to pass integer by reference, but in C++ KMC API this is the way the counter of k-mer is returnet. One (rather ugly) workaroud is to wrap integer in a class that is passed by reference in python.

py_kmc_api module contains following classes:

  • CountVec - it contains one filed (value) which is a list of integers being an output parameter of GetCountersForRead method of CKMCFile class.
  • Count - it contains one filed (value) which is a single integer being an output parameter of ReadNextKmer and CheckKmer methods of CKMCFile class.
  • LongKmerRepresentation - it contains one field (value) wich is a list of integers being a binary k-mer representation (each integer is, at C++ side 8 byte unsigned), it is used in to_long method of KmerAPI class
  • CKMCFileInfo - contains base information of KMC database.
  • CKmerAPI - represents a single k-mer.
  • CKMCFile - represents a KMC database, that may be opened in one of two modes: listing(only part of it is loaded into memory, privides sequential access to the database), random access (whole database is loaded into memory, provides existence query of a specific k-mer).

The public interface of this classes is described below.


CKMCFileInfo class

Fields:

  • kmer_length - the length of a k-mer
  • mode - always 0 (1 for older versions of KMC where quake aware counters were supported)
  • counter_size - the numer of bytes used to store each counter in kmc database
  • lut_prefix_length - internal parameter of kmc database, see more details in API.pdf
  • signature_len - the length of signature (it is also internal parameter used while database was constructed)
  • min_count - minimum value of a counter (if some k-mer had lower occurences than this value it is not stored in the database)
  • max_count - maximal value of a counter (if some k-mer had higher occurences than this value it is not store int the database)
  • both_strands - True if kmc was run without -b switch, False otherwise
  • total_kmers - the total numer of k-mers stored in the database

Methods: None

CKmerAPI class

Fields: None

Methods:

  • init(length) - constructor, takes one parameter, the length of a k-mer (may be skipped then 1 is taken as a default value)
  • init(kmer: KmerAPI) - constructor that created new object beased on existing one (copy ctor in C++ nomenclature)
  • assign(kmer: KmerAPI) - replace kmer with the one passed by parameter (equivalent of C++ copy assignment operator)
  • __eq__(kmer: KmerAPI) - equality comparison
  • __lt__(kmer: KmerAPI) - lower than comparison
  • get_asci_symbol(pos) - returns symbol at 0-based position
  • get_num_symbol(pos) - same as get_asci_sybol, but encoded (A->0, C->1, G->2, T->3)
  • __str__ - converts k-mer to string representation
  • to_long(result: LongKmerRepresentation) - converts k-mer to list of integers being its binary representation (each integer is unsigned 8 byte integer at C++ side)
  • reverse - converts k-mer to its reverse complement
  • get_signature(sig_len) - get the numeric representation of signature of a k-mer. Signature is internal term and generalization of minimizer used to achieve better performance of KMC, the concept is described in detains in the paper describing KMC 2
  • from_string(kmer_str) - converts string k-mer representation to internal KmerAPI representation

CKMCFile class

Fields: None

Methods:

  • init - parameterless constructor
  • OpenForRA(file_name) - open database for random access mode, returns True in case of success, False otherwise
  • OpenForListing(file_name) - open database for listing mdoe, returs True in case of success, False otherwise
  • ReadNextKmer(kmer: KmerAPI, count: Count) - reads next k-mer from the database, both parameters are output parameters, avaiable only in listing mode, returns True if successfully readed next k-mer, False otherwise
  • Close - close opened database
  • SetMinCount(x) - sets the minimum value of a counter, if k-mer has lower value it is treated as non existing one
  • GetMinCount() - returns the value set with SetMinCount
  • SetMaxCount(x) - sets the maximal value of a counter, if k-mer has greater value it is treated as non existing one
  • GetMaxCount - returns the value set with SetMaxCount
  • GetBothStrands - returns True if KMC was run without -b switch, False otherwise
  • KmerCount - returns the number of k-mers in the database (if SetMinCount or SetMaxCount overrides values stored in the database, the database is linearly scanned to count this number, otherwise it simply readed from KMC database header)
  • KmerLength - returns the length of k-mers stored in the databse
  • RestartListing - avaiable in listing mode, sets the internal pointer to first k-mer in the database
  • Eof - avaiable in listing mode, returns True is all k-mers have been listed, False otherwise
  • CheckKmer(kmer: KmerAPI, count: Count) - avaiable in random acssess mode, check if kmer passed by first argument exists in the database, if no it returns False, if yes it returns True, the number of k-mer occurences in the database is returned by the second parameter
  • IsKmer(kmer: KmerAPI) - avaiable in random access mode, returns True if k-mer exists in the database, False otherwise
  • ResetMinMaxCounts - restores default values of min and max values of a counter
  • Info - returns object of CKMCFileInfo class containing base kmc database informations
  • GetCountersForRead(read, counters: CountVec ) - finds all k-mers from read passed as first argument in the database and sets counters of k-mers in the second parameter (if k-mer do not exists in the database its counter is set to 0, also in case when k-mer in a read contains 'N', it means that len(counters.value) always equls len(read) - k + 1)