-
Notifications
You must be signed in to change notification settings - Fork 13
Query Interface
All queries use a single thread. Unless otherwise noted, they are member functions of GBWT
and DynamicGBWT
. The structures are defined in gbwt.h and dynamic_gbwt.h, respectively.
The low-level query interface assumes that the parameter values are valid. The validity can be checked with contains()
(see below).
A search state combines a node identifier and a range of offsets. The state is defined in algorithms.h.
struct SearchState
{
node_type node;
range_type range;
size_type size() const;
bool empty() const;
};
-
node
: Node identifier. -
range
: Range of record offsets. -
size_type size() const
: The same asRange::length(range)
. -
bool empty() const
: The same asRange::empty(range)
.
A bidirectional state combines the search states for a pattern and its reverse. (The reverse path corresponds to the reverse complement of the DNA sequence.) It is defined in algorithms.h.
struct BidirectionalState
{
SearchState forward, backward;
size_type size() const;
bool empty() const;
void flip();
};
-
forward
: The search state for the pattern. -
backward
: The search state for the reverse of the pattern. -
size_type size() const
: The same asforward.size()
, which is equal toreverse.size()
. -
bool empty() const
: The same asforward.empty()
, which is equal toreverse.empty()
. -
void flip()
: Reverses the orientation of the pattern; swaps the forward and reverse states.
These are fast inline functions, unless otherwise noted.
-
size_type size() const
: Total length of the paths. -
bool empty() const
: Is the index empty? -
size_type sequences() const
: Number of paths (sequences) in the index. -
size_type sigma() const
: Alphabet size; largest node identifier + 1. -
size_type effective() const
: Effective alphabet size; largest record identifier + 1. -
std::pair<size_type, size_type> runs() const
: Number of runs in the BWT. The first value is the number of concrete runs in the encoding, while the second value is the number of logical runs. (Each endmarker is a separate logical run.) Expensive. -
size_type samples() const
: Number of sampled path identifiers in the index. Expensive inDynamicGBWT
. -
bool bidirectional() const
: Does the index support bidirectional search?
-
bool contains(node_type node) const
: Is there a record for nodenode
; isnode
in the effective alphabet? -
bool contains(edge_type position) const
: As above, but also checks that the record offset is valid. -
bool contains(SearchState state) const
: As above, but also checks that the range of record offsets is non-empty and valid. -
bool hasEdge(node_type from, node_type to) const
: Is there an edge from nodefrom
to nodeto
. -
std::vector<edge_type> edges(node_type from) const
: Returns the edges from nodefrom
. -
node_type firstNode() const
: Node identifier for the first non-endmarker node. -
comp_type toComp(node_type node) const
: Record identifier for nodenode
. -
node_type toNode(comp_type comp) const
: Node identifier for recordcomp
. -
size_type nodeSize(node_type node) const
: Record size for nodenode
; the number of occurrences ofnode
in the BWT. Relatively expensive inGBWT
. -
bool empty(node_type node) const
: Is the record for nodenode
empty? Relatively expensive inGBWT
.
These operations are based on positions expressed as a combination of a node identifier and a record offset. They return either positions or offsets. Errors are expressed with invalid_edge()
, invalid_offset()
, or Range::empty_range()
.
-
edge_type LF(node_type from, size_type i) const
: Move from nodefrom
offseti
to the next node on the path. -
edge_type LF(edge_type position) const
: As above. -
size_type LF(node_type from, size_type i, node_type to) const
: Start from nodefrom
offseti
and get the next offset, assuming that the next node isto
. -
size_type LF(edge_type position, node_type to) const
: As above. -
range_type LF(node_type from, range_type range, node_type to) const
: Start from rangerange
of offsets in nodefrom
and get the range of successors in nodeto
. -
range_type LF(SearchState state, node_type to) const
: As above.
DynamicGBWT
also supports full LF-mapping that works regardless of whether the implied edge exists.
-
size_type fullLF(node_type from, size_type i, node_type to) const
: AsLF()
with the same parameters. If there is no edge fromfrom
toto
, the return value is the number of paths from any nodev
toto
, wherev
is smaller thanfrom
. -
size_type fullLF(edge_type position, node_type to) const
: As above.
These functions require a bidirectional index.
edge_type inverseLF(node_type from, size_type i) const`
edge_type inverseLF(edge_type position) const
Inverse LF-mapping from node from
offset i
moves to the previous node on the path. It is several times slower than going forward.
range_type bdLF(SearchState state, node_type to, size_type& reverse_offset) const
This function returns LF(state, to)
. It also stores the number of occurrences of nodes smaller than to
in the query range in reverse_offset
. This is the starting offset of the new reverse range relative to the start of the old reverse range.
These queries convert between positions and path identifiers. Errors are expressed with invalid_edge()
and invalid_sequence()
.
-
edge_type start(size_type sequence) const
: The position for the start of pathsequence
. -
size_type tryLocate(node_type node, size_type i) const
: The sampled path identifier at nodenode
, record offseti
orinvalid_sequence()
if there is no sample. -
size_type tryLocate(edge_type position) const
: As above.