Skip to content

measure

Andrew Wilkey edited this page Apr 9, 2019 · 1 revision

Measures are any form of glyph where a value is important to how the glyph is drawn. Value is indicated by score (6th) column in GFF or in value= attribute in attribute (9th) column.

Table of Contents

Configuration Options

Currently, measure supports the following three display options:

Option Available Glyphs Description
heat All Changes the Glyph's color based on value's % of maximum-minimum range
distance All Moves Glyph away from backbone based on value's % of maximum-minimum range
histogram range Draws range box to take up full space based on value.
stackedbar range Like histogram, but can draw based on [classes] and count_classes = <1 or 2>
ratio range like stackedbar but with a constant height, as defined by max_distance used to show % composition by class.

Not all options below are available for all display types, these options will be noted. When configuring a distance, all the options for the glyph chosen by draw_as are also available, and the options defined for that glyph style will be used unless overwritten here.

Measures are configured as follows :

Option Use With Default Description
value_type all value_attr 'score_col' - use column 6, 'value_attr' - use colum 9 "value" attribute
display all heat Which of the display options above to use.
draw_as heat,distance range How to draw the glyph. May use centromere, position, range, border or marker
enable_pileup all 0 (boolean) Move glyph if it overlaps with others, suggest leaving off for histograms
heat_colors heat redgreen (array)(colors) Array of two or more colors to use for generating the heat intensity. In addition to an array redgreen is an alias for [red,green] and grayscale is an alias for [black,white]
max_distance distance,histogram 25 maximum aditional offset in pixels
min all 0 minimum value, will be overridden if actual min is smaller.
max all 9 maximum value, will be overridden if actual max is larger.

The following options have been added from the legacy format:

Option Use With Default Description
generate_bins all 0 (boolean) Generate bins and use count as value.
bin_size all 0 If generate_bins size of each bin in backbone units
bin_count all 0 If generate_bins number of bins per backbone.
bin_min all 0 Set a hard minimum, will not be overridden
bin_max all 0 If not zero, set a hard maximum
count_classes all 0 0,1,2 - Use class=<class-name> as a secondary count in a bin. Classes are only counted if they are assigned a color in [classes]. 0 = don't count. 1 = count only items with a class attribute. 2 = count all features with items not defined being treated as "uncategorized"
invert_value all 0 (boolean) Calculate values with min and max swapped (lower is higher)
value_distribution heat,distance,histogram linear [linear,log,exponential] Used to convert non-linear distributions to linear.
value_base heat,distance,histogram e Value to use as base for non-linear distributions.

value_distribution currently does the transform on the measure's min,max and passed value, so if using a non-linear distribution with bin_min or bin_max remember to set them as appropriate.

A Note About count_classes

For precomputed bins, class count may be spuupled directly by appending the attribute <class-name>=<value>. If the sum of all values with a valid class name are greater than the provided value, the sum will be used instead.

Unless a glyph explicitly supports the data generated by count-classes this option is treated as a way to filter the bin count. That is count_classes = 0 and count_classes = 2 are functionally the same.

Generating bins

If genrate_bins = 1 is defined in a measure configuration, instead of using provided values, CViT will attempt to use the provided data to generate values to draw the measure with, it does this based on the state of bin_size and bin_count. In all cases, the value is based on the count of the number of features within a bin. Also, the bin_size used is slightly dynamic, as the value is tweaked slightly to prevent counting bins outside either end of the backbone.

generate_bins = 1 and both bin_count and bin_size are 0

CViT goes over each backbone and generates bins based on rice's rule

Ceil( 2*n^(1/3) )

where n is the number of features on that backbone.

This is used to determine a bin_size for each backbone, with the smallest bin_size being kept for drawing the resulting measure.

generate_bins = 1 and bin_size > 0

The provided bin_size is used.

generate_bins = 1 and bin_count > 0

bin_size is calculated per backbone based on fitting the provided number of bins.

generate_bins = 1 and both bin_count and bin_size are >0

Is treated the same as bin_size > 0