-
Notifications
You must be signed in to change notification settings - Fork 1
Demo: Comparison of 2 profiles
Highly active enhancer regions are thought to be important for cell fate (Andersson et al. 2014, FANTOM5 consortium, Hnisz et al. 2013). Highly active enhancers regions have been selected in GM12878 cells. Similarity of ChIP-seq profiles has been tested using two histone post-transcriptional modifications linked to highly active enhancers H3K27ac (DCC accession: ENCFF000ASG) and H3K4me1 (DCC accession: ENCFF000ARY) from the Encyclopedia of DNA Elements (ENCODE) data (Dunham I et al. 2012). Accordingly with the literature, similarity between the profiles of these two histone marks has been identified.
First, the similaRpeak package must be loaded.
library(similaRpeak)
A region, chr7:61968807-61969730, shows interesting profiles for both histones. Let's load the data for this region:
data(chr7Profiles)
str(chr7Profiles)
## List of 1
## $ chr7.61968807.61969730:List of 2
## ..$ H3K27ac: num [1:924] 31 31 31 31 30 29 27 24 24 26 ...
## ..$ H3K4me1: num [1:924] 319 317 310 300 284 274 259 257 258 253 ...
H3K27ac and H3K4me1 have the following profiles:
The metrics are calculated using the similarity
function which takes as
arguments the two ChIP-Seq profiles vectors and the threshold values.
metrics <- similarity(chr7Profiles$chr7.61968807.61969730$H3K27ac,
chr7Profiles$chr7.61968807.61969730$H3K4me1,
ratioAreaThreshold=5,
ratioMaxMaxThreshold=2,
ratioIntersectThreshold=5,
ratioNormalizedIntersectThreshold=2,
diffPosMaxThresholdMinValue=10,
diffPosMaxThresholdMaxDiff=100,
diffPosMaxTolerance=0.01)
The similarity
function returns a list which contains the general
information about both ChIP-Seq profiles and a list of all calculated metrics.
metrics
## $nbrPosition
## [1] 924
## $areaProfile1
## [1] 22140
## $areaProfile2
## [1] 182991
## $maxProfile1
## [1] 77
## $maxProfile2
## [1] 646
## $maxPositionProfile1
## [1] 557
## $maxPositionProfile2
## [1] 657
## $metrics
## $metrics$RATIO_AREA
## [1] 0.1209896
## $metrics$DIFF_POS_MAX
## [1] -49.5
## $metrics$RATIO_MAX_MAX
## [1] 0.119195
## $metrics$RATIO_INTERSECT
## [1] 0.1208916
## $metrics$RATIO_NORMALIZED_INTERSECT
## [1] 0.8478406
## $metrics$SPEARMAN_CORRELATION
## [1] 0.9689683
Each specific value can be directly accessed. Some examples:
metrics$areaProfile1
## [1] 22140
metrics$areaProfile2
## [1] 182991
The RATIO_INTERSECT value of 0.12 and the RATIO_MAX_MAX value of 0.12 are quite low. Both values can be explained by the large difference in coverage between profiles. Those values could be interpreted as two profiles with low level of similarity. However, the RATIO_NORMALIZED_INTERSECT of 0.85 is much closer to 1. It could be a sign that the profiles, once normalized, are quite similar. This hypothesis can be validated by looking at a graph of the normalized profiles :
Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, et al. (2014) An atlas of active enhancers across human cell types and tissues. Nature, 507(7493), 455-461.
Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al. (2014) A promoter-level mammalian expression atlas. Nature, 507(7493):462-470.
Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, et al. (2013) Super-enhancers in the control of cell identity and disease. Cell, 155(4), 934-947.