Skip to content

Commit

Permalink
Updated Readme and documention.tex to consider the chromatin conforma…
Browse files Browse the repository at this point in the history
…tion capture data
  • Loading branch information
Florian411 committed Jul 8, 2019
1 parent a6501c7 commit 65d17ab
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 3 deletions.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# TEPIC (version 2.1)
#TEPIC (version 2.2)
-------
TEPIC offers workflows for the prediction and analysis of Transcription Factor (TF) binding sites including:
* TF affinity computation in user provided regions
Expand All @@ -10,6 +10,8 @@ A graphical overview on the workflows of TEPIC is shown below. Blue font indicat
![](docs/TEPIC_Workflow.png)

## News
08.10.2019: We present a novel feature to include TFBS in regulatory sites determined by chromatin conformation capture data. Using an extended feature space representation, the INVOKE model can investigate the regulatory influence of TFs bound to promoters and enhancers separately.

10.10.2018: TEPIC 2.0 is now published in [Bioinformatics](https://doi.org/10.1093/bioinformatics/bty856).

13.08.2018: In addition to the gene-centric annotation, the functionality for transcript based annotation has been added.
Expand Down Expand Up @@ -156,6 +158,11 @@ Here, thresholded TF affinities are used for the computation.
GENEID TF1 TF2 ... TFn peak length peak count peak signal
ENSG00000044612 0 0 ... 4.2 23 3 19.2

The *Prefix_Conformation_Data_Affinity_Three_Peak_Based_Features_Gene_View.txt* files are based on the previous structure but extend it by including the same features, that is TF gene-scores and peak features determined for DHS residing in chromatin loops:

GENEID TF1 TF2 ... TFn peak length peak count peak signal LR_TF1 ... LR_TFn LR_peak length LR_peak count LR_peak signal
ENSG00000044612 0 0 ... 4.2 23 3 19.2 3.4 ... 0.9 14 4 63.3

The *Prefix_Thresholded_Sparse_Affinity_Gene_View.txt* files are tab separated files listing the Ensemble GeneID in the first column, and the name of the TF associated to this gene in the second column.
Here, thresholded TF affinities are used for the computation. The third column of this file is required by DREM and does not carry any specific meaning.

Expand Down
Binary file modified docs/Description.pdf
Binary file not shown.
14 changes: 12 additions & 2 deletions docs/Description.tex
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
\item Annotation of user defined regions with TF affinities using TRAP and a variety of provided TF-motifs,
\item Aggregation of TF affinities to TF-gene scores,
\item Computation of statistical scores such as peak-length, peak-count or peak-signal per gene,
\item Inclusion of long range chromatin contacts,
\item Discretization of continuous TF affinities using a background distribution into a binary measure for TF-binding,
\item Linear regression analysis to infer key transcriptional regulators within one sample,
\item Logistic regression classifier to suggest key transcriptional regulators between samples,
Expand Down Expand Up @@ -198,7 +199,6 @@ \subsection{Computing TF gene scores}

Furthermore, TEPIC can compute a TF-specific affinity cut-off derived from either user-defined, or randomly generated sequences, to distinguish likely bound sites from unbound sites. These scores
can be used to come-up with a binary TF-gene assignment. Further details on this mode are provided in Section \ref{EPIC-DREM}.

\begin{figure}[h!]
\begin{center}
\includegraphics[width=\textwidth]{Workflow.png}
Expand All @@ -211,8 +211,18 @@ \subsection{Computing TF gene scores}
\label{workflowFig}
\end{figure}

With version $2.2$ of TEPIC, we introduced support for the inclusion of long range chromatin conformation capture data. In addition to the promoter centric windows used before, we
calculate TF affinities $a_{g,i}*$ and peak scores $pl_g*, pc_g*, ps_g*$ for all DHSs residing in genomic loci looping into the promoter region of a gene, summarizied in $P_{g,V_g}$, where $V_g$ is the set of all regions looped into the promoter region of gene $g$:
\begin{align}
a_{g,i}&=\sum_{p \in P_{g,V_g}} a_{p,i},\\
pl_g*&=\sum_{p \in P_{g,V_g}}|p|, \\
pc_g*&=\sum_{p \in P_{g,V_g}}, \\
ps_g*&=\sum_{p \in P_{g,V_g}}s_{p}.
\end{align}
Note that scores computed for $p \in P_{g,V_g}g$ are never considering the exponential decay as a direct interaction of the respective sites with the promoter region of gene $g$ has been determined by chromatin conformation capture experiments.

\subsection{Required input}
To compute TF gene scores a user needs to specify:
To compute TF gene scores, a user needs to specify:
\begin{itemize}
\item a reference genome (-g option),
\item a set of \textit{PSEMs} (-p option),
Expand Down

0 comments on commit 65d17ab

Please sign in to comment.