The ActiveDriverDB is a database integrating post-translational (PTM) modification sites and mutations (both germline and somatic) from multiple sources. The data are displayed interactively in the context of PTM signalling networks, proteins or diseases. The database is available at activedriverdb.org.
Please see our publication in Nucleic Acids Research and publications subpage for detailed academic discussion.
In a broader context, this repository provides a visualisation framework for genome mutations in gene and protein networks. The extensible, interactive views enable users to understand the analysed mutations, diseases or genes in the context of signalling networks, protein structure or sequence conservation, being a useful tool for case studies and hypothesis exploration.
The co-developed needleplot visualisation library is available in a separate repository: reimandlab/needleplot.
The application is Open Source and is licensed under the terms of GNU Lesser General Public License.
Please, see acknowledgments at the bottom of this document for third-party code licences.
The project is developed with Python 3. It uses Flask as a web framework with database access provided by SQLAlchemy. Templating is performed with Jinja2 on the server side and (Jinja2-compatible) Nunjucks.js on the client side. On the frontend the styles are written with SASS; the visualizations are jQuery and D3.js based. Interactive filtering and REST API is based on custom filtering system (built on top of SQLAlchemy and activated via AJAX requests). All used HTML, CSS and JS features are required to meet 95% level of support in web browsers as calculated by caniuse.com.
To recreate the environment, please use conda:
conda env create --file environment.yml --name addb
conda activate addb
You can set up the databases, celery and other required services using setup.sh
script (which requires Ubuntu 18.04+):
# set the environmental variables for setup.sh to create the databases
export MYSQL_USER=root # super-user able to create databases and assign privileges
export MYSQL_PORT=3306
bash setup.sh
cd website
bash deploy.sh
To create a basic local copy of ActiveDriverDB you need a machine with at least 4 GB of RAM memory. If you wish to import genomic mappings for genome variants annotation you will more than 10 GB of RAM (recommended 16 GB).
For full deployment two MySQL databases will be needed: one for biological data and one for CMS.
You need to create them, along with relevant database users and privileges. See the example SQL query for a quick way to set up the databases and users.
Remember to set secure password; user, database and host names are adjustable too. You may wish to create two separate users for each of databases, this case is supported too.
Privileges on mysql database are required to allow creation of custom functions (for edit distance based sorting).
Afterwards, you can start writing your configuration by copying the exemplar configuration file:
cp example_config.py config.py
Carefully replace variables mentioned in comments in the file as some of those have critical importance on application's security. To check if the database is configured properly, run the following command:
./manage.py
If you see (at the very end): Scripts loaded successfuly, no tasks specified.
it indicates that everything is working properly.
All data files can be downloaded easily with ./download.sh
script from website/data
directory.
Before server start, data have to be imported. The safest way to do this is to run:
./manage.py load all
albeit one might want to use Python's optimized mode (so import will be a lot faster, but it shouldn't be used with new, untested data since the assertions won't be checked in this mode):
python3 -OO manage.py load all
The given arguments instruct program to create and import data for: DNA -> protein mappings, biological relational database and Content Management System. During CMS creation you will be asked to set up login credentials for root user.
Warning: after each migration affecting protein's identifiers it is crucial to reimport mappings: otherwise the mappings will point to wrong proteins!
With manage.py
script you can load or remove specific parts of the database and perform very simple automigration (for newly created models). For further details use built-in help option:
./manage.py -h
Note that the helps will adapt to specified subcommands (i.e. it will show more details for: ./manage.py load -h
, and even more for: ./manage.py load mutations -h
)
MySQL specific: if you see a message MySQL server has gone away
, try to set global max_allowed_packet=1073741824;
If you don't want to perform steps specified below for every single deployment, you can use deploy.sh
script (after installing all dependencies listed in the steps below).
Stylesheet files are important part of this visualisation framework. To compile them, you will need to have sass
gem installed.
To create all *.css
files, run following command from website
directory:
sass --update .:.
Nunjucks templating system is used for client-side templating. It allows to move some repeatedly performed templating tasks to user's browser, which reduces transfer and speeds-up site loading. It uses jinja-nearly-compatible syntax.
To keep this process efficient, templates should be precompiled. To do so, you will need to get full nunjucks installation, for example with npm
(you should be able to install npm
with your system's package manager):
sudo npm install -g nunjucks
Afterwards compile templates with:
cd website/static/js_templates
./precompile.sh
And you are done. When DEBUG = False
, precompiled templates will be loaded automatically.
For cyclic tasks a CRON-like package Advanced Python Scheduler is used; it is fully integrated with application code and no additional setup is required.
The jobs functions are defined in jobs.py
file and scheduling information is stored in config.py
, in JOBS
variable.
To manage and execute user provided mutation search Celery Distributed Task Queue is used, with the broker and backend being RabbitMQ. Both RabitMQ and Celery need to be run as services and set up properly, as described in Celery. On Debian-based machines RabitMQ may be installed as a service directly from repositories.
To run celery worker as a script please use the following command:
celery -A celery_worker.celery worker
For deployment, it should be started as a service.
A major part of configuration will be performed by setup.sh
automatically but one need to amend configuration file (celeryd
) so all paths are absolute and correct.
To start the service use init.d
script:
/etc/init.d/celeryd {start|stop|restart|status}
To start the webserver simply type:
./run.py
For adjusting the port or IP address, check -h
switch of the run.py
script
(note: to run on port 80 sudo privileges may be required).
Deployment on Apache2 server is more powerful alternative to Werkzeug webserver.
As you may want to have a virtual environment for this application, website/app.wsgi
provides ready-to go activation script to use with Apache2 (assuming that the name of your virtual environment is virtual_environment
). mod_wsgi
extension is required (apt-get install libapache2-mod-wsgi-py3
for Debian/Ubuntu).
Following extract from configuration file might be useful help for writing you own configuration:
DocumentRoot /some_path/website
# Prevent 'Timeout when reading response headers from daemon process'
WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess app user=some_username group=some_group threads=2
WSGIScriptAlias / /some_path/website/app.wsgi
<Directory /some_path/website>
WSGIProcessGroup app
WSGIApplicationGroup %{GLOBAL}
# Order deny,allow # do not use with Apache 2.4 or newer
# Deny from all # do not use with Apache 2.4 or newer
Require all denied # Apache 2.4 or newer
</Directory>
# Serve static files directly:
Alias /static/ /some_path/static/
<Directory /some_path/website/static/*>
# Order allow,deny # do not use with Apache 2.4 or newer
# Allow from all # do not use with Apache 2.4 or newer
Require all granted # Apache 2.4 or newer
</Directory>
<Location /static>
SetHandler None
</Location>
Usually you can find appropriate configuration files in directories like /etc/apache2/sites-enabled/
or so.
Apart from the soft (software, CMS-controlled) maintenance mode, an additional maintenance mode for more advanced works is available.
To set it up, add following code to the Apache configuration:
# Handle maintenance mode:
Alias /maintenance/ /some_path/website/static/maintenance.html
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/maintenance-mode-on -f
RewriteCond %{REQUEST_URI} !^/static.*
RewriteCond %{REQUEST_URI} !^/maintenance
RewriteRule ^(.*) /maintenance/ [R=503,L]
ErrorDocument 503 /maintenance/
RewriteCond %{DOCUMENT_ROOT}/maintenance-mode-off -f
RewriteCond %{REQUEST_URI} ^/maintenance
RewriteRule ^(.*) / [R,L]
and enable rewrite engine:
sudo a2enmod rewrite
Then, to enable the maintenance mode from within website directory use:
mv maintenance-mode-off maintenance-mode-on
and to disable:
mv maintenance-mode-on maintenance-mode-off
For Apache2, increasing the maximum length of URI is recommended (in order to handle GET requests, e.g. for filters which include large number of disease names). To do so, edit Apache configuration (typically /etc/apache2/apache2.conf
) appending:
LimitRequestLine 10000
LimitRequestFieldSize 10000
You can modify the default path to python executable used by WSGI by adding a python_path
argument to WSGIDaemonProcess
directive. It allows you to use small middleware script turning optimalization mode on. Here is an example script:
#!/bin/sh
exec python3 -OO "$@"
To login to root account (created with manage.py
script) visit /login/
page on your server. It will allow you to create, edit and remove standalone pages.
All tests are placed in website/tests directory. Please find all steps explained in readme.md
file inside this subdirectory.
Browser compatibility testing is provided by BrowserStack which allows cloud testing on desktop browsers, real iOS and Android devices. It also allows automate testing integration.
The project is developed with support from Ontario Institute of Cancer Research and received support from Google Summer of Code 2016.
BrowserStack supports this open source project allowing us to use their testing systems for free.
The licences of third-party Python dependencies can be retrieved from the Python Package Index using the dependency names from requirements.txt
file. Other third-party snippets used as a base for code in this repository include:
- sparql query to download PTM data from UniProt, based on a snippet by "me" user of Biostars, published under the terms of Creative Commons Attribution 4.0 International License
- venn.js integration, based on (fairly extensive) example from the venn.js documentation, MIT licensed