Some features that were previously in MetagenomeScope are not currently re-implemented yet -- this should be changed soon. Thanks for bearing with me as I work on improving this, and please let me know if you have any questions.
MetagenomeScope is an interactive visualization tool designed for metagenomic sequence assembly graphs. The tool aims to display a hierarchical layout of the input graph while emphasizing the presence of small-scale details that can correspond to interesting biological features in the data.
To this end, MetagenomeScope
highlights certain "structural patterns" of contigs in the graph (repeating the
pattern identification hierarchically),
splits the graph into its connected components (by default only displaying one
connected component at a time),
and uses Graphviz'
dot
tool to hierarchically
lay out each connected component of the graph.
MetagenomeScope also contains many other features intended to simplify exploratory analysis of assembly graphs, including tools for scaffold visualization, path finishing, and coloring nodes by biological metadata (e.g. GC content). (As mentioned above, many of these features are not available in the current version yet.)
Probably the easiest way to install MetagenomeScope is using a conda environment:
# Download the YAML file describing the conda packages we'll install
wget https://raw.githubusercontent.com/marbl/MetagenomeScope/main/environment.yml
# Create a new conda environment based on this YAML file
# (by default, it'll be named "mgsc")
conda env create -f environment.yml
# Activate this conda environment
conda activate mgsc
# Install the actual MetagenomeScope software
pip install git+https://github.com/marbl/MetagenomeScope.git
Assuming you are currently in the conda environment we just created, visualizing an assembly graph can be done in one command:
mgsc -i [path to your assembly graph] -o [output directory name]
The output directory will contain an index.html
file that can be opened in
most modern web browsers. (The file points to other resources within the
directory, so please don't move it out of the directory.)
Currently, MetagenomeScope supports the following filetypes:
Filetype | Assemblers that output this filetype | Notes |
---|---|---|
GFA | (meta)Flye, LJA, more | Both v1 and v2 work, but currently only the raw structure (segments and links) are included |
FASTG | SPAdes | Expects SPAdes-"dialect" FASTG files: see pyfastg's documentation for details |
GML | MetaCarvel | Expects MetaCarvel-"dialect" GML files |
LastGraph | Velvet | Only the raw structure (nodes and arcs) are included |
MetagenomeScope is composed of two main components:
MetagenomeScope's preprocessing script (contained in the
metagenomescope/
directory of this repository) is a mostly-Python script that
takes as input an assembly graph file and produces a directory containing a
HTML visualization of the graph. Once installed, it can be run from the command
line using the mgsc
command.
Note. By default, connected components containing 8,000 or more nodes or
edges will not be laid out. These thresholds are configurable using the
--max-node-count
/ --max-edge-count
parameters. This default is intended
to save time and effort: hierarchical layout can take a really long time for
complex and/or large connected components, so oftentimes trying to visualize
the largest few components of a graph will take an intractable amount of
computational resources / time. Furthermore, really complex components of
assembly graphs can be hard to visualize meaningfully.
This isn't always the case (for example, a connected component containing 10,000 nodes all in a straight line will be much easier to lay out and visualize than a connected component with 5,000 nodes and 20,000 edges), but we wanted to be conservative with the defaults.
MetagenomeScope's viewer interface (contained in the
metagenomescope/support_files/
directory
of this repository) is a client-side web application that visualizes laid-out
assembly graphs using Cytoscape.js.
This interface includes various features for interacting with the graph and the identified structural patterns within it.
You should be able to load visualizations created by MetagenomeScope in most modern web browsers (mobile browsers probably will also work, although using a desktop browser is recommended).
Getting Graphviz and PyGraphviz installed -- and getting them to communicate with each other -- can be tricky. I'm looking into ways of making this less painful; for now, if you run into problems, please feel free to contact me and I'll try to help out.
Some early demos are available online. We'll probably add more of these in the future.
-
- See Nijkamp et al. 2013 for details. This graph was based on the topology shown in Fig. 2(a) of this paper.
-
- This graph is example data from the website of Bandage (which is another great tool for visualizing assembly graphs :)
Coming soon.
MetagenomeScope is licensed under the GNU GPL, version 3.
License information for MetagenomeScope's dependencies is included in the root directory of this repository, in DEPENDENCY_LICENSES.txt
. License copies for dependencies distributed/linked with MetagenomeScope -- when not included with their corresponding source code -- are available in the dependency_licenses/
directory.
See the acknowledgements page on the wiki for a list of acknowledgements for MetagenomeScope's codebase.
MetagenomeScope was created by members of the Pop Lab in the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park.
Feel free to email mfedarko (at) ucsd (dot) edu
with any questions, suggestions, comments, concerns, etc. regarding the tool. You can also open an issue in this repository, if you'd like.