Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF representation & Dash integration #6

Merged
merged 19 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on: [push, pull_request]

jobs:
build-linux:
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
strategy:
max-parallel: 5

Expand All @@ -21,7 +21,7 @@ jobs:
- name: Install dependencies
run: |
# I am using the pip installed by conda, bad practice but OK in C/I I think
pip install pyranges-plot[all]==0.1.0
pip install -e .[all]
pip install ruff pytest
- name: Check formatting with ruff
run: |
Expand Down
6 changes: 5 additions & 1 deletion docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ API reference
#. :doc:`Setting variables <./prp_settings>` the variables to be set: engine(compulsory), id_col, theme and warnings
#. :doc:`Register plot <./prp_registerplot>`
#. :doc:`Customization options <./prp_options>`
#. :doc:`VCF tools <./prp_vcf>`
#. :doc:`Scatterplot creation <./prp_scatter>`

.. toctree::
:maxdepth: 2
Expand All @@ -13,4 +15,6 @@ API reference
prp_plot
prp_settings
prp_registerplot
prp_options
prp_options
prp_vcf
prp_scatter
3 changes: 3 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
#
from docutils import nodes
import sphinx_rtd_theme
import os
import sys

sys.path.insert(0, os.path.abspath("../src"))

# -- Project information -----------------------------------------------------

Expand Down
Binary file modified docs/images/prp_rtd_09.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/prp_rtd_11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_17.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_18.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_19.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_20.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_21.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_22.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_23.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_24.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_25.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_26.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_27.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/prp_rtd_28.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/prp_scatter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Scatterplot creation
--------------------

Creates a Scatterplot on PyRanges objects.

.. automodule:: pyranges_plot
:members:
:imported-members: # Ensure this is set to include imported members
:exclude-members: set_engine, get_engine, set_id_col, get_id_col, set_theme, get_theme,set_warnings, get_warnings, plot, print_options, set_options, reset_options, register_plot, ncbi_gff, ncbi_vcf
10 changes: 10 additions & 0 deletions docs/prp_vcf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
VCF tools
---------

Functions related to loading, processing, and transforming VCF (Variant Call Format) files. These tools allow for efficient
reading of VCF files into PyRanges objects and flexible manipulation of their fields. For further explanation check the
**Dealing with VCF files** section of the :ref:`tutorial <tutorial>`.

.. automodule:: pyranges_plot.vcf
:members:
:imported-members:
431 changes: 414 additions & 17 deletions docs/tutorial.rst

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@

setup(
name="pyranges_plot",
version="0.1.0",
version="0.1.2",
packages=find_packages(where="src"),
package_dir={"": "src"},
include_package_data=True, # Ensure package data is included
package_data={
"pyranges_plot": ["data/*"], # Specify the path to include data folder contents
},
)
4 changes: 3 additions & 1 deletion src/pyranges_plot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@
)
from .plot_main import plot # noqa: F401
from .pr_register_plot import register_plot # noqa: F401
from .example_data import p1, p2, p3, p_ala, p_cys # noqa: F401
from .example_data import p1, p2, p3, p_ala, p_cys, ncbi_gff, ncbi_vcf # noqa: F401
from . import vcf # noqa: F401
from .make_subsets import make_scatter # noqa: F401
1 change: 1 addition & 0 deletions src/pyranges_plot/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,7 @@ def format_row(key, value):
"plot_border",
"title_size",
"title_color",
"title_font",
"grid_color",
"exon_border",
"shrunk_bkg",
Expand Down
242,223 changes: 242,223 additions & 0 deletions src/pyranges_plot/data/homo_sapiens_clinically_associated.vcf

Large diffs are not rendered by default.

2,013 changes: 2,013 additions & 0 deletions src/pyranges_plot/data/ncbi.gff3

Large diffs are not rendered by default.

242,133 changes: 242,133 additions & 0 deletions src/pyranges_plot/data/ncbi.vcf

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions src/pyranges_plot/example_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import pyranges as pr
import os
from .vcf.vcf_reader import read_vcf


p1 = pr.PyRanges(
Expand Down Expand Up @@ -169,3 +171,36 @@
"depth": [0] * 3 + [1] * 5,
}
)

# Define the path to the data folder
DATA_DIR = os.path.join(os.path.dirname(__file__), "data")


def ncbi_gff():
"""
Load the example NCBI GFF3 file as a PyRanges object.

Returns:
PyRanges: A PyRanges object containing the GFF3 data.
"""
file_path = os.path.join(DATA_DIR, "ncbi.gff3")
if not os.path.exists(file_path):
raise FileNotFoundError(
"The file 'ncbi.gff3' was not found in the data folder."
)
return pr.read_gff3(file_path)


def ncbi_vcf():
"""
Load the example VCF file as a PyRanges object.

Returns:
PyRanges: A PyRanges object containing the VCF data.
"""
file_path = os.path.join(DATA_DIR, "homo_sapiens_clinically_associated.vcf")
if not os.path.exists(file_path):
raise FileNotFoundError(
"The file 'homo_sapiens_clinically_associated.vcf' was not found in the data folder."
)
return read_vcf(file_path)
157 changes: 157 additions & 0 deletions src/pyranges_plot/make_subsets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
import plotly.graph_objects as go


def make_scatter(
p, # vcf_df,
x: str = "Start",
y: str | None = None, # --> count
color_by: str | None = None,
size_by: str | None = None,
title: str | None = None,
title_size: int | None = None,
title_color: str | None = None,
height: int | None = None,
y_space: int | None = None,
):
"""
Create a Scatter plot from a Pyranges object using Plotly.

This function generates a scatter plot for visualizing genomic variants or other
data points based on the provided DataFrame. It allows customization of axes,
marker sizes, colors, and plot titles.

Parameters
----------
p: pd.DataFrame
Input DataFrame containing the genomic data with columns for
start and end positions (e.g., 'Start' and 'End').
x: str, optional
The column name to use for the x-axis. Defaults to 'Start'.
y: str
The column name to use for the y-axis.
color_by: str, default None
The column name to use for coloring the markers. If specified, it
aggregates unique positions based on this column. Defaults to None.
size_by: str, default None
The column name to use for setting the marker sizes. If specified, it
aggregates unique positions based on this column. Defaults to None.
title: str, default None
The title of the plot. Defaults to None.
title_size: int, default None
The font size of the plot title. Applicable only if `title` is specified.
Defaults to None.
title_color: str, default None
The color of the plot title. Applicable only if `title` is specified.
Defaults to None.
height: int, default None
Determines the length of the y axis
Defaults to None.
y_space: int, default None
The space between the main plot and the added plot
Defaults to None.

Returns
-------
Union[go.Scatter, tuple]:
- Returns a tuple with the `go.Scatter` object
and a dictionary containing title customization options

Raises
------
ValueError:
If `x`, `y`, `color_by`, or `size_by` columns are not found in the
input DataFrame.

Examples
--------
>>> import pyranges as pr
>>> p = pr.PyRanges({
... "Chromosome": [1] * 5,
... "Strand": ["+"] * 3 + ["-"] * 2,
... "Start": [10, 20, 30, 25, 40],
... "End": [15, 25, 35, 30, 50],
... "transcript_id": ["t1"] * 3 + ["t2"] * 2,
... "feature1": ["A", "B", "C", "A", "B"],
... "Count": [1, 2, 3, 4, 5] # Example count values
... })
>>> prp.make_scatter(p,y='Count')
(Scatter({
'hovertemplate': '<b>Position:</b> %{x}<br><b>Count:</b> %{y}<extra></extra>',
'marker': {'color': 'blue', 'size': 8},
'mode': 'markers',
'x': array([10, 20, 30, 25, 40]),
'y': array([1, 2, 3, 4, 5])
}), {'title': 'Count'})

"""
# Validate y in input
if not y:
raise ValueError(
"The parameter 'y' is required and must be a str to run this function."
)

# Validate the x column
if x not in p.columns:
raise ValueError(f"The column '{x}' does not exist in the DataFrame.")

# Handle y-axis logic
if y not in p.columns:
raise ValueError(f"The column '{y}' does not exist in the DataFrame.")

# Add coloring logic if `color_by` is provided
if color_by:
if color_by not in p.columns:
raise ValueError(
f"The column '{color_by}' does not exist in the DataFrame."
)

# Aggregate color information for unique positions
color_values = (
p.groupby(["Start", "End"])[color_by].first().reset_index()[color_by]
)
color_values = color_values.astype(
"category"
).cat.codes # Convert to numeric if categorical
else:
color_values = "blue" # Default color for all points

# Handle sizing logic
if size_by:
if size_by not in p.columns:
raise ValueError(f"The column '{size_by}' does not exist in the DataFrame.")

# Aggregate size information for unique positions
size_values = (
p.groupby(["Start", "End"])[size_by].first().reset_index()[size_by]
)
size_values = size_values.astype(float) # Ensure the size column is numeric
else:
size_values = 8 # Default marker size

# Create a scatter plot
scatter = go.Scatter(
x=p[x], # X-axis: Start positions #### x
y=p[y], # Y-axis: Counts of transcripts #### y or __count__
mode="markers", # Display points as markers
marker=dict(
size=size_values,
color=color_values, # Assign color values
colorscale="Viridis" if color_by else None, # Use a colormap if coloring
),
hovertemplate="<b>Position:</b> %{x}<br><b>Count:</b> %{y}<extra></extra>",
)

custom = {"title": title if title else y}
# Defining optional parameters for customisation
optional_params = {
"title_size": title_size,
"title_color": title_color,
"height": height,
"y_space": y_space,
}

custom.update(
{key: value for key, value in optional_params.items() if value is not None}
)

return (scatter, custom)
2 changes: 2 additions & 0 deletions src/pyranges_plot/plot_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"plot_bkg": ("white", "Background color of the plots.", " "),
"plot_border": ("black", "Color of the line delimiting the plots.", " "),
"plotly_port": (8050, "Port to run plotly app.", " "),
"return_plot": (None, "Whether the plot is returned or not.", " "),
"shrink_threshold": (
0.01,
"Minimum length of an intron or intergenic region in order for it to be shrunk while using the “shrink” feature. When threshold is float, it represents the fraction of the plot space, while an int threshold represents number of positions or base pairs.",
Expand All @@ -65,6 +66,7 @@
"text_size": (10, "Fontsize of the text annotation beside the intervals.", " "),
"title_color": ("black", "Color of the plots' titles.", " "),
"title_size": (18, "Size of the plots' titles.", " "),
"title_font": ("Arial", "Font of the plots' titles.", " "),
"v_spacer": (0.5, "Vertical distance between the intervals and plot border.", " "),
"x_ticks": (
None,
Expand Down
Loading
Loading