Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequency Chord diagram and Heatmap in Python #85

Open
maximelepetit opened this issue Jan 31, 2024 · 10 comments
Open

Frequency Chord diagram and Heatmap in Python #85

maximelepetit opened this issue Jan 31, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@maximelepetit
Copy link

Hi ,

Thanks a lot for this useful tool !

I'm running Liana and i'm curious that you cannot provide methods to generate Chord diagram or heatmap as in the liana R wrapper.

Have you planned to add these methods ?

Best regards,

Maxime

@maximelepetit maximelepetit added the enhancement New feature or request label Jan 31, 2024
@dbdimitrov
Copy link
Collaborator

Hi @maximelepetit,

Thanks for opening an issue!
I've been considering implementing one of the two to liana-py.

I will do my best to have at least one implemented soon.

In the meantime, ChatGPT does an excellent job when generating network plots with networkx. And a plot with https://github.com/DingWB/PyComplexHeatmap should be relatively quick to make :)

Best wishes,
Daniel

@maximelepetit
Copy link
Author

maximelepetit commented Feb 2, 2024

Hi Daniel,

For Chords diagrams i found an easy and quickly way with pyCirclize :
One example :

import pycirclize
from pycirclize import Circos
import pandas as pd
from pycirclize.parser import Matrix


## Extract liana_res
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
df = df1.pivot(index='source',columns='target',values='counts')

matrix_data=df.values
row_names = list(df.index)
col_names = row_names

matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)

# Initialize from matrix (Can also directly load tsv matrix file)
circos = Circos.initialize_from_matrix(
    matrix_df,
    space=3,
    cmap="tab10",
    r_lim=(93, 100),
    ticks_interval=500,
    label_kws=dict(r=94, size=12, color="white"),
    link_kws=dict(direction=1, ec="black", lw=0.5),
)

#print(matrix_df)
fig = circos.plotfig()

#print(matrix_df)
fig = circos.plotfig()

Give :
chords1
Best !
Maxime

@maximelepetit
Copy link
Author

maximelepetit commented Feb 2, 2024

You can highlight specific groups :

import pycirclize
from pycirclize import Circos
import pandas as pd
from pycirclize.parser import Matrix

# Extract liana_res
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
df = df1.pivot(index='source',columns='target',values='counts')

matrix_data=df.values
row_names = list(df.index)
col_names = row_names
matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)

# Define link_kws handler function to customize each link property
def link_kws_handler(from_label: str, to_label: str):
    ## Highlith Astro, CR, Microglia source
    if from_label in ('Astrocytes', 'CR', 'Microglia'):
        # Set alpha, zorder values higher than other links for highlighting
        return dict(alpha=0.5, zorder=1.0)
    else:
     return dict(alpha=0.1, zorder=1.0)

# Initialize from matrix (Can also directly load tsv matrix file)
circos = Circos.initialize_from_matrix(
    matrix_df,
    space=3,
    cmap="tab10",
    r_lim=(93, 100),
    ticks_interval=500,
    label_kws=dict(r=94, size=12, color="white"),
    link_kws=dict(direction=1, ec="black", lw=0.5),
    link_kws_handler=link_kws_handler,
)

#print(matrix_df)
fig = circos.plotfig()

Give :
chord2

;)

@maximelepetit
Copy link
Author

Alternatively you can generate Sankey plot through pySankey :

from pySankey.sankey import sankey
import pandas as pd
df1 = adata.uns['liana_res'][['source','target']].groupby(['source','target']).size().reset_index(name='counts')
sankey(
    left=df1["source"], right=df1["target"], 
    leftWeight= df1["counts"], rightWeight=df1["counts"], fontsize=10
)

GIves :
sankey

@dbdimitrov
Copy link
Collaborator

Hi @maximelepetit,

Thanks a lot for all your suggestions!
I will pin the issue and will keep it open until I have some of those in place.

I plan to do a rehaul of liana's plotting functions, and I will make sure these are included also.

Daniel

@dbdimitrov dbdimitrov pinned this issue Feb 3, 2024
@brainfo
Copy link

brainfo commented Apr 2, 2024

Hi, I feel like it's also good to have a hypothesis driven layout, showing the specific sender-ligand -- receptor-recepient pairs, with for example senders on one sphere and recepients on another and the color by any score, like the magnitude, or the significance, rank, lrmean, or change between conditions.
I used to use ChordDiagram package in r.
Maybe it's good to have this in the utils in liana+ py?

while so far for convenience with same information we can simply show bar/dot plots

and to compare the same pair in different conditions, which metric is recommended to be tested? combined rank or individual continous magnitude metric?

@dbdimitrov
Copy link
Collaborator

Hi @brainfo, thanks for the suggestion. I'm still working on refactoring the algorithmic side of liana, but once I'm done with that. I will refactor the functions and extend them with the suggestions here.

re comparing across conditions: there are a few tutorials, e.g. using differential expression analysis with liana and PyDeSeq2, or the tutorials with MOFA or Tensor-cell2cell.

I would suggest checking those 🙂 though you could compare most scores as long as you normalize such that your conditions are comparable prior to running liana.

Hope this helps.

@brainfo
Copy link

brainfo commented May 21, 2024

Hi @brainfo, thanks for the suggestion. I'm still working on refactoring the algorithmic side of liana, but once I'm done with that. I will refactor the functions and extend them with the suggestions here.

re comparing across conditions: there are a few tutorials, e.g. using differential expression analysis with liana and PyDeSeq2, or the tutorials with MOFA or Tensor-cell2cell.

I would suggest checking those 🙂 though you could compare most scores as long as you normalize such that your conditions are comparable prior to running liana.

Hope this helps.

Hi, I am aware of the scope of LIANA+ project in ligand receptor interaction imputation but not downstream usage of the lr results.
I am basically doing network analysis on the Liana resulted dataframe (with customized secondary manipulation of differential expression analysis stats). Mostly using networkx and also customized visualizations:

  1. source and target as nodes, and one edge with aggregated attributes between nodes
  2. source and target as nodes (selected), multiple edges allowed with all or selected interactions with their attributes
  3. source-ligand complex and target-receptor complex as nodes, and with their nodes and edge attributes
  4. api/call back to networkx utilities for at least general node and edge stats for all these network outputs. And network visualization (node and attributes mapping).

I have most of the scripts for these and not sure if it's something relevant to this open issue.

@dbdimitrov
Copy link
Collaborator

Hi @brainfo,

TBH, I realize that I should provide a bit more visualisations, so suggestions are welcome. Please feel free to open an issue with your suggestions, and I will do my best to find time to implement them :)

@brainfo
Copy link

brainfo commented Jun 30, 2024

Hi @brainfo,

TBH, I realize that I should provide a bit more visualisations, so suggestions are welcome. Please feel free to open an issue with your suggestions, and I will do my best to find time to implement them :)

Hi, what I did is like,

  1. at the source-ligand level, calculate the out-degree:
    say tb is the table with the fields from liana:
def save_cell_molecule_tb(tb, outfile):
    lr_ = pd.DataFrame(columns=['#node1', 'node2', 'combined_score'])
    ## join two columns of tb
    lr_['#node1'] = tb['ligand_complex'] + '-' + tb['source']
    lr_['node2'] = tb['receptor_complex'] + '-' + tb['target']
    lr_['combined_score'] = tb['interaction_score']
    lr_.to_csv(outfile, sep='\t', index=False)
    return lr_
  1. out-degree analysis using networkx and visualization with highlighted cell types
def out_degree(cell_molecule_tb_file, high_celltype, low_celltype, plot_file, outdegree_file):
    lr_net = nx.read_edgelist(cell_molecule_tb_file,data=(('combined_score',float),), delimiter='\t', create_using=nx.DiGraph)
    degree_sequence = sorted((d for n, d in lr_net.out_degree(weight='combined_score')), reverse=True)
    ordered_dict = dict(sorted(lr_net.out_degree(weight='combined_score'), key=lambda item: item[1], reverse=True))
    node_keys = np.array(list(ordered_dict.keys()))

    color_sequence = np.where(np.core.defchararray.find(node_keys,high_celltype)!=-1, '#FBDB7D', np.where(np.core.defchararray.find(node_keys,low_celltype)!=-1, '#df65b0ff', 'darkgrey'))
    dmax = max(degree_sequence)

    fig = plt.figure("Degree of a random graph", figsize=(4, 2))
# Create a gridspec for adding subplots of different sizes
    axgrid = fig.add_gridspec(2, 4)

    ax1 = fig.add_subplot(axgrid[:, :2])
    ax1.scatter(range(len(degree_sequence)), degree_sequence, c=color_sequence)
    ax1.set_title("Out-Degree Rank Plot")
    ax1.set_ylabel("Out-Degree")
    ax1.set_xlabel("Rank")
    ax1.axhline(y=0, linestyle='--', c='black')
    ax2 = fig.add_subplot(axgrid[:, 2:])
    ax2.bar(*np.unique(degree_sequence, return_counts=True))
    ax2.set_title("Out-Degree histogram")
    ax2.set_xlabel("Out-Degree")
    ax2.set_ylabel("# of Nodes")

    fig.tight_layout()
    plt.savefig(plot_file)

    out_degrees = {node:val for (node, val) in lr_net.out_degree(weight='combined_score')}

    out_degrees_df = pd.DataFrame.from_dict(out_degrees, orient='index', columns=['degree'])
    out_degrees_df.to_csv(outdegree_file, sep='\t')
    return out_degrees_df
  1. Visualize with source and target as nodes, using netgraph
def preprare_data(ccc):
    ccc_cell = pd.DataFrame(columns=ccc.columns)
    ccc_cell['#node1'] = ccc['#node1'].str.split('-').str[-1]
    ccc_cell['node2'] = ccc['node2'].str.split('-').str[-1]
    ccc_cell['combined_score'] = ccc['combined_score']
    ccc_cell['weight'] = ccc_cell.groupby(['#node1', 'node2'])['combined_score'].transform('sum')

    unique_ccc_cell = ccc_cell.drop_duplicates(subset=['#node1', 'node2'])
    unique_ccc_cell.reset_index(inplace=True)
    unique_ccc_cell['weight'] = unique_ccc_cell['weight'].abs()
    return unique_ccc_cell

unique_cccc = preprare_data(cccc)
G = nx.from_pandas_edgelist(unique_cccc,source='#node1' , target='node2', edge_attr="weight", create_using=nx.DiGraph())

edges,weights = zip(*nx.get_edge_attributes(G,'weight').items())
pos = nx.spring_layout(G, seed=404)
 # pip install netgraph
## hex_dict = %your own hex dict for nodes
Graph(G, node_color=hex_dict, node_size= node_size, edge_cmap='Blues', edge_width={k: 2*v for k, v in nx.get_edge_attributes(G,'weight').items()}, arrows=True, node_labels=True, node_label_fontdict={'size':18})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants