cell2cell
  • Home
  • API Documentation

cell2cell Tutorials

  • Cell-cell communication from bulk dataset
  • Cell-cell communication from single-cell dataset

Tensor-cell2cell Tutorials

  • Obtaining patterns of cell-cell communication with Tensor-cell2cell
  • Downstream analysis 1: Factor-specific analyses
  • Downstream analysis 2: Gene Set Enrichment Analysis
  • Inspecting CCC patterns from spatial transcriptomics
  • Running Tensor-cell2cell on your own GPU or on Google Colab's GPU

Tensor-cell2cell v2 Tutorials

  • Building multimodal 4D communication tensors (MEBOCOST)
  • Patterns of protein- and metabolite-based cell-cell communication (MEBOCOST)
  • Patterns of cell-cell communication and transcription factors (DecoupleR)
  • Patterns of cell-cell communication and metabolic activities (scCellFie)
  • Patterns of protein- and metabolite-based cell-cell communication (LIANA+)
    • 1 - Load Data
      • RNA-seq data
    • 2 - Data Preprocessing
      • RNA-seq for CCC
    • 3 - Infer cell-cell communication with LIANA+
      • Protein-mediated modality
      • Metabolite-mediated modality
    • 4 - Generate Tensors
      • 4D communication tensors
      • Generate coupled tensors
    • 5 - Run Analysis
      • Perform tensor factorization
    • 6 - Results
      • Metadata
      • Plot factor loadings
      • Top LR pairs
      • Export Loadings
    • 7 - Downstream Analyses
      • Generate factor-specific networks
      • Clustermaps
cell2cell
  • Tensor-cell2cell v2 Tutorials
  • Patterns of protein- and metabolite-based cell-cell communication (LIANA+)
  • Edit on GitHub

Patterns of protein- and metabolite-based cell-cell communication (LIANA+)¶

This tutorial is focused on running Tensor-cell2cell v2 purely by using the tool LIANA+ to predict protein- and metabolite-mediated cell-cell communication (CCC), separately. LIANA+ was previously integrated with Tensor-cell2cell, so we can run everything smoothly and directly generate the tensors.

In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).

In [1]:
Copied!
use_gpu = True

import tensorly as tl
if use_gpu:
    tl.set_backend('pytorch')
use_gpu = True import tensorly as tl if use_gpu: tl.set_backend('pytorch')

Importing packages to use

In [2]:
Copied!
import cell2cell as c2c
import scanpy as sc

import liana as li
import decoupler as dc

import numpy as np
import pandas as pd

from tqdm.auto import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import cell2cell as c2c import scanpy as sc import liana as li import decoupler as dc import numpy as np import pandas as pd from tqdm.auto import tqdm import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
In [3]:
Copied!
import warnings
warnings.filterwarnings('ignore')
import warnings warnings.filterwarnings('ignore')

IMPORTANT: In this notebook, the version 0.8.0 of cell2cell is used. Older versions do not implement the Coupled Tensor Component Analysis (CTCA).

1 - Load Data¶

Specify folders where the data is located, and where the outputs will be written:

In [4]:
Copied!
import os

data_folder = './data/'
directory = os.fsencode(data_folder)

output_folder = './results/LIANA/'
if not os.path.isdir(output_folder):
    os.mkdir(output_folder)
import os data_folder = './data/' directory = os.fsencode(data_folder) output_folder = './results/LIANA/' if not os.path.isdir(output_folder): os.mkdir(output_folder)

RNA-seq data¶

A preprocessed version of the original paper (Trujillo et al, 2019) can be found here

In [5]:
Copied!
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
In [6]:
Copied!
rnaseq = rnaseq.raw.to_adata()
rnaseq = rnaseq.raw.to_adata()
In [7]:
Copied!
rnaseq.shape
rnaseq.shape
Out[7]:
(15973, 21042)
In [8]:
Copied!
rnaseq.obs.head(2)
rnaseq.obs.head(2)
Out[8]:
orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters celltype old.ident
Month-01_AAACCTGAGAATTGTG-1 Month-01 6417.0 2431 2.898551 11 11 Other 11
Month-01_AAACCTGAGCAACGGT-1 Month-01 8522.0 2464 2.663694 6 6 Progenitor 6

2 - Data Preprocessing¶

RNA-seq for CCC¶

We first exclude cell types that were not used by Trujillo et al (2019).

In [9]:
Copied!
rnaseq = rnaseq[~rnaseq.obs['celltype'].isin(['MC', 'Other'])]
rnaseq = rnaseq[~rnaseq.obs['celltype'].isin(['MC', 'Other'])]

To perform a basic filtering of cell types within time points based on their number of cells, we first generate a combined column:

In [10]:
Copied!
rnaseq.obs['celltype_timepoint'] = rnaseq.obs['celltype'].astype(str) + '_' + rnaseq.obs['orig.ident'].astype(str)
rnaseq.obs['celltype_timepoint'] = rnaseq.obs['celltype'].astype(str) + '_' + rnaseq.obs['orig.ident'].astype(str)
In [11]:
Copied!
cell_type_counts = rnaseq.obs['celltype_timepoint'].value_counts()
min_cells = 20
valid_cell_types = cell_type_counts[cell_type_counts >= min_cells].index
adata = rnaseq[rnaseq.obs['celltype_timepoint'].isin(valid_cell_types)]
adata.shape
cell_type_counts = rnaseq.obs['celltype_timepoint'].value_counts() min_cells = 20 valid_cell_types = cell_type_counts[cell_type_counts >= min_cells].index adata = rnaseq[rnaseq.obs['celltype_timepoint'].isin(valid_cell_types)] adata.shape
Out[11]:
(12792, 21042)

Then, we normalize the data:

In [12]:
Copied!
sc.pp.normalize_total(rnaseq)
sc.pp.log1p(rnaseq)
sc.pp.normalize_total(rnaseq) sc.pp.log1p(rnaseq)

3 - Infer cell-cell communication with LIANA+¶

In [13]:
Copied!
sample_key = 'orig.ident'
groupby = 'celltype'
sample_key = 'orig.ident' groupby = 'celltype'

Protein-mediated modality¶

We first use LIANA to predict CCC mediated by protein ligands and receptors, as previously done in this tutorial

In [14]:
Copied!
li.mt.rank_aggregate.by_sample(adata,
                               sample_key=sample_key,
                               groupby=groupby,
                               resource_name = 'consensus',
                               expr_prop=0.1, # must be expressed in expr_prop fraction of cells
                               min_cells = 5,
                               n_perms = 100,
                               use_raw = False, # run on log- and library-normalized counts
                               verbose = True,
                               inplace = True,
                               return_all_lrs=True, # return all LR values
                              )
li.mt.rank_aggregate.by_sample(adata, sample_key=sample_key, groupby=groupby, resource_name = 'consensus', expr_prop=0.1, # must be expressed in expr_prop fraction of cells min_cells = 5, n_perms = 100, use_raw = False, # run on log- and library-normalized counts verbose = True, inplace = True, return_all_lrs=True, # return all LR values )
Converting `orig.ident` to categorical!
Now running: Month-10: 100%|██████████████████████████████████████████████████████████| 4/4 [00:32<00:00,  8.01s/it]

Results can be found here:

In [15]:
Copied!
adata.uns['liana_res'].head()
adata.uns['liana_res'].head()
Out[15]:
orig.ident source target ligand_complex receptor_complex lr_means cellphone_pvals expr_prod scaled_weight lr_logfc spec_weight lrscore specificity_rank magnitude_rank
0 Month-01 Glia Glia MDK NCL 1.062089 0.00 0.966921 0.290163 0.089959 0.074526 0.925330 0.000045 1.708695e-08
1 Month-01 Progenitor Glia MDK NCL 1.036312 0.00 0.932859 0.108443 0.081726 0.071901 0.924081 0.000056 6.834581e-08
2 Month-01 Glia Progenitor MDK NCL 1.034963 0.00 0.887523 0.217578 0.039795 0.068406 0.922315 0.000071 2.733674e-07
3 Month-01 Glia Glutamatergic MDK NCL 1.031363 0.06 0.876988 0.207946 0.034401 0.067594 0.921886 0.000076 4.271241e-07
4 Month-01 IP Glia MDK NCL 1.027941 0.02 0.921797 0.049431 0.042089 0.071048 0.923662 0.000059 4.271241e-07
In [16]:
Copied!
protein_ccc = adata.uns['liana_res'].copy()
protein_ccc = adata.uns['liana_res'].copy()

Metabolite-mediated modality¶

We then use LIANA to predict CCC mediated by metabolite ligands, as previously done in this tutorial combining an univariate linear model

First we obtain the database that contains information to predict metabolite presence:

In [17]:
Copied!
metalinks = li.resource.get_metalinks(biospecimen_location='Blood',
                                      source=['CellPhoneDB', 'Cellinker', 'scConnect', # Ligand-Receptor resources
                                              'recon', 'hmr', 'rhea', 'hmdb' # Production-Degradation resources
                                              ],
                                      types=['pd', 'lr'], # NOTE: we obtain both ligand-receptor and production-degradation sets
                                     )
metalinks = li.resource.get_metalinks(biospecimen_location='Blood', source=['CellPhoneDB', 'Cellinker', 'scConnect', # Ligand-Receptor resources 'recon', 'hmr', 'rhea', 'hmdb' # Production-Degradation resources ], types=['pd', 'lr'], # NOTE: we obtain both ligand-receptor and production-degradation sets )

Next, prepare this database:

In [18]:
Copied!
resource = metalinks[metalinks['type']=='lr'].copy()
resource = resource[['metabolite', 'gene_symbol']]\
    .rename(columns={'gene_symbol':'receptor'}).drop_duplicates()
resource.head()
resource = metalinks[metalinks['type']=='lr'].copy() resource = resource[['metabolite', 'gene_symbol']]\ .rename(columns={'gene_symbol':'receptor'}).drop_duplicates() resource.head()
Out[18]:
metabolite receptor
173 Oxoglutaric acid OXGR1
351 Acetaldehyde TRPA1
410 Calcitriol VDR
843 ADP P2RY1
1071 ADP P2RY6

We split it into two parts: 1) the information that links enzymes producing or consuming each metabolite:

In [19]:
Copied!
pd_net = metalinks[metalinks['type'] == 'pd']
# we need to aggregate the production-degradation values
pd_net = pd_net[['metabolite', 'gene_symbol', 'mor']].groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
pd_net.head()
pd_net = metalinks[metalinks['type'] == 'pd'] # we need to aggregate the production-degradation values pd_net = pd_net[['metabolite', 'gene_symbol', 'mor']].groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index() pd_net.head()
Out[19]:
metabolite gene_symbol mor
0 (+)-Limonene CYP2C19 -1.0
1 (+)-Limonene CYP2C9 -1.0
2 (R)-Lipoic acid CEL 1.0
3 (R)-Lipoic acid DLD -1.0
4 (R)-Lipoic acid LIPA 1.0
In [20]:
Copied!
pd_net.rename(columns={'metabolite' : 'source', 'gene_symbol': 'target' , 'mor': 'weight'}, inplace=True)
pd_net.rename(columns={'metabolite' : 'source', 'gene_symbol': 'target' , 'mor': 'weight'}, inplace=True)

And 2) the information that contains the transporter of each metabolite

In [21]:
Copied!
t_net = metalinks[metalinks['type'] == 'pd']
t_net = t_net[['metabolite', 'gene_symbol', 'transport_direction']].dropna()
# Note that we treat export as positive and import as negative
t_net['mor'] = t_net['transport_direction'].apply(lambda x: 1 if x == 'out' else -1 if x == 'in' else None)
t_net = t_net[['metabolite', 'gene_symbol', 'mor']].dropna().groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
t_net = t_net[t_net['mor']!=0]
t_net = metalinks[metalinks['type'] == 'pd'] t_net = t_net[['metabolite', 'gene_symbol', 'transport_direction']].dropna() # Note that we treat export as positive and import as negative t_net['mor'] = t_net['transport_direction'].apply(lambda x: 1 if x == 'out' else -1 if x == 'in' else None) t_net = t_net[['metabolite', 'gene_symbol', 'mor']].dropna().groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index() t_net = t_net[t_net['mor']!=0]
In [22]:
Copied!
t_net.rename(columns={'metabolite' : 'source', 'gene_symbol': 'target' , 'mor': 'weight'}, inplace=True)
t_net.rename(columns={'metabolite' : 'source', 'gene_symbol': 'target' , 'mor': 'weight'}, inplace=True)

With this we can predict metabolite abundances

This is ugly code due to issues with LIANA+ versions, but if you manage to have no issues you can directly call the function li.mt.fun.estimate_metalinks() instead, as in here:

meta = li.mt.fun.estimate_metalinks(adata,
                                    resource,
                                    pd_net=pd_net,
                                    t_net=t_net, # (Optional)
                                    use_raw=False, 
                                    # keyword arguments passed to decoupler-py
                                    source='metabolite', target='gene_symbol',
                                    weight='mor', min_n=3)
# pass cell type information
meta.obs['celltype'] = adata.obs['celltype']
In [23]:
Copied!
from mudata import MuData
from mudata import MuData
In [24]:
Copied!
dc.mt.ulm(adata, net=pd_net, raw=False)
met_est = adata.obsm['score_ulm']
dc.mt.ulm(adata, net=pd_net, raw=False) met_est = adata.obsm['score_ulm']
In [25]:
Copied!
dc.mt.waggr(adata, t_net, times=0, raw=False, fun='wmean')

out_est = adata.obsm['score_waggr']
intersect = np.intersect1d(met_est.columns, out_est.columns)
out_est = out_est[intersect]
out_mask = out_est > 0
mask = np.ones(out_est.shape)
mask[out_mask == 0] = 0
# mask those with transporters
mmat = met_est[intersect] * mask
# concat the rest
coldiff = np.setdiff1d(met_est.columns, mmat.columns)
mmat = pd.concat([mmat, met_est[coldiff]], axis = 1)
dc.mt.waggr(adata, t_net, times=0, raw=False, fun='wmean') out_est = adata.obsm['score_waggr'] intersect = np.intersect1d(met_est.columns, out_est.columns) out_est = out_est[intersect] out_mask = out_est > 0 mask = np.ones(out_est.shape) mask[out_mask == 0] = 0 # mask those with transporters mmat = met_est[intersect] * mask # concat the rest coldiff = np.setdiff1d(met_est.columns, mmat.columns) mmat = pd.concat([mmat, met_est[coldiff]], axis = 1)
In [26]:
Copied!
_resource = resource[resource['metabolite'].isin(mmat.columns)].copy()
receptor = adata[:, adata.var.index.isin(np.unique(_resource['receptor']))]
_resource = resource[resource['metabolite'].isin(mmat.columns)].copy() receptor = adata[:, adata.var.index.isin(np.unique(_resource['receptor']))]
In [27]:
Copied!
adata.obsm['mmat'] = mmat
mmat = li.utils.obsm_to_adata(adata, 'mmat')
meta = MuData({'metabolite':mmat, 'receptor':receptor})
adata.obsm['mmat'] = mmat mmat = li.utils.obsm_to_adata(adata, 'mmat') meta = MuData({'metabolite':mmat, 'receptor':receptor})

This is where the ugly code ends

In [28]:
Copied!
# pass cell type information
meta.obs[groupby] = adata.obs[groupby]
meta.obs[sample_key] = adata.obs[sample_key]
# pass cell type information meta.obs[groupby] = adata.obs[groupby] meta.obs[sample_key] = adata.obs[sample_key]
In [29]:
Copied!
sc.pl.umap(meta.mod['metabolite'], color=['gamma-Aminobutyric acid', 'Arachidic acid', 'celltype'])
sc.pl.umap(meta.mod['metabolite'], color=['gamma-Aminobutyric acid', 'Arachidic acid', 'celltype'])
... storing 'celltype' as categorical
... storing 'celltype_timepoint' as categorical
No description has been provided for this image

Finally compute cell-cell communication

In [30]:
Copied!
li.mt.rank_aggregate.by_sample(meta,
                               sample_key=sample_key,
                               groupby=groupby,
                               use_raw=False,
                               # pass our modified resource
                               resource=resource.rename(columns={'metabolite':'ligand'}),
                               # NOTE: Essential arguments when handling multimodal data
                               mdata_kwargs={
                               'x_mod': 'metabolite',
                               'y_mod': 'receptor',
                               'x_use_raw':False,
                               'y_use_raw':False,
                               'x_transform':li.ut.zi_minmax,
                               'y_transform':li.ut.zi_minmax,
                              },
                               expr_prop=0.1, # must be expressed in expr_prop fraction of cells
                               min_cells = 5,
                               n_perms = 100,
                            verbose=True
                            )
li.mt.rank_aggregate.by_sample(meta, sample_key=sample_key, groupby=groupby, use_raw=False, # pass our modified resource resource=resource.rename(columns={'metabolite':'ligand'}), # NOTE: Essential arguments when handling multimodal data mdata_kwargs={ 'x_mod': 'metabolite', 'y_mod': 'receptor', 'x_use_raw':False, 'y_use_raw':False, 'x_transform':li.ut.zi_minmax, 'y_transform':li.ut.zi_minmax, }, expr_prop=0.1, # must be expressed in expr_prop fraction of cells min_cells = 5, n_perms = 100, verbose=True )
Now running: Month-10: 100%|██████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.30s/it]

And the results:

In [31]:
Copied!
meta.uns['liana_res'].head()
meta.uns['liana_res'].head()
Out[31]:
orig.ident source target ligand_complex receptor_complex lr_means cellphone_pvals expr_prod scaled_weight lr_logfc spec_weight lrscore specificity_rank magnitude_rank
0 Month-01 Glia Progenitor Sphinganine TRPM3 0.359097 0.00 0.124714 0.092563 0.210764 0.165734 0.715372 0.027473 0.000129
1 Month-01 Progenitor Progenitor Sphinganine TRPM3 0.351860 0.00 0.118574 0.069406 0.227814 0.157574 0.710204 0.039417 0.000514
2 Month-01 Progenitor Progenitor Sodium TRPM3 0.319108 0.00 0.090789 0.068954 0.226059 0.160477 0.681978 0.035748 0.018032
3 Month-01 Glia Glia Sphinganine TRPM3 0.317264 0.79 0.100116 -0.030954 -0.017828 0.133044 0.692487 1.000000 0.027954
4 Month-01 Glia Glutamatergic all-trans-Retinoic acid RARB 0.017623 0.93 0.000182 -0.081092 -0.020941 0.021961 0.087561 1.000000 0.033962
In [32]:
Copied!
met_ccc = meta.uns['liana_res'].copy()
met_ccc = meta.uns['liana_res'].copy()

4 - Generate Tensors¶

We first sort our contexts:

In [33]:
Copied!
contexts = sorted(adata.obs[sample_key].unique())
contexts
contexts = sorted(adata.obs[sample_key].unique()) contexts
Out[33]:
['Month-01', 'Month-03', 'Month-06', 'Month-10']

4D communication tensors¶

LIANA and Tensor-cell2cell are conveniently interconnected, so generating tensors from LIANA is quite smooth. We start with the protein-based tensor:

In [34]:
Copied!
tensor_prot = li.multi.to_tensor_c2c(liana_res=protein_ccc, # LIANA's dataframe containing results
                                     sample_key=sample_key, # Column name of the samples
                                     source_key='source', # Column name of the sender cells
                                     target_key='target', # Column name of the receiver cells
                                     ligand_key='ligand_complex', # Column name of the ligands
                                     receptor_key='receptor_complex', # Column name of the receptors
                                     score_key='magnitude_rank', # Column name of the communication scores to use
                                     inverse_fun=lambda x: 1 - x, # Transformation function
                                     how='outer_cells', # What to include across all samples
                                     outer_fraction=1/3, # Fraction of samples as threshold to include cells and LR pairs.
                                     context_order=contexts, # Order to store the contexts in the tensor
                                     order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells']
                                    )
tensor_prot = li.multi.to_tensor_c2c(liana_res=protein_ccc, # LIANA's dataframe containing results sample_key=sample_key, # Column name of the samples source_key='source', # Column name of the sender cells target_key='target', # Column name of the receiver cells ligand_key='ligand_complex', # Column name of the ligands receptor_key='receptor_complex', # Column name of the receptors score_key='magnitude_rank', # Column name of the communication scores to use inverse_fun=lambda x: 1 - x, # Transformation function how='outer_cells', # What to include across all samples outer_fraction=1/3, # Fraction of samples as threshold to include cells and LR pairs. context_order=contexts, # Order to store the contexts in the tensor order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'] )
100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:29<00:00,  7.44s/it]
In [35]:
Copied!
tensor_prot.shape
tensor_prot.shape
Out[35]:
torch.Size([4, 1303, 5, 5])

Then, we continue with the metabolite-based tensor

In [36]:
Copied!
tensor_met = li.multi.to_tensor_c2c(liana_res=met_ccc, # LIANA's dataframe containing results
                                     sample_key=sample_key, # Column name of the samples
                                     source_key='source', # Column name of the sender cells
                                     target_key='target', # Column name of the receiver cells
                                     ligand_key='ligand_complex', # Column name of the ligands
                                     receptor_key='receptor_complex', # Column name of the receptors
                                     score_key='magnitude_rank', # Column name of the communication scores to use
                                     inverse_fun=lambda x: 1 - x, # Transformation function
                                     how='outer_cells', # What to include across all samples
                                     outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs.
                                     context_order=contexts, # Order to store the contexts in the tensor
                                     order_labels=['Contexts', 'Metabolite LRIs', 'Sender Cells', 'Receiver Cells']
                                    )
tensor_met = li.multi.to_tensor_c2c(liana_res=met_ccc, # LIANA's dataframe containing results sample_key=sample_key, # Column name of the samples source_key='source', # Column name of the sender cells target_key='target', # Column name of the receiver cells ligand_key='ligand_complex', # Column name of the ligands receptor_key='receptor_complex', # Column name of the receptors score_key='magnitude_rank', # Column name of the communication scores to use inverse_fun=lambda x: 1 - x, # Transformation function how='outer_cells', # What to include across all samples outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs. context_order=contexts, # Order to store the contexts in the tensor order_labels=['Contexts', 'Metabolite LRIs', 'Sender Cells', 'Receiver Cells'] )
100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 25.58it/s]
In [37]:
Copied!
tensor_met.shape
tensor_met.shape
Out[37]:
torch.Size([4, 8, 5, 5])

Generate coupled tensors¶

Once we have our 4D tensors, we start creating a CoupledInteractionTensorobject in Tensor-cell2cell v2. This object will later perform the CTCA.

In [38]:
Copied!
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_prot,
                                                     tensor2=tensor_met,
                                                     tensor1_name='cell2cell',
                                                     tensor2_name='LIANA',
                                                     mode_mapping={'shared' : [(0,0), (2,2), (3,3)]}
                                                    )
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_prot, tensor2=tensor_met, tensor1_name='cell2cell', tensor2_name='LIANA', mode_mapping={'shared' : [(0,0), (2,2), (3,3)]} )

We can observe that this object includes both tensors simultaneously.

In [39]:
Copied!
coupled_tensor.shape
coupled_tensor.shape
Out[39]:
(torch.Size([4, 1303, 5, 5]), torch.Size([4, 8, 5, 5]))

Assume that missing values are real zeros

Instead of imputing missing values with our method, we will assume that they are real zeros.

In [40]:
Copied!
coupled_tensor.mask1 = None
coupled_tensor.mask2 = None
coupled_tensor.mask1 = None coupled_tensor.mask2 = None

5 - Run Analysis¶

Perform tensor factorization¶

Next, we apply a non-negative canonical coupled component analysis (CTCA). Briefly, tensor decomposition identifies a low-rank tensor (here, a rank of 6) for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (6 of them in this case). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.

In the CTCA, we have vectors that are shared between the coupled tensors (here contexts, sender cells, and receiver cells), while we have other vectors that are private for each modality (protein-based LR pair and metabolite-based LR pair dimensions). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs of each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pairs for each factor would behave the same, as this could lead to loadings for the context, sender cells and receiver cells with different behaviors between modalities.

tensor-fact

To illustrate how performing the CTCA for identifying protein LR pairs acting coordinated with metabolite LR pairs, we skipped the Elbow Analysis, which can be found in our previous tutorial using MEBOCOST. Instead, we run the factorization using a manual number of 6 factors:

In [41]:
Copied!
coupled_tensor.compute_tensor_factorization(rank=6, # This is just for illustrative purposes - to keep a low number of factors
                                            init='svd',
                                            random_state=888,
                                            runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
                                           )
coupled_tensor.compute_tensor_factorization(rank=6, # This is just for illustrative purposes - to keep a low number of factors init='svd', random_state=888, runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition )

6 - Results¶

Metadata¶

Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots

This is a list containing metadata for each dimension of the tensor (contexts, LR pairs, sender cells, receiver cells). Each metadata corresponds to a dataframe with info of each element in the respective dimension.

We start with the protein-based tensor.

In [42]:
Copied!
prot_functions = {k: 'Proteins' for k in tensor_prot.order_names[1]}
prot_functions = {k: 'Proteins' for k in tensor_prot.order_names[1]}
In [43]:
Copied!
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_prot,
                                              metadata_dicts=[None, prot_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_prot, metadata_dicts=[None, prot_functions, None, None], fill_with_order_elements=True )

We continue with the metabolite-based tensor.

In [44]:
Copied!
met_functions = {k: 'Metabolites' for k in tensor_met.order_names[1]}
met_functions = {k: 'Metabolites' for k in tensor_met.order_names[1]}
In [45]:
Copied!
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met,
                                              metadata_dicts=[None, met_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met, metadata_dicts=[None, met_functions, None, None], fill_with_order_elements=True )

Generate unified metadata for CTCA results

By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.

In [46]:
Copied!
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)

Plot factor loadings¶

After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tesnor (protein- and metabolite-based, respectively).

In [47]:
Copied!
# Color palettes for each of the coupled tensor dimensions, ordered as in order_sorting below
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2', ]
# Color palettes for each of the coupled tensor dimensions, ordered as in order_sorting below cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2', ]
In [48]:
Copied!
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
                                                 metadata=new_meta_tf,
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps=cmaps,
                                                 order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells', # Shared dimensions
                                                                'Protein LRIs', 'Metabolite LRIs' # Private dimensions
                                                               ],
                                                 fontsize=14,
                                                 filename=output_folder + 'Tensor-Factorization.pdf'
                                                )
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor, metadata=new_meta_tf, sample_col='Element', group_col='Category', meta_cmaps=cmaps, order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells', # Shared dimensions 'Protein LRIs', 'Metabolite LRIs' # Private dimensions ], fontsize=14, filename=output_folder + 'Tensor-Factorization.pdf' )
No description has been provided for this image

Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts, sender-cells, and receiver-cells; Private dimensions: Protein LR pairs and Metabolite LR pairs).

The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 1, the identified communication pattern occurs in early stages (month 1), where all cell types but GABAergic neurons participate as sender and receiver cells. Then we can identify the top LR pairs contributing from each modality. Thus, those that are top ranked for the protein modality act in coordination with those top ranked for the metabolite modality.

Top LR pairs¶

Top-5 LR pairs

In [49]:
Copied!
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 5))
    print('')
for i in range(coupled_tensor.rank): print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 5)) print('')
MDK^ITGA6_ITGB1    0.104460
HMGB1^CXCR4        0.101745
APP^NCSTN          0.101160
MDK^NOTCH2         0.101085
APP^NOTCH2         0.100711
Name: Factor 1, dtype: float32

MDK^LRP1      0.101270
GNAS^ADCY1    0.100843
PSAP^LRP1     0.100770
TIMP1^CD63    0.100751
CALR^LRP1     0.100648
Name: Factor 2, dtype: float32

CALM1^KCNQ3     0.141701
CALM3^KCNQ3     0.141581
MDK^PTPRZ1      0.141258
PTN^PTPRZ1      0.141081
NCAM1^PTPRZ1    0.140969
Name: Factor 3, dtype: float32

CALM1^INSR     0.225223
CALM3^INSR     0.224854
ARF1^INSR      0.223842
HRAS^INSR      0.223012
CALM1^PTPRA    0.218243
Name: Factor 4, dtype: float32

MDK^NOTCH2    0.166489
CD99^CD81     0.166447
APP^NOTCH2    0.165535
APP^GPC1      0.155688
MDK^SDC2      0.148343
Name: Factor 5, dtype: float32

APP^LRP10     0.163997
APP^NOTCH2    0.160833
MDK^NOTCH2    0.160823
PSAP^LRP1     0.147551
MDK^LRP1      0.147358
Name: Factor 6, dtype: float32

In [50]:
Copied!
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Metabolite LRIs', 'Factor {}'.format(i+1), 5))
    print('')
for i in range(coupled_tensor.rank): print(coupled_tensor.get_top_factor_elements('Metabolite LRIs', 'Factor {}'.format(i+1), 5)) print('')
Sphinganine^TRPM3                       0.523455
Sodium^TRPM3                            0.501364
Sphingosine^TRPM3                       0.472981
24-Hydroxycholesterol^NR1H2             0.423842
MG(0:0/20:4(5Z,8Z,11Z,14Z)/0:0)^CNR1    0.188327
Name: Factor 1, dtype: float32

24-Hydroxycholesterol^NR1H2             0.577879
MG(0:0/20:4(5Z,8Z,11Z,14Z)/0:0)^CNR1    0.507618
Anandamide^CNR1                         0.461660
Sphinganine^TRPM3                       0.273666
Sodium^TRPM3                            0.267419
Name: Factor 2, dtype: float32

24-Hydroxycholesterol^NR1H2             0.750918
Sodium^TRPM3                            0.378224
Anandamide^CNR1                         0.311966
Sphingosine^TRPM3                       0.285774
MG(0:0/20:4(5Z,8Z,11Z,14Z)/0:0)^CNR1    0.227730
Name: Factor 3, dtype: float32

MG(0:0/20:4(5Z,8Z,11Z,14Z)/0:0)^CNR1    6.022073e-01
L-Glutamic acid^GRIA2                   5.719475e-01
Anandamide^CNR1                         5.569762e-01
Dopamine^HTR2C                          8.162540e-05
Sphinganine^TRPM3                       5.793399e-27
Name: Factor 4, dtype: float32

24-Hydroxycholesterol^NR1H2    0.665430
Sphinganine^TRPM3              0.548730
Sphingosine^TRPM3              0.382991
Sodium^TRPM3                   0.324742
Dopamine^HTR2C                 0.062921
Name: Factor 5, dtype: float32

24-Hydroxycholesterol^NR1H2    0.849855
Sphingosine^TRPM3              0.362065
Sphinganine^TRPM3              0.306646
Sodium^TRPM3                   0.222987
L-Glutamic acid^GRIA2          0.053857
Name: Factor 6, dtype: float32

Export Loadings¶

THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES

In [51]:
Copied!
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
Coupled tensor factor loadings saved to ./results/LIANA/Coupled-Loadings.xlsx

7 - Downstream Analyses¶

Generate factor-specific networks¶

We can generate factor-specific communication networks connecting sender-to-receiver cells.

In [52]:
Copied!
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, 
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells')
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, sender_label='Sender Cells', receiver_label='Receiver Cells')

Generate matrix of cell-cell pairs by factors

In [53]:
Copied!
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')

Select cell-cell pairs with high potential of interaction

In [54]:
Copied!
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
No description has been provided for this image
In [55]:
Copied!
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
# To consider only cell pairs with high potential to interact in each factor cci_threshold = 0.2

Visualize networks of factors with significant differences between groups

In [56]:
Copied!
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
                                                 included_factors=None,
                                                 ccc_threshold=cci_threshold, # Only important communication
                                                 nrows=2,
                                                 panel_size=(8, 8),
                                                 network_layout='circular',
                                                 filename=output_folder + 'Networks.svg'
                                                )
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors, included_factors=None, ccc_threshold=cci_threshold, # Only important communication nrows=2, panel_size=(8, 8), network_layout='circular', filename=output_folder + 'Networks.svg' )
No description has been provided for this image

Clustermaps¶

Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings

In [57]:
Copied!
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]

Cluster cell-cell pairs by their potential across factors

Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.

In [58]:
Copied!
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
                                         use_zscore=False, 
                                         loading_threshold=cci_threshold, # To consider relevant CCIs
                                         figsize=(20, 9),
                                         tick_fontsize=16,
                                         cmap='YlOrBr',
                                         filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
                                         row_cluster=False
                                        )
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted], use_zscore=False, loading_threshold=cci_threshold, # To consider relevant CCIs figsize=(20, 9), tick_fontsize=16, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf', row_cluster=False )
No description has been provided for this image

Cluster LR pairs by their importance across factors

We can plot the key LR pairs from each modality. First we identify what could represent a high loading given their separate distributions.

In [59]:
Copied!
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
No description has been provided for this image
In [60]:
Copied!
_ = plt.hist(coupled_tensor.factors['Metabolite LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Metabolite LRIs'].values.flatten(), bins = 20)
No description has been provided for this image

And set and arbitrary threshold to filter out those that have low loadings

In [61]:
Copied!
# To consider only most relevant LR pairs
lr_threshold1 = 0.1
lr_threshold2 = 0.5
# To consider only most relevant LR pairs lr_threshold1 = 0.1 lr_threshold2 = 0.5

Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case.

In [62]:
Copied!
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold1).any()]
df1 = (df1 / df1.max().max())

df2 = coupled_tensor.factors['Metabolite LRIs'].copy()
df2 = df2[(df2.T > lr_threshold2).any()]
df2 = (df2 / df2.max().max())
df1 = coupled_tensor.factors['Protein LRIs'].copy() df1 = df1[(df1.T > lr_threshold1).any()] df1 = (df1 / df1.max().max()) df2 = coupled_tensor.factors['Metabolite LRIs'].copy() df2 = df2[(df2.T > lr_threshold2).any()] df2 = (df2 / df2.max().max())
In [63]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df1,
                                        loading_threshold=lr_threshold1, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(16, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df1, loading_threshold=lr_threshold1, # To consider only top LRs use_zscore=False, figsize=(16, 5), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf', row_cluster=False )
No description has been provided for this image
In [64]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df2,
                                        loading_threshold=lr_threshold2, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(8, 6),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-MetLRs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df2, loading_threshold=lr_threshold2, # To consider only top LRs use_zscore=False, figsize=(8, 6), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-MetLRs.pdf', row_cluster=False )
No description has been provided for this image

Given that we max-scaled them, we can also put them together in an unified plot

In [65]:
Copied!
df = pd.concat([df1, df2])
df = pd.concat([df1, df2])
In [66]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df,
                                        loading_threshold=0.15, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(24, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-LRs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df, loading_threshold=0.15, # To consider only top LRs use_zscore=False, figsize=(24, 5), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-LRs.pdf', row_cluster=False )
No description has been provided for this image
Previous

Built with MkDocs using a theme provided by Read the Docs.
GitHub « Previous