cell2cell
  • Home
  • API Documentation

cell2cell Tutorials

  • Cell-cell communication from bulk dataset
  • Cell-cell communication from single-cell dataset

Tensor-cell2cell Tutorials

  • Obtaining patterns of cell-cell communication with Tensor-cell2cell
  • Downstream analysis 1: Factor-specific analyses
  • Downstream analysis 2: Gene Set Enrichment Analysis
  • Inspecting CCC patterns from spatial transcriptomics
  • Running Tensor-cell2cell on your own GPU or on Google Colab's GPU

Tensor-cell2cell v2 Tutorials

  • Building multimodal 4D communication tensors (MEBOCOST)
  • Patterns of protein- and metabolite-based cell-cell communication (MEBOCOST)
  • Patterns of cell-cell communication and transcription factors (DecoupleR)
  • Patterns of cell-cell communication and metabolic activities (scCellFie)
    • 1 - Load Data
      • RNA-seq data
      • Ligand-Receptor pairs
    • 2 - Data Preprocessing
      • RNA-seq for CCC
      • LR pairs
    • 3 - Predict metabolic activities with scCellFie
    • 4 - Build Tensors
      • Build 4D-Communication Tensor
      • Build 3D Metabolic Task Activity Tensor
    • 5 - Generate coupled tensors
    • 6 - Run Analysis
      • Perform tensor factorization
    • 7 - Results
      • Metadata
      • Plot factor loadings
      • Top LR pairs and metabolic tasks
      • Export Loadings
    • 8 - Downstream Analyses
      • Generate factor-specific networks
      • Clustermaps
  • Patterns of protein- and metabolite-based cell-cell communication (LIANA+)
cell2cell
  • Tensor-cell2cell v2 Tutorials
  • Patterns of cell-cell communication and metabolic activities (scCellFie)
  • Edit on GitHub

Patterns of cell-cell communication and metabolic activities (scCellFie)¶

Previously we showed how Tensor-cell2cell can be used to extract patterns of multiple modalities of cell-cell communication (CCC). Here we will show how our CTCA can be extended to couple CCC and intracellular metabolic activity. Here we use the tools cell2cell and scCellFie to predict CCC and the activity of metabolic tasks, respectively.

In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).

In [1]:
Copied!
use_gpu = False

import tensorly as tl
if use_gpu:
    tl.set_backend('pytorch')
use_gpu = False import tensorly as tl if use_gpu: tl.set_backend('pytorch')

Importing packages to use

In [2]:
Copied!
import cell2cell as c2c
import scanpy as sc
import sccellfie

import numpy as np
import pandas as pd

from tqdm.auto import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import cell2cell as c2c import scanpy as sc import sccellfie import numpy as np import pandas as pd from tqdm.auto import tqdm import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
/Users/earmingol/miniforge3/envs/cell2cell/lib/python3.10/site-packages/dask/dataframe/__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning.
  warnings.warn(
/Users/earmingol/miniforge3/envs/cell2cell/lib/python3.10/site-packages/xarray_schema/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound, get_distribution
/Users/earmingol/miniforge3/envs/cell2cell/lib/python3.10/site-packages/anndata/utils.py:434: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.
  warnings.warn(msg, FutureWarning)
In [3]:
Copied!
import warnings
warnings.filterwarnings('ignore')
import warnings warnings.filterwarnings('ignore')

IMPORTANT: In this notebook, the version 0.8.0 of cell2cell is used. Older versions do not implement the Coupled Tensor Component Analysis (CTCA).

1 - Load Data¶

Specify folders where the data is located, and where the outputs will be written:

In [4]:
Copied!
import os

data_folder = './data/'
directory = os.fsencode(data_folder)

output_folder = './results/scCellFie/'
if not os.path.isdir(output_folder):
    os.makedirs(output_folder)
import os data_folder = './data/' directory = os.fsencode(data_folder) output_folder = './results/scCellFie/' if not os.path.isdir(output_folder): os.makedirs(output_folder)

RNA-seq data¶

A preprocessed version of the original paper (Trujillo et al, 2019) can be found here

In [5]:
Copied!
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
In [6]:
Copied!
rnaseq = rnaseq.raw.to_adata()
rnaseq = rnaseq.raw.to_adata()
In [7]:
Copied!
rnaseq.obs.head(2)
rnaseq.obs.head(2)
Out[7]:
orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters celltype old.ident
Month-01_AAACCTGAGAATTGTG-1 Month-01 6417.0 2431 2.898551 11 11 Other 11
Month-01_AAACCTGAGCAACGGT-1 Month-01 8522.0 2464 2.663694 6 6 Progenitor 6

Ligand-Receptor pairs¶

Different databases of ligand-receptor interactions could be used. We previously created a repository that includes many available DBs (https://github.com/LewisLabUCSD/Ligand-Receptor-Pairs). In this tutorial, we employ the ligand-receptor pairs from CellChat (https://doi.org/10.1038/s41467-021-21246-9), which includes multimeric protein complexes.

In [8]:
Copied!
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv')
lr_pairs = lr_pairs.astype(str)
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv') lr_pairs = lr_pairs.astype(str)
In [9]:
Copied!
lr_pairs.head(2)
lr_pairs.head(2)
Out[9]:
interaction_name pathway_name ligand receptor agonist antagonist co_A_receptor co_I_receptor evidence annotation interaction_name_2 ligand_symbol receptor_symbol ligand_ensembl receptor_ensembl interaction_symbol interaction_ensembl
0 TGFB1_TGFBR1_TGFBR2 TGFb TGFB1 TGFbR1_R2 TGFb agonist TGFb antagonist nan TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling TGFB1 - (TGFBR1+TGFBR2) TGFB1 TGFBR1&TGFBR2 ENSG00000105329 ENSG00000106799&ENSG00000163513 TGFB1^TGFBR1&TGFBR2 ENSG00000105329^ENSG00000106799&ENSG00000163513
1 TGFB2_TGFBR1_TGFBR2 TGFb TGFB2 TGFbR1_R2 TGFb agonist TGFb antagonist nan TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling TGFB2 - (TGFBR1+TGFBR2) TGFB2 TGFBR1&TGFBR2 ENSG00000092969 ENSG00000106799&ENSG00000163513 TGFB2^TGFBR1&TGFBR2 ENSG00000092969^ENSG00000106799&ENSG00000163513
In [10]:
Copied!
# interaction columns:
int_columns = ('ligand_symbol', 'receptor_symbol')
# interaction columns: int_columns = ('ligand_symbol', 'receptor_symbol')

2 - Data Preprocessing¶

RNA-seq for CCC¶

Organize data to create tensor

First, define contexts

In [11]:
Copied!
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']

Generate list of RNA-seq data containing all contexts (time points)

Here, we convert the single-cell expression levels into cell-type expression levels. A basic context-wise preprocessing is performed here, keeping only genes that are expressed at least in 4 single cells, and cell types per context with at least 20 cells.

Important: A list of aggregated gene expression matrices must be used for building the 4D-communication tensor. Each of the gene expression matrices is for one of the contexts, following the same order as the previouslist (contexts).

The aggregation here corresponds to mean expression value across cells of each cell type. Other aggregation methods could be used by changing the parameter method='average' in the aggregate_single_cell() function. Additionally, other gene expression values could be passed, using other preprocessing approaches such as log(x+1) or batch correction methods.

In [12]:
Copied!
rnaseq_matrices = []

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = rnaseq[cells]
    # Keep genes in each sample with at least 4 single cells expressing it
    sc.pp.filter_genes(tmp_data, min_cells=4)
    
    # Normalize and Log1p
    sc.pp.normalize_total(tmp_data, target_sum=1e6)
    sc.pp.log1p(tmp_data)

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    rnaseq_matrices.append(exp_df)
rnaseq_matrices = [] for context in tqdm(contexts): meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \ & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')] # Remove celltype per context that has few single cells min_sc_number = 20 excluded_sc = [] for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows(): if ~row['celltype']: excluded_sc.append(idx) meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)] cells = list(meta_context.index) meta_context.index.name = 'barcode' tmp_data = rnaseq[cells] # Keep genes in each sample with at least 4 single cells expressing it sc.pp.filter_genes(tmp_data, min_cells=4) # Normalize and Log1p sc.pp.normalize_total(tmp_data, target_sum=1e6) sc.pp.log1p(tmp_data) # Aggregate gene expression of single cells into cell types exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(), metadata=meta_context, barcode_col='barcode', celltype_col='celltype', method='average', ) rnaseq_matrices.append(exp_df)
  0%|          | 0/4 [00:00<?, ?it/s]

rnaseq_matrices is the list we will use for building the tensor based on protein-based ligand-receptor pairs.

LR pairs¶

Remove bidirectionality in the list of protein-based ligand-receptor pairs. That is, remove repeated interactions where both interactions are the same but in different order:

From this list:

Ligand Receptor
Protein A Protein B
Protein B Protein A

We will have:

Ligand Receptor
Protein A Protein B
In [13]:
Copied!
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, 
                                                             interaction_columns=int_columns
                                                             )
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, interaction_columns=int_columns )
Removing bidirectionality of PPI network
In [14]:
Copied!
lr_pairs.shape
lr_pairs.shape
Out[14]:
(1988, 17)

Generate a dictionary with function info for each LR pairs.

Keys are LIGAND_NAME^RECEPTOR_NAME (this is the same nomenclature that will be used when building the protein-based 4D-communication tensor later), and values are the function in the annotation column in the dataframe containing ligand-receptor pairs. Other functional annotations of the LR pairs could be used if available.

In [15]:
Copied!
ppi_functions = dict()

for idx, row in lr_pairs.iterrows():
    ppi_functions[row['interaction_symbol']] = row['annotation']
ppi_functions = dict() for idx, row in lr_pairs.iterrows(): ppi_functions[row['interaction_symbol']] = row['annotation']

3 - Predict metabolic activities with scCellFie¶

Here, the python tool scCellFie to predict metabolic activities based on the gene expression of enzymes that are part of a specific metabolic task, as indicated in this tutorial.

We only need the raw counts in the single-cell data to make the predictions, as scCellFie performs all necessary processing. However, to perform a smoothing based on neighbors, we will compute the neighbors based on the harmony components:

In [16]:
Copied!
sc.pp.neighbors(rnaseq, use_rep='X_harmony')
sc.pp.neighbors(rnaseq, use_rep='X_harmony')
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
In [17]:
Copied!
rnaseq
rnaseq
Out[17]:
AnnData object with n_obs × n_vars = 15973 × 21042
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.5', 'seurat_clusters', 'celltype', 'old.ident'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
    uns: 'neighbors'
    obsm: 'X_harmony', 'X_pca', 'X_umap'
    obsp: 'distances', 'connectivities'
In [18]:
Copied!
results = sccellfie.run_sccellfie_pipeline(rnaseq,
                                           organism='human',
                                           sccellfie_data_folder=None,
                                           n_counts_col='nCount_RNA', # Total counts per cell will be computed if left as None,
                                           process_by_group=False, # Whether to do the processing by cell groups
                                           groupby=None, # Column indicating cell groups if `process_by_group=True`
                                           neighbors_key='neighbors', # Neighbors information if precomputed. Otherwise, it will be computed here
                                           n_neighbors=10, # Number of neighbors to use
                                           batch_key=None, # there is no batch_key in this dataset
                                           threshold_key='sccellfie_threshold',  # This is for using the default database. If personalized thresholds are used, specificy column name
                                           smooth_cells=True, # Whether to perform gene expression smoothing before running the tool
                                           alpha=0.33, # Importance of neighbors' expression for the smoothing (0 to 1)
                                           chunk_size=5000, # Number of chunks to run the processing steps (helps with the memory)
                                           disable_pbar=False,
                                           save_folder=None, # In case results will be saved. If so, results will not be returned and should be loaded from the folder (see sccellfie.io.load_data function
                                           save_filename=None # Name for saving the files, otherwise a default name will be used
                                          )
results = sccellfie.run_sccellfie_pipeline(rnaseq, organism='human', sccellfie_data_folder=None, n_counts_col='nCount_RNA', # Total counts per cell will be computed if left as None, process_by_group=False, # Whether to do the processing by cell groups groupby=None, # Column indicating cell groups if `process_by_group=True` neighbors_key='neighbors', # Neighbors information if precomputed. Otherwise, it will be computed here n_neighbors=10, # Number of neighbors to use batch_key=None, # there is no batch_key in this dataset threshold_key='sccellfie_threshold', # This is for using the default database. If personalized thresholds are used, specificy column name smooth_cells=True, # Whether to perform gene expression smoothing before running the tool alpha=0.33, # Importance of neighbors' expression for the smoothing (0 to 1) chunk_size=5000, # Number of chunks to run the processing steps (helps with the memory) disable_pbar=False, save_folder=None, # In case results will be saved. If so, results will not be returned and should be loaded from the folder (see sccellfie.io.load_data function save_filename=None # Name for saving the files, otherwise a default name will be used )
==== scCellFie Pipeline: Initializing ====
Loading scCellFie database for organism: human

==== scCellFie Pipeline: Processing entire dataset ====

---- scCellFie Step: Preprocessing data ----

---- scCellFie Step: Preparing inputs ----
Gene names corrected to match database: 22
Shape of new adata object: (15973, 838)
Number of GPRs: 764
Shape of tasks by genes: (218, 838)
Shape of reactions by genes: (764, 838)
Shape of tasks by reactions: (218, 764)

---- scCellFie Step: Smoothing gene expression ----
Smoothing Expression: 100%|███████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.33it/s]
---- scCellFie Step: Computing gene scores ----

---- scCellFie Step: Computing reaction activity ----
Cell Rxn Activities: 100%|███████████████████████████████████████████████████| 15973/15973 [01:39<00:00, 159.88it/s]
---- scCellFie Step: Computing metabolic task activity ----
Removed 0 metabolic tasks with zeros across all cells.

==== scCellFie Pipeline: Processing completed successfully ====

The results can be found here:

In [19]:
Copied!
results.keys()
results.keys()
Out[19]:
dict_keys(['adata', 'gpr_rules', 'task_by_gene', 'rxn_by_gene', 'task_by_rxn', 'rxn_info', 'task_info', 'thresholds', 'organism'])

To access metabolic activities, we need to inspect results['adata']:

  • The processed single-cell data is located in the AnnData object results['adata'].

  • The reaction activities for each cell are located in the AnnData object results['adata'].reactions.

  • The metabolic task activities for each cell are located in the AnnData object results['adata'].metabolic_tasks.

In [20]:
Copied!
met_activities = results['adata'].metabolic_tasks.copy()
met_activities = results['adata'].metabolic_tasks.copy()

We will scale these activities with respect to the ranges previously computed across the CZI CELLxGENE Human Cell Atlas

In [21]:
Copied!
minmax = pd.read_csv('https://raw.githubusercontent.com/ventolab/sccellfie-website/refs/heads/main/data/CELLxGENEMetabolicTasksMinMax.csv', index_col=0)
minmax
minmax = pd.read_csv('https://raw.githubusercontent.com/ventolab/sccellfie-website/refs/heads/main/data/CELLxGENEMetabolicTasksMinMax.csv', index_col=0) minmax
Out[21]:
(R)-3-Hydroxybutanoate synthesis 3'-Phospho-5'-adenylyl sulfate synthesis AMP salvage from adenine ATP generation from glucose (hypoxic conditions) - glycolysis ATP regeneration from glucose (normoxic conditions) - glycolysis + krebs cycle Acetoacetate synthesis Alanine degradation Alanine synthesis Arachidonate degradation Arachidonate synthesis ... Valine to succinyl-coA Vesicle secretion beta-Alanine degradation beta-Alanine synthesis cis-vaccenic acid degradation cis-vaccenic acid synthesis gamma-Linolenate degradation gamma-Linolenate synthesis glyco-cholate synthesis tauro-cholate synthesis
single_cell_min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
single_cell_max 7.419829 11.795217 20.651447 13.613079 4.979552 8.038148 3.030361 7.951063 1.752682 4.296328 ... 6.424888 2.780753 2.980583 6.203485 2.161476 4.050148 2.341566 7.578767 2.982760 2.982760
cell_type_min 0.041299 0.000000 0.000000 0.033310 0.024252 0.023852 0.028060 0.025760 0.000000 0.021624 ... 0.000000 0.000000 0.025981 0.029364 0.000000 0.013848 0.000000 0.000000 0.000000 0.000000
cell_type_max 4.403184 4.105401 5.187688 7.840590 3.159272 4.670686 1.874831 4.480565 0.854841 2.594763 ... 1.313472 0.798270 1.899791 3.463882 1.151658 2.225383 1.263210 1.716187 0.941299 0.941299

4 rows × 218 columns

We divide each task value by the max value found across the human cell atlas, so we scale them between 0 and 1:

In [22]:
Copied!
met_activities.X = np.clip(met_activities.X / minmax.loc['single_cell_max', :].values.reshape(-1), a_min=0., a_max=1.0)
met_activities.X = np.clip(met_activities.X / minmax.loc['single_cell_max', :].values.reshape(-1), a_min=0., a_max=1.0)

Similar to the CCC part where we prepared the gene expression data, here we convert the single-cell metabolic activities into cell-type activity levels.

A basic context-wise preprocessing is performed here, keeping only cell types per context with at least 20 cells.

In [23]:
Copied!
met_matrices = []

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = met_activities[cells]

    # Aggregate gene expression of single cells into cell types, using the trimean
    exp_df = sccellfie.expression.aggregation.agg_expression_cells(met_activities,
                                                                   groupby='celltype', 
                                                                   agg_func='trimean')

    met_matrices.append(exp_df)
met_matrices = [] for context in tqdm(contexts): meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \ & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')] # Remove celltype per context that has few single cells min_sc_number = 20 excluded_sc = [] for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows(): if ~row['celltype']: excluded_sc.append(idx) meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)] cells = list(meta_context.index) meta_context.index.name = 'barcode' tmp_data = met_activities[cells] # Aggregate gene expression of single cells into cell types, using the trimean exp_df = sccellfie.expression.aggregation.agg_expression_cells(met_activities, groupby='celltype', agg_func='trimean') met_matrices.append(exp_df)
  0%|          | 0/4 [00:00<?, ?it/s]
In [24]:
Copied!
met_matrices[0]
met_matrices[0]
Out[24]:
Task (R)-3-Hydroxybutanoate synthesis 3'-Phospho-5'-adenylyl sulfate synthesis AMP salvage from adenine ATP generation from glucose (hypoxic conditions) - glycolysis ATP regeneration from glucose (normoxic conditions) - glycolysis + krebs cycle Acetoacetate synthesis Alanine degradation Alanine synthesis Arachidonate degradation Arachidonate synthesis ... Valine to succinyl-coA Vesicle secretion beta-Alanine degradation beta-Alanine synthesis cis-vaccenic acid degradation cis-vaccenic acid synthesis gamma-Linolenate degradation gamma-Linolenate synthesis glyco-cholate synthesis tauro-cholate synthesis
GABAergic 0.257290 0.070321 0.016719 0.248574 0.289472 0.257290 0.316788 0.251865 0.122373 0.317457 ... 0.024489 0.088211 0.349128 0.258951 0.183256 0.262169 0.175722 0.041018 0.017065 0.017065
Glia 0.228816 0.017635 0.033727 0.215604 0.267410 0.227830 0.336807 0.231558 0.161978 0.301874 ... 0.033127 0.068634 0.364833 0.233392 0.210315 0.257721 0.202597 0.049065 0.034382 0.034382
Glutamatergic 0.185345 0.064322 0.025364 0.173935 0.232887 0.184854 0.271598 0.184064 0.127847 0.251595 ... 0.024005 0.070838 0.304126 0.193836 0.175435 0.211203 0.169420 0.043661 0.019224 0.019224
IP 0.222547 0.052597 0.027519 0.213665 0.270883 0.222173 0.310499 0.222034 0.138435 0.299648 ... 0.024196 0.081357 0.348271 0.224283 0.204476 0.246714 0.195585 0.040729 0.022202 0.022202
MC 0.176754 0.014631 0.020480 0.168017 0.204469 0.175424 0.274017 0.175904 0.115840 0.236990 ... 0.029954 0.058975 0.300923 0.181178 0.175457 0.199442 0.165586 0.043496 0.018289 0.018289
Other 0.136290 0.022540 0.034058 0.128478 0.162129 0.136080 0.209196 0.134473 0.086839 0.178832 ... 0.031152 0.055257 0.232028 0.137534 0.123759 0.149276 0.118593 0.025574 0.017805 0.017805
Progenitor 0.142688 0.011824 0.060387 0.132497 0.168674 0.142371 0.215409 0.141622 0.105195 0.191222 ... 0.040749 0.068754 0.249974 0.149202 0.142666 0.151951 0.138825 0.012511 0.032817 0.032817

7 rows × 218 columns

4 - Build Tensors¶

Build 4D-Communication Tensor¶

Here we use as input the list of gene expression matrices (rnaseq_matrices) that were aggregated into a cell-type granularity. This list contains the expression matrices of all samples/contexts.

The following functions perform all the steps to build a 4D-communication tensor:

4dtensor

The parameter communication_score indicates the scoring function to use.

how='outer_cells' considers only LR pairs that are present in all contexts, while all cell types in the whole dataset are kept, regardless they appear only in one or a few contexts.

complex_sep='&' is used to specify that the list of ligand-receptor pairs contains protein complexes and that subunits are separated by '&'. If the list does not have complexes, use complex_sep=None instead.

In [25]:
Copied!
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
                                         ppi_data=lr_pairs,
                                         context_names=contexts,
                                         how='outer_cells',
                                         complex_sep='&',
                                         interaction_columns=int_columns,
                                         communication_score='expression_gmean',
                                         order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'],
                                         device='cuda' if use_gpu else 'cpu'
                                        )
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices, ppi_data=lr_pairs, context_names=contexts, how='outer_cells', complex_sep='&', interaction_columns=int_columns, communication_score='expression_gmean', order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'], device='cuda' if use_gpu else 'cpu' )
Getting expression values for protein complexes
Building tensor for the provided context
In [26]:
Copied!
tensor_lr.tensor.shape
tensor_lr.tensor.shape
Out[26]:
(4, 422, 5, 5)

Build 3D Metabolic Task Activity Tensor¶

To build our 3D tensor with dimensions as contexts, receiver cells, and metabolic tasks, we need to make sure that all cell types in the 4D tensor are present in each matrix (for those missing we will assign a NaN value).

We get the list of cell types:

In [27]:
Copied!
celltypes = tensor_lr.order_names[-1]
celltypes = tensor_lr.order_names[-1]

Then we reindex all matrices with metabolic activities (this will generate missing rows (index), and fill them with NaNs):

In [28]:
Copied!
met_activities_nan = [met.reindex(index=celltypes) for met in met_matrices]
met_activities_nan = [met.reindex(index=celltypes) for met in met_matrices]

Now we can use these scCellFie outputs directly to generate our 3D metabolic activity tensor. First we generate just the tensor containing the values using tensorly:

In [29]:
Copied!
tensor_3d = tl.tensor(met_activities_nan)
tensor_3d.shape
tensor_3d = tl.tensor(met_activities_nan) tensor_3d.shape
Out[29]:
(4, 5, 218)

We then obtain the name of metabolic task present in the data:

In [30]:
Copied!
met_names = met_activities_nan[0].columns.tolist()
met_names = met_activities_nan[0].columns.tolist()

Now we take this tensor to generate an InteractionTensor object using cell2cell:

In [31]:
Copied!
tensor_met = c2c.tensor.PreBuiltTensor(tensor_3d, 
                                       order_names=[contexts, celltypes, met_names],
                                       order_labels=['Contexts', 'Receiver Cells', 'Metabolic Tasks']
                                      )
tensor_met = c2c.tensor.PreBuiltTensor(tensor_3d, order_names=[contexts, celltypes, met_names], order_labels=['Contexts', 'Receiver Cells', 'Metabolic Tasks'] )
In [32]:
Copied!
tensor_met.tensor.shape
tensor_met.tensor.shape
Out[32]:
(4, 5, 218)

5 - Generate coupled tensors¶

After constructing our 4D communication tensor and 3D metabolic task activity tensor, we can create a CoupledInteractionTensor object in Tensor-cell2cell v2. This object will later be used to perform CTCA.

The key step here is to define which dimensions should be coupled between the tensors. In this example:

  • Context dimensions are coupled directly, since they are identical across both tensors.
  • Cell type dimension (in the metabolic task tensor) can be coupled either with the sender or receiver dimension (in the communication tensor):
    • Couple to sender: Metabolic task activity is aligned with the cells producing the ligands of each LR pair. This suggests that the metabolic state of sender cells could influence or coordinate with their ability to produce signaling molecules.
    • Couple to receiver: Metabolic task activity is aligned with the cells receiving the signal through the receptor. This suggests that metabolic programs may be regulated downstream of the receptors, capturing how cell-cell communication drives metabolic changes in the receiving cells.

In this tutorial, we choose the second option (coupling to the receiver). This means we focus on metabolic task activity that is potentially triggered by LR interactions within the CCC program.

Concretely, the context dimension corresponds to dim 0 in both tensors, while the receiver cell dimension is dim 3 in the 4D communication tensor and dim 2 in the 3D metabolic task tensor.

In [33]:
Copied!
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
                                                     tensor2=tensor_met,
                                                     tensor1_name='cell2cell',
                                                     tensor2_name='scCellFie',
                                                      # Below we specify how dims of tensor1 are coupled with dims of tensor2
                                                     mode_mapping={'shared' : [(0,0), (3,1)]}
                                                    )
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr, tensor2=tensor_met, tensor1_name='cell2cell', tensor2_name='scCellFie', # Below we specify how dims of tensor1 are coupled with dims of tensor2 mode_mapping={'shared' : [(0,0), (3,1)]} )

We can observe that this object includes both tensors simultaneously.

In [34]:
Copied!
coupled_tensor.shape
coupled_tensor.shape
Out[34]:
((4, 422, 5, 5), (4, 5, 218))

Assume that missing values are real zeros

Instead of imputing missing values with our method, we will assume that they are real zeros.

In [35]:
Copied!
coupled_tensor.mask1 = None
coupled_tensor.mask2 = None
coupled_tensor.mask1 = None coupled_tensor.mask2 = None

6 - Run Analysis¶

Perform tensor factorization¶

Next, we apply a non-negative canonical coupled component analysis (CTCA). Briefly, this tensor decomposition identifies a low-rank tensor (here, a rank of 10) for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (10 of them in this case). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.

In the CTCA, we have vectors that are shared between the coupled tensors (here contexts and receiver cells), while we have other vectors that are private for each modality (LR pair and sender cells for the 4D communication tensor; metabolic tasks for the 3D metabolic activity tensor). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs and metabolic tasks from each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pair and metabolic tasks for each factor would behave the same, as this could lead to loadings for the context and receiver cells with different behaviors between modalities.

To illustrate how performing the CTCA for identifying LR pairs acting coordinated with metabolic activities, we skipped the Elbow Analysis, which can be found in our previous tutorial for protein- and metabolite-based LR pairs. Instead, we run the factorization using a manual number of 10 factors:

In [36]:
Copied!
coupled_tensor.compute_tensor_factorization(rank=10, # This is just for illustrative purposes - to keep a low number of factors
                                            init='svd',
                                            random_state=888,
                                            runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
                                           )
coupled_tensor.compute_tensor_factorization(rank=10, # This is just for illustrative purposes - to keep a low number of factors init='svd', random_state=888, runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition )

7 - Results¶

Metadata¶

Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots

This is a list containing metadata for each dimension of the tensor (contexts, LR pairs, sender cells, receiver cells). Each metadata corresponds to a dataframe with info of each element in the respective dimension.

We start with the protein-based tensor.

In [37]:
Copied!
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr,
                                              metadata_dicts=[None, ppi_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr, metadata_dicts=[None, ppi_functions, None, None], fill_with_order_elements=True )

We continue with the metabolic activity tensor. We can find metabolic tasks metadata in results['task_info']:

In [38]:
Copied!
results['task_info'].head()
results['task_info'].head()
Out[38]:
Task System Subsystem
0 (R)-3-Hydroxybutanoate synthesis CARBOHYDRATES METABOLISM KETOGENESIS
1 3'-Phospho-5'-adenylyl sulfate synthesis NUCLEOTIDE METABOLISM COFACTOR
2 AMP salvage from adenine NUCLEOTIDE METABOLISM SALVAGE
3 ATP generation from glucose (hypoxic condition... ENERGY METABOLISM ATP GENERATION
4 ATP regeneration from glucose (normoxic condit... ENERGY METABOLISM ATP GENERATION

Here we will use categories at the System level:

In [39]:
Copied!
task_mapper = results['task_info'].set_index('Task')['System'].to_dict()
task_mapper = results['task_info'].set_index('Task')['System'].to_dict()
In [40]:
Copied!
met_functions = {task: task_mapper[task] for task in tensor_met.order_names[2]}
met_functions = {task: task_mapper[task] for task in tensor_met.order_names[2]}
In [41]:
Copied!
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met,
                                              metadata_dicts=[None, None, met_functions],
                                              fill_with_order_elements=True
                                             )
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met, metadata_dicts=[None, None, met_functions], fill_with_order_elements=True )

Generate unified metadata for CTCA results

By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.

In [42]:
Copied!
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)

Plot factor loadings¶

After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tensor (LR pairs and Metabolic Tasks, respectively).

We generate the order of LR pairs and metabolic tasks given their categories:

In [43]:
Copied!
lr_order = meta_tf[1].sort_values(['Category', 'Element'])['Element'].values.tolist()
met_order = meta_tf2[2].sort_values(['Category', 'Element'])['Element'].values.tolist()
lr_order = meta_tf[1].sort_values(['Category', 'Element'])['Element'].values.tolist() met_order = meta_tf2[2].sort_values(['Category', 'Element'])['Element'].values.tolist()

And color palettes for each of the dimensions:

In [44]:
Copied!
# Color palettes for each of the coupled tensor dimensions, ordered as order_sorting list below
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'tab20']
# Color palettes for each of the coupled tensor dimensions, ordered as order_sorting list below cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'tab20']
In [45]:
Copied!
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
                                                 metadata=new_meta_tf,
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps=cmaps,
                                                 # Order of dimensions
                                                 order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells',
                                                                'Protein LRIs', 'Metabolic Tasks'
                                                               ],
                                                 # Order of elements
                                                 reorder_elements={'Protein LRIs' : lr_order,
                                                                   'Metabolic Tasks': met_order},
                                                 fontsize=14,
                                                 filename=output_folder + 'Tensor-Factorization.pdf'
                                                )
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor, metadata=new_meta_tf, sample_col='Element', group_col='Category', meta_cmaps=cmaps, # Order of dimensions order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells', 'Protein LRIs', 'Metabolic Tasks' ], # Order of elements reorder_elements={'Protein LRIs' : lr_order, 'Metabolic Tasks': met_order}, fontsize=14, filename=output_folder + 'Tensor-Factorization.pdf' )
No description has been provided for this image

Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts and receiver-cells; Private dimensions: LR pairs, sender-cells, and metabolic tasks).

The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 9, the identified communication pattern occurs in early stages, decreasing across time, where glial cells and progenitors play a major role as sender and as receivers. Then we can identify the top LR pairs and metabolic tasks contributing from cognate modalities. Thus, we can say that the receptors of these LR pairs act in coordination with the metabolic tasks in the receiver cells.

Top LR pairs and metabolic tasks¶

Top-10 LR pairs

In [46]:
Copied!
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 10))
    print('')
for i in range(coupled_tensor.rank): print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 10)) print('')
MDK^NCL        0.447461
CDH2^CDH2      0.354247
CD99^CD99      0.293988
MDK^SDC2       0.288428
PTN^NCL        0.238056
MDK^LRP1       0.173126
PTN^SDC2       0.152314
MPZL1^MPZL1    0.135136
MDK^PTPRZ1     0.132628
GAS6^AXL       0.112833
Name: Factor 1, dtype: float64

MDK^NCL       0.468916
PTN^NCL       0.411515
MDK^SDC2      0.396334
CD99^CD99     0.338853
PTN^SDC2      0.335042
MDK^PTPRZ1    0.198042
CDH2^CDH2     0.173948
MDK^LRP1      0.169168
PTN^PTPRZ1    0.168926
MDK^SDC4      0.105507
Name: Factor 2, dtype: float64

PTN^PTPRZ1    0.531429
MDK^PTPRZ1    0.461338
PTN^NCL       0.439472
MDK^NCL       0.339584
CD99^CD99     0.184761
CDH2^CDH2     0.158804
PTN^SDC3      0.109892
PTN^SDC1      0.105353
MDK^SDC1      0.098751
MDK^LRP1      0.097240
Name: Factor 3, dtype: float64

NCAM1^NCAM1    0.420415
MDK^NCL        0.390320
PTN^NCL        0.345838
CADM1^CADM1    0.325055
CDH2^CDH2      0.295218
MDK^LRP1       0.169197
EFNB2^EPHA4    0.135251
PTN^SDC3       0.128014
NRXN1^NLGN2    0.121424
MPZL1^MPZL1    0.119317
Name: Factor 4, dtype: float64

PTN^PTPRZ1         0.449854
MDK^PTPRZ1         0.434368
PTN^SDC3           0.295742
CDH2^CDH2          0.226626
PTN^NCL            0.220143
MDK^NCL            0.192212
MDK^LRP1           0.164232
CD99^CD99          0.147850
MDK^ITGA6&ITGB1    0.140440
DLL1^NOTCH2        0.122669
Name: Factor 5, dtype: float64

MDK^PTPRZ1      0.478962
MDK^NCL         0.427506
CD99^CD99       0.425680
MDK^LRP1        0.192323
EFNA3^EPHA5     0.185464
EFNA3^EPHA4     0.174341
CADM1^CADM1     0.163048
EFNA3^EPHA7     0.138361
DLK1^NOTCH1     0.121665
LRRC4B^PTPRF    0.116205
Name: Factor 6, dtype: float64

MDK^NCL            0.348511
MDK^SDC2           0.325627
DLL3^NOTCH2        0.251937
DLK1^NOTCH2        0.247194
NCAM1^FGFR1        0.228189
MDK^ITGA6&ITGB1    0.208784
CDH2^CDH2          0.203821
CD99^CD99          0.179026
PTN^SDC2           0.176697
PTN^NCL            0.169592
Name: Factor 7, dtype: float64

PTN^PTPRZ1         0.335747
NCAM1^NCAM1        0.316447
NCAM1^FGFR1        0.303607
MDK^PTPRZ1         0.234244
NRXN1^NLGN1        0.213643
CADM1^CADM1        0.196564
MDK^ITGA6&ITGB1    0.195380
DLL3^NOTCH2        0.195363
NRXN1^NLGN3        0.192741
NRXN1^NLGN2        0.185744
Name: Factor 8, dtype: float64

MDK^ITGA6&ITGB1       0.359098
DLK1^NOTCH2           0.218617
MDK^SDC2              0.180041
COL4A5^ITGAV&ITGB8    0.178171
CD99^CD99             0.174371
DLK1^NOTCH3           0.163681
COL4A6^ITGAV&ITGB8    0.154054
COL6A1^ITGAV&ITGB8    0.145910
GAS6^AXL              0.142663
LAMB2^ITGA6&ITGB1     0.126305
Name: Factor 9, dtype: float64

NCAM1^NCAM1      0.411756
MDK^NCL          0.245947
CADM1^CADM1      0.227029
CADM3^CADM3      0.199214
L1CAM^L1CAM      0.195029
PTN^SDC3         0.184067
PTN^NCL          0.173581
SEMA6A^PLXNA4    0.163215
MDK^LRP1         0.158855
EFNB3^EPHB6      0.157455
Name: Factor 10, dtype: float64

Top 10 Metabolic Tasks

In [47]:
Copied!
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Metabolic Tasks', 'Factor {}'.format(i+1), 10))
    print('')
for i in range(coupled_tensor.rank): print(coupled_tensor.get_top_factor_elements('Metabolic Tasks', 'Factor {}'.format(i+1), 10)) print('')
Ceramide synthesis                     0.174836
Sphingomyelin synthesis                0.159966
beta-Alanine degradation               0.155451
Proline degradation                    0.154990
Phosphatidyl-choline synthesis         0.152423
Phosphatidyl-ethanolamine synthesis    0.151645
Phosphatidyl-serine synthesis          0.150507
Ornithine degradation                  0.149437
Valine degradation                     0.144739
Arginine degradation                   0.142601
Name: Factor 1, dtype: float64

Ceramide synthesis                                                              0.200701
Isoleucine degradation                                                          0.165953
Synthesis of galactosyl glucosyl ceramide (link with ganglioside metabolism)    0.163736
Ornithine degradation                                                           0.153292
beta-Alanine degradation                                                        0.151075
Phosphatidyl-ethanolamine synthesis                                             0.150496
Proline degradation                                                             0.149675
Phosphatidyl-choline synthesis                                                  0.149418
Lysine degradation                                                              0.148358
Threonine degradation                                                           0.147263
Name: Factor 2, dtype: float64

Proline degradation                    0.195586
beta-Alanine degradation               0.192549
Ornithine degradation                  0.188544
Phosphatidyl-ethanolamine synthesis    0.183235
Phosphatidyl-choline synthesis         0.182657
Phosphatidyl-serine synthesis          0.178943
Threonine degradation                  0.178147
Lysine degradation                     0.171664
Glutamate degradation                  0.168758
Leucine degradation                    0.165325
Name: Factor 3, dtype: float64

Ceramide synthesis                                                              0.263489
Sphingomyelin synthesis                                                         0.226176
Synthesis of galactosyl glucosyl ceramide (link with ganglioside metabolism)    0.210807
Isoleucine degradation                                                          0.206439
Valine degradation                                                              0.194876
Alanine degradation                                                             0.178883
(R)-3-Hydroxybutanoate synthesis                                                0.157006
Glycine synthesis                                                               0.150121
Triacylglycerol synthesis                                                       0.149223
Aspartate synthesis                                                             0.142947
Name: Factor 4, dtype: float64

Sphingomyelin synthesis                                                         0.196100
Valine degradation                                                              0.178646
Ceramide synthesis                                                              0.167458
Alanine degradation                                                             0.165018
Synthesis of galactosyl glucosyl ceramide (link with ganglioside metabolism)    0.139448
Isoleucine degradation                                                          0.138287
Ornithine degradation                                                           0.137874
Phosphatidyl-serine synthesis                                                   0.137472
Arginine degradation                                                            0.135105
Proline degradation                                                             0.134954
Name: Factor 5, dtype: float64

Sphingomyelin synthesis                                          0.357071
Valine degradation                                               0.342053
Alanine degradation                                              0.330187
Aspartate synthesis                                              0.246301
ATP generation from glucose (hypoxic conditions) - glycolysis    0.235262
Cardiolipin synthesis                                            0.215084
Synthesis of globoside (link with globoside metabolism)          0.209443
Glycine degradation                                              0.184424
Linolenate degradation                                           0.178612
Guanosine triphosphate synthesis (GTP)                           0.176691
Name: Factor 6, dtype: float64

Conversion of glutamate to glutamine                                                                                       0.369020
Synthesis of taurine from cysteine                                                                                         0.316840
Keratan sulfate degradation                                                                                                0.271127
Degradation of adenine to urate                                                                                            0.263173
Fructose to glucose conversion (via fructose-6-phosphate)                                                                  0.262595
AMP salvage from adenine                                                                                                   0.259930
Homocysteine synthesis (need methionine)                                                                                   0.219605
Cysteine synthesis (need serine and methionine)                                                                            0.211448
Synthesis of creatine from arginine                                                                                        0.207733
Biosynthesis of core4 (N-Acetyl-beta-D-glucosaminyl-1,6-(N-acetyl-beta-D-glucosaminyl-1,3)-N-acetyl-D-galactosaminyl-R)    0.206311
Name: Factor 7, dtype: float64

Sphingomyelin synthesis                         0.367898
Cysteine degradation                            0.340516
Palmitolate degradation                         0.306788
Palmitate degradation                           0.294419
Elaidate degradation                            0.280084
Deoxyguanosine triphosphate synthesis (dGTP)    0.254476
Guanosine triphosphate synthesis (GTP)          0.253583
Valine degradation                              0.229901
Cardiolipin synthesis                           0.223260
Alanine degradation                             0.198617
Name: Factor 8, dtype: float64

Synthesis of taurine from cysteine                            0.624341
Keratan sulfate biosynthesis from N-glycan                    0.373960
Synthesis of coenzyme A                                       0.263272
Keratan sulfate biosynthesis from O-glycan (core 4-linked)    0.220560
Keratan sulfate biosynthesis from O-glycan (core 2-linked)    0.220560
UDP-glucose synthesis                                         0.199930
Heme synthesis                                                0.197824
Synthesis of aspartate from glutamine                         0.188478
O-linked glycosylation                                        0.177559
Branching (N-acetylglucosaminyltransferases)                  0.172314
Name: Factor 9, dtype: float64

Conversion of asparate to asparagine                                            0.260433
Sphingomyelin synthesis                                                         0.248437
Synthesis of galactosyl glucosyl ceramide (link with ganglioside metabolism)    0.247289
Degradation of cytosine                                                         0.199181
Degradation of uracil                                                           0.199181
3'-Phospho-5'-adenylyl sulfate synthesis                                        0.190187
Methionine degradation                                                          0.188159
Arachidonate synthesis                                                          0.182839
Triacylglycerol synthesis                                                       0.181313
Ornithine degradation                                                           0.172243
Name: Factor 10, dtype: float64

Export Loadings¶

THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES

In [48]:
Copied!
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
Coupled tensor factor loadings saved to ./results/scCellFie/Coupled-Loadings.xlsx

8 - Downstream Analyses¶

Generate factor-specific networks¶

We can generate factor-specific communication networks connecting sender-to-receiver cells.

In [49]:
Copied!
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, 
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells')
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, sender_label='Sender Cells', receiver_label='Receiver Cells')

Generate matrix of cell-cell pairs by factors

In [50]:
Copied!
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')

Select cell-cell pairs with high potential of interaction

In [51]:
Copied!
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
No description has been provided for this image
In [52]:
Copied!
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
# To consider only cell pairs with high potential to interact in each factor cci_threshold = 0.2

Visualize networks of factors with significant differences between groups

In [53]:
Copied!
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
                                                 included_factors=None,
                                                 ccc_threshold=cci_threshold, # Only important communication
                                                 nrows=2,
                                                 panel_size=(8, 8),
                                                 network_layout='circular',
                                                 filename=output_folder + 'Networks.pdf'
                                                )
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors, included_factors=None, ccc_threshold=cci_threshold, # Only important communication nrows=2, panel_size=(8, 8), network_layout='circular', filename=output_folder + 'Networks.pdf' )
No description has been provided for this image

Clustermaps¶

Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings

In [54]:
Copied!
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]

Cluster cell-cell pairs by their potential across factors

Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.

In [55]:
Copied!
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
                                         use_zscore=False, 
                                         loading_threshold=cci_threshold, # To consider relevant CCIs
                                         figsize=(20, 9),
                                         tick_fontsize=16,
                                         cmap='YlOrBr',
                                         filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
                                         row_cluster=False
                                        )
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted], use_zscore=False, loading_threshold=cci_threshold, # To consider relevant CCIs figsize=(20, 9), tick_fontsize=16, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf', row_cluster=False )
No description has been provided for this image

Cluster LR pairs by their importance across factors

We can plot the key LR pairs from the CCC modality and TFs from the TF activity modality. First we identify what could represent a high loading given their separate distributions.

In [56]:
Copied!
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
No description has been provided for this image
In [57]:
Copied!
_ = plt.hist(coupled_tensor.factors['Metabolic Tasks'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Metabolic Tasks'].values.flatten(), bins = 20)
No description has been provided for this image

And set and arbitrary threshold to filter out those that have low loadings

In [58]:
Copied!
# To consider only most relevant LR pairs
lr_threshold = 0.15
met_threshold = 0.15
# To consider only most relevant LR pairs lr_threshold = 0.15 met_threshold = 0.15

Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case.

In [59]:
Copied!
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold).any()]
df1 = (df1 / df1.max().max())

df2 = coupled_tensor.factors['Metabolic Tasks'].copy()
df2 = df2[(df2.T > met_threshold).any()]
df2 = (df2 / df2.max().max())
df1 = coupled_tensor.factors['Protein LRIs'].copy() df1 = df1[(df1.T > lr_threshold).any()] df1 = (df1 / df1.max().max()) df2 = coupled_tensor.factors['Metabolic Tasks'].copy() df2 = df2[(df2.T > met_threshold).any()] df2 = (df2 / df2.max().max())
In [60]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df1,
                                        loading_threshold=lr_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(10, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df1, loading_threshold=lr_threshold, # To consider only top LRs use_zscore=False, figsize=(10, 5), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf', row_cluster=False )
No description has been provided for this image
In [61]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df2,
                                        loading_threshold=met_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(16, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-MetabolicTasks.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df2, loading_threshold=met_threshold, # To consider only top LRs use_zscore=False, figsize=(16, 5), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-MetabolicTasks.pdf', row_cluster=False )
No description has been provided for this image

Given that we max-scaled them, we can also put them together in an unified plot

In [62]:
Copied!
df = pd.concat([df1, df2])
df = pd.concat([df1, df2])
In [63]:
Copied!
lr_cm = c2c.plotting.loading_clustermap(df,
                                        loading_threshold=0.1, # To consider only top LRs and TFs
                                        use_zscore=False,
                                        figsize=(24, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-LR-MetTasks.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df, loading_threshold=0.1, # To consider only top LRs and TFs use_zscore=False, figsize=(24, 5), cbar_fontsize=20, tick_fontsize=10, cmap='YlOrBr', filename=output_folder + 'Clustermap-Factor-specific-LR-MetTasks.pdf', row_cluster=False )
No description has been provided for this image
Previous Next

Built with MkDocs using a theme provided by Read the Docs.
GitHub « Previous Next »