cell2cell
  • Home
  • API Documentation

cell2cell Tutorials

  • Cell-cell communication from bulk dataset
  • Cell-cell communication from single-cell dataset

Tensor-cell2cell Tutorials

  • Obtaining patterns of cell-cell communication with Tensor-cell2cell
  • Downstream analysis 1: Factor-specific analyses
  • Downstream analysis 2: Gene Set Enrichment Analysis
  • Inspecting CCC patterns from spatial transcriptomics
  • Running Tensor-cell2cell on your own GPU or on Google Colab's GPU

Tensor-cell2cell v2 Tutorials

  • Building multimodal 4D communication tensors (MEBOCOST)
    • 1 - Load Data
      • RNA-seq data
      • Ligand-Receptor pairs
      • MEBOCOST ouputs
    • 2 - Data Preprocessing
      • RNA-seq
      • LR pairs
    • 3 - Build Tensors
      • Build 4D-Communication Tensor
      • Re-order ligand-receptor interactions based on their categories
    • 4 - Export tensors and metadata
  • Patterns of protein- and metabolite-based cell-cell communication (MEBOCOST)
  • Patterns of cell-cell communication and transcription factors (DecoupleR)
  • Patterns of cell-cell communication and metabolic activities (scCellFie)
  • Patterns of protein- and metabolite-based cell-cell communication (LIANA+)
cell2cell
  • Tensor-cell2cell v2 Tutorials
  • Building multimodal 4D communication tensors (MEBOCOST)
  • Edit on GitHub

Building multimodal 4D communication tensors (MEBOCOST)¶

This tutorial is focused on running Tensor-cell2cell v2 using the tools cell2cell and MEBOCOST to predict protein- and metabolite-mediated cell-cell communication (CCC), respectively.

In this case, we use samples from cortical brain organoids across different time points, previously published on https://doi.org/10.1016/j.stem.2019.08.002.

Read this if you plan using a GPU to speed-up the analysis. Before running this notebook, make sure to have a proper NVIDIA GPU driver (https://www.nvidia.com/Download/index.aspx) as well as the CUDA toolkit (https://developer.nvidia.com/cuda-toolkit) installed.

Then, make sure to create an environment with Pytorch >= v1.8.0 following these instructions to enable CUDA.

https://pytorch.org/get-started/locally/

Once you have everything installed, run the next blocks two blocks of code before everything.

If you are using a NVIDIA GPU, with PyTorch and CUDA, set the following variable to be True

use_gpu = True, otherwise set it use_gpu = False

In [1]:
Copied!
use_gpu = True

import tensorly as tl
if use_gpu:
    tl.set_backend('pytorch')
use_gpu = True import tensorly as tl if use_gpu: tl.set_backend('pytorch')
In [2]:
Copied!
import cell2cell as c2c
import scanpy as sc

import numpy as np
import pandas as pd

from tqdm.auto import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import cell2cell as c2c import scanpy as sc import numpy as np import pandas as pd from tqdm.auto import tqdm import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
In [3]:
Copied!
import warnings
warnings.filterwarnings('ignore')
import warnings warnings.filterwarnings('ignore')

1 - Load Data¶

In [4]:
Copied!
import os

data_folder = './data/'
directory = os.fsencode(data_folder)

output_folder = './data/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
    os.mkdir(output_folder)
import os data_folder = './data/' directory = os.fsencode(data_folder) output_folder = './data/Tensor-cell2cell/' if not os.path.isdir(output_folder): os.mkdir(output_folder)

RNA-seq data¶

A preprocessed version of the original paper (Trujillo et al, 2019) can be found here

In [5]:
Copied!
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
In [6]:
Copied!
rnaseq = rnaseq.raw.to_adata()
rnaseq = rnaseq.raw.to_adata()
In [7]:
Copied!
rnaseq.obs.head(2)
rnaseq.obs.head(2)
Out[7]:
orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters celltype old.ident
Month-01_AAACCTGAGAATTGTG-1 Month-01 6417.0 2431 2.898551 11 11 Other 11
Month-01_AAACCTGAGCAACGGT-1 Month-01 8522.0 2464 2.663694 6 6 Progenitor 6

Ligand-Receptor pairs¶

Different databases of ligand-receptor interactions could be used. We previously created a repository that includes many available DBs (https://github.com/LewisLabUCSD/Ligand-Receptor-Pairs). In this tutorial, we employ the ligand-receptor pairs from CellChat (https://doi.org/10.1038/s41467-021-21246-9), which includes multimeric protein complexes.

In [8]:
Copied!
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv')
lr_pairs = lr_pairs.astype(str)
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv') lr_pairs = lr_pairs.astype(str)
In [9]:
Copied!
lr_pairs.head(2)
lr_pairs.head(2)
Out[9]:
interaction_name pathway_name ligand receptor agonist antagonist co_A_receptor co_I_receptor evidence annotation interaction_name_2 ligand_symbol receptor_symbol ligand_ensembl receptor_ensembl interaction_symbol interaction_ensembl
0 TGFB1_TGFBR1_TGFBR2 TGFb TGFB1 TGFbR1_R2 TGFb agonist TGFb antagonist nan TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling TGFB1 - (TGFBR1+TGFBR2) TGFB1 TGFBR1&TGFBR2 ENSG00000105329 ENSG00000106799&ENSG00000163513 TGFB1^TGFBR1&TGFBR2 ENSG00000105329^ENSG00000106799&ENSG00000163513
1 TGFB2_TGFBR1_TGFBR2 TGFb TGFB2 TGFbR1_R2 TGFb agonist TGFb antagonist nan TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling TGFB2 - (TGFBR1+TGFBR2) TGFB2 TGFBR1&TGFBR2 ENSG00000092969 ENSG00000106799&ENSG00000163513 TGFB2^TGFBR1&TGFBR2 ENSG00000092969^ENSG00000106799&ENSG00000163513
In [10]:
Copied!
# interaction columns:
int_columns = ('ligand_symbol', 'receptor_symbol')
# interaction columns: int_columns = ('ligand_symbol', 'receptor_symbol')

MEBOCOST ouputs¶

In this tutorial we assume that you already ran MEBOCOST and saved its ouputs. You can find an example on how running it and format outputs to be used here:

  • Run MEBOCOST for Tensor-cell2cell v2
In [11]:
Copied!
import os

mebocost_folder = data_folder + '/mebocost-results/'
directory = os.fsencode(mebocost_folder )


data = dict()
for file in sorted(os.listdir(directory)):
    filename = os.fsdecode(file)
    if filename.endswith(".csv"): 
        print(filename)
        basename = os.path.basename(filename)
        sample = basename.split('.')[0]
        sample = sample.lstrip('MEBOCOST')[1:]
        data[sample] = pd.read_csv(mebocost_folder  + filename)
        data[sample] = data[sample].loc[data[sample]['Annotation'].str.contains('Receptor')]
    else:
        continue
import os mebocost_folder = data_folder + '/mebocost-results/' directory = os.fsencode(mebocost_folder ) data = dict() for file in sorted(os.listdir(directory)): filename = os.fsdecode(file) if filename.endswith(".csv"): print(filename) basename = os.path.basename(filename) sample = basename.split('.')[0] sample = sample.lstrip('MEBOCOST')[1:] data[sample] = pd.read_csv(mebocost_folder + filename) data[sample] = data[sample].loc[data[sample]['Annotation'].str.contains('Receptor')] else: continue
MEBOCOST-Month-01.csv
MEBOCOST-Month-03.csv
MEBOCOST-Month-06.csv
MEBOCOST-Month-10.csv

2 - Data Preprocessing¶

RNA-seq¶

Organize data to create tensor

First, define contexts

In [16]:
Copied!
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']

Generate list of RNA-seq data containing all contexts (time points)

Here, we convert the single-cell expression levels into cell-type expression levels. A basic context-wise preprocessing is performed here, keeping only genes that are expressed at least in 4 single cells, and cell types per context with at least 20 cells.

Important: A list of aggregated gene expression matrices must be used for building the 4D-communication tensor. Each of the gene expression matrices is for one of the contexts, following the same order as the previouslist (contexts).

The aggregation here corresponds to mean expression value across cells of each cell type. Other aggregation methods could be used by changing the parameter method='average' in the aggregate_single_cell() function. Additionally, other gene expression values could be passed, using other preprocessing approaches such as log(x+1) or batch correction methods.

In [17]:
Copied!
rnaseq_matrices = []

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = rnaseq[cells]
    # Keep genes in each sample with at least 4 single cells expressing it
    sc.pp.filter_genes(tmp_data, min_cells=4)
    
    # Normalize and Log1p
    sc.pp.normalize_total(tmp_data, target_sum=1e6)
    sc.pp.log1p(tmp_data)

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    rnaseq_matrices.append(exp_df)
rnaseq_matrices = [] for context in tqdm(contexts): meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \ & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')] # Remove celltype per context that has few single cells min_sc_number = 20 excluded_sc = [] for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows(): if ~row['celltype']: excluded_sc.append(idx) meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)] cells = list(meta_context.index) meta_context.index.name = 'barcode' tmp_data = rnaseq[cells] # Keep genes in each sample with at least 4 single cells expressing it sc.pp.filter_genes(tmp_data, min_cells=4) # Normalize and Log1p sc.pp.normalize_total(tmp_data, target_sum=1e6) sc.pp.log1p(tmp_data) # Aggregate gene expression of single cells into cell types exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(), metadata=meta_context, barcode_col='barcode', celltype_col='celltype', method='average', ) rnaseq_matrices.append(exp_df)
HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))

rnaseq_matrices is the list we will use for building the tensor based on protein-based ligand-receptor pairs.

LR pairs¶

Remove bidirectionality in the list of protein-based ligand-receptor pairs. That is, remove repeated interactions where both interactions are the same but in different order:

From this list:

Ligand Receptor
Protein A Protein B
Protein B Protein A

We will have:

Ligand Receptor
Protein A Protein B
In [13]:
Copied!
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, 
                                                             interaction_columns=int_columns
                                                             )
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, interaction_columns=int_columns )
Removing bidirectionality of PPI network
In [14]:
Copied!
lr_pairs.shape
lr_pairs.shape
Out[14]:
(1988, 17)

Generate a dictionary with function info for each LR pairs.

Keys are LIGAND_NAME^RECEPTOR_NAME (this is the same nomenclature that will be used when building the protein-based 4D-communication tensor later), and values are the function in the annotation column in the dataframe containing ligand-receptor pairs. Other functional annotations of the LR pairs could be used if available.

In [15]:
Copied!
ppi_functions = dict()

for idx, row in lr_pairs.iterrows():
    ppi_functions[row['interaction_symbol']] = row['annotation']
ppi_functions = dict() for idx, row in lr_pairs.iterrows(): ppi_functions[row['interaction_symbol']] = row['annotation']

Similarly, we can generate the functions for the metabolite-based ligand-receptor pairs. For that, we open the MEBOCOST database:

In [23]:
Copied!
met_sensor = pd.read_csv('https://raw.githubusercontent.com/kaifuchenlab/MEBOCOST/refs/heads/main/data/mebocost_db/human/human_met_sensor_update_May21_2025.tsv', 
                         sep='\t',
                         index_col=0
                        )
met_sensor = pd.read_csv('https://raw.githubusercontent.com/kaifuchenlab/MEBOCOST/refs/heads/main/data/mebocost_db/human/human_met_sensor_update_May21_2025.tsv', sep='\t', index_col=0 )
In [24]:
Copied!
met_sensor.head(2)
met_sensor.head(2)
Out[24]:
HMDB_ID standard_metName metName Gene_name Protein_name Evidence Annotation
ID
1 HMDB0006247 25-Hydroxycholesterol 25-Hydroxycholesterol ABCA1 Phospholipid-transporting ATPase ABCA1 16611739 Transporter
2 HMDB0000517 L-Arginine Arginine; L-Arginine SLC7A1 High affinity cationic amino acid transporter 1 33209975; 17325243; 33639836; 23831616; 221449... Transporter

Then, generate the LIGAND_NAME^RECEPTOR_NAME nomenclature that will be used to build the tensor later.

In [25]:
Copied!
met_sensor['interaction'] = met_sensor.apply(lambda row: row['standard_metName'] + '^' + row['Gene_name'], axis=1)
met_sensor['interaction'] = met_sensor.apply(lambda row: row['standard_metName'] + '^' + row['Gene_name'], axis=1)

We get the annotation of each ligand-receptor interaction. Again, this could be a different information/annotation if available.

In [26]:
Copied!
met_mapper = met_sensor.set_index('interaction').to_dict()['Annotation']
met_mapper = met_sensor.set_index('interaction').to_dict()['Annotation']

And add it to our ppi_function dictionary

In [27]:
Copied!
for m in tensor_met.order_names[1]:
    ppi_functions[m] = 'Metabolite ' + met_mapper[m]
for m in tensor_met.order_names[1]: ppi_functions[m] = 'Metabolite ' + met_mapper[m]

3 - Build Tensors¶

Build 4D-Communication Tensor¶

Here we use as input the list of gene expression matrices that were aggregated into a cell-type granularity. This list contains the expression matrices of all samples/contexts.

The following functions perform all the steps to build a 4D-communication tensor:

4dtensor

The parameter communication_score indicates the scoring function to use.

how='outer_cells' considers only LR pairs that are present in all contexts, while all cell types in the whole dataset are kept, regardless they appear only in one or a few contexts.

complex_sep='&' is used to specify that the list of ligand-receptor pairs contains protein complexes and that subunits are separated by '&'. If the list does not have complexes, use complex_sep=None instead.

In [18]:
Copied!
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
                                         ppi_data=lr_pairs,
                                         context_names=contexts,
                                         how='outer_cells',
                                         complex_sep='&',
                                         interaction_columns=int_columns,
                                         communication_score='expression_gmean',
                                         order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'],
                                         device='cuda' if use_gpu else 'cpu'
                                        )
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices, ppi_data=lr_pairs, context_names=contexts, how='outer_cells', complex_sep='&', interaction_columns=int_columns, communication_score='expression_gmean', order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'], device='cuda' if use_gpu else 'cpu' )
Getting expression values for protein complexes
Building tensor for the provided context
In [19]:
Copied!
tensor_lr.tensor.shape
tensor_lr.tensor.shape
Out[19]:
torch.Size([4, 422, 5, 5])

By using our MEBOCOST outputs directly, we can generate our metabolite-based 4D communication tensor:

In [20]:
Copied!
tensor_met = c2c.tensor.dataframes_to_tensor(context_df_dict=data,
                                             sender_col='Sender',
                                             receiver_col='Receiver',
                                             ligand_col='Metabolite_Name',
                                             receptor_col='Sensor',
                                             score_col='Commu_Score',
                                             order_labels=['Contexts', 'Metabolite LRIs', 'Sender Cells', 'Receiver Cells'],
                                             how='outer_cells', # Multiple options implemented now
                                             lr_sep='^',
                                             context_order=sorted(data.keys()),
                                             sort_elements=True,
                                             device='cuda' if use_gpu else 'cpu'
                                            )
tensor_met = c2c.tensor.dataframes_to_tensor(context_df_dict=data, sender_col='Sender', receiver_col='Receiver', ligand_col='Metabolite_Name', receptor_col='Sensor', score_col='Commu_Score', order_labels=['Contexts', 'Metabolite LRIs', 'Sender Cells', 'Receiver Cells'], how='outer_cells', # Multiple options implemented now lr_sep='^', context_order=sorted(data.keys()), sort_elements=True, device='cuda' if use_gpu else 'cpu' )
100%|██████████| 4/4 [00:00<00:00,  6.45it/s]
In [21]:
Copied!
tensor_met.tensor.shape
tensor_met.tensor.shape
Out[21]:
torch.Size([4, 52, 5, 5])

Modify MEBOCOST scores

Here, we modify the original MEBOCOST scores (which are the product between metabolite production and sensor expression). Given the way that MEBOCOST computes the scores, it allows negative values (when the enzymes consuming the metabolite present higher expression than those producing it), we make these values to be zero, as in our case we are only interested when the metabolite is produced.

Additionally, we use the square root of the final score, to represent the geometric mean between the ligand and the receptor, consistent with our calculations for the protein-based LR pairs above.

In [22]:
Copied!
tensor_met.tensor = tl.sqrt((tensor_met.tensor >= 0) * tensor_met.tensor) # Remove negative cases and convert to gmean
tensor_met.tensor = tl.sqrt((tensor_met.tensor >= 0) * tensor_met.tensor) # Remove negative cases and convert to gmean

Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots

This is a list containing metadata for each dimension of the tensor (contexts, LR pairs, sender cells, receiver cells). Each metadata corresponds to a dataframe with info of each element in the respective dimension.

We start with the protein-based tensor.

In [28]:
Copied!
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr,
                                              metadata_dicts=[None, ppi_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr, metadata_dicts=[None, ppi_functions, None, None], fill_with_order_elements=True )

We continue with the metabolite-based tensor.

In [29]:
Copied!
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met,
                                              metadata_dicts=[None, ppi_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_met, metadata_dicts=[None, ppi_functions, None, None], fill_with_order_elements=True )

Re-order ligand-receptor interactions based on their categories¶

Later, we will be interested in plotting our results and group LR pairs by their annotations, so we sort them in a way that those with similar annotations will be plot together.

First, we sort them in the metadata.

In [30]:
Copied!
meta_tf[1].sort_values(['Category', 'Element'], inplace=True)
meta_tf2[1].sort_values(['Category', 'Element'], inplace=True)
meta_tf[1].sort_values(['Category', 'Element'], inplace=True) meta_tf2[1].sort_values(['Category', 'Element'], inplace=True)

Then, based on this new order, we sort their locations within the tensors.

In [31]:
Copied!
tensor_lr = c2c.tensor.subset_tensor(tensor_lr, subset_dict={1: meta_tf[1]['Element'].values.tolist()})
tensor_met = c2c.tensor.subset_tensor(tensor_met, subset_dict={1: meta_tf2[1]['Element'].values.tolist()})
tensor_lr = c2c.tensor.subset_tensor(tensor_lr, subset_dict={1: meta_tf[1]['Element'].values.tolist()}) tensor_met = c2c.tensor.subset_tensor(tensor_met, subset_dict={1: meta_tf2[1]['Element'].values.tolist()})

4 - Export tensors and metadata¶

Finally, we export both, metadata and tensor for both modalities (protein- and metabolite-based communication).

In [32]:
Copied!
c2c.io.save_data.export_variable_with_pickle(tensor_lr, output_folder + '/Initial-c2c-Tensor.pkl')
c2c.io.save_data.export_variable_with_pickle(tensor_met, output_folder + '/Initial-MEBOCOST-Tensor.pkl')
c2c.io.save_data.export_variable_with_pickle(tensor_lr, output_folder + '/Initial-c2c-Tensor.pkl') c2c.io.save_data.export_variable_with_pickle(tensor_met, output_folder + '/Initial-MEBOCOST-Tensor.pkl')
/data/Tensor-cell2cell//Initial-c2c-Tensor.pkl  was correctly saved.
/data/Tensor-cell2cell//Initial-MEBOCOST-Tensor.pkl  was correctly saved.
In [33]:
Copied!
c2c.io.save_data.export_variable_with_pickle(meta_tf, output_folder + '/Meta-Initial-c2c-Tensor.pkl')
c2c.io.save_data.export_variable_with_pickle(meta_tf2, output_folder + '/Meta-Initial-MEBOCOST-Tensor.pkl')
c2c.io.save_data.export_variable_with_pickle(meta_tf, output_folder + '/Meta-Initial-c2c-Tensor.pkl') c2c.io.save_data.export_variable_with_pickle(meta_tf2, output_folder + '/Meta-Initial-MEBOCOST-Tensor.pkl')
/data/Tensor-cell2cell//Meta-Initial-c2c-Tensor.pkl  was correctly saved.
/data/Tensor-cell2cell//Meta-Initial-MEBOCOST-Tensor.pkl  was correctly saved.
Previous Next

Built with MkDocs using a theme provided by Read the Docs.
GitHub « Previous Next »