Patterns of protein- and metabolite-based cell-cell communication (MEBOCOST)¶
This tutorial is focused on running Tensor-cell2cell v2 using the tools cell2cell and MEBOCOST to predict protein- and metabolite-mediated cell-cell communication (CCC), respectively.
In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).
Read this if you plan using a GPU to speed-up the analysis. Before running this notebook, make sure to have a proper NVIDIA GPU driver (https://www.nvidia.com/Download/index.aspx) as well as the CUDA toolkit (https://developer.nvidia.com/cuda-toolkit) installed.
Then, make sure to create an environment with Pytorch >= v1.8.0 following these instructions to enable CUDA.
https://pytorch.org/get-started/locally/
Once you have everything installed, run the next blocks two blocks of code before everything.
If you are using a NVIDIA GPU, with PyTorch and CUDA, set the following variable to be True
use_gpu = True, otherwise set it use_gpu = False
use_gpu = True
import tensorly as tl
if use_gpu:
backend = 'pytorch'
device = 'cuda'
tl.set_backend(backend)
else:
backend = 'numpy'
device = 'cpu'
Importing packages to use
import cell2cell as c2c
import scanpy as sc
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
IMPORTANT: In this notebook, the version 0.8.3 of cell2cell is used. Versions older than 0.8.0 do not implement the Coupled Tensor Component Analysis (CTCA).
1 - Load Data¶
Specify folders where the data is located, and where the outputs will be written:
import os
data_folder = '/data/'
directory = os.fsencode(data_folder)
output_folder = '/results/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
os.mkdir(output_folder)
Import 4D communication tensors¶
We will import the communication tensors created for both modalities (protein- and metabolite-mediated CCC), as previously done.
tensor_lr = c2c.io.load_tensor('/data/Tensor-cell2cell/Initial-c2c-Tensor.pkl', backend=backend, device=device)
tensor_met = c2c.io.load_tensor('/data/Tensor-cell2cell/Initial-MEBOCOST-Tensor.pkl', backend=backend, device=device)
meta_tf = c2c.io.load_variable_with_pickle('/data/Tensor-cell2cell/Meta-Initial-c2c-Tensor.pkl')
meta_tf2 = c2c.io.load_variable_with_pickle('/data/Tensor-cell2cell/Meta-Initial-MEBOCOST-Tensor.pkl')
2 - Generate coupled tensors¶
Once we have loaded our 4D tensors, we start creating a CoupledInteractionTensorobject in Tensor-cell2cell v2. This object will later perform the CTCA.
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
tensor2=tensor_met,
tensor1_name='Protein Tensor',
tensor2_name='Metabolite Tensor',
mode_mapping={'shared' : [(0,0), (2,2), (3,3)]}
)
We can observe that this object includes both tensors simultaneously.
coupled_tensor.shape
(torch.Size([4, 422, 5, 5]), torch.Size([4, 52, 5, 5]))
3 - Run Analysis¶
Next, we apply a non-negative canonical coupled component analysis (CTCA). Briefly, this tensor decomposition identifies a low-rank tensor. Briefly, tensor decomposition identifies a low-rank tensor for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (in this case, the rank, or number of rank-1 tensors corresponds to number of factors). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.
In the CTCA, we have vectors that are shared between the coupled tensors (here contexts, sender cells, and receiver cells), while we have other vectors that are private for each modality (protein-based LR pair and metabolite-based LR pair dimensions). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs of each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pairs for each factor would behave the same, as this could lead to loadings for the context, sender cells and receiver cells with different behaviors between modalities.

Selecting number of factors¶
For selecting the number of factors to decompose CCC, we could do it either manually or automatically from an Elbow analysis. Typically this comes from running the CTCA with different number of factors; then, selecting the elbow with the lowest error.
However, here we combined the similarity across runs in the same rank and the reconstruction error. Then, we manually selected the number under the criteria of achieving a similarity of at least 0.9 while keeping the error as low as possible.
If you want to stick to using the elbow analysis based on the error, just run the code after the following cell and change the parameter automatic_elbowto be True if you want to automatically select the number of factors.
elbow, error = coupled_tensor.elbow_rank_selection(upper_rank=20,
runs=20,
init='svd',
metric='similarity',
automatic_elbow=False,
random_state=888,
show_individual=True,
)
plt.savefig(output_folder + 'Elbow-similarity.pdf', dpi=300, bbox_inches='tight')
100%|██████████| 20/20 [04:43<00:00, 14.19s/it]
elbow, error = coupled_tensor.elbow_rank_selection(upper_rank=20,
runs=20,
init='svd',
metric='error',
automatic_elbow=False,
random_state=888,
show_individual=True,
)
plt.savefig(output_folder + 'Elbow-error.pdf', dpi=300, bbox_inches='tight')
100%|██████████| 20/20 [04:46<00:00, 14.33s/it]
The number of factors we picked here is 6, which keeps a good balance between similarity (>0.9) and error (as low as possible).
n_factors = 6
Fine-tuning balancing weights for CTCA¶
Our CTCA introduces two balancing weights, α1 and α2, that sum up to one, to control how each tensor contributes to the factorization results. By default, Tensor-cell2cell prioritizes both tensors equally as previously implemented in other coupled factorization approaches (Li et al. 2024). However, balancing weights can be changed by the user upon reasonable criteria to influence the discovered patterns.
In this case, after selecting the number of factors, we fine tune α1 and α2. To do so, we pick different combinations starting from α1 = 0.1 ; α2 = 0.9 until α1 = 0.9 ; α2 = 0.1, with changes of 0.1 in each step:
weights = np.linspace(0.1, 0.9, 9)
factors = []
names = []
weight_pairs = dict()
for i, a1 in tqdm(enumerate(weights), total=weights.shape[0]):
a1 = np.round(a1, 1)
a2 = np.round(1. - a1, 1)
name = 'α1={}; α2={}'.format(a1, a2)
coupled_tensor.compute_tensor_factorization(rank=n_factors,
init='svd',
random_state=888, # Same initialization, different weights
manual_weights=(a1, a2),
runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
)
factors.append(coupled_tensor.factors.copy())
names.append(name)
weight_pairs[name] = (a1, a2)
HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=9.0), HTML(value='')))
Then, we use CorrIndex for similarity (1-CorrIndex) and evaluate which combination is the most balanced (that means less distinct from the other combinations).
df = c2c.tensor.metrics.pairwise_correlation_index(factors)
df.columns = names
df.index = names
Visualizations of Similarities
# Generate a mask for the upper triangle
mask = np.tril(np.ones_like(1. - df, dtype=bool), k=-1)
# Create figure and axes
fig, ax = plt.subplots(figsize=(8, 6))
# Create heatmap without colorbar
hm = sns.heatmap(1. - df, mask=mask, annot=True, cmap='Blues',
square=True, linewidths=0.5, cbar=False, ax=ax)
# Create colorbar on the left
divider = make_axes_locatable(ax)
cax = divider.append_axes("left", size="5%", pad=0.5)
cbar = plt.colorbar(hm.collections[0], cax=cax)
cbar.set_label('Similarity\n(1 - CorrIndex)', rotation=90, labelpad=15)
cax.yaxis.set_ticks_position('left')
cax.yaxis.set_label_position('left')
plt.sca(ax) # Set current axes back to main plot
plt.title('Sensitivity Analysis of Tensor Weights')
# Move x-axis labels to top and rotate 90 degrees
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, ha='center')
# Move y-axis labels to right and set horizontal (0 degrees)
ax.yaxis.tick_right()
ax.yaxis.set_label_position('right')
plt.setp(ax.yaxis.get_majorticklabels(), rotation=0)
plt.tight_layout()
plt.savefig(output_folder + '/CorrIndex_similarity_weights.pdf',
dpi=150,
bbox_inches='tight',
edgecolor='none')
We can also visualize the distribution of these values:
# Create violin plot
plt.figure(figsize=(10, 6))
sns.boxplot(data=1.-df)
plt.xlabel('Weights')
plt.ylabel('Similarity (1 - CorrIndex)')
plt.title('Distribution of Similarity Values across Weight Pairs')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig(output_folder + '/CorrIndex_similarity_weights_dist.pdf',
dpi=150,
bbox_inches='tight',
edgecolor='none')
We pick the pair of weights with highest median of similarity across different combinations:
weight_pair = weight_pairs[(1.-df).median().idxmax()]
print(f'Chosen weights: {weight_pair}')
Chosen weights: (0.7, 0.3)
This resulted in weights of α1=0.7 and α2=0.3, which means that the Protein Tensor will contribute more than the Metabolite Tensor to the factorization results.
Perform tensor factorization¶
After selecting the number of factors (6 in this case), and the balancing weights (α1=0.7 and α2=0.3), we can then perform our definite CTCA. To make our results more robust, we perform the initialization by running SVD, and by running the CTCA 100 times, each with a different initialization seed. Additionally, we add parameters to ensure a robust stop criteria (tol=1e-8and n_iter_max=500). From the 100 runs, we select the CTCA with lowest error.
coupled_tensor.compute_tensor_factorization(rank=n_factors,
init='svd',
random_state=888,
manual_weights=weight_pair,
runs=100, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
)
100%|██████████| 100/100 [05:58<00:00, 3.59s/it]
Best coupled model has a combined normalized error of: 0.231
Then, we can visualize the reconstruction error, separately for each tensor or combined, across iterations:
errors, err_fig = coupled_tensor.get_factorization_errors(plot=True, filename=output_folder + '/Factorization_loss.pdf')
3 - Results¶
Generate unified metadata for CTCA results
By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
Plot factor loadings¶
After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tensor (protein- and metabolite-based, respectively).
# Color palettes for each of the coupled tensor dimensions, ordered as in coupled_tensor.factors.keys()
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2', ]
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
metadata=new_meta_tf,
sample_col='Element',
group_col='Category',
meta_cmaps=cmaps,
order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells', 'Protein LRIs', 'Metabolite LRIs'],
fontsize=14,
filename=output_folder + 'Tensor-Factorization.pdf'
)
Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts, sender-cells, and receiver-cells; Private dimensions: Protein LR pairs and Metabolite LR pairs).
The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 1, the identified communication pattern occurs in early stages (month 1), where all cell types but GABAergic neurons participate as sender cells. Then we can identify the top LR pairs contributing from each modality. Thus, those that are top ranked for the protein modality act in coordination with those top ranked for the metabolite modality.
Top LR pairs¶
Top-5 LR pairs
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 5))
print('')
MDK^NCL 0.372804 MDK^SDC2 0.317517 CDH2^CDH2 0.302039 CD99^CD99 0.267838 MDK^LRP1 0.177680 Name: Factor 1, dtype: float64 PTN^NCL 0.507021 MDK^NCL 0.434334 PTN^PTPRZ1 0.354619 MDK^PTPRZ1 0.301730 CDH2^CDH2 0.256842 Name: Factor 2, dtype: float64 PTN^PTPRZ1 0.514414 MDK^PTPRZ1 0.501849 PTN^NCL 0.245779 PTN^SDC3 0.233022 MDK^NCL 0.228345 Name: Factor 3, dtype: float64 CD99^CD99 0.464812 MDK^PTPRZ1 0.460041 MDK^NCL 0.417492 MDK^SDC2 0.199778 MDK^LRP1 0.170363 Name: Factor 4, dtype: float64 MDK^SDC2 0.436574 PTN^NCL 0.425681 MDK^NCL 0.387236 PTN^SDC2 0.380880 CD99^CD99 0.282566 Name: Factor 5, dtype: float64 NCAM1^NCAM1 0.439778 CADM1^CADM1 0.322127 NRXN1^NLGN2 0.213717 MDK^NCL 0.210292 NRXN1^NLGN1 0.210099 Name: Factor 6, dtype: float64
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('Metabolite LRIs', 'Factor {}'.format(i+1), 5))
print('')
Farnesyl pyrophosphate^LPAR2 0.323938 Arachidonic acid^ANXA2 0.320029 gamma-Aminobutyric acid^GABRB3 0.293607 Sphingosine^TRPM3 0.288527 Retinal^RARA 0.280857 Name: Factor 1, dtype: float64 Iron^TFRC 0.401326 Retinal^RXRB 0.358571 Taurine^GLRB 0.330157 Retinal^RARA 0.296951 gamma-Aminobutyric acid^GABBR1 0.230053 Name: Factor 2, dtype: float64 gamma-Aminobutyric acid^GABRB3 0.646834 Iron^TFRC 0.363304 gamma-Aminobutyric acid^GABBR2 0.355796 Retinal^RARB 0.229937 Testosterone^AR 0.229856 Name: Factor 3, dtype: float64 Farnesyl pyrophosphate^LPAR2 0.528710 Desmosterol^NR1H2 0.396541 Thyroxine^THRA 0.309135 Desmosterol^NR1H3 0.234776 Retinal^RXRB 0.225645 Name: Factor 4, dtype: float64 Sphingosine^TRPM3 0.515894 Arachidonic acid^S100A10 0.369580 Arachidonic acid^ANXA2 0.341487 Leukotriene B4^PPARA 0.220591 gamma-Aminobutyric acid^GABRB3 0.218846 Name: Factor 5, dtype: float64 Iron^TFRC 0.358893 Arachidonic acid^ANXA2 0.336427 gamma-Aminobutyric acid^GABBR1 0.277284 Thyroxine^THRA 0.257077 Desmosterol^NR1H2 0.248295 Name: Factor 6, dtype: float64
Export Loadings¶
THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
Coupled tensor factor loadings saved to /results/Tensor-cell2cell/Coupled-Loadings.xlsx
4 - Downstream Analyses¶
Generate factor-specific networks¶
We can generate factor-specific communication networks connecting sender-to-receiver cells.
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors,
sender_label='Sender Cells',
receiver_label='Receiver Cells')
Generate matrix of cell-cell pairs by factors
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
Select cell-cell pairs with high potential of interaction
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
Visualize networks of factors with significant differences between groups
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
included_factors=None,
ccc_threshold=cci_threshold, # Only important communication
nrows=2,
panel_size=(8, 8),
network_layout='circular',
filename=output_folder + 'Networks.svg'
)
Clustermaps¶
Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
Cluster cell-cell pairs by their potential across factors
Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.
cci_df = network_by_factors[factor_sorted]
cci_df = cci_df[(cci_df.T > cci_threshold).any()]
cci_df = (cci_df / cci_df.max())
cci_cm = c2c.plotting.loading_clustermap(cci_df,
use_zscore=False,
loading_threshold=0.,
figsize=(20, 9),
tick_fontsize=16,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
row_cluster=False
)
Cluster LR pairs by their importance across factors
We can plot the key LR pairs from each modality. First we identify what could represent a high loading given their separate distributions.
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Metabolite LRIs'].values.flatten(), bins = 20)
And set and arbitrary threshold to filter out those that have low loadings
# To consider only most relevant LR pairs
lr_threshold1 = 0.1
lr_threshold2 = 0.2
Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case, in a factor-wise manner.
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold1).any()]
df1 = (df1 / df1.max())
df2 = coupled_tensor.factors['Metabolite LRIs'].copy()
df2 = df2[(df2.T > lr_threshold2).any()]
df2 = (df2 / df2.max())
lr_cm = c2c.plotting.loading_clustermap(df1,
loading_threshold=0., # To consider only top LRs
use_zscore=False,
figsize=(10, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
row_cluster=False
)
lr_cm = c2c.plotting.loading_clustermap(df2,
loading_threshold=0., # To consider only top LRs
use_zscore=False,
figsize=(8, 6),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-MetLRs.pdf',
row_cluster=False
)
Given that we max-scaled them, we can also put them together in an unified plot
df_both = pd.concat([df1, df2])
lr_cm = c2c.plotting.loading_clustermap(df_both,
loading_threshold=0.15, # To consider only top LRs
use_zscore=False,
figsize=(24, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-LRs.pdf',
row_cluster=False
)