Patterns of protein- and metabolite-based cell-cell communication (MEBOCOST)¶
This tutorial is focused on running Tensor-cell2cell v2 using the tools cell2cell and MEBOCOST to predict protein- and metabolite-mediated cell-cell communication (CCC), respectively.
In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).
Read this if you plan using a GPU to speed-up the analysis. Before running this notebook, make sure to have a proper NVIDIA GPU driver (https://www.nvidia.com/Download/index.aspx) as well as the CUDA toolkit (https://developer.nvidia.com/cuda-toolkit) installed.
Then, make sure to create an environment with Pytorch >= v1.8.0 following these instructions to enable CUDA.
https://pytorch.org/get-started/locally/
Once you have everything installed, run the next blocks two blocks of code before everything.
If you are using a NVIDIA GPU, with PyTorch and CUDA, set the following variable to be True
use_gpu = True
, otherwise set it use_gpu = False
use_gpu = True
import tensorly as tl
if use_gpu:
tl.set_backend('pytorch')
Importing packages to use
import cell2cell as c2c
import scanpy as sc
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
IMPORTANT: In this notebook, the version 0.8.0 of cell2cell is used. Older versions do not implement the Coupled Tensor Component Analysis (CTCA).
1 - Load Data¶
Specify folders where the data is located, and where the outputs will be written:
import os
data_folder = './data/'
directory = os.fsencode(data_folder)
output_folder = './results/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
os.mkdir(output_folder)
Import 4D communication tensors¶
We will import the communication tensors created for both modalities (protein- and metabolite-mediated CCC), as previously done.
tensor_lr = c2c.io.load_tensor('./data/Tensor-cell2cell/Initial-c2c-Tensor.pkl')
tensor_met = c2c.io.load_tensor('./data/Tensor-cell2cell/Initial-MEBOCOST-Tensor.pkl')
meta_tf = c2c.io.load_variable_with_pickle('./data/Tensor-cell2cell/Meta-Initial-c2c-Tensor.pkl')
meta_tf2 = c2c.io.load_variable_with_pickle('./data/Tensor-cell2cell/Meta-Initial-MEBOCOST-Tensor.pkl')
2 - Generate coupled tensors¶
Once we have loaded our 4D tensors, we start creating a CoupledInteractionTensor
object in Tensor-cell2cell v2. This object will later perform the CTCA.
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
tensor2=tensor_met,
tensor1_name='cell2cell',
tensor2_name='mebocost',
mode_mapping={'shared' : [(0,0), (2,2), (3,3)]}
)
We can observe that this object includes both tensors simultaneously.
coupled_tensor.shape
(torch.Size([4, 422, 5, 5]), torch.Size([4, 52, 5, 5]))
Assume that missing values are real zeros
Instead of imputing missing values with our method, we will assume that they are real zeros.
coupled_tensor.mask1 = None
coupled_tensor.mask2 = None
3 - Run Analysis¶
Elbow analysis¶
For selecting the number of factors to decompose CCC, we could do it either manually or automatically from an Elbow analysis.
In this case, after running the Elbow analysis, we manually selected the number of factors. Change the parameter automatic_elbow
to be True
if you want to automatically select the number of factors.
elbow, error = coupled_tensor.elbow_rank_selection(upper_rank=25,
runs=20,
init='svd',
automatic_elbow=False,
random_state=888,
)
plt.savefig(output_folder + 'Elbow.pdf', dpi=300, bbox_inches='tight')
100%|██████████| 25/25 [05:03<00:00, 12.14s/it]
The rank at the elbow is: 18
Here, the error measures how different is the original tensor from the sum of R rank-1 tensors used to approximate it. The idea is to find a good trade-off between a small number of factors and a small error.
Perform tensor factorization¶
Next, we apply a non-negative canonical coupled component analysis (CTCA) – the same performed in the elbow analysis above. Briefly, this tensor decomposition identifies a low-rank tensor. Briefly, tensor decomposition identifies a low-rank tensor (here, a rank of 6) for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (6 of them in this case). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.
In the CTCA, we have vectors that are shared between the coupled tensors (here contexts, sender cells, and receiver cells), while we have other vectors that are private for each modality (protein-based LR pair and metabolite-based LR pair dimensions). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs of each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pairs for each factor would behave the same, as this could lead to loadings for the context, sender cells and receiver cells with different behaviors between modalities.
coupled_tensor.compute_tensor_factorization(rank=6, # This is just for illustrative purposes - to keep a low number of factors
init='svd',
random_state=888,
runs=100, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
)
100%|██████████| 100/100 [03:22<00:00, 2.02s/it]
Best coupled model has a combined normalized error of: 0.604
3 - Results¶
Generate unified metadata for CTCA results
By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
Plot factor loadings¶
After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tensor (protein- and metabolite-based, respectively).
# Color palettes for each of the coupled tensor dimensions, ordered as in coupled_tensor.factors.keys()
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2', ]
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
metadata=new_meta_tf,
sample_col='Element',
group_col='Category',
meta_cmaps=cmaps,
order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells', # Shared dimensions
'Protein LRIs', 'Metabolite LRIs' # Private dimensions
],
fontsize=14,
filename=output_folder + 'Tensor-Factorization.pdf'
)
Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts, sender-cells, and receiver-cells; Private dimensions: Protein LR pairs and Metabolite LR pairs).
The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 1, the identified communication pattern occurs in early stages (month 1), where all cell types but GABAergic neurons participate as sender and receiver cells. Then we can identify the top LR pairs contributing from each modality. Thus, those that are top ranked for the protein modality act in coordination with those top ranked for the metabolite modality.
Top LR pairs¶
Top-5 LR pairs
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 5))
print('')
MDK^NCL 0.407387 CDH2^CDH2 0.307432 MDK^SDC2 0.285180 CD99^CD99 0.269068 PTN^NCL 0.224309 Name: Factor 1, dtype: float64 MDK^PTPRZ1 0.471079 MDK^NCL 0.462388 CD99^CD99 0.382835 PTN^NCL 0.349663 PTN^PTPRZ1 0.329195 Name: Factor 2, dtype: float64 PTN^PTPRZ1 0.441224 PTN^NCL 0.433979 MDK^PTPRZ1 0.403027 MDK^NCL 0.361902 CDH2^CDH2 0.276318 Name: Factor 3, dtype: float64 MDK^SDC2 0.509185 PTN^SDC2 0.429248 CD99^CD99 0.287311 MDK^ITGA6&ITGB1 0.225686 MDK^NCL 0.208107 Name: Factor 4, dtype: float64 NCAM1^NCAM1 0.385062 PTN^NCL 0.302485 PTN^PTPRZ1 0.290058 MDK^PTPRZ1 0.282025 CADM1^CADM1 0.275014 Name: Factor 5, dtype: float64 NCAM1^NCAM1 0.329649 MDK^LRP1 0.295148 CADM1^CADM1 0.284719 MDK^NCL 0.284322 PTN^SDC3 0.223877 Name: Factor 6, dtype: float64
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('Metabolite LRIs', 'Factor {}'.format(i+1), 5))
print('')
Farnesyl pyrophosphate^LPAR2 0.388216 Arachidonic acid^ANXA2 0.373286 Retinal^RARA 0.303664 Thyroxine^THRA 0.259081 Arachidonic acid^S100A10 0.254810 Name: Factor 1, dtype: float64 Thyroxine^THRA 0.479568 Iron^TFRC 0.432253 Retinal^RARG 0.329154 Farnesyl pyrophosphate^LPAR2 0.268704 Desmosterol^NR1H2 0.245748 Name: Factor 2, dtype: float64 Iron^TFRC 0.577320 Retinal^RARB 0.393141 Retinal^RXRA 0.380903 gamma-Aminobutyric acid^GABBR1 0.316231 5-HETE^TRPV1 0.226016 Name: Factor 3, dtype: float64 Sphingosine^TRPM3 0.430887 Arachidonic acid^ANXA2 0.394077 Iron^TFRC 0.373948 Arachidonic acid^S100A10 0.360215 Retinal^RXRB 0.356561 Name: Factor 4, dtype: float64 Farnesyl pyrophosphate^LPAR2 0.768104 Desmosterol^NR1H2 0.464831 Taurine^GLRB 0.272342 Testosterone^AR 0.263091 Retinal^RARG 0.178612 Name: Factor 5, dtype: float64 Retinal^RXRB 0.465276 gamma-Aminobutyric acid^GABRB3 0.367336 Retinal^RARA 0.342311 Iron^TFR2 0.325170 gamma-Aminobutyric acid^GABBR1 0.302204 Name: Factor 6, dtype: float64
Export Loadings¶
THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
Coupled tensor factor loadings saved to /results/Tensor-cell2cell/Coupled-Loadings.xlsx
4 - Downstream Analyses¶
Generate factor-specific networks¶
We can generate factor-specific communication networks connecting sender-to-receiver cells.
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors,
sender_label='Sender Cells',
receiver_label='Receiver Cells')
Generate matrix of cell-cell pairs by factors
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
Select cell-cell pairs with high potential of interaction
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
Visualize networks of factors with significant differences between groups
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
included_factors=None,
ccc_threshold=cci_threshold, # Only important communication
nrows=2,
panel_size=(8, 8),
network_layout='circular',
filename=output_folder + 'Networks.svg'
)
Clustermaps¶
Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
Cluster cell-cell pairs by their potential across factors
Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
use_zscore=False,
loading_threshold=cci_threshold, # To consider relevant CCIs
figsize=(20, 9),
tick_fontsize=16,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
row_cluster=False
)
Cluster LR pairs by their importance across factors
We can plot the key LR pairs from each modality. First we identify what could represent a high loading given their separate distributions.
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Metabolite LRIs'].values.flatten(), bins = 20)
And set and arbitrary threshold to filter out those that have low loadings
# To consider only most relevant LR pairs
lr_threshold1 = 0.1
lr_threshold2 = 0.2
Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case.
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold1).any()]
df1 = (df1 / df1.max().max())
df2 = coupled_tensor.factors['Metabolite LRIs'].copy()
df2 = df2[(df2.T > lr_threshold2).any()]
df2 = (df2 / df2.max().max())
lr_cm = c2c.plotting.loading_clustermap(df1,
loading_threshold=lr_threshold1, # To consider only top LRs
use_zscore=False,
figsize=(10, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
row_cluster=False
)
lr_cm = c2c.plotting.loading_clustermap(df2,
loading_threshold=lr_threshold2, # To consider only top LRs
use_zscore=False,
figsize=(8, 6),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-MetLRs.pdf',
row_cluster=False
)
Given that we max-scaled them, we can also put them together in an unified plot
df = pd.concat([df1, df2])
lr_cm = c2c.plotting.loading_clustermap(df,
loading_threshold=0.15, # To consider only top LRs
use_zscore=False,
figsize=(24, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-LRs.pdf',
row_cluster=False
)