Patterns of cell-cell communication and transcription factors (DecoupleR)¶
Previously we showed how Tensor-cell2cell can be used to extract patterns of multiple modalities of cell-cell communication (CCC). Here we will show how our CTCA can be extended to couple CCC and downstream transcription factor activity. Here we use the tools cell2cell and decoupleR-py to predict CCC and transcription factor activity, respectively.
In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).
use_gpu = False
import tensorly as tl
if use_gpu:
tl.set_backend('pytorch')
Importing packages to use
import cell2cell as c2c
import scanpy as sc
import decoupler as dc
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
IMPORTANT: In this notebook, the version 0.8.0 of cell2cell is used. Older versions do not implement the Coupled Tensor Component Analysis (CTCA).
1 - Load Data¶
Specify folders where the data is located, and where the outputs will be written:
import os
data_folder = './data/'
directory = os.fsencode(data_folder)
output_folder = './results/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
os.makedirs(output_folder)
RNA-seq data¶
A preprocessed version of the original paper (Trujillo et al, 2019) can be found here
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
rnaseq = rnaseq.raw.to_adata()
rnaseq.obs.head(2)
| orig.ident | nCount_RNA | nFeature_RNA | percent.mt | RNA_snn_res.0.5 | seurat_clusters | celltype | old.ident | |
|---|---|---|---|---|---|---|---|---|
| Month-01_AAACCTGAGAATTGTG-1 | Month-01 | 6417.0 | 2431 | 2.898551 | 11 | 11 | Other | 11 |
| Month-01_AAACCTGAGCAACGGT-1 | Month-01 | 8522.0 | 2464 | 2.663694 | 6 | 6 | Progenitor | 6 |
Ligand-Receptor pairs¶
Different databases of ligand-receptor interactions could be used. We previously created a repository that includes many available DBs (https://github.com/LewisLabUCSD/Ligand-Receptor-Pairs). In this tutorial, we employ the ligand-receptor pairs from CellChat (https://doi.org/10.1038/s41467-021-21246-9), which includes multimeric protein complexes.
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv')
lr_pairs = lr_pairs.astype(str)
lr_pairs.head(2)
| interaction_name | pathway_name | ligand | receptor | agonist | antagonist | co_A_receptor | co_I_receptor | evidence | annotation | interaction_name_2 | ligand_symbol | receptor_symbol | ligand_ensembl | receptor_ensembl | interaction_symbol | interaction_ensembl | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | TGFB1_TGFBR1_TGFBR2 | TGFb | TGFB1 | TGFbR1_R2 | TGFb agonist | TGFb antagonist | nan | TGFb inhibition receptor | KEGG: hsa04350 | Secreted Signaling | TGFB1 - (TGFBR1+TGFBR2) | TGFB1 | TGFBR1&TGFBR2 | ENSG00000105329 | ENSG00000106799&ENSG00000163513 | TGFB1^TGFBR1&TGFBR2 | ENSG00000105329^ENSG00000106799&ENSG00000163513 |
| 1 | TGFB2_TGFBR1_TGFBR2 | TGFb | TGFB2 | TGFbR1_R2 | TGFb agonist | TGFb antagonist | nan | TGFb inhibition receptor | KEGG: hsa04350 | Secreted Signaling | TGFB2 - (TGFBR1+TGFBR2) | TGFB2 | TGFBR1&TGFBR2 | ENSG00000092969 | ENSG00000106799&ENSG00000163513 | TGFB2^TGFBR1&TGFBR2 | ENSG00000092969^ENSG00000106799&ENSG00000163513 |
# interaction columns:
int_columns = ('ligand_symbol', 'receptor_symbol')
2 - Data Preprocessing¶
RNA-seq for CCC¶
Organize data to create tensor
First, define contexts
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']
Generate list of RNA-seq data containing all contexts (time points)
Here, we convert the single-cell expression levels into cell-type expression levels. A basic context-wise preprocessing is performed here, keeping only genes that are expressed at least in 4 single cells, and cell types per context with at least 20 cells.
Important: A list of aggregated gene expression matrices must be used for building the 4D-communication tensor. Each of the gene expression matrices is for one of the contexts, following the same order as the previouslist (contexts).
The aggregation here corresponds to mean expression value across cells of each cell type. Other aggregation methods could be used by changing the parameter method='average' in the aggregate_single_cell() function. Additionally, other gene expression values could be passed, using other preprocessing approaches such as log(x+1) or batch correction methods.
rnaseq_matrices = []
for context in tqdm(contexts):
meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
& (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
# Remove celltype per context that has few single cells
min_sc_number = 20
excluded_sc = []
for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
if ~row['celltype']:
excluded_sc.append(idx)
meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
cells = list(meta_context.index)
meta_context.index.name = 'barcode'
tmp_data = rnaseq[cells]
# Keep genes in each sample with at least 4 single cells expressing it
sc.pp.filter_genes(tmp_data, min_cells=4)
# Normalize and Log1p
sc.pp.normalize_total(tmp_data, target_sum=1e6)
sc.pp.log1p(tmp_data)
# Aggregate gene expression of single cells into cell types
exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
metadata=meta_context,
barcode_col='barcode',
celltype_col='celltype',
method='average',
)
rnaseq_matrices.append(exp_df)
100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.90s/it]
rnaseq_matrices is the list we will use for building the tensor based on protein-based ligand-receptor pairs.
LR pairs¶
Remove bidirectionality in the list of protein-based ligand-receptor pairs. That is, remove repeated interactions where both interactions are the same but in different order:
From this list:
| Ligand | Receptor |
|---|---|
| Protein A | Protein B |
| Protein B | Protein A |
We will have:
| Ligand | Receptor |
|---|---|
| Protein A | Protein B |
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs,
interaction_columns=int_columns
)
Removing bidirectionality of PPI network
lr_pairs.shape
(1988, 17)
Generate a dictionary with function info for each LR pairs.
Keys are LIGAND_NAME^RECEPTOR_NAME (this is the same nomenclature that will be used when building the protein-based 4D-communication tensor later), and values are the function in the annotation column in the dataframe containing ligand-receptor pairs. Other functional annotations of the LR pairs could be used if available.
ppi_functions = dict()
for idx, row in lr_pairs.iterrows():
ppi_functions[row['interaction_symbol']] = row['annotation']
3 - Predict transcription factor activities with DecoupleR¶
Here, we use the python version of DecoupleR to predict Transcription Factor (TF) activities based on their target gene expression, as indicated in this tutorial.
We start collecting the database linking TFs and their target genes:
collectri = dc.op.collectri(organism="human")
collectri.head()
| source | target | weight | resources | references | sign_decision | |
|---|---|---|---|---|---|---|
| 0 | MYC | TERT | 1.0 | DoRothEA-A;ExTRI;HTRI;NTNU.Curated;Pavlidis202... | 10022128;10491298;10606235;10637317;10723141;1... | PMID |
| 1 | SPI1 | BGLAP | 1.0 | ExTRI | 10022617 | default activation |
| 2 | SMAD3 | JUN | 1.0 | ExTRI;NTNU.Curated;TFactS;TRRUST | 10022869;12374795 | PMID |
| 3 | SMAD4 | JUN | 1.0 | ExTRI;NTNU.Curated;TFactS;TRRUST | 10022869;12374795 | PMID |
| 4 | STAT5A | IL2 | 1.0 | ExTRI | 10022878;11435608;17182565;17911616;22854263;2... | default activation |
Then, we processs the single-cell data to obtain gene expression as required for decoupler inputs:
adata = rnaseq.copy()
# Normalize and Log1p
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.log1p(adata)
We then proceed to compute the TF activities
dc.mt.ulm(data=adata, net=collectri)
And gather the activities into a AnnData object:
tf_scores = dc.pp.get_obsm(adata=adata, key="score_ulm")
Similar to the CCC part where we prepared the gene expression data, here we convert the single-cell TF activities into cell-type activity levels.
A basic context-wise preprocessing is performed here, keeping only cell types per context with at least 20 cells.
tf_matrices = []
# This is to keep track of min and max aggregated values per context
max_matrices = pd.DataFrame()
min_matrices = pd.DataFrame()
for context in tqdm(contexts):
meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
& (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
# Remove celltype per context that has few single cells
min_sc_number = 20
excluded_sc = []
for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
if ~row['celltype']:
excluded_sc.append(idx)
meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
cells = list(meta_context.index)
meta_context.index.name = 'barcode'
tmp_data = tf_scores[cells]
# Aggregate gene expression of single cells into cell types
exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
metadata=meta_context,
barcode_col='barcode',
celltype_col='celltype',
method='average',
)
tf_matrices.append(exp_df)
max_matrices = pd.concat([max_matrices, exp_df.max(axis=1)], axis=1)
min_matrices = pd.concat([min_matrices, exp_df.min(axis=1)], axis=1)
100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.06it/s]
Once we colected the TF activities (in tf_matrices), we proceed to min-max normalize them, so we get the Z-scores in a range of 0 and 1; otherwise these values could not be directly used by Tensor-cell2cell as it requires non-negative values:
tf_activities = [((tf.T - min_matrices.min(axis=1)) / (max_matrices.max(axis=1) - min_matrices.min(axis=1))).T for tf in tf_matrices]
4 - Build Tensors¶
Build 4D-Communication Tensor¶
Here we use as input the list of gene expression matrices (rnaseq_matrices) that were aggregated into a cell-type granularity. This list contains the expression matrices of all samples/contexts.
The following functions perform all the steps to build a 4D-communication tensor:

The parameter communication_score indicates the scoring function to use.
how='outer_cells' considers only LR pairs that are present in all contexts, while all cell types in the whole dataset are kept, regardless they appear only in one or a few contexts.
complex_sep='&' is used to specify that the list of ligand-receptor pairs contains protein complexes and that subunits are separated by '&'. If the list does not have complexes, use complex_sep=None instead.
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
ppi_data=lr_pairs,
context_names=contexts,
how='outer_cells',
complex_sep='&',
interaction_columns=int_columns,
communication_score='expression_gmean',
order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'],
device='cuda' if use_gpu else 'cpu'
)
Getting expression values for protein complexes Building tensor for the provided context
tensor_lr.tensor.shape
(4, 422, 5, 5)
Build 3D TF Activity Tensor¶
To build our 3D tensor with dimensions as contexts, receiver cells, and transcription factors, we need to make sure that all cell types in the 4D tensor are present in each matrix (for those missing we will assign a NaN value).
We get the list of cell types:
celltypes = tensor_lr.order_names[-1]
Then we reindex all matrices with transcription factors (this will generate missing columns, and fill them with NaNs):
tf_activities_nan = [tf.reindex(columns=celltypes) for tf in tf_activities]
Now we can use these DecoupleR outputs directly to generate our 3D transcription activity tensor. First we generate just the tensor containing the values using tensorly:
tensor_3d = tl.tensor(tf_activities_nan)
tensor_3d.shape
(4, 728, 5)
We then obtain the name of TFs present in the data:
tf_names = tf_activities_nan[0].index.tolist()
Now we take this tensor to generate an InteractionTensor object using cell2cell:
tensor_tf = c2c.tensor.PreBuiltTensor(tensor_3d,
order_names=[contexts, tf_names, celltypes],
order_labels=['Contexts', 'TFs', 'Receiver Cells']
)
tensor_tf.tensor.shape
(4, 728, 5)
5 - Generate coupled tensors¶
After constructing our 4D communication tensor and 3D TF activity tensor, we can create a CoupledInteractionTensor object in Tensor-cell2cell v2. This object will later be used to perform CTCA.
The key step here is to define which dimensions should be coupled between the tensors. In this example:
- Context dimensions are coupled directly, since they are identical across both tensors.
- Cell type dimension (in the TF tensor) can be coupled either with the sender or receiver dimension (in the communication tensor):
- Couple to sender: TF activity is aligned with the cells producing the ligand of each LR pair. This suggests that TFs may regulate or coordinate with the expression of those ligands in the sender cells.
- Couple to receiver: TF activity is aligned with the cells receiving the ligand via the receptor. This suggests that TFs are potentially activated downstream of the receptor, capturing the intracellular signaling coordinated with the communication programs.
In this tutorial, we choose the second option (coupling to the receiver). This means we focus on TF activity triggered by LR interactions in the CCC program.
Concretely, the context dimension corresponds to dim 0 in both tensors, while the receiver cell dimension is dim 3 in the 4D communication tensor and dim 2 in the 3D TF activity tensor.
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
tensor2=tensor_tf,
tensor1_name='cell2cell',
tensor2_name='decoupler',
# Below we specify how dims of tensor1 are coupled with dims of tensor2
mode_mapping={'shared' : [(0,0), (3,2)]}
)
We can observe that this object includes both tensors simultaneously.
coupled_tensor.shape
((4, 422, 5, 5), (4, 728, 5))
6 - Run Analysis¶
Perform tensor factorization¶
Next, we apply a non-negative canonical coupled component analysis (CTCA). Briefly, this tensor decomposition identifies a low-rank tensor (here, a rank of 10) for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (10 of them in this case). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.
In the CTCA, we have vectors that are shared between the coupled tensors (here contexts and receiver cells), while we have other vectors that are private for each modality (LR pair and sender cells for the 4D communication tensor; transcription factors for the 3D TF activity tensor). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs and TFs from each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pair and TFs for each factor would behave the same, as this could lead to loadings for the context and receiver cells with different behaviors between modalities.
To illustrate how performing the CTCA for identifying LR pairs acting coordinated with TF activities, we skipped the Elbow Analysis, which can be found in our previous tutorial for protein- and metabolite-based LR pairs. Instead, we run the factorization using a manual number of 10 factors:
coupled_tensor.compute_tensor_factorization(rank=10, # This is just for illustrative purposes - to keep a low number of factors
init='svd',
random_state=888,
runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
)
7 - Results¶
Metadata¶
Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots
This is a list containing metadata for each dimension of the tensor (contexts, LR pairs, sender cells, receiver cells). Each metadata corresponds to a dataframe with info of each element in the respective dimension.
We start with the protein-based tensor.
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr,
metadata_dicts=[None, ppi_functions, None, None],
fill_with_order_elements=True
)
We continue with the TF activity tensor.
tf_functions = {k: 'TF' for k in tensor_tf.order_names[1]}
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_tf,
metadata_dicts=[None, tf_functions, None],
fill_with_order_elements=True
)
Generate unified metadata for CTCA results
By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
Plot factor loadings¶
After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tensor (LR pairs and TFs, respectively).
# Color palettes for each of the coupled tensor dimensions, ordered as order_sorting list below
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2']
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
metadata=new_meta_tf,
sample_col='Element',
group_col='Category',
meta_cmaps=cmaps,
order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells',
'Protein LRIs', 'TFs'
],
fontsize=14,
filename=output_folder + 'Tensor-Factorization.pdf'
)
Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts and receiver-cells; Private dimensions: LR pairs, sender-cells, and transcription factors).
The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 2, the identified communication pattern occurs in early stages (month 1), where all cell types but GABAergic neurons participate as sender cells. In particular, the signaling by receiver cells seems linked to glial cells and progenitors. Then we can identify the top LR pairs and TFs contributing from cognate modalities. Thus, we can say that the receptors of these LR pairs act in coordination with the TFs in the receiver cells.
Top LR pairs and TFs¶
Top-10 LR pairs
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 10))
print('')
PTN^NCL 0.464370 PTN^PTPRZ1 0.441668 MDK^PTPRZ1 0.404244 MDK^NCL 0.392486 CD99^CD99 0.185811 MDK^LRP1 0.182649 PTN^SDC3 0.156745 CDH2^CDH2 0.147477 PTN^SDC1 0.096327 MDK^SDC1 0.093366 Name: Factor 1, dtype: float64 MDK^NCL 0.352663 CDH2^CDH2 0.304189 CD99^CD99 0.281941 MDK^SDC2 0.281588 PTN^NCL 0.204081 MDK^ITGA6&ITGB1 0.178264 PTN^SDC2 0.163884 GAS6^AXL 0.158380 MDK^LRP1 0.157087 MPZL1^MPZL1 0.110423 Name: Factor 2, dtype: float64 PTN^PTPRZ1 0.512311 MDK^PTPRZ1 0.496919 PTN^NCL 0.328947 MDK^NCL 0.268683 CD99^CD99 0.253674 CDH2^CDH2 0.175823 MDK^ITGA6&ITGB1 0.175633 NCAM1^FGFR1 0.143054 PTN^SDC2 0.129606 MDK^LRP1 0.102070 Name: Factor 3, dtype: float64 MDK^NCL 0.494203 CD99^CD99 0.426234 MDK^PTPRZ1 0.408034 MDK^SDC2 0.193602 PTN^NCL 0.175574 MDK^LRP1 0.157664 CADM1^CADM1 0.151180 PTN^PTPRZ1 0.135335 EFNA3^EPHA5 0.129642 EFNA3^EPHA4 0.126296 Name: Factor 4, dtype: float64 MDK^NCL 0.438771 CDH2^CDH2 0.284537 CD99^CD99 0.273331 PTN^NCL 0.221741 MDK^SDC2 0.202529 MDK^LRP1 0.193363 CADM1^CADM1 0.167788 MPZL1^MPZL1 0.147171 EFNB3^EPHB6 0.132789 NCAM1^NCAM1 0.129871 Name: Factor 5, dtype: float64 MDK^NCL 0.374700 MDK^SDC2 0.359838 DLL3^NOTCH2 0.284931 NCAM1^FGFR1 0.272678 DLK1^NOTCH2 0.248256 CDH2^CDH2 0.220813 MDK^ITGA6&ITGB1 0.187641 CD99^CD99 0.179746 MDK^SDC4 0.169932 MDK^LRP1 0.169848 Name: Factor 6, dtype: float64 PTN^PTPRZ1 0.423821 MDK^PTPRZ1 0.371672 PTN^NCL 0.324885 PTN^SDC3 0.294610 MDK^NCL 0.258867 CD99^CD99 0.190757 MDK^LRP1 0.186956 CDH2^CDH2 0.184021 PTN^SDC2 0.154907 MDK^SDC2 0.121523 Name: Factor 7, dtype: float64 CDH2^CDH2 0.521731 NCAM1^NCAM1 0.438277 MDK^NCL 0.435396 PTN^NCL 0.282249 CADM1^CADM1 0.269076 CDH4^CDH4 0.170086 NAMPT^INSR 0.119082 EFNB2^EPHB2 0.106536 EFNB1^EPHB2 0.098939 SEMA3C^NRP1&PLXNA3 0.081387 Name: Factor 8, dtype: float64 MDK^SDC2 0.455132 PTN^SDC2 0.441950 PTN^NCL 0.405290 MDK^NCL 0.359242 CD99^CD99 0.297494 MDK^LRP1 0.179619 MDK^SDC4 0.126331 CDH2^CDH2 0.121484 PTN^SDC4 0.111805 MDK^ITGA6&ITGB1 0.103127 Name: Factor 9, dtype: float64 NCAM1^NCAM1 0.419227 CADM1^CADM1 0.318216 NRXN1^NLGN2 0.220669 NRXN1^NLGN1 0.212132 CADM3^CADM3 0.175772 PTN^PTPRZ1 0.170832 MDK^PTPRZ1 0.168097 PTN^NCL 0.166271 NRXN1^NLGN3 0.162873 L1CAM^L1CAM 0.162465 Name: Factor 10, dtype: float64
Top 5 TFs
for i in range(coupled_tensor.rank):
print(coupled_tensor.get_top_factor_elements('TFs', 'Factor {}'.format(i+1), 5))
print('')
NEUROG1 0.117618 HEYL 0.111762 NEUROD2 0.109378 DACH1 0.108111 CRX 0.105775 Name: Factor 1, dtype: float64 TCF7L1 0.102091 CTCFL 0.101045 ELF5 0.101010 MXI1 0.100094 HES5 0.099838 Name: Factor 2, dtype: float64 HOXA7 0.078940 HDGF 0.074023 HMGA2 0.073552 SOX3 0.073537 RBPJ 0.072943 Name: Factor 3, dtype: float64 ZNF143 0.113813 CTCFL 0.107025 ZGPAT 0.105305 EN1 0.104683 SREBF2 0.103721 Name: Factor 4, dtype: float64 KLF3 0.077042 BHLHE41 0.076295 HLX 0.075530 NHLH2 0.074782 KDM5A 0.074587 Name: Factor 5, dtype: float64 SOX9 0.111740 ZNF384 0.109957 TCF7L2 0.108080 FEZF1 0.105497 VHL 0.103428 Name: Factor 6, dtype: float64 HIC1 0.095895 HOXD3 0.090289 HIVEP2 0.087248 GTF2I 0.084096 CIITA 0.082000 Name: Factor 7, dtype: float64 MYOCD 0.106479 HCFC1 0.101568 TBXT 0.096313 RFXANK 0.095671 SHOX2 0.095202 Name: Factor 8, dtype: float64 ONECUT1 0.094022 MITF 0.092153 SP4 0.089893 ELF5 0.089841 CRX 0.089732 Name: Factor 9, dtype: float64 HOXA11 0.176306 GFI1 0.167475 MSX1 0.166739 HOXC8 0.163315 GATA5 0.162388 Name: Factor 10, dtype: float64
Export Loadings¶
THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
Coupled tensor factor loadings saved to ./results/Tensor-cell2cell/Coupled-Loadings.xlsx
8 - Downstream Analyses¶
Generate factor-specific networks¶
We can generate factor-specific communication networks connecting sender-to-receiver cells.
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors,
sender_label='Sender Cells',
receiver_label='Receiver Cells')
Generate matrix of cell-cell pairs by factors
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
Select cell-cell pairs with high potential of interaction
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
Visualize networks of factors with significant differences between groups
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
included_factors=None,
ccc_threshold=cci_threshold, # Only important communication
nrows=2,
panel_size=(8, 8),
network_layout='circular',
filename=output_folder + 'Networks.pdf'
)
Clustermaps¶
Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
Cluster cell-cell pairs by their potential across factors
Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
use_zscore=False,
loading_threshold=cci_threshold, # To consider relevant CCIs
figsize=(20, 9),
tick_fontsize=16,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
row_cluster=False
)
Cluster LR pairs by their importance across factors
We can plot the key LR pairs from the CCC modality and TFs from the TF activity modality. First we identify what could represent a high loading given their separate distributions.
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['TFs'].values.flatten(), bins = 20)
And set and arbitrary threshold to filter out those that have low loadings
# To consider only most relevant LR pairs
lr_threshold = 0.1
tf_threshold = 0.1
Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case.
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold).any()]
df1 = (df1 / df1.max())
df2 = coupled_tensor.factors['TFs'].copy()
df2 = df2[(df2.T > tf_threshold).any()]
df2 = (df2 / df2.max())
lr_cm = c2c.plotting.loading_clustermap(df1,
loading_threshold=lr_threshold, # To consider only top LRs
use_zscore=False,
figsize=(10, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
row_cluster=False
)
lr_cm = c2c.plotting.loading_clustermap(df2,
loading_threshold=tf_threshold, # To consider only top LRs
use_zscore=False,
figsize=(16, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-TFs.pdf',
row_cluster=False
)
Given that we max-scaled them, we can also put them together in an unified plot
df = pd.concat([df1, df2])
lr_cm = c2c.plotting.loading_clustermap(df,
loading_threshold=0.1, # To consider only top LRs and TFs
use_zscore=False,
figsize=(24, 5),
cbar_fontsize=20,
tick_fontsize=10,
cmap='YlOrBr',
filename=output_folder + 'Clustermap-Factor-specific-LR-TFs.pdf',
row_cluster=False
)