Patterns of cell-cell communication and transcription factors (DecoupleR)¶

Previously we showed how Tensor-cell2cell can be used to extract patterns of multiple modalities of cell-cell communication (CCC). Here we will show how our CTCA can be extended to couple CCC and downstream transcription factor activity. Here we use the tools cell2cell and decoupleR-py to predict CCC and transcription factor activity, respectively.

In this case, we use samples from cortical brain organoids across different time points, previously published by Trujillo et al (2019).

In [1]:

Copied!

use_gpu = False

import tensorly as tl
if use_gpu:
    tl.set_backend('pytorch')
use_gpu = False

import tensorly as tl
if use_gpu:
    tl.set_backend('pytorch')

Importing packages to use

In [2]:

Copied!





import cell2cell as c2c
import scanpy as sc
import decoupler as dc

import numpy as np
import pandas as pd

from tqdm import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import cell2cell as c2c
import scanpy as sc
import decoupler as dc

import numpy as np
import pandas as pd

from tqdm import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:

Copied!

import warnings
warnings.filterwarnings('ignore')
import warnings
warnings.filterwarnings('ignore')

IMPORTANT: In this notebook, the version 0.8.0 of cell2cell is used. Older versions do not implement the Coupled Tensor Component Analysis (CTCA).

1 - Load Data¶

Specify folders where the data is located, and where the outputs will be written:

In [4]:

Copied!





import os

data_folder = './data/'
directory = os.fsencode(data_folder)

output_folder = './results/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
    os.makedirs(output_folder)
import os

data_folder = './data/'
directory = os.fsencode(data_folder)

output_folder = './results/Tensor-cell2cell/'
if not os.path.isdir(output_folder):
    os.makedirs(output_folder)

RNA-seq data¶

A preprocessed version of the original paper (Trujillo et al, 2019) can be found here

In [5]:

Copied!

rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')
rnaseq = sc.read_h5ad(data_folder + '/annotated_seurat_norm_harmony_2022.h5ad')

In [6]:

Copied!

rnaseq = rnaseq.raw.to_adata()
rnaseq = rnaseq.raw.to_adata()

In [7]:

Copied!

rnaseq.obs.head(2)
rnaseq.obs.head(2)

Out[7]:

	orig.ident	nCount_RNA	nFeature_RNA	percent.mt	RNA_snn_res.0.5	seurat_clusters	celltype	old.ident
Month-01_AAACCTGAGAATTGTG-1	Month-01	6417.0	2431	2.898551	11	11	Other	11
Month-01_AAACCTGAGCAACGGT-1	Month-01	8522.0	2464	2.663694	6	6	Progenitor	6

Ligand-Receptor pairs¶

Different databases of ligand-receptor interactions could be used. We previously created a repository that includes many available DBs (https://github.com/LewisLabUCSD/Ligand-Receptor-Pairs). In this tutorial, we employ the ligand-receptor pairs from CellChat (https://doi.org/10.1038/s41467-021-21246-9), which includes multimeric protein complexes.

In [8]:

Copied!

lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv')
lr_pairs = lr_pairs.astype(str)
lr_pairs = pd.read_csv('https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Human/Human-2020-Jin-LR-pairs.csv')
lr_pairs = lr_pairs.astype(str)

In [9]:

Copied!

lr_pairs.head(2)
lr_pairs.head(2)

Out[9]:

	interaction_name	pathway_name	ligand	receptor	agonist	antagonist	co_A_receptor	co_I_receptor	evidence	annotation	interaction_name_2	ligand_symbol	receptor_symbol	ligand_ensembl	receptor_ensembl	interaction_symbol	interaction_ensembl
0	TGFB1_TGFBR1_TGFBR2	TGFb	TGFB1	TGFbR1_R2	TGFb agonist	TGFb antagonist	nan	TGFb inhibition receptor	KEGG: hsa04350	Secreted Signaling	TGFB1 - (TGFBR1+TGFBR2)	TGFB1	TGFBR1&TGFBR2	ENSG00000105329	ENSG00000106799&ENSG00000163513	TGFB1^TGFBR1&TGFBR2	ENSG00000105329^ENSG00000106799&ENSG00000163513
1	TGFB2_TGFBR1_TGFBR2	TGFb	TGFB2	TGFbR1_R2	TGFb agonist	TGFb antagonist	nan	TGFb inhibition receptor	KEGG: hsa04350	Secreted Signaling	TGFB2 - (TGFBR1+TGFBR2)	TGFB2	TGFBR1&TGFBR2	ENSG00000092969	ENSG00000106799&ENSG00000163513	TGFB2^TGFBR1&TGFBR2	ENSG00000092969^ENSG00000106799&ENSG00000163513

In [10]:

Copied!

# interaction columns:
int_columns = ('ligand_symbol', 'receptor_symbol')
# interaction columns:
int_columns = ('ligand_symbol', 'receptor_symbol')

2 - Data Preprocessing¶

RNA-seq for CCC¶

Organize data to create tensor

First, define contexts

In [11]:

Copied!

contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']
contexts = ['Month-01', 'Month-03', 'Month-06', 'Month-10']

Generate list of RNA-seq data containing all contexts (time points)

Here, we convert the single-cell expression levels into cell-type expression levels. A basic context-wise preprocessing is performed here, keeping only genes that are expressed at least in 4 single cells, and cell types per context with at least 20 cells.

Important: A list of aggregated gene expression matrices must be used for building the 4D-communication tensor. Each of the gene expression matrices is for one of the contexts, following the same order as the previouslist (contexts).

The aggregation here corresponds to mean expression value across cells of each cell type. Other aggregation methods could be used by changing the parameter method='average' in the aggregate_single_cell() function. Additionally, other gene expression values could be passed, using other preprocessing approaches such as log(x+1) or batch correction methods.

In [12]:

Copied!





rnaseq_matrices = []

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = rnaseq[cells]
    # Keep genes in each sample with at least 4 single cells expressing it
    sc.pp.filter_genes(tmp_data, min_cells=4)
    
    # Normalize and Log1p
    sc.pp.normalize_total(tmp_data, target_sum=1e6)
    sc.pp.log1p(tmp_data)

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    rnaseq_matrices.append(exp_df)
rnaseq_matrices = []

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = rnaseq[cells]
    # Keep genes in each sample with at least 4 single cells expressing it
    sc.pp.filter_genes(tmp_data, min_cells=4)
    
    # Normalize and Log1p
    sc.pp.normalize_total(tmp_data, target_sum=1e6)
    sc.pp.log1p(tmp_data)

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    rnaseq_matrices.append(exp_df)

100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.90s/it]

rnaseq_matrices is the list we will use for building the tensor based on protein-based ligand-receptor pairs.

LR pairs¶

Remove bidirectionality in the list of protein-based ligand-receptor pairs. That is, remove repeated interactions where both interactions are the same but in different order:

From this list:

Ligand	Receptor
Protein A	Protein B
Protein B	Protein A

We will have:

Ligand	Receptor
Protein A	Protein B

In [13]:

Copied!

lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, 
                                                             interaction_columns=int_columns
                                                             )
lr_pairs = c2c.preprocessing.ppi.remove_ppi_bidirectionality(ppi_data=lr_pairs, 
                                                             interaction_columns=int_columns
                                                             )

Removing bidirectionality of PPI network

In [14]:

Copied!

lr_pairs.shape
lr_pairs.shape

Out[14]:

(1988, 17)

Generate a dictionary with function info for each LR pairs.

Keys are LIGAND_NAME^RECEPTOR_NAME (this is the same nomenclature that will be used when building the protein-based 4D-communication tensor later), and values are the function in the annotation column in the dataframe containing ligand-receptor pairs. Other functional annotations of the LR pairs could be used if available.

In [15]:

Copied!

ppi_functions = dict()

for idx, row in lr_pairs.iterrows():
    ppi_functions[row['interaction_symbol']] = row['annotation']
ppi_functions = dict()

for idx, row in lr_pairs.iterrows():
    ppi_functions[row['interaction_symbol']] = row['annotation']

3 - Predict transcription factor activities with DecoupleR¶

Here, we use the python version of DecoupleR to predict Transcription Factor (TF) activities based on their target gene expression, as indicated in this tutorial.

We start collecting the database linking TFs and their target genes:

In [16]:

Copied!

collectri = dc.op.collectri(organism="human")
collectri.head()
collectri = dc.op.collectri(organism="human")
collectri.head()

Out[16]:

	source	target	weight	resources	references	sign_decision
0	MYC	TERT	1.0	DoRothEA-A;ExTRI;HTRI;NTNU.Curated;Pavlidis202...	10022128;10491298;10606235;10637317;10723141;1...	PMID
1	SPI1	BGLAP	1.0	ExTRI	10022617	default activation
2	SMAD3	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
3	SMAD4	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
4	STAT5A	IL2	1.0	ExTRI	10022878;11435608;17182565;17911616;22854263;2...	default activation

Then, we processs the single-cell data to obtain gene expression as required for decoupler inputs:

In [17]:

Copied!

adata = rnaseq.copy()

# Normalize and Log1p
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.log1p(adata)
adata = rnaseq.copy()

# Normalize and Log1p
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.log1p(adata)

We then proceed to compute the TF activities

In [18]:

Copied!

dc.mt.ulm(data=adata, net=collectri)
dc.mt.ulm(data=adata, net=collectri)

And gather the activities into a AnnData object:

In [19]:

Copied!

tf_scores = dc.pp.get_obsm(adata=adata, key="score_ulm")
tf_scores = dc.pp.get_obsm(adata=adata, key="score_ulm")

Similar to the CCC part where we prepared the gene expression data, here we convert the single-cell TF activities into cell-type activity levels.

A basic context-wise preprocessing is performed here, keeping only cell types per context with at least 20 cells.

In [20]:

Copied!





tf_matrices = []

# This is to keep track of min and max aggregated values per context
max_matrices = pd.DataFrame()
min_matrices = pd.DataFrame()

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = tf_scores[cells]

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    tf_matrices.append(exp_df)
    max_matrices = pd.concat([max_matrices, exp_df.max(axis=1)], axis=1)
    min_matrices = pd.concat([min_matrices, exp_df.min(axis=1)], axis=1)
tf_matrices = []

# This is to keep track of min and max aggregated values per context
max_matrices = pd.DataFrame()
min_matrices = pd.DataFrame()

for context in tqdm(contexts):
    meta_context = rnaseq.obs.loc[(rnaseq.obs['orig.ident'] == context) \
                                  & (rnaseq.obs['celltype'] != 'Other') & (rnaseq.obs['celltype'] != 'MC')]
    
    # Remove celltype per context that has few single cells
    min_sc_number = 20
    excluded_sc = []
    for idx, row in (meta_context.groupby(['celltype'])[['celltype']].count() >= min_sc_number).iterrows():
        if ~row['celltype']:
            excluded_sc.append(idx)
        
    meta_context = meta_context.loc[~meta_context['celltype'].isin(excluded_sc)]
    cells = list(meta_context.index)

    meta_context.index.name = 'barcode'
    tmp_data = tf_scores[cells]

    # Aggregate gene expression of single cells into cell types
    exp_df = c2c.preprocessing.aggregate_single_cells(rnaseq_data=tmp_data.to_df(),
                                                      metadata=meta_context,
                                                      barcode_col='barcode',
                                                      celltype_col='celltype',
                                                      method='average',
                                                     )

    tf_matrices.append(exp_df)
    max_matrices = pd.concat([max_matrices, exp_df.max(axis=1)], axis=1)
    min_matrices = pd.concat([min_matrices, exp_df.min(axis=1)], axis=1)

100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.06it/s]

Once we colected the TF activities (in tf_matrices), we proceed to min-max normalize them, so we get the Z-scores in a range of 0 and 1; otherwise these values could not be directly used by Tensor-cell2cell as it requires non-negative values:

In [21]:

Copied!

tf_activities = [((tf.T - min_matrices.min(axis=1)) / (max_matrices.max(axis=1) - min_matrices.min(axis=1))).T for tf in tf_matrices]
tf_activities = [((tf.T - min_matrices.min(axis=1)) / (max_matrices.max(axis=1) - min_matrices.min(axis=1))).T for tf in tf_matrices]

4 - Build Tensors¶

Build 4D-Communication Tensor¶

Here we use as input the list of gene expression matrices (rnaseq_matrices) that were aggregated into a cell-type granularity. This list contains the expression matrices of all samples/contexts.

The following functions perform all the steps to build a 4D-communication tensor:

4dtensor

The parameter communication_score indicates the scoring function to use.

how='outer_cells' considers only LR pairs that are present in all contexts, while all cell types in the whole dataset are kept, regardless they appear only in one or a few contexts.

complex_sep='&' is used to specify that the list of ligand-receptor pairs contains protein complexes and that subunits are separated by '&'. If the list does not have complexes, use complex_sep=None instead.

In [22]:

Copied!





tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
                                         ppi_data=lr_pairs,
                                         context_names=contexts,
                                         how='outer_cells',
                                         complex_sep='&',
                                         interaction_columns=int_columns,
                                         communication_score='expression_gmean',
                                         order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'],
                                         device='cuda' if use_gpu else 'cpu'
                                        )
tensor_lr = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
                                         ppi_data=lr_pairs,
                                         context_names=contexts,
                                         how='outer_cells',
                                         complex_sep='&',
                                         interaction_columns=int_columns,
                                         communication_score='expression_gmean',
                                         order_labels=['Contexts', 'Protein LRIs', 'Sender Cells', 'Receiver Cells'],
                                         device='cuda' if use_gpu else 'cpu'
                                        )

Getting expression values for protein complexes
Building tensor for the provided context

In [23]:

Copied!

tensor_lr.tensor.shape
tensor_lr.tensor.shape

Out[23]:

(4, 422, 5, 5)

Build 3D TF Activity Tensor¶

To build our 3D tensor with dimensions as contexts, receiver cells, and transcription factors, we need to make sure that all cell types in the 4D tensor are present in each matrix (for those missing we will assign a NaN value).

We get the list of cell types:

In [24]:

Copied!

celltypes = tensor_lr.order_names[-1]
celltypes = tensor_lr.order_names[-1]

Then we reindex all matrices with transcription factors (this will generate missing columns, and fill them with NaNs):

In [25]:

Copied!

tf_activities_nan = [tf.reindex(columns=celltypes) for tf in tf_activities]
tf_activities_nan = [tf.reindex(columns=celltypes) for tf in tf_activities]

Now we can use these DecoupleR outputs directly to generate our 3D transcription activity tensor. First we generate just the tensor containing the values using tensorly:

In [26]:

Copied!

tensor_3d = tl.tensor(tf_activities_nan)
tensor_3d.shape
tensor_3d = tl.tensor(tf_activities_nan)
tensor_3d.shape

Out[26]:

(4, 728, 5)

We then obtain the name of TFs present in the data:

In [27]:

Copied!

tf_names = tf_activities_nan[0].index.tolist()
tf_names = tf_activities_nan[0].index.tolist()

Now we take this tensor to generate an InteractionTensor object using cell2cell:

In [28]:

Copied!





tensor_tf = c2c.tensor.PreBuiltTensor(tensor_3d, 
                                      order_names=[contexts, tf_names, celltypes],
                                      order_labels=['Contexts', 'TFs', 'Receiver Cells']
                                     )
tensor_tf = c2c.tensor.PreBuiltTensor(tensor_3d, 
                                      order_names=[contexts, tf_names, celltypes],
                                      order_labels=['Contexts', 'TFs', 'Receiver Cells']
                                     )

In [29]:

Copied!

tensor_tf.tensor.shape
tensor_tf.tensor.shape

Out[29]:

(4, 728, 5)

5 - Generate coupled tensors¶

After constructing our 4D communication tensor and 3D TF activity tensor, we can create a CoupledInteractionTensor object in Tensor-cell2cell v2. This object will later be used to perform CTCA.

The key step here is to define which dimensions should be coupled between the tensors. In this example:

Context dimensions are coupled directly, since they are identical across both tensors.
Cell type dimension (in the TF tensor) can be coupled either with the sender or receiver dimension (in the communication tensor):
- Couple to sender: TF activity is aligned with the cells producing the ligand of each LR pair. This suggests that TFs may regulate or coordinate with the expression of those ligands in the sender cells.
- Couple to receiver: TF activity is aligned with the cells receiving the ligand via the receptor. This suggests that TFs are potentially activated downstream of the receptor, capturing the intracellular signaling coordinated with the communication programs.

In this tutorial, we choose the second option (coupling to the receiver). This means we focus on TF activity triggered by LR interactions in the CCC program.

Concretely, the context dimension corresponds to dim 0 in both tensors, while the receiver cell dimension is dim 3 in the 4D communication tensor and dim 2 in the 3D TF activity tensor.

In [30]:

Copied!





coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
                                                     tensor2=tensor_tf,
                                                     tensor1_name='cell2cell',
                                                     tensor2_name='decoupler',
                                                      # Below we specify how dims of tensor1 are coupled with dims of tensor2
                                                     mode_mapping={'shared' : [(0,0), (3,2)]}
                                                    )
coupled_tensor = c2c.tensor.CoupledInteractionTensor(tensor1=tensor_lr,
                                                     tensor2=tensor_tf,
                                                     tensor1_name='cell2cell',
                                                     tensor2_name='decoupler',
                                                      # Below we specify how dims of tensor1 are coupled with dims of tensor2
                                                     mode_mapping={'shared' : [(0,0), (3,2)]}
                                                    )

We can observe that this object includes both tensors simultaneously.

In [31]:

Copied!

coupled_tensor.shape
coupled_tensor.shape

Out[31]:

((4, 422, 5, 5), (4, 728, 5))

6 - Run Analysis¶

Perform tensor factorization¶

Next, we apply a non-negative canonical coupled component analysis (CTCA). Briefly, this tensor decomposition identifies a low-rank tensor (here, a rank of 10) for each modality's tensor that better approximates each original tensor. These low-rank tensors can be represented as the sum of a set of rank-1 tensors (10 of them in this case). Each rank-1 tensor represents a factor in the decomposition and can be further represented as the outer product of n vectors, where n represents the number of tensor dimensions. Each vector represents one of the n tensor dimensions for that factor and its values, corresponding to individual elements in each dimension, represent the factor loadings.

In the CTCA, we have vectors that are shared between the coupled tensors (here contexts and receiver cells), while we have other vectors that are private for each modality (LR pair and sender cells for the 4D communication tensor; transcription factors for the 3D TF activity tensor). The loadings shared between both tensors behave exactly the same in both modalities, representing that we are capturing LR pairs and TFs from each modality that present similar trends in each factor. If each tensor was decomposed separately with the standard TCA in Tensor-cell2cell v1, we would not ensure that the key LR pair and TFs for each factor would behave the same, as this could lead to loadings for the context and receiver cells with different behaviors between modalities.

To illustrate how performing the CTCA for identifying LR pairs acting coordinated with TF activities, we skipped the Elbow Analysis, which can be found in our previous tutorial for protein- and metabolite-based LR pairs. Instead, we run the factorization using a manual number of 10 factors:

In [32]:

Copied!





coupled_tensor.compute_tensor_factorization(rank=10, # This is just for illustrative purposes - to keep a low number of factors
                                            init='svd',
                                            random_state=888,
                                            runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
                                           )
coupled_tensor.compute_tensor_factorization(rank=10, # This is just for illustrative purposes - to keep a low number of factors
                                            init='svd',
                                            random_state=888,
                                            runs=1, tol=1e-8, n_iter_max=500 # Robust version of tensor decomposition
                                           )

7 - Results¶

Metadata¶

Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots

This is a list containing metadata for each dimension of the tensor (contexts, LR pairs, sender cells, receiver cells). Each metadata corresponds to a dataframe with info of each element in the respective dimension.

We start with the protein-based tensor.

In [33]:

Copied!





meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr,
                                              metadata_dicts=[None, ppi_functions, None, None],
                                              fill_with_order_elements=True
                                             )
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_lr,
                                              metadata_dicts=[None, ppi_functions, None, None],
                                              fill_with_order_elements=True
                                             )

We continue with the TF activity tensor.

In [34]:

Copied!

tf_functions = {k: 'TF' for k in tensor_tf.order_names[1]}
tf_functions = {k: 'TF' for k in tensor_tf.order_names[1]}

In [35]:

Copied!





meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_tf,
                                              metadata_dicts=[None, tf_functions, None],
                                              fill_with_order_elements=True
                                             )
meta_tf2 = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor_tf,
                                              metadata_dicts=[None, tf_functions, None],
                                              fill_with_order_elements=True
                                             )

Generate unified metadata for CTCA results

By using the metadata generated separately for each tensor, we generate a new metadata that incorporates metadata for the shared and private dimensions between both modalities.

In [36]:

Copied!

new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)
new_meta_tf = coupled_tensor.reorder_metadata(meta_tf, meta_tf2)

Plot factor loadings¶

After performing the CTCA, a set of loadings is obtained in a factor-specific way, for each tensor. These loadings provide details of what elements of each dimension are important within each factor. Here we plot first the loadings for the shared dimensions, followed by the private dimensions of the first and second tensor (LR pairs and TFs, respectively).

In [37]:

Copied!

# Color palettes for each of the coupled tensor dimensions, ordered as order_sorting list below
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2']
# Color palettes for each of the coupled tensor dimensions, ordered as order_sorting list below
cmaps = ['viridis', 'tab20', 'tab20', 'Dark2', 'Dark2']

In [38]:

Copied!





factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
                                                 metadata=new_meta_tf,
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps=cmaps,
                                                 order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells',
                                                                'Protein LRIs', 'TFs'
                                                               ],
                                                 fontsize=14,
                                                 filename=output_folder + 'Tensor-Factorization.pdf'
                                                )
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=coupled_tensor,
                                                 metadata=new_meta_tf,
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps=cmaps,
                                                 order_sorting=['Contexts', 'Sender Cells', 'Receiver Cells',
                                                                'Protein LRIs', 'TFs'
                                                               ],
                                                 fontsize=14,
                                                 filename=output_folder + 'Tensor-Factorization.pdf'
                                                )

No description has been provided for this image

Each factor represents a context-dependent communication pattern. In this visualization, each row is a factor and each column is a tensor dimension in that factor (Shared dimensions: contexts and receiver-cells; Private dimensions: LR pairs, sender-cells, and transcription factors).

The loadings represent the contribution of each element to that pattern, and their combinations between all dimensions represent the overall pattern. For example, in Factor 2, the identified communication pattern occurs in early stages (month 1), where all cell types but GABAergic neurons participate as sender cells. In particular, the signaling by receiver cells seems linked to glial cells and progenitors. Then we can identify the top LR pairs and TFs contributing from cognate modalities. Thus, we can say that the receptors of these LR pairs act in coordination with the TFs in the receiver cells.

Top LR pairs and TFs¶

Top-10 LR pairs

In [39]:

Copied!

for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 10))
    print('')
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('Protein LRIs', 'Factor {}'.format(i+1), 10))
    print('')

PTN^NCL       0.464370
PTN^PTPRZ1    0.441668
MDK^PTPRZ1    0.404244
MDK^NCL       0.392486
CD99^CD99     0.185811
MDK^LRP1      0.182649
PTN^SDC3      0.156745
CDH2^CDH2     0.147477
PTN^SDC1      0.096327
MDK^SDC1      0.093366
Name: Factor 1, dtype: float64

MDK^NCL            0.352663
CDH2^CDH2          0.304189
CD99^CD99          0.281941
MDK^SDC2           0.281588
PTN^NCL            0.204081
MDK^ITGA6&ITGB1    0.178264
PTN^SDC2           0.163884
GAS6^AXL           0.158380
MDK^LRP1           0.157087
MPZL1^MPZL1        0.110423
Name: Factor 2, dtype: float64

PTN^PTPRZ1         0.512311
MDK^PTPRZ1         0.496919
PTN^NCL            0.328947
MDK^NCL            0.268683
CD99^CD99          0.253674
CDH2^CDH2          0.175823
MDK^ITGA6&ITGB1    0.175633
NCAM1^FGFR1        0.143054
PTN^SDC2           0.129606
MDK^LRP1           0.102070
Name: Factor 3, dtype: float64

MDK^NCL        0.494203
CD99^CD99      0.426234
MDK^PTPRZ1     0.408034
MDK^SDC2       0.193602
PTN^NCL        0.175574
MDK^LRP1       0.157664
CADM1^CADM1    0.151180
PTN^PTPRZ1     0.135335
EFNA3^EPHA5    0.129642
EFNA3^EPHA4    0.126296
Name: Factor 4, dtype: float64

MDK^NCL        0.438771
CDH2^CDH2      0.284537
CD99^CD99      0.273331
PTN^NCL        0.221741
MDK^SDC2       0.202529
MDK^LRP1       0.193363
CADM1^CADM1    0.167788
MPZL1^MPZL1    0.147171
EFNB3^EPHB6    0.132789
NCAM1^NCAM1    0.129871
Name: Factor 5, dtype: float64

MDK^NCL            0.374700
MDK^SDC2           0.359838
DLL3^NOTCH2        0.284931
NCAM1^FGFR1        0.272678
DLK1^NOTCH2        0.248256
CDH2^CDH2          0.220813
MDK^ITGA6&ITGB1    0.187641
CD99^CD99          0.179746
MDK^SDC4           0.169932
MDK^LRP1           0.169848
Name: Factor 6, dtype: float64

PTN^PTPRZ1    0.423821
MDK^PTPRZ1    0.371672
PTN^NCL       0.324885
PTN^SDC3      0.294610
MDK^NCL       0.258867
CD99^CD99     0.190757
MDK^LRP1      0.186956
CDH2^CDH2     0.184021
PTN^SDC2      0.154907
MDK^SDC2      0.121523
Name: Factor 7, dtype: float64

CDH2^CDH2             0.521731
NCAM1^NCAM1           0.438277
MDK^NCL               0.435396
PTN^NCL               0.282249
CADM1^CADM1           0.269076
CDH4^CDH4             0.170086
NAMPT^INSR            0.119082
EFNB2^EPHB2           0.106536
EFNB1^EPHB2           0.098939
SEMA3C^NRP1&PLXNA3    0.081387
Name: Factor 8, dtype: float64

MDK^SDC2           0.455132
PTN^SDC2           0.441950
PTN^NCL            0.405290
MDK^NCL            0.359242
CD99^CD99          0.297494
MDK^LRP1           0.179619
MDK^SDC4           0.126331
CDH2^CDH2          0.121484
PTN^SDC4           0.111805
MDK^ITGA6&ITGB1    0.103127
Name: Factor 9, dtype: float64

NCAM1^NCAM1    0.419227
CADM1^CADM1    0.318216
NRXN1^NLGN2    0.220669
NRXN1^NLGN1    0.212132
CADM3^CADM3    0.175772
PTN^PTPRZ1     0.170832
MDK^PTPRZ1     0.168097
PTN^NCL        0.166271
NRXN1^NLGN3    0.162873
L1CAM^L1CAM    0.162465
Name: Factor 10, dtype: float64

Top 5 TFs

In [40]:

Copied!

for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('TFs', 'Factor {}'.format(i+1), 5))
    print('')
for i in range(coupled_tensor.rank):
    print(coupled_tensor.get_top_factor_elements('TFs', 'Factor {}'.format(i+1), 5))
    print('')

NEUROG1    0.117618
HEYL       0.111762
NEUROD2    0.109378
DACH1      0.108111
CRX        0.105775
Name: Factor 1, dtype: float64

TCF7L1    0.102091
CTCFL     0.101045
ELF5      0.101010
MXI1      0.100094
HES5      0.099838
Name: Factor 2, dtype: float64

HOXA7    0.078940
HDGF     0.074023
HMGA2    0.073552
SOX3     0.073537
RBPJ     0.072943
Name: Factor 3, dtype: float64

ZNF143    0.113813
CTCFL     0.107025
ZGPAT     0.105305
EN1       0.104683
SREBF2    0.103721
Name: Factor 4, dtype: float64

KLF3       0.077042
BHLHE41    0.076295
HLX        0.075530
NHLH2      0.074782
KDM5A      0.074587
Name: Factor 5, dtype: float64

SOX9      0.111740
ZNF384    0.109957
TCF7L2    0.108080
FEZF1     0.105497
VHL       0.103428
Name: Factor 6, dtype: float64

HIC1      0.095895
HOXD3     0.090289
HIVEP2    0.087248
GTF2I     0.084096
CIITA     0.082000
Name: Factor 7, dtype: float64

MYOCD     0.106479
HCFC1     0.101568
TBXT      0.096313
RFXANK    0.095671
SHOX2     0.095202
Name: Factor 8, dtype: float64

ONECUT1    0.094022
MITF       0.092153
SP4        0.089893
ELF5       0.089841
CRX        0.089732
Name: Factor 9, dtype: float64

HOXA11    0.176306
GFI1      0.167475
MSX1      0.166739
HOXC8     0.163315
GATA5     0.162388
Name: Factor 10, dtype: float64

Export Loadings¶

THESE VALUES CAN BE USED FOR DOWNSTREAM ANALYSES

In [41]:

Copied!

coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')
coupled_tensor.export_factor_loadings(output_folder + 'Coupled-Loadings.xlsx')

Coupled tensor factor loadings saved to ./results/Tensor-cell2cell/Coupled-Loadings.xlsx

8 - Downstream Analyses¶

Generate factor-specific networks¶

We can generate factor-specific communication networks connecting sender-to-receiver cells.

In [42]:

Copied!

networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, 
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells')
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(coupled_tensor.factors, 
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells')

Generate matrix of cell-cell pairs by factors

In [43]:

Copied!

network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')

Select cell-cell pairs with high potential of interaction

In [44]:

Copied!

_ = plt.hist(network_by_factors.values.flatten(), bins = 20)
_ = plt.hist(network_by_factors.values.flatten(), bins = 20)

In [45]:

Copied!

# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2
# To consider only cell pairs with high potential to interact in each factor
cci_threshold = 0.2

Visualize networks of factors with significant differences between groups

In [46]:

Copied!





net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
                                                 included_factors=None,
                                                 ccc_threshold=cci_threshold, # Only important communication
                                                 nrows=2,
                                                 panel_size=(8, 8),
                                                 network_layout='circular',
                                                 filename=output_folder + 'Networks.pdf'
                                                )
net_fig, net_ax = c2c.plotting.ccc_networks_plot(coupled_tensor.factors,
                                                 included_factors=None,
                                                 ccc_threshold=cci_threshold, # Only important communication
                                                 nrows=2,
                                                 panel_size=(8, 8),
                                                 network_layout='circular',
                                                 filename=output_folder + 'Networks.pdf'
                                                )

Clustermaps¶

Similarly, we can visualize the contribution of each dimension element across factors by plotting their loadings

In [47]:

Copied!

factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]
factor_sorted = ['Factor {}'.format(f) for f in range(1, coupled_tensor.rank+1)]

Cluster cell-cell pairs by their potential across factors

Here we display the outer product between the sender and receiver cell dimensions, which are the value governing the factor-specific networks above.

In [48]:

Copied!





cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
                                         use_zscore=False, 
                                         loading_threshold=cci_threshold, # To consider relevant CCIs
                                         figsize=(20, 9),
                                         tick_fontsize=16,
                                         cmap='YlOrBr',
                                         filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
                                         row_cluster=False
                                        )
cci_cm = c2c.plotting.loading_clustermap(network_by_factors[factor_sorted],
                                         use_zscore=False, 
                                         loading_threshold=cci_threshold, # To consider relevant CCIs
                                         figsize=(20, 9),
                                         tick_fontsize=16,
                                         cmap='YlOrBr',
                                         filename=output_folder + 'Clustermap-Factor-specific-CCI.pdf',
                                         row_cluster=False
                                        )

Cluster LR pairs by their importance across factors

We can plot the key LR pairs from the CCC modality and TFs from the TF activity modality. First we identify what could represent a high loading given their separate distributions.

In [49]:

Copied!

_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['Protein LRIs'].values.flatten(), bins = 20)

In [50]:

Copied!

_ = plt.hist(coupled_tensor.factors['TFs'].values.flatten(), bins = 20)
_ = plt.hist(coupled_tensor.factors['TFs'].values.flatten(), bins = 20)

And set and arbitrary threshold to filter out those that have low loadings

In [51]:

Copied!

# To consider only most relevant LR pairs
lr_threshold = 0.1
tf_threshold = 0.1
# To consider only most relevant LR pairs
lr_threshold = 0.1
tf_threshold = 0.1

Since we have loadings from different modalities, we want to visualize them in a comparable way, so we normalize each of them by their maximum value in each case.

In [52]:

Copied!





df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold).any()]
df1 = (df1 / df1.max())

df2 = coupled_tensor.factors['TFs'].copy()
df2 = df2[(df2.T > tf_threshold).any()]
df2 = (df2 / df2.max())
df1 = coupled_tensor.factors['Protein LRIs'].copy()
df1 = df1[(df1.T > lr_threshold).any()]
df1 = (df1 / df1.max())

df2 = coupled_tensor.factors['TFs'].copy()
df2 = df2[(df2.T > tf_threshold).any()]
df2 = (df2 / df2.max())

In [53]:

Copied!





lr_cm = c2c.plotting.loading_clustermap(df1,
                                        loading_threshold=lr_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(10, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df1,
                                        loading_threshold=lr_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(10, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-ProtLRs.pdf',
                                        row_cluster=False
                                       )

In [54]:

Copied!





lr_cm = c2c.plotting.loading_clustermap(df2,
                                        loading_threshold=tf_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(16, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-TFs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df2,
                                        loading_threshold=tf_threshold, # To consider only top LRs
                                        use_zscore=False,
                                        figsize=(16, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-TFs.pdf',
                                        row_cluster=False
                                       )

Given that we max-scaled them, we can also put them together in an unified plot

In [55]:

Copied!

df = pd.concat([df1, df2])
df = pd.concat([df1, df2])

In [56]:

Copied!





lr_cm = c2c.plotting.loading_clustermap(df,
                                        loading_threshold=0.1, # To consider only top LRs and TFs
                                        use_zscore=False,
                                        figsize=(24, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-LR-TFs.pdf',
                                        row_cluster=False
                                       )
lr_cm = c2c.plotting.loading_clustermap(df,
                                        loading_threshold=0.1, # To consider only top LRs and TFs
                                        use_zscore=False,
                                        figsize=(24, 5),
                                        cbar_fontsize=20,
                                        tick_fontsize=10,
                                        cmap='YlOrBr',
                                        filename=output_folder + 'Clustermap-Factor-specific-LR-TFs.pdf',
                                        row_cluster=False
                                       )