# Documentation for *cell2cell*

This documentation is for our *cell2cell* suite, which includes the regular cell2cell
and Tensor-cell2cell tools. The former is for inferring cell-cell interactions
and communication in one sample or context, while the latter is for deconvolving complex patterns
of cell-cell communication across multiple samples or contexts simultaneously into interpretable factors
representing patterns of communication.

Here, multiple classes and functions are implemented to facilitate the analyses, including a variety of visualizations to simplify the interpretation of results:

**cell2cell.analysis**: Includes simplified pipelines for running the analyses, and functions for downstream analyses of Tensor-cell2cell**cell2cell.clustering**: Includes multiple scipy-based functions for performing clustering methods.**cell2cell.core**: Includes the core functions for inferring cell-cell interactions and communication. It includes scoring methods, cell classes, and interaction spaces.**cell2cell.datasets**: Includes toy datasets and annotations for testing functions in basic scenarios.**cell2cell.external**: Includes built-in approaches borrowed from other tools to avoid incompatibilities (e.g. UMAP, tensorly, and PCoA).**cell2cell.io**: Includes functions for opening and saving diverse types of files.**cell2cell.plotting**: Includes all the visualization options that*cell2cell*offers.**cell2cell.preprocessing**: Includes functions for manipulating data and variables (e.g. data preprocessing, integration, permutation, among others).**cell2cell.spatial**: Includes filtering of cell-cell interactions results given intercellular distance, as well as defining neighborhoods by grids or moving windows.**cell2cell.stats**: Includes statistical analyses such as enrichment analysis, multiple test correction methods, permutation approaches, and Gini coefficient.**cell2cell.tensor**: Includes all functions pertinent to the analysis of*Tensor-cell2cell***cell2cell.utils**: Includes general utilities for analyzing networks and performing parallel computing.

Below, all the inputs, parameters (including their different options), and outputs are detailed. Source code of the functions is also included.

##
`analysis`

`special`

###
`cell2cell_pipelines`

####
```
BulkInteractions
```

Interaction class with all necessary methods to run the cell2cell pipeline on a bulk RNA-seq dataset. Cells here could be represented by tissues, samples or any bulk organization of cells.

###### Parameters

rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment. Columns are samples and rows are genes.

ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.

metadata : pandas.Dataframe, default=None Metadata associated with the samples in the RNA-seq dataset.

interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.

communication_score : str, default='expression_thresholding' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:

```
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. Options:

```
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.

sample_col : str, default='sampleID' Column-name for the samples in the metadata.

group_col : str, default='tissue' Column-name for the grouping information associated with the samples in the metadata.

expression_threshold : float, default=10 Threshold value to binarize gene expression when using communication_score='expression_thresholding'. Units have to be the same as the rnaseq_data matrix (e.g., TPMs, counts, etc.).

complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".

complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

verbose : boolean, default=False Whether printing or not steps of the analysis.

###### Attributes

rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment. Columns are samples and rows are genes.

metadata : pandas.DataFrame Metadata associated with the samples in the RNA-seq dataset.

index_col : str Column-name for the samples in the metadata.

group_col : str Column-name for the grouping information associated with the samples in the metadata.

ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.

complex_sep : str Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".

complex_agg_method : str Method to aggregate the expression value of multiple genes in a complex.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

ref_ppi : pandas.DataFrame Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).

interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.

analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):

```
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
```

cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:

```
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
```

interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
class BulkInteractions:
'''Interaction class with all necessary methods to run the cell2cell pipeline
on a bulk RNA-seq dataset. Cells here could be represented by tissues, samples
or any bulk organization of cells.
Parameters
----------
rnaseq_data : pandas.DataFrame
Gene expression data for a bulk RNA-seq experiment. Columns are samples
and rows are genes.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication.
metadata : pandas.Dataframe, default=None
Metadata associated with the samples in the RNA-seq dataset.
interaction_columns : tuple, default=('A', 'B')
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
communication_score : str, default='expression_thresholding'
Type of communication score to infer the potential use of a given ligand-
receptor pair by a pair of cells/tissues/samples.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
cci_score : str, default='bray_curtis'
Scoring function to aggregate the communication scores between a pair of
cells. It computes an overall potential of cell-cell interactions.
Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
cci_type : str, default='undirected'
Specifies whether computing the cci_score in a directed or undirected
way. For a pair of cells A and B, directed means that the ligands are
considered only from cell A and receptors only from cell B or viceversa.
While undirected simultaneously considers signaling from cell A to
cell B and from cell B to cell A.
sample_col : str, default='sampleID'
Column-name for the samples in the metadata.
group_col : str, default='tissue'
Column-name for the grouping information associated with the samples
in the metadata.
expression_threshold : float, default=10
Threshold value to binarize gene expression when using
communication_score='expression_thresholding'. Units have to be the
same as the rnaseq_data matrix (e.g., TPMs, counts, etc.).
complex_sep : str, default=None
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min'
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
verbose : boolean, default=False
Whether printing or not steps of the analysis.
Attributes
----------
rnaseq_data : pandas.DataFrame
Gene expression data for a bulk RNA-seq experiment. Columns are samples
and rows are genes.
metadata : pandas.DataFrame
Metadata associated with the samples in the RNA-seq dataset.
index_col : str
Column-name for the samples in the metadata.
group_col : str
Column-name for the grouping information associated with the samples
in the metadata.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used for
inferring the cell-cell interactions and communication.
complex_sep : str
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
ref_ppi : pandas.DataFrame
Reference list of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication. It could be the
same as 'ppi_data' if ppi_data is not bidirectional (that is, contains
ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must
be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
interaction_columns : tuple
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
analysis_setup : dict
Contains main setup for running the cell-cell interactions and communication
analyses.
Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
cutoff_setup : dict
Contains two keys: 'type' and 'parameter'. The first key represent the
way to use a cutoff or threshold, while parameter is the value used
to binarize the expression values.
The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
interaction_space : cell2cell.core.interaction_space.InteractionSpace
Interaction space that contains all the required elements to perform the
cell-cell interaction and communication analysis between every pair of cells.
After performing the analyses, the results are stored in this object.
'''
def __init__(self, rnaseq_data, ppi_data, metadata=None, interaction_columns=('A', 'B'),
communication_score='expression_thresholding', cci_score='bray_curtis', cci_type='undirected',
sample_col='sampleID', group_col='tissue', expression_threshold=10, complex_sep=None,
complex_agg_method='min', verbose=False):
# Placeholders
self.rnaseq_data = rnaseq_data
self.metadata = metadata
self.index_col = sample_col
self.group_col = group_col
self.analysis_setup = dict()
self.cutoff_setup = dict()
self.complex_sep = complex_sep
self.complex_agg_method = complex_agg_method
self.interaction_columns = interaction_columns
# Analysis setup
self.analysis_setup['communication_score'] = communication_score
self.analysis_setup['cci_score'] = cci_score
self.analysis_setup['cci_type'] = cci_type
self.analysis_setup['ccc_type'] = cci_type
# Initialize PPI
genes = list(rnaseq_data.index)
ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
proteins=genes,
complex_sep=complex_sep,
upper_letter_comparison=False,
interaction_columns=self.interaction_columns)
self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
interaction_columns=self.interaction_columns,
verbose=verbose)
if self.analysis_setup['cci_type'] == 'undirected':
self.ref_ppi = self.ppi_data.copy()
self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
interaction_columns=self.interaction_columns,
verbose=verbose)
else:
self.ref_ppi = None
# Thresholding
self.cutoff_setup['type'] = 'constant_value'
self.cutoff_setup['parameter'] = expression_threshold
# Interaction Space
self.interaction_space = initialize_interaction_space(rnaseq_data=self.rnaseq_data,
ppi_data=self.ppi_data,
cutoff_setup=self.cutoff_setup,
analysis_setup=self.analysis_setup,
complex_sep=complex_sep,
complex_agg_method=complex_agg_method,
interaction_columns=self.interaction_columns,
verbose=verbose)
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
'''Computes overall CCI scores for each pair of cells.
Parameters
----------
cci_score : str, default=None
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
self.interaction_space.compute_pairwise_cci_scores(cci_score=cci_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
interaction_columns=None, cells=None, cci_type=None, verbose=True):
'''Computes the communication scores for each LR pairs in
a given pair of sender-receiver cell
Parameters
----------
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None
Reference list of protein-protein interactions (or
ligand-receptor pairs) used for inferring the cell-cell
interactions and communication. It could be the same as
'ppi_data' if ppi_data is not bidirectional (that is,
contains ProtA-ProtB interaction as well as ProtB-ProtA
interaction). ref_ppi must be undirected (contains only
ProtA-ProtB and not ProtB-ProtA interaction). If None
the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None
Contains the names of the columns where to find the
partners in a dataframe of protein-protein interactions.
If the list is for ligand-receptor pairs, the first column
is for the ligands and the second for the receptors. If
None, the one stored in the attribute interaction_columns
will be used.
cells : list=None
List of cells to consider.
cci_type : str, default=None
Type of interaction between two cells. Used to specify
if we want to consider a LR pair in both directions.
It can be:
- 'undirected'
- 'directed'
If None, 'directed' will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
if interaction_columns is None:
interaction_columns = self.interaction_columns # Used only for ref_ppi_data
if ref_ppi_data is None:
ref_ppi_data = self.ref_ppi
if cci_type is None:
cci_type = 'directed'
self.analysis_setup['ccc_type'] = cci_type
self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
use_ppi_score=use_ppi_score,
ref_ppi_data=ref_ppi_data,
interaction_columns=interaction_columns,
cells=cells,
cci_type=cci_type,
verbose=verbose)
@property
def interaction_elements(self):
'''Returns the interaction elements within an interaction space.'''
if hasattr(self.interaction_space, 'interaction_elements'):
return self.interaction_space.interaction_elements
else:
return None
```

#####
`interaction_elements`

`property`

`readonly`

Returns the interaction elements within an interaction space.

#####
`compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)`

Computes overall CCI scores for each pair of cells.

###### Parameters

cci_score : str, default=None Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:

```
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.

verbose : boolean, default=True Whether printing or not steps of the analysis.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
'''Computes overall CCI scores for each pair of cells.
Parameters
----------
cci_score : str, default=None
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
self.interaction_space.compute_pairwise_cci_scores(cci_score=cci_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
```

#####
`compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=None, cells=None, cci_type=None, verbose=True)`

Computes the communication scores for each LR pairs in a given pair of sender-receiver cell

###### Parameters

communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:

```
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.

ref_ppi_data : pandas.DataFrame, default=None Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). If None the one stored in the attribute ref_ppi will be used.

interaction_columns : tuple, default=None Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. If None, the one stored in the attribute interaction_columns will be used.

cells : list=None List of cells to consider.

cci_type : str, default=None Type of interaction between two cells. Used to specify if we want to consider a LR pair in both directions. It can be:

```
- 'undirected'
- 'directed'
If None, 'directed' will be used.
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
interaction_columns=None, cells=None, cci_type=None, verbose=True):
'''Computes the communication scores for each LR pairs in
a given pair of sender-receiver cell
Parameters
----------
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None
Reference list of protein-protein interactions (or
ligand-receptor pairs) used for inferring the cell-cell
interactions and communication. It could be the same as
'ppi_data' if ppi_data is not bidirectional (that is,
contains ProtA-ProtB interaction as well as ProtB-ProtA
interaction). ref_ppi must be undirected (contains only
ProtA-ProtB and not ProtB-ProtA interaction). If None
the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None
Contains the names of the columns where to find the
partners in a dataframe of protein-protein interactions.
If the list is for ligand-receptor pairs, the first column
is for the ligands and the second for the receptors. If
None, the one stored in the attribute interaction_columns
will be used.
cells : list=None
List of cells to consider.
cci_type : str, default=None
Type of interaction between two cells. Used to specify
if we want to consider a LR pair in both directions.
It can be:
- 'undirected'
- 'directed'
If None, 'directed' will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
if interaction_columns is None:
interaction_columns = self.interaction_columns # Used only for ref_ppi_data
if ref_ppi_data is None:
ref_ppi_data = self.ref_ppi
if cci_type is None:
cci_type = 'directed'
self.analysis_setup['ccc_type'] = cci_type
self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
use_ppi_score=use_ppi_score,
ref_ppi_data=ref_ppi_data,
interaction_columns=interaction_columns,
cells=cells,
cci_type=cci_type,
verbose=verbose)
```

####
```
SingleCellInteractions
```

Interaction class with all necessary methods to run the cell2cell pipeline on a single-cell RNA-seq dataset.

###### Parameters

rnaseq_data : pandas.DataFrame or scanpy.AnnData Gene expression data for a single-cell RNA-seq experiment. If it is a dataframe columns are single cells and rows are genes, while if it is a AnnData object, columns are genes and rows are single cells.

ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.

metadata : pandas.Dataframe Metadata containing the cell types for each single cells in the RNA-seq dataset.

interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.

communication_score : str, default='expression_thresholding' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:

```
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. Options:

```
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.

expression_threshold : float, default=0.2 Threshold value to binarize gene expression when using communication_score='expression_thresholding'. Units have to be the same as the aggregated gene expression matrix (e.g., counts, fraction of cells with non-zero counts, etc.).

aggregation_method : str, default='nn_cell_fraction' Specifies the method to use to aggregate gene expression of single cells into their respective cell types. Used to perform the CCI analysis since it is on the cell types rather than single cells. Options are:

```
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
```

barcode_col : str, default='barcodes' Column-name for the single cells in the metadata.

celltype_col : str, default='celltypes' Column-name in the metadata for the grouping single cells into cell types by the selected aggregation method.

complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".

complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

verbose : boolean, default=False Whether printing or not steps of the analysis.

###### Attributes

rnaseq_data : pandas.DataFrame or scanpy.AnnData Gene expression data for a single-cell RNA-seq experiment. If it is a dataframe columns are single cells and rows are genes, while if it is a AnnData object, columns are genes and rows are single cells.

metadata : pandas.DataFrame Metadata containing the cell types for each single cells in the RNA-seq dataset.

index_col : str Column-name for the single cells in the metadata.

group_col : str Column-name in the metadata for the grouping single cells into cell types by the selected aggregation method.

complex_sep : str Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".

complex_agg_method : str Method to aggregate the expression value of multiple genes in a complex.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

ref_ppi : pandas.DataFrame Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).

interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.

analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):

```
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
```

cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:

```
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
```

interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.

aggregation_method : str Specifies the method to use to aggregate gene expression of single cells into their respective cell types. Used to perform the CCI analysis since it is on the cell types rather than single cells. Options are:

```
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
```

ccc_permutation_pvalues : pandas.DataFrame Contains the P-values of the permutation analysis on the communication scores.

cci_permutation_pvalues : pandas.DataFrame Contains the P-values of the permutation analysis on the CCI scores.

__adata : boolean Auxiliary variable used for storing whether rnaseq_data is an AnnData object.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
class SingleCellInteractions:
'''Interaction class with all necessary methods to run the cell2cell pipeline
on a single-cell RNA-seq dataset.
Parameters
----------
rnaseq_data : pandas.DataFrame or scanpy.AnnData
Gene expression data for a single-cell RNA-seq experiment. If it is a
dataframe columns are single cells and rows are genes, while if it is
a AnnData object, columns are genes and rows are single cells.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication.
metadata : pandas.Dataframe
Metadata containing the cell types for each single cells in the
RNA-seq dataset.
interaction_columns : tuple, default=('A', 'B')
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
communication_score : str, default='expression_thresholding'
Type of communication score to infer the potential use of a given ligand-
receptor pair by a pair of cells/tissues/samples.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
cci_score : str, default='bray_curtis'
Scoring function to aggregate the communication scores between a pair of
cells. It computes an overall potential of cell-cell interactions.
Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
cci_type : str, default='undirected'
Specifies whether computing the cci_score in a directed or undirected
way. For a pair of cells A and B, directed means that the ligands are
considered only from cell A and receptors only from cell B or viceversa.
While undirected simultaneously considers signaling from cell A to
cell B and from cell B to cell A.
expression_threshold : float, default=0.2
Threshold value to binarize gene expression when using
communication_score='expression_thresholding'. Units have to be the
same as the aggregated gene expression matrix (e.g., counts, fraction
of cells with non-zero counts, etc.).
aggregation_method : str, default='nn_cell_fraction'
Specifies the method to use to aggregate gene expression of single
cells into their respective cell types. Used to perform the CCI
analysis since it is on the cell types rather than single cells.
Options are:
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
barcode_col : str, default='barcodes'
Column-name for the single cells in the metadata.
celltype_col : str, default='celltypes'
Column-name in the metadata for the grouping single cells into cell types
by the selected aggregation method.
complex_sep : str, default=None
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min'
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
verbose : boolean, default=False
Whether printing or not steps of the analysis.
Attributes
----------
rnaseq_data : pandas.DataFrame or scanpy.AnnData
Gene expression data for a single-cell RNA-seq experiment. If it is a
dataframe columns are single cells and rows are genes, while if it is
a AnnData object, columns are genes and rows are single cells.
metadata : pandas.DataFrame
Metadata containing the cell types for each single cells in the
RNA-seq dataset.
index_col : str
Column-name for the single cells in the metadata.
group_col : str
Column-name in the metadata for the grouping single cells into cell types
by the selected aggregation method.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used for
inferring the cell-cell interactions and communication.
complex_sep : str
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
ref_ppi : pandas.DataFrame
Reference list of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication. It could be the
same as 'ppi_data' if ppi_data is not bidirectional (that is, contains
ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must
be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
interaction_columns : tuple
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
analysis_setup : dict
Contains main setup for running the cell-cell interactions and communication
analyses.
Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
cutoff_setup : dict
Contains two keys: 'type' and 'parameter'. The first key represent the
way to use a cutoff or threshold, while parameter is the value used
to binarize the expression values.
The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
interaction_space : cell2cell.core.interaction_space.InteractionSpace
Interaction space that contains all the required elements to perform the
cell-cell interaction and communication analysis between every pair of cells.
After performing the analyses, the results are stored in this object.
aggregation_method : str
Specifies the method to use to aggregate gene expression of single
cells into their respective cell types. Used to perform the CCI
analysis since it is on the cell types rather than single cells.
Options are:
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
ccc_permutation_pvalues : pandas.DataFrame
Contains the P-values of the permutation analysis on the
communication scores.
cci_permutation_pvalues : pandas.DataFrame
Contains the P-values of the permutation analysis on the
CCI scores.
__adata : boolean
Auxiliary variable used for storing whether rnaseq_data
is an AnnData object.
'''
compute_pairwise_cci_scores = BulkInteractions.compute_pairwise_cci_scores
compute_pairwise_communication_scores = BulkInteractions.compute_pairwise_communication_scores
interaction_elements = BulkInteractions.interaction_elements
def __init__(self, rnaseq_data, ppi_data, metadata, interaction_columns=('A', 'B'),
communication_score='expression_thresholding', cci_score='bray_curtis', cci_type='undirected',
expression_threshold=0.20, aggregation_method='nn_cell_fraction', barcode_col='barcodes',
celltype_col='cell_types', complex_sep=None, complex_agg_method='min', verbose=False):
# Placeholders
self.rnaseq_data = rnaseq_data
self.metadata = metadata
self.index_col = barcode_col
self.group_col = celltype_col
self.aggregation_method = aggregation_method
self.analysis_setup = dict()
self.cutoff_setup = dict()
self.complex_sep = complex_sep
self.complex_agg_method = complex_agg_method
self.interaction_columns = interaction_columns
self.ccc_permutation_pvalues = None
self.cci_permutation_pvalues = None
if isinstance(rnaseq_data, scanpy.AnnData):
self.__adata = True
genes = list(rnaseq_data.var.index)
else:
self.__adata = False
genes = list(rnaseq_data.index)
# Analysis
self.analysis_setup['communication_score'] = communication_score
self.analysis_setup['cci_score'] = cci_score
self.analysis_setup['cci_type'] = cci_type
self.analysis_setup['ccc_type'] = cci_type
# Initialize PPI
ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
proteins=genes,
complex_sep=complex_sep,
upper_letter_comparison=False,
interaction_columns=interaction_columns)
self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
interaction_columns=interaction_columns,
verbose=verbose)
if self.analysis_setup['cci_type'] == 'undirected':
self.ref_ppi = self.ppi_data
self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
interaction_columns=interaction_columns,
verbose=verbose)
else:
self.ref_ppi = None
# Thresholding
self.cutoff_setup['type'] = 'constant_value'
self.cutoff_setup['parameter'] = expression_threshold
# Aggregate single-cell RNA-Seq data
self.aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
metadata=self.metadata,
barcode_col=self.index_col,
celltype_col=self.group_col,
method=self.aggregation_method,
transposed=self.__adata)
# Interaction Space
self.interaction_space = initialize_interaction_space(rnaseq_data=self.aggregated_expression,
ppi_data=self.ppi_data,
cutoff_setup=self.cutoff_setup,
analysis_setup=self.analysis_setup,
complex_sep=self.complex_sep,
complex_agg_method=self.complex_agg_method,
interaction_columns=self.interaction_columns,
verbose=verbose)
def permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None,
verbose=False):
'''Performs permutation analysis of cell-type labels. Detects
significant CCI or communication scores.
Parameters
----------
permutations : int, default=100
Number of permutations where in each of them a random
shuffle of cell-type labels is performed, followed of
computing CCI or communication scores to create a null
distribution.
evaluation : str, default='communication'
Whether calculating P-values for CCI or communication scores.
- 'interactions' : For CCI scores.
- 'communication' : For communication scores.
fdr_correction : boolean, default=True
Whether performing a multiple test correction after
computing P-values. In this case corresponds to an
FDR or Benjamini-Hochberg correction, using an alpha
equal to 0.05.
random_state : int, default=None
Seed for randomization.
verbose : boolean, default=False
Whether printing or not steps of the analysis.
'''
if evaluation == 'communication':
if 'communication_matrix' not in self.interaction_space.interaction_elements.keys():
raise ValueError('Run the method compute_pairwise_communication_scores() before permutation analysis.')
score = self.interaction_space.interaction_elements['communication_matrix'].copy()
elif evaluation == 'interactions':
if not hasattr(self.interaction_space, 'distance_matrix'):
raise ValueError('Run the method compute_pairwise_interactions() before permutation analysis.')
score = self.interaction_space.interaction_elements['cci_matrix'].copy()
else:
raise ValueError('Not a valid evaluation')
randomized_scores = []
analysis_setup = self.analysis_setup.copy()
ppi_data = self.ppi_data
if (evaluation == 'communication') & (self.analysis_setup['cci_type'] != self.analysis_setup['ccc_type']):
analysis_setup['cci_type'] = analysis_setup['ccc_type']
if self.analysis_setup['cci_type'] == 'directed':
ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
interaction_columns=self.interaction_columns,
verbose=verbose)
elif self.analysis_setup['cci_type'] == 'undirected':
ppi_data = self.ref_ppi
for i in tqdm(range(permutations), disable=not verbose):
if random_state is not None:
seed = random_state + i
else:
seed = random_state
randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
columns=self.group_col,
random_state=seed)
aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
metadata=randomized_meta,
barcode_col=self.index_col,
celltype_col=self.group_col,
method=self.aggregation_method,
transposed=self.__adata)
interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
ppi_data=ppi_data,
cutoff_setup=self.cutoff_setup,
analysis_setup=analysis_setup,
complex_sep=self.complex_sep,
complex_agg_method=self.complex_agg_method,
interaction_columns=self.interaction_columns,
verbose=False)
if evaluation == 'communication':
interaction_space.compute_pairwise_communication_scores(verbose=False)
randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
elif evaluation == 'interactions':
interaction_space.compute_pairwise_cci_scores(verbose=False)
randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
randomized_scores = np.array(randomized_scores)
base_scores = score.values.flatten()
pvals = np.ones(base_scores.shape)
n_pvals = len(base_scores)
randomized_scores = randomized_scores.reshape((-1, n_pvals))
for i in range(n_pvals):
dist = randomized_scores[:, i]
dist = np.append(dist, base_scores[i])
pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
dist=dist,
consider_size=True,
comparison='different'
)
pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
if fdr_correction:
symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
if symmetric:
pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
alpha=0.05)
else:
pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
alpha=0.05)
if evaluation == 'communication':
self.ccc_permutation_pvalues = pval_df
elif evaluation == 'interactions':
self.cci_permutation_pvalues = pval_df
return pval_df
```

#####
`interaction_elements`

`property`

`readonly`

Returns the interaction elements within an interaction space.

#####
`compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)`

Computes overall CCI scores for each pair of cells.

###### Parameters

cci_score : str, default=None Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:

```
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.

verbose : boolean, default=True Whether printing or not steps of the analysis.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
'''Computes overall CCI scores for each pair of cells.
Parameters
----------
cci_score : str, default=None
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
self.interaction_space.compute_pairwise_cci_scores(cci_score=cci_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
```

#####
`compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=None, cells=None, cci_type=None, verbose=True)`

Computes the communication scores for each LR pairs in a given pair of sender-receiver cell

###### Parameters

communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:

```
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

ref_ppi_data : pandas.DataFrame, default=None Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). If None the one stored in the attribute ref_ppi will be used.

interaction_columns : tuple, default=None Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. If None, the one stored in the attribute interaction_columns will be used.

cells : list=None List of cells to consider.

cci_type : str, default=None Type of interaction between two cells. Used to specify if we want to consider a LR pair in both directions. It can be:

```
- 'undirected'
- 'directed'
If None, 'directed' will be used.
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
interaction_columns=None, cells=None, cci_type=None, verbose=True):
'''Computes the communication scores for each LR pairs in
a given pair of sender-receiver cell
Parameters
----------
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None
Reference list of protein-protein interactions (or
ligand-receptor pairs) used for inferring the cell-cell
interactions and communication. It could be the same as
'ppi_data' if ppi_data is not bidirectional (that is,
contains ProtA-ProtB interaction as well as ProtB-ProtA
interaction). ref_ppi must be undirected (contains only
ProtA-ProtB and not ProtB-ProtA interaction). If None
the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None
Contains the names of the columns where to find the
partners in a dataframe of protein-protein interactions.
If the list is for ligand-receptor pairs, the first column
is for the ligands and the second for the receptors. If
None, the one stored in the attribute interaction_columns
will be used.
cells : list=None
List of cells to consider.
cci_type : str, default=None
Type of interaction between two cells. Used to specify
if we want to consider a LR pair in both directions.
It can be:
- 'undirected'
- 'directed'
If None, 'directed' will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
'''
if interaction_columns is None:
interaction_columns = self.interaction_columns # Used only for ref_ppi_data
if ref_ppi_data is None:
ref_ppi_data = self.ref_ppi
if cci_type is None:
cci_type = 'directed'
self.analysis_setup['ccc_type'] = cci_type
self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
use_ppi_score=use_ppi_score,
ref_ppi_data=ref_ppi_data,
interaction_columns=interaction_columns,
cells=cells,
cci_type=cci_type,
verbose=verbose)
```

#####
`permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None, verbose=False)`

Performs permutation analysis of cell-type labels. Detects significant CCI or communication scores.

###### Parameters

permutations : int, default=100 Number of permutations where in each of them a random shuffle of cell-type labels is performed, followed of computing CCI or communication scores to create a null distribution.

evaluation : str, default='communication' Whether calculating P-values for CCI or communication scores.

```
- 'interactions' : For CCI scores.
- 'communication' : For communication scores.
```

fdr_correction : boolean, default=True Whether performing a multiple test correction after computing P-values. In this case corresponds to an FDR or Benjamini-Hochberg correction, using an alpha equal to 0.05.

random_state : int, default=None Seed for randomization.

verbose : boolean, default=False Whether printing or not steps of the analysis.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None,
verbose=False):
'''Performs permutation analysis of cell-type labels. Detects
significant CCI or communication scores.
Parameters
----------
permutations : int, default=100
Number of permutations where in each of them a random
shuffle of cell-type labels is performed, followed of
computing CCI or communication scores to create a null
distribution.
evaluation : str, default='communication'
Whether calculating P-values for CCI or communication scores.
- 'interactions' : For CCI scores.
- 'communication' : For communication scores.
fdr_correction : boolean, default=True
Whether performing a multiple test correction after
computing P-values. In this case corresponds to an
FDR or Benjamini-Hochberg correction, using an alpha
equal to 0.05.
random_state : int, default=None
Seed for randomization.
verbose : boolean, default=False
Whether printing or not steps of the analysis.
'''
if evaluation == 'communication':
if 'communication_matrix' not in self.interaction_space.interaction_elements.keys():
raise ValueError('Run the method compute_pairwise_communication_scores() before permutation analysis.')
score = self.interaction_space.interaction_elements['communication_matrix'].copy()
elif evaluation == 'interactions':
if not hasattr(self.interaction_space, 'distance_matrix'):
raise ValueError('Run the method compute_pairwise_interactions() before permutation analysis.')
score = self.interaction_space.interaction_elements['cci_matrix'].copy()
else:
raise ValueError('Not a valid evaluation')
randomized_scores = []
analysis_setup = self.analysis_setup.copy()
ppi_data = self.ppi_data
if (evaluation == 'communication') & (self.analysis_setup['cci_type'] != self.analysis_setup['ccc_type']):
analysis_setup['cci_type'] = analysis_setup['ccc_type']
if self.analysis_setup['cci_type'] == 'directed':
ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
interaction_columns=self.interaction_columns,
verbose=verbose)
elif self.analysis_setup['cci_type'] == 'undirected':
ppi_data = self.ref_ppi
for i in tqdm(range(permutations), disable=not verbose):
if random_state is not None:
seed = random_state + i
else:
seed = random_state
randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
columns=self.group_col,
random_state=seed)
aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
metadata=randomized_meta,
barcode_col=self.index_col,
celltype_col=self.group_col,
method=self.aggregation_method,
transposed=self.__adata)
interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
ppi_data=ppi_data,
cutoff_setup=self.cutoff_setup,
analysis_setup=analysis_setup,
complex_sep=self.complex_sep,
complex_agg_method=self.complex_agg_method,
interaction_columns=self.interaction_columns,
verbose=False)
if evaluation == 'communication':
interaction_space.compute_pairwise_communication_scores(verbose=False)
randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
elif evaluation == 'interactions':
interaction_space.compute_pairwise_cci_scores(verbose=False)
randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
randomized_scores = np.array(randomized_scores)
base_scores = score.values.flatten()
pvals = np.ones(base_scores.shape)
n_pvals = len(base_scores)
randomized_scores = randomized_scores.reshape((-1, n_pvals))
for i in range(n_pvals):
dist = randomized_scores[:, i]
dist = np.append(dist, base_scores[i])
pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
dist=dist,
consider_size=True,
comparison='different'
)
pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
if fdr_correction:
symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
if symmetric:
pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
alpha=0.05)
else:
pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
alpha=0.05)
if evaluation == 'communication':
self.ccc_permutation_pvalues = pval_df
elif evaluation == 'interactions':
self.cci_permutation_pvalues = pval_df
return pval_df
```

####
`initialize_interaction_space(rnaseq_data, ppi_data, cutoff_setup, analysis_setup, excluded_cells=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)`

Initializes a InteractionSpace object to perform the analyses

###### Parameters

rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are samples and rows are genes.

cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:

```
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
```

analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):

```
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
```

excluded_cells : list, default=None List of cells in the rnaseq_data to be excluded. If None, all cells are considered.

complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".

complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
```

interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.

## Source code in `cell2cell/analysis/cell2cell_pipelines.py`

```
def initialize_interaction_space(rnaseq_data, ppi_data, cutoff_setup, analysis_setup, excluded_cells=None,
complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'),
verbose=True):
'''Initializes a InteractionSpace object to perform the analyses
Parameters
----------
rnaseq_data : pandas.DataFrame
Gene expression data for a bulk RNA-seq experiment or a single-cell
experiment after aggregation into cell types. Columns are samples
and rows are genes.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication.
cutoff_setup : dict
Contains two keys: 'type' and 'parameter'. The first key represent the
way to use a cutoff or threshold, while parameter is the value used
to binarize the expression values.
The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
analysis_setup : dict
Contains main setup for running the cell-cell interactions and communication
analyses.
Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
excluded_cells : list, default=None
List of cells in the rnaseq_data to be excluded. If None, all cells
are considered.
complex_sep : str, default=None
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min'
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
interaction_columns : tuple, default=('A', 'B')
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
interaction_space : cell2cell.core.interaction_space.InteractionSpace
Interaction space that contains all the required elements to perform the
cell-cell interaction and communication analysis between every pair of cells.
After performing the analyses, the results are stored in this object.
'''
if excluded_cells is None:
excluded_cells = []
included_cells = sorted(list((set(rnaseq_data.columns) - set(excluded_cells))))
interaction_space = ispace.InteractionSpace(rnaseq_data=rnaseq_data[included_cells],
ppi_data=ppi_data,
gene_cutoffs=cutoff_setup,
communication_score=analysis_setup['communication_score'],
cci_score=analysis_setup['cci_score'],
cci_type=analysis_setup['cci_type'],
complex_sep=complex_sep,
complex_agg_method=complex_agg_method,
interaction_columns=interaction_columns,
verbose=verbose)
return interaction_space
```

###
`tensor_downstream`

####
`compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells')`

Computes Gini coefficient on the distribution of edge weights in each factor-specific cell-cell communication network. Factors obtained from the tensor decomposition with Tensor-cell2cell.

###### Parameters

result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors

sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels

receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels

###### Returns

gini_df : pandas.DataFrame Dataframe containing the Gini coefficient of each factor from a tensor decomposition. Calculated on the factor-specific cell-cell communication networks.

## Source code in `cell2cell/analysis/tensor_downstream.py`

```
def compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
'''
Computes Gini coefficient on the distribution of edge weights
in each factor-specific cell-cell communication network. Factors
obtained from the tensor decomposition with Tensor-cell2cell.
Parameters
----------
result : any Tensor class in cell2cell.tensor.tensor or a dict
Either a Tensor type or a dictionary which resulted from the tensor
decomposition. If it is a dict, it should be the one in, for example,
InteractionTensor.factors
sender_label : str
Label for the dimension of sender cells. Usually found in
InteractionTensor.order_labels
receiver_label : str
Label for the dimension of receiver cells. Usually found in
InteractionTensor.order_labels
Returns
-------
gini_df : pandas.DataFrame
Dataframe containing the Gini coefficient of each factor from
a tensor decomposition. Calculated on the factor-specific
cell-cell communication networks.
'''
if hasattr(result, 'factors'):
result = result.factors
if result is None:
raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
elif isinstance(result, dict):
pass
else:
raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
factors = sorted(list(set(result[sender_label].columns) & set(result[receiver_label].columns)))
ginis = []
for f in factors:
factor_net = get_joint_loadings(result=result,
dim1=sender_label,
dim2=receiver_label,
factor=f
)
gini = gini_coefficient(factor_net.values.flatten())
ginis.append((f, gini))
gini_df = pd.DataFrame.from_records(ginis, columns=['Factor', 'Gini'])
return gini_df
```

####
`flatten_factor_ccc_networks(networks, orderby='senders')`

Flattens all adjacency matrices in the factor-specific cell-cell communication networks. It generates a matrix where rows are factors and columns are cell-cell pairs.

###### Parameters

networks : dict A dictionary containing a pandas.DataFrame for each of the factors (factor names are the keys of the dict). These dataframes are the adjacency matrices of the CCC networks.

orderby : str Order of the flatten cell-cell pairs. Options are 'senders' and 'receivers'. 'senders' means to flatten the matrices in a way that all cell-cell pairs with a same sender cell are put next to each others. 'receivers' means the same, but by considering the receiver cell instead.

###### Returns

flatten_networks : pandas.DataFrame A dataframe wherein rows contains a factor-specific network. Columns are the directed cell-cell pairs.

## Source code in `cell2cell/analysis/tensor_downstream.py`

```
def flatten_factor_ccc_networks(networks, orderby='senders'):
'''
Flattens all adjacency matrices in the factor-specific
cell-cell communication networks. It generates a matrix
where rows are factors and columns are cell-cell pairs.
Parameters
----------
networks : dict
A dictionary containing a pandas.DataFrame for each of the factors
(factor names are the keys of the dict). These dataframes are the
adjacency matrices of the CCC networks.
orderby : str
Order of the flatten cell-cell pairs. Options are 'senders' and
'receivers'. 'senders' means to flatten the matrices in a way that
all cell-cell pairs with a same sender cell are put next to each others.
'receivers' means the same, but by considering the receiver cell instead.
Returns
-------
flatten_networks : pandas.DataFrame
A dataframe wherein rows contains a factor-specific network. Columns are
the directed cell-cell pairs.
'''
senders = sorted(set.intersection(*[set(v.index) for v in networks.values()]))
receivers = sorted(set.intersection(*[set(v.columns) for v in networks.values()]))
if orderby == 'senders':
cell_pairs = [s + ' --> ' + r for s in senders for r in receivers]
flatten_order = 'C'
elif orderby == 'receivers':
cell_pairs = [s + ' --> ' + r for r in receivers for s in senders]
flatten_order = 'F'
else:
raise ValueError("`orderby` must be either 'senders' or 'receivers'.")
data = np.asarray([v.values.flatten(flatten_order) for v in networks.values()]).T
flatten_networks = pd.DataFrame(data=data,
index=cell_pairs,
columns=list(networks.keys())
)
return flatten_networks
```

####
`get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells')`

Generates adjacency matrices for each of the factors obtained from a tensor decomposition. These matrices represent a cell-cell communication directed network.

###### Parameters

result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors

sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels

receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels

###### Returns

networks : dict A dictionary containing a pandas.DataFrame for each of the factors (factor names are the keys of the dict). These dataframes are the adjacency matrices of the CCC networks.

## Source code in `cell2cell/analysis/tensor_downstream.py`

```
def get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
'''
Generates adjacency matrices for each of the factors
obtained from a tensor decomposition. These matrices represent a
cell-cell communication directed network.
Parameters
----------
result : any Tensor class in cell2cell.tensor.tensor or a dict
Either a Tensor type or a dictionary which resulted from the tensor
decomposition. If it is a dict, it should be the one in, for example,
InteractionTensor.factors
sender_label : str
Label for the dimension of sender cells. Usually found in
InteractionTensor.order_labels
receiver_label : str
Label for the dimension of receiver cells. Usually found in
InteractionTensor.order_labels
Returns
-------
networks : dict
A dictionary containing a pandas.DataFrame for each of the factors
(factor names are the keys of the dict). These dataframes are the
adjacency matrices of the CCC networks.
'''
if hasattr(result, 'factors'):
result = result.factors
if result is None:
raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
elif isinstance(result, dict):
pass
else:
raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
factors = sorted(list(set(result[sender_label].columns) & set(result[receiver_label].columns)))
networks = dict()
for f in factors:
networks[f] = get_joint_loadings(result=result,
dim1=sender_label,
dim2=receiver_label,
factor=f
)
return networks
```

####
`get_joint_loadings(result, dim1, dim2, factor)`

Creates the joint loading distribution between two tensor dimensions for a given factor output from decomposition.

###### Parameters

result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors

dim1 : str One of the tensor dimensions (options are in the keys of the dict, or interaction.factors.keys())

dim2 : str A second tensor dimension (options are in the keys of the dict, or interaction.factors.keys())

str

One of the factors output from the decomposition (e.g. 'Factor 1').

###### Returns

joint_dist : pandas.DataFrame Joint distribution of factor loadings for the specified dimensions. Rows correspond to elements in dim1 and columns to elements in dim2.

## Source code in `cell2cell/analysis/tensor_downstream.py`

```
def get_joint_loadings(result, dim1, dim2, factor):
"""
Creates the joint loading distribution between two tensor dimensions for a
given factor output from decomposition.
Parameters
----------
result : any Tensor class in cell2cell.tensor.tensor or a dict
Either a Tensor type or a dictionary which resulted from the tensor
decomposition. If it is a dict, it should be the one in, for example,
InteractionTensor.factors
dim1 : str
One of the tensor dimensions (options are in the keys of the dict,
or interaction.factors.keys())
dim2 : str
A second tensor dimension (options are in the keys of the dict,
or interaction.factors.keys())
factor: str
One of the factors output from the decomposition (e.g. 'Factor 1').
Returns
-------
joint_dist : pandas.DataFrame
Joint distribution of factor loadings for the specified dimensions.
Rows correspond to elements in dim1 and columns to elements in dim2.
"""
if hasattr(result, 'factors'):
result = result.factors
if result is None:
raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
elif isinstance(result, dict):
pass
else:
raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
assert dim1 in result.keys(), 'The specified dimension ' + dim1 + ' is not present in the `result` input'
assert dim2 in result.keys(), 'The specified dimension ' + dim2 + ' is not present in the `result` input'
vec1 = result[dim1][factor]
vec2 = result[dim2][factor]
# Calculate the outer product
joint_dist = pd.DataFrame(data=np.outer(vec1, vec2),
index=vec1.index,
columns=vec2.index)
joint_dist.index.name = dim1
joint_dist.columns.name = dim2
return joint_dist
```

####
`get_lr_by_cell_pairs(result, lr_label, sender_label, receiver_label, order_cells_by='receivers', factor=None, cci_threshold=None, lr_threshold=None)`

Returns a dataframe containing the product loadings of a specific combination of ligand-receptor pair and sender-receiver pair.

###### Parameters

lr_label : str Label for the dimension of the ligand-receptor pairs. Usually found in InteractionTensor.order_labels

sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels

receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels

order_cells_by : str, default='receivers' Order of the returned dataframe. Options are 'senders' and 'receivers'. 'senders' means to order the dataframe in a way that all cell-cell pairs with a same sender cell are put next to each others. 'receivers' means the same, but by considering the receiver cell instead.

factor : str, default=None Name of the factor to be used to compute the product loadings. If None, all factors will be included to compute them.

cci_threshold : float, default=None Threshold to be applied on the product loadings of the sender-cell pairs. If specified, only cell-cell pairs with a product loading above the threshold at least in one of the factors included will be included in the returned dataframe.

lr_threshold : float, default=None Threshold to be applied on the ligand-receptor loadings. If specified, only LR pairs with a loading above the threshold at least in one of the factors included will be included in the returned dataframe.

###### Returns

cci_lr : pandas.DataFrame Dataframe containing the product loadings of a specific combination of ligand-receptor pair and sender-receiver pair. If the factor is specified, the returned dataframe will contain the product loadings of that factor. If the factor is not specified, the returned dataframe will contain the product loadings across all factors.

## Source code in `cell2cell/analysis/tensor_downstream.py`

```
def get_lr_by_cell_pairs(result, lr_label, sender_label, receiver_label, order_cells_by='receivers', factor=None,
cci_threshold=None, lr_threshold=None):
'''
Returns a dataframe containing the product loadings of a specific combination
of ligand-receptor pair and sender-receiver pair.
Parameters
----------
result : any Tensor class in cell2cell.tensor.tensor or a dict
Either a Tensor type or a dictionary which resulted from the tensor
decomposition. If it is a dict, it should be the one in, for example,
InteractionTensor.factors
lr_label : str
Label for the dimension of the ligand-receptor pairs. Usually found in
InteractionTensor.order_labels
sender_label : str
Label for the dimension of sender cells. Usually found in
InteractionTensor.order_labels
receiver_label : str
Label for the dimension of receiver cells. Usually found in
InteractionTensor.order_labels
order_cells_by : str, default='receivers'
Order of the returned dataframe. Options are 'senders' and
'receivers'. 'senders' means to order the dataframe in a way that
all cell-cell pairs with a same sender cell are put next to each others.
'receivers' means the same, but by considering the receiver cell instead.
factor : str, default=None
Name of the factor to be used to compute the product loadings.
If None, all factors will be included to compute them.
cci_threshold : float, default=None
Threshold to be applied on the product loadings of the sender-cell pairs.
If specified, only cell-cell pairs with a product loading above the
threshold at least in one of the factors included will be included
in the returned dataframe.
lr_threshold : float, default=None
Threshold to be applied on the ligand-receptor loadings.
If specified, only LR pairs with a loading above the
threshold at least in one of the factors included will be included
in the returned dataframe.
Returns
-------
cci_lr : pandas.DataFrame
Dataframe containing the product loadings of a specific combination
of ligand-receptor pair and sender-receiver pair. If the factor is specified,
the returned dataframe will contain the product loadings of that factor.
If the factor is not specified, the returned dataframe will contain the
product loadings across all factors.
'''
if hasattr(result, 'factors'):
result = result.factors
if result is None:
raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
elif isinstance(result, dict):
pass
else:
raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
assert lr_label in result.keys(), 'The specified dimension ' + lr_label + ' is not present in the `result` input'
assert sender_label in result.keys(), 'The specified dimension ' + sender_label + ' is not present in the `result` input'
assert receiver_label in result.keys(), 'The specified dimension ' + receiver_label + ' is not present in the `result` input'
# Sort factors
sorted_factors = sorted(result[lr_label].columns, key=lambda x: int(x.split(' ')[1]))
# Get CCI network per factor
networks = get_factor_specific_ccc_networks(result=result,
sender_label=sender_label,
receiver_label=receiver_label)
# Flatten networks
network_by_factors = flatten_factor_ccc_networks(networks=networks, orderby=order_cells_by)
# Get final dataframe
df1 = network_by_factors[sorted_factors]
df2 = result[lr_label][sorted_factors]
if factor is not None:
df1 = df1[factor]
df2 = df2[factor]
if cci_threshold is not None:
df1 = df1[(df1 > cci_threshold)]
if lr_threshold is not None:
df2 = df2[(df2 > lr_threshold)]
data = pd.DataFrame(np.outer(df1, df2), index=df1.index, columns=df2.index)
else:
if cci_threshold is not None:
df1 = df1[(df1.T > cci_threshold).any()] # Top sender-receiver pairs
if lr_threshold is not None:
df2 = df2[(df2.T > lr_threshold).any()] # Top LR Pairs
data = np.matmul(df1, df2.T)
cci_lr = pd.DataFrame(data.T.values,
columns=df1.index,
index=df2.index
)
cci_lr.columns.name = 'Sender-Receiver Pair'
cci_lr.index.name = 'Ligand-Receptor Pair'
return cci_lr
```

###
`tensor_pipelines`

####
`run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor=False, rank=None, tf_optimization='regular', random_state=None, backend=None, device=None, elbow_metric='error', smooth_elbow=False, upper_rank=25, tf_init='random', tf_svd='numpy_svd', cmaps=None, sample_col='Element', group_col='Category', fig_fontsize=14, output_folder=None, output_fig=True, fig_format='pdf', **kwargs)`

Runs basic pipeline of Tensor-cell2cell (excluding downstream analyses).

###### Parameters

interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor.

tensor_metadata : list
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable `sample_col`

contains
the name of each element in the tensor while another column called as the
variable `group_col`

contains the metadata or grouping information of each
element.

copy_tensor : boolean, default=False Whether generating a copy of the original tensor to avoid modifying it.

rank : int, default=None Rank of the Tensor Factorization (number of factors to deconvolve the original tensor). If None, it will automatically inferred from an elbow analysis.

tf_optimization : str, default='regular' It defines whether performing an optimization with higher number of iterations, independent factorization runs, and higher resolution (lower tolerance), or with lower number of iterations, factorization runs, and resolution. Options are:

```
- 'regular' : It uses 100 max iterations, 1 factorization run, and 10e-7 tolerance.
Faster to run.
- 'robust' : It uses 500 max iterations, 100 factorization runs, and 10e-8 tolerance.
Slower to run.
```

random_state : boolean, default=None Seed for randomization.

backend : str, default=None Backend that TensorLy will use to perform calculations on this tensor. When None, the default backend used is the currently active backend, usually is ('numpy'). Options are:

device : str, default=None Device to use when backend allows multiple devices. Options are:

elbow_metric : str, default='error' Metric to perform the elbow analysis (y-axis).

```
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
```

smooth_elbow : boolean, default=False Whether smoothing the elbow-analysis curve with a Savitzky-Golay filter.

upper_rank : int, default=25 Upper bound of ranks to explore with the elbow analysis.

tf_init : str, default='random' Initialization method for computing the Tensor Factorization.

tf_svd : str, default='numpy_svd' Function to compute the SVD for initializing the Tensor Factorization, acceptable values in tensorly.SVD_FUNS

cmaps : list, default=None A list of colormaps used for coloring elements in each dimension. The length of this list is equal to the number of dimensions of the tensor. If None, all dimensions will be colores with the colormap 'gist_rainbow'.

sample_col : str, default='Element' Name of the column containing the element names in the metadata.

group_col : str, default='Category' Name of the column containing the metadata or grouping information for each element in the metadata.

fig_fontsize : int, default=14 Font size of the tick labels. Axis labels will be 1.2 times the fontsize.

output_folder : str, default=None Path to the folder where the figures generated will be saved. If None, figures will not be saved.

output_fig : boolean, default=True Whether generating the figures with matplotlib.

fig_format : str, default='pdf'
Format to store figures when an `output_folder`

is specified
and `output_fig`

is True. Otherwise, this is not necessary.

**kwargs : dict Extra arguments for the tensor factorization according to inputs in tensorly.

###### Returns

interaction_tensor : cell2cell.tensor.tensor.BaseTensor
Either the original input `interaction_tensor`

or a copy of it.
This also stores the results from running the Tensor-cell2cell
pipeline in the corresponding attributes.

## Source code in `cell2cell/analysis/tensor_pipelines.py`

```
def run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor=False, rank=None,
tf_optimization='regular', random_state=None, backend=None, device=None,
elbow_metric='error', smooth_elbow=False, upper_rank=25, tf_init='random',
tf_svd='numpy_svd', cmaps=None, sample_col='Element', group_col='Category',
fig_fontsize=14, output_folder=None, output_fig=True, fig_format='pdf', **kwargs):
'''
Runs basic pipeline of Tensor-cell2cell (excluding downstream analyses).
Parameters
----------
interaction_tensor : cell2cell.tensor.BaseTensor
A communication tensor generated with any of the tensor class in
cell2cell.tensor.
tensor_metadata : list
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable `sample_col` contains
the name of each element in the tensor while another column called as the
variable `group_col` contains the metadata or grouping information of each
element.
copy_tensor : boolean, default=False
Whether generating a copy of the original tensor to avoid modifying it.
rank : int, default=None
Rank of the Tensor Factorization (number of factors to deconvolve the original
tensor). If None, it will automatically inferred from an elbow analysis.
tf_optimization : str, default='regular'
It defines whether performing an optimization with higher number of iterations,
independent factorization runs, and higher resolution (lower tolerance),
or with lower number of iterations, factorization runs, and resolution.
Options are:
- 'regular' : It uses 100 max iterations, 1 factorization run, and 10e-7 tolerance.
Faster to run.
- 'robust' : It uses 500 max iterations, 100 factorization runs, and 10e-8 tolerance.
Slower to run.
random_state : boolean, default=None
Seed for randomization.
backend : str, default=None
Backend that TensorLy will use to perform calculations
on this tensor. When None, the default backend used is
the currently active backend, usually is ('numpy'). Options are:
{'cupy', 'jax', 'mxnet', 'numpy', 'pytorch', 'tensorflow'}
device : str, default=None
Device to use when backend allows multiple devices. Options are:
{'cpu', 'cuda:0', None}
elbow_metric : str, default='error'
Metric to perform the elbow analysis (y-axis).
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
smooth_elbow : boolean, default=False
Whether smoothing the elbow-analysis curve with a Savitzky-Golay filter.
upper_rank : int, default=25
Upper bound of ranks to explore with the elbow analysis.
tf_init : str, default='random'
Initialization method for computing the Tensor Factorization.
{â€˜svdâ€™, â€˜randomâ€™}
tf_svd : str, default='numpy_svd'
Function to compute the SVD for initializing the Tensor Factorization,
acceptable values in tensorly.SVD_FUNS
cmaps : list, default=None
A list of colormaps used for coloring elements in each dimension. The length
of this list is equal to the number of dimensions of the tensor. If None, all
dimensions will be colores with the colormap 'gist_rainbow'.
sample_col : str, default='Element'
Name of the column containing the element names in the metadata.
group_col : str, default='Category'
Name of the column containing the metadata or grouping information for each
element in the metadata.
fig_fontsize : int, default=14
Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
output_folder : str, default=None
Path to the folder where the figures generated will be saved.
If None, figures will not be saved.
output_fig : boolean, default=True
Whether generating the figures with matplotlib.
fig_format : str, default='pdf'
Format to store figures when an `output_folder` is specified
and `output_fig` is True. Otherwise, this is not necessary.
**kwargs : dict
Extra arguments for the tensor factorization according to inputs in
tensorly.
Returns
-------
interaction_tensor : cell2cell.tensor.tensor.BaseTensor
Either the original input `interaction_tensor` or a copy of it.
This also stores the results from running the Tensor-cell2cell
pipeline in the corresponding attributes.
'''
if copy_tensor:
interaction_tensor = interaction_tensor.copy()
dim = len(interaction_tensor.tensor.shape)
### OUTPUT FILENAMES ###
if output_folder is None:
elbow_filename = None
tf_filename = None
loading_filename = None
else:
elbow_filename = output_folder + '/Elbow.{}'.format(fig_format)
tf_filename = output_folder + '/Tensor-Factorization.{}'.format(fig_format)
loading_filename = output_folder + '/Loadings.xlsx'
### PALETTE COLORS FOR ELEMENTS IN TENSOR DIMS ###
if cmaps is None:
cmap_5d = ['tab10', 'viridis', 'Dark2_r', 'tab20', 'tab20']
cmap_4d = ['plasma', 'Dark2_r', 'tab20', 'tab20']
if dim == 5:
cmaps = cmap_5d
elif dim <= 4:
cmaps = cmap_4d[-dim:]
else:
raise ValueError('Tensor of dimension higher to 5 is not supported')
assert len(cmaps) == dim, "`cmap` must be of the same len of dimensions in the tensor."
### FACTORIZATION PARAMETERS ###
if tf_optimization == 'robust':
elbow_runs = 20
tf_runs = 100
tol = 1e-8
n_iter_max = 500
elif tf_optimization == 'regular':
elbow_runs = 10
tf_runs = 1
tol = 1e-7
n_iter_max = 100
else:
raise ValueError("`factorization_type` must be either 'robust' or 'regular'.")
if backend is not None:
tl.set_backend(backend)
if device is not None:
interaction_tensor.to_device(device=device)
### ANALYSIS ###
# Elbow
if rank is None:
print('Running Elbow Analysis')
fig1, error = interaction_tensor.elbow_rank_selection(upper_rank=upper_rank,
runs=elbow_runs,
init=tf_init,
svd=tf_svd,
automatic_elbow=True,
metric=elbow_metric,
output_fig=output_fig,
smooth=smooth_elbow,
random_state=random_state,
fontsize=fig_fontsize,
filename=elbow_filename,
tol=tol, n_iter_max=n_iter_max,
**kwargs
)
rank = interaction_tensor.rank
# Factorization
print('Running Tensor Factorization')
interaction_tensor.compute_tensor_factorization(rank=rank,
init=tf_init,
svd=tf_svd,
random_state=random_state,
runs=tf_runs,
normalize_loadings=True,
tol=tol, n_iter_max=n_iter_max,
**kwargs
)
### EXPORT RESULTS ###
if output_folder is not None:
print('Generating Outputs')
interaction_tensor.export_factor_loadings(loading_filename)
if output_fig:
fig2, axes = tensor_factors_plot(interaction_tensor=interaction_tensor,
metadata=tensor_metadata,
sample_col=sample_col,
group_col=group_col,
meta_cmaps=cmaps,
fontsize=fig_fontsize,
filename=tf_filename
)
return interaction_tensor
```

##
`clustering`

`special`

###
`cluster_interactions`

####
`compute_distance(data_matrix, axis=0, metric='euclidean')`

Computes the pairwise distance between elements in a matrix of shape m x n. Uses the function scipy.spatial.distance.pdist

###### Parameters

data_matrix : pandas.DataFrame or ndarray A m x n matrix used to compute the distances

axis : int, default=0 To decide on which elements to compute the distance. If axis=0, the distances will be between elements in the rows, while axis=1 will lead to distances between elements in the columns.

metric : str, default='euclidean' The distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.

###### Returns

D : ndarray Returns a condensed distance matrix Y. For each i and j (where i < j < m), where m is the number of original observations. The metric dist(u=X[i], v=X[j]) is computed and stored in entry m * i + j - ((i + 2) * (i + 1)) // 2.

## Source code in `cell2cell/clustering/cluster_interactions.py`

```
def compute_distance(data_matrix, axis=0, metric='euclidean'):
'''Computes the pairwise distance between elements in a
matrix of shape m x n. Uses the function
scipy.spatial.distance.pdist
Parameters
----------
data_matrix : pandas.DataFrame or ndarray
A m x n matrix used to compute the distances
axis : int, default=0
To decide on which elements to compute the distance.
If axis=0, the distances will be between elements in
the rows, while axis=1 will lead to distances between
elements in the columns.
metric : str, default='euclidean'
The distance metric to use. The distance function can be 'braycurtis',
'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski',
'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
Returns
-------
D : ndarray
Returns a condensed distance matrix Y. For each i and j (where i < j < m),
where m is the number of original observations. The metric
dist(u=X[i], v=X[j]) is computed and stored in entry
m * i + j - ((i + 2) * (i + 1)) // 2.
'''
if (type(data_matrix) is pd.core.frame.DataFrame):
data = data_matrix.values
else:
data = data_matrix
if axis == 0:
D = sp.distance.squareform(sp.distance.pdist(data, metric=metric))
elif axis == 1:
D = sp.distance.squareform(sp.distance.pdist(data.T, metric=metric))
else:
raise ValueError('Not valid axis. Use 0 or 1.')
return D
```

####
`compute_linkage(distance_matrix, method='ward', optimal_ordering=True)`

Returns a linkage for a given distance matrix using a specific method.

###### Parameters

distance_matrix : numpy.ndarray A square array containing the distance between a given row and a given column. Diagonal elements must be zero.

method : str, 'ward' by default Method to compute the linkage. It could be:

```
- 'single'
- 'complete'
- 'average'
- 'weighted'
- 'centroid'
- 'median'
- 'ward'
For more details, go to:
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
```

optimal_ordering : boolean, default=True Whether sorting the leaf of the dendrograms to have a minimal distance between successive leaves. For more information, see scipy.cluster.hierarchy.optimal_leaf_ordering

###### Returns

Z : numpy.ndarray The hierarchical clustering encoded as a linkage matrix.

## Source code in `cell2cell/clustering/cluster_interactions.py`

```
def compute_linkage(distance_matrix, method='ward', optimal_ordering=True):
'''
Returns a linkage for a given distance matrix using a specific method.
Parameters
----------
distance_matrix : numpy.ndarray
A square array containing the distance between a given row and a
given column. Diagonal elements must be zero.
method : str, 'ward' by default
Method to compute the linkage. It could be:
- 'single'
- 'complete'
- 'average'
- 'weighted'
- 'centroid'
- 'median'
- 'ward'
For more details, go to:
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
optimal_ordering : boolean, default=True
Whether sorting the leaf of the dendrograms to have a minimal distance
between successive leaves. For more information, see
scipy.cluster.hierarchy.optimal_leaf_ordering
Returns
-------
Z : numpy.ndarray
The hierarchical clustering encoded as a linkage matrix.
'''
if (type(distance_matrix) is pd.core.frame.DataFrame):
data = distance_matrix.values
else:
data = distance_matrix.copy()
if ~(data.transpose() == data).all():
raise ValueError('The matrix is not symmetric')
np.fill_diagonal(data, 0.0)
# Compute linkage
D = sp.distance.squareform(data)
Z = hc.linkage(D, method=method, optimal_ordering=optimal_ordering)
return Z
```

####
`get_clusters_from_linkage(linkage, threshold, criterion='maxclust', labels=None)`

Gets clusters from a linkage given a threshold and a criterion.

###### Parameters

linkage : numpy.ndarray The hierarchical clustering encoded with the matrix returned by the linkage function (Z).

threshold : float The threshold to apply when forming flat clusters.

criterion : str, 'maxclust' by default The criterion to use in forming flat clusters. Depending on the criterion, the threshold has different meanings. More information on: https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.fcluster.html

labels : array-like, None by default List of labels of the elements contained in the linkage. The order must match the order they were provided when generating the linkage.

###### Returns

clusters : dict A dictionary containing the clusters obtained. The keys correspond to the cluster numbers and the vaues to a list with element names given the labels, or the element index based on the linkage.

## Source code in `cell2cell/clustering/cluster_interactions.py`

```
def get_clusters_from_linkage(linkage, threshold, criterion='maxclust', labels=None):
'''
Gets clusters from a linkage given a threshold and a criterion.
Parameters
----------
linkage : numpy.ndarray
The hierarchical clustering encoded with the matrix returned by
the linkage function (Z).
threshold : float
The threshold to apply when forming flat clusters.
criterion : str, 'maxclust' by default
The criterion to use in forming flat clusters. Depending on the
criterion, the threshold has different meanings. More information on:
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.fcluster.html
labels : array-like, None by default
List of labels of the elements contained in the linkage. The order
must match the order they were provided when generating the linkage.
Returns
-------
clusters : dict
A dictionary containing the clusters obtained. The keys correspond to
the cluster numbers and the vaues to a list with element names given the
labels, or the element index based on the linkage.
'''
cluster_ids = hc.fcluster(linkage, threshold, criterion=criterion)
clusters = dict()
for c in np.unique(cluster_ids):
clusters[c] = []
for i, c in enumerate(cluster_ids):
if labels is not None:
clusters[c].append(labels[i])
else:
clusters[c].append(i)
return clusters
```

##
`core`

`special`

###
`cci_scores`

####
`compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=None)`

Calculates a Bray-Curtis-like score for the interaction between two cells based on their intercellular protein-protein interactions such as ligand-receptor interactions.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.

###### Returns

cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case is a Bray-Curtis-like score.

## Source code in `cell2cell/core/cci_scores.py`

```
def compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=None):
'''Calculates a Bray-Curtis-like score for the interaction between
two cells based on their intercellular protein-protein
interactions such as ligand-receptor interactions.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the receiver.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples. In this case is a
Bray-Curtis-like score.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
# Bray Curtis similarity
numerator = 2 * np.nansum(c1 * c2 * ppi_score)
denominator = np.nansum(c1 * c1 * ppi_score) + np.nansum(c2 * c2 * ppi_score)
if denominator == 0.0:
return 0.0
cci_score = numerator / denominator
if cci_score is np.nan:
return 0.0
return cci_score
```

####
`compute_count_score(cell1, cell2, ppi_score=None)`

Calculates the number of active protein-protein interactions for the interaction between two cells, which could be the number of active ligand-receptor interactions.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.

###### Returns

cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples.

## Source code in `cell2cell/core/cci_scores.py`

```
def compute_count_score(cell1, cell2, ppi_score=None):
'''Calculates the number of active protein-protein interactions
for the interaction between two cells, which could be the number
of active ligand-receptor interactions.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the receiver.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
mult = c1 * c2 * ppi_score
cci_score = np.nansum(mult != 0) # Count all active pathways (different to zero)
if cci_score is np.nan:
return 0.0
return cci_score
```

####
`compute_icellnet_score(cell1, cell2, ppi_score=None)`

Calculates the sum of communication scores for the interaction between two cells. Based on ICELLNET.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.

###### Returns

cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples.

## Source code in `cell2cell/core/cci_scores.py`

```
def compute_icellnet_score(cell1, cell2, ppi_score=None):
'''Calculates the sum of communication scores
for the interaction between two cells. Based on ICELLNET.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the receiver.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
mult = c1 * c2 * ppi_score
cci_score = np.nansum(mult)
if cci_score is np.nan:
return 0.0
return cci_score
```

####
`compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None)`

Calculates a Jaccard-like score for the interaction between two cells based on their intercellular protein-protein interactions such as ligand-receptor interactions.

###### Parameters

###### Returns

cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case it is a Jaccard-like score.

## Source code in `cell2cell/core/cci_scores.py`

```
def compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None):
'''Calculates a Jaccard-like score for the interaction between
two cells based on their intercellular protein-protein
interactions such as ligand-receptor interactions.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute interaction
between a pair of them. In a directed interaction,
this is the receiver.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples. In this case it is a
Jaccard-like score.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
# Extended Jaccard similarity
numerator = np.nansum(c1 * c2 * ppi_score)
denominator = np.nansum(c1 * c1 * ppi_score) + np.nansum(c2 * c2 * ppi_score) - numerator
if denominator == 0.0:
return 0.0
cci_score = numerator / denominator
if cci_score is np.nan:
return 0.0
return cci_score
```

####
`matmul_bray_curtis_like(A_scores, B_scores, ppi_score=None)`

Computes Bray-Curtis-like scores using matrices of proteins by cell-types/tissues/samples.

###### Parameters

A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

###### Returns

bray_curtis : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/cci_scores.py`

```
def matmul_bray_curtis_like(A_scores, B_scores, ppi_score=None):
'''Computes Bray-Curtis-like scores using matrices of proteins by
cell-types/tissues/samples.
Parameters
----------
A_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
B_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
Returns
-------
bray_curtis : numpy.array
Matrix MxM, representing the CCI score for all pairs of
cell-types/tissues/samples. In directed interactions,
the vertical axis (axis 0) represents the senders, while
the horizontal axis (axis 1) represents the receivers.
'''
if ppi_score is None:
ppi_score = np.array([1.0] * A_scores.shape[0])
ppi_score = ppi_score.reshape((len(ppi_score), 1))
numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0)
B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0)
denominator = A_module.reshape((A_module.shape[0], 1)) + B_module
bray_curtis = np.divide(2.0*numerator, denominator)
return bray_curtis
```

####
`matmul_cosine(A_scores, B_scores, ppi_score=None)`

Computes cosine-similarity scores using matrices of proteins by cell-types/tissues/samples.

###### Parameters

A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

###### Returns

cosine : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/cci_scores.py`

```
def matmul_cosine(A_scores, B_scores, ppi_score=None):
'''Computes cosine-similarity scores using matrices of proteins by
cell-types/tissues/samples.
Parameters
----------
A_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
B_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
Returns
-------
cosine : numpy.array
Matrix MxM, representing the CCI score for all pairs of
cell-types/tissues/samples. In directed interactions,
the vertical axis (axis 0) represents the senders, while
the horizontal axis (axis 1) represents the receivers.
'''
if ppi_score is None:
ppi_score = np.array([1.0] * A_scores.shape[0])
ppi_score = ppi_score.reshape((len(ppi_score), 1))
numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0) ** 0.5
B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0) ** 0.5
denominator = A_module.reshape((A_module.shape[0], 1)) * B_module
cosine = np.divide(numerator, denominator)
return cosine
```

####
`matmul_count_active(A_scores, B_scores, ppi_score=None)`

Computes the count of active protein-protein interactions used for intercellular communication using matrices of proteins by cell-types/tissues/samples.

###### Parameters

A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.

###### Returns

counts : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/cci_scores.py`

```
def matmul_count_active(A_scores, B_scores, ppi_score=None):
'''Computes the count of active protein-protein interactions
used for intercellular communication using matrices of proteins by
cell-types/tissues/samples.
Parameters
----------
A_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
B_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
Returns
-------
counts : numpy.array
Matrix MxM, representing the CCI score for all pairs of
cell-types/tissues/samples. In directed interactions,
the vertical axis (axis 0) represents the senders, while
the horizontal axis (axis 1) represents the receivers.
'''
if ppi_score is None:
ppi_score = np.array([1.0] * A_scores.shape[0])
ppi_score = ppi_score.reshape((len(ppi_score), 1))
counts = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
return counts
```

####
`matmul_jaccard_like(A_scores, B_scores, ppi_score=None)`

Computes Jaccard-like scores using matrices of proteins by cell-types/tissues/samples.

###### Parameters

###### Returns

jaccard : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/cci_scores.py`

```
def matmul_jaccard_like(A_scores, B_scores, ppi_score=None):
'''Computes Jaccard-like scores using matrices of proteins by
cell-types/tissues/samples.
Parameters
----------
A_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
B_scores : array-like
Matrix of size NxM, where N are the proteins in the first
column of a list of PPIs and M are the
cell-types/tissues/samples.
Returns
-------
jaccard : numpy.array
Matrix MxM, representing the CCI score for all pairs of
cell-types/tissues/samples. In directed interactions,
the vertical axis (axis 0) represents the senders, while
the horizontal axis (axis 1) represents the receivers.
'''
if ppi_score is None:
ppi_score = np.array([1.0] * A_scores.shape[0])
ppi_score = ppi_score.reshape((len(ppi_score), 1))
numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0)
B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0)
denominator = A_module.reshape((A_module.shape[0], 1)) + B_module - numerator
jaccard = np.divide(numerator, denominator)
return jaccard
```

###
`cell`

####
```
Cell
```

Specific cell-type/tissue/organ element in a RNAseq dataset.

###### Parameters

sc_rnaseq_data : pandas.DataFrame A gene expression matrix. Contains only one column that corresponds to cell-type/tissue/sample, while the genes are rows and the specific. Column name will be the label of the instance.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Attributes

id : int ID number of the instance generated.

type : str Name of the respective cell-type/tissue/sample.

rnaseq_data : pandas.DataFrame Copy of sc_rnaseq_data.

weighted_ppi : pandas.DataFrame Dataframe created from a list of protein-protein interactions, here the columns of the interacting proteins are replaced by a score or a preprocessed gene expression of the respective proteins.

## Source code in `cell2cell/core/cell.py`

```
class Cell:
'''Specific cell-type/tissue/organ element in a RNAseq dataset.
Parameters
----------
sc_rnaseq_data : pandas.DataFrame
A gene expression matrix. Contains only one column that
corresponds to cell-type/tissue/sample, while the genes
are rows and the specific. Column name will be the label
of the instance.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Attributes
----------
id : int
ID number of the instance generated.
type : str
Name of the respective cell-type/tissue/sample.
rnaseq_data : pandas.DataFrame
Copy of sc_rnaseq_data.
weighted_ppi : pandas.DataFrame
Dataframe created from a list of protein-protein interactions,
here the columns of the interacting proteins are replaced by
a score or a preprocessed gene expression of the respective
proteins.
'''
_id_counter = 0 # Number of active instances
_id = 0 # Unique ID
def __init__(self, sc_rnaseq_data, verbose=True):
self.id = Cell._id
Cell._id_counter += 1
Cell._id += 1
self.type = str(sc_rnaseq_data.columns[-1])
# RNAseq datasets
self.rnaseq_data = sc_rnaseq_data.copy()
self.rnaseq_data.columns = ['value']
# Binary ppi datasets
self.weighted_ppi = pd.DataFrame(columns=['A', 'B', 'score'])
# Object created
if verbose:
print("New cell instance created for " + self.type)
def __del__(self):
Cell._id_counter -= 1
def __str__(self):
return str(self.type)
__repr__ = __str__
```

####
`get_cells_from_rnaseq(rnaseq_data, cell_columns=None, verbose=True)`

Creates new instances of Cell based on the RNAseq data of each cell-type/tissue/sample in a gene expression matrix.

###### Parameters

rnaseq_data : pandas.DataFrame Gene expression data for a RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.

cell_columns : array-like, default=None List of names of cell-types/tissues/samples in the dataset to be used. If None, all columns will be used.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

cells : dict Dictionary containing all Cell instances generated from a RNAseq dataset. The keys of this dictionary are the names of the corresponding Cell instances.

## Source code in `cell2cell/core/cell.py`

```
def get_cells_from_rnaseq(rnaseq_data, cell_columns=None, verbose=True):
'''
Creates new instances of Cell based on the RNAseq data of each
cell-type/tissue/sample in a gene expression matrix.
Parameters
----------
rnaseq_data : pandas.DataFrame
Gene expression data for a RNA-seq experiment. Columns are
cell-types/tissues/samples and rows are genes.
cell_columns : array-like, default=None
List of names of cell-types/tissues/samples in the dataset
to be used. If None, all columns will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
cells : dict
Dictionary containing all Cell instances generated from a RNAseq dataset.
The keys of this dictionary are the names of the corresponding Cell instances.
'''
if verbose:
print("Generating objects according to RNAseq datasets provided")
cells = dict()
if cell_columns is None:
cell_columns = rnaseq_data.columns
for cell in cell_columns:
cells[cell] = Cell(rnaseq_data[[cell]], verbose=verbose)
return cells
```

###
`communication_scores`

####
`aggregate_ccc_matrices(ccc_matrices, method='gmean')`

Aggregates matrices of communication scores. Each matrix has the communication scores across all pairs of cell-types/tissues/samples for a different pair of interacting proteins.

###### Parameters

ccc_matrices : list List of matrices of communication scores. Each matrix is for an specific pair of interacting proteins.

method : str, default='gmean'. Method to aggregate the matrices element-wise. Options are:

```
- 'gmean' : Geometric mean in an element-wise way.
- 'sum' : Sum in an element-wise way.
- 'mean' : Mean in an element-wise way.
```

###### Returns

aggregated_ccc_matrix : numpy.array A matrix contiaining aggregated communication scores from multiple PPIs. It's shape is of MxM, where M are all cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/communication_scores.py`

```
def aggregate_ccc_matrices(ccc_matrices, method='gmean'):
'''Aggregates matrices of communication scores. Each
matrix has the communication scores across all pairs
of cell-types/tissues/samples for a different
pair of interacting proteins.
Parameters
----------
ccc_matrices : list
List of matrices of communication scores. Each matrix
is for an specific pair of interacting proteins.
method : str, default='gmean'.
Method to aggregate the matrices element-wise.
Options are:
- 'gmean' : Geometric mean in an element-wise way.
- 'sum' : Sum in an element-wise way.
- 'mean' : Mean in an element-wise way.
Returns
-------
aggregated_ccc_matrix : numpy.array
A matrix contiaining aggregated communication scores
from multiple PPIs. It's shape is of MxM, where M are all
cell-types/tissues/samples. In directed interactions, the
vertical axis (axis 0) represents the senders, while the
horizontal axis (axis 1) represents the receivers.
'''
if method == 'gmean':
aggregated_ccc_matrix = gmean(ccc_matrices)
elif method == 'sum':
aggregated_ccc_matrix = np.nansum(ccc_matrices, axis=0)
elif method == 'mean':
aggregated_ccc_matrix = np.nanmean(ccc_matrices, axis=0)
else:
raise ValueError("Not a valid method")
return aggregated_ccc_matrix
```

####
`compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product')`

Computes communication scores for an specific protein-protein interaction using vectors of gene expression levels for a given interacting protein produced by different cell-types/tissues/samples.

###### Parameters

prot_a_exp : array-like Vector with gene expression levels for an interacting protein A in a given PPI. Coordinates are different cell-types/tissues/samples.

prot_b_exp : array-like Vector with gene expression levels for an interacting protein B in a given PPI. Coordinates are different cell-types/tissues/samples.

communication_score : str, default='expression_product' Scoring function for computing the communication score. Options are:

```
- 'expression_product' : Multiplication between the expression
of the interacting proteins.
- 'expression_mean' : Average between the expression
of the interacting proteins.
- 'expression_gmean' : Geometric mean between the expression
of the interacting proteins.
```

###### Returns

communication_scores : numpy.array Matrix MxM, representing the CCC scores of an specific PPI across all pairs of cell-types/tissues/samples. M are all cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.

## Source code in `cell2cell/core/communication_scores.py`

```
def compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product'):
'''Computes communication scores for an specific
protein-protein interaction using vectors of gene expression
levels for a given interacting protein produced by
different cell-types/tissues/samples.
Parameters
----------
prot_a_exp : array-like
Vector with gene expression levels for an interacting protein A
in a given PPI. Coordinates are different cell-types/tissues/samples.
prot_b_exp : array-like
Vector with gene expression levels for an interacting protein B
in a given PPI. Coordinates are different cell-types/tissues/samples.
communication_score : str, default='expression_product'
Scoring function for computing the communication score.
Options are:
- 'expression_product' : Multiplication between the expression
of the interacting proteins.
- 'expression_mean' : Average between the expression
of the interacting proteins.
- 'expression_gmean' : Geometric mean between the expression
of the interacting proteins.
Returns
-------
communication_scores : numpy.array
Matrix MxM, representing the CCC scores of an specific PPI
across all pairs of cell-types/tissues/samples. M are all
cell-types/tissues/samples. In directed interactions, the
vertical axis (axis 0) represents the senders, while the
horizontal axis (axis 1) represents the receivers.
'''
if communication_score == 'expression_product':
communication_scores = np.outer(prot_a_exp, prot_b_exp)
elif communication_score == 'expression_mean':
communication_scores = (np.outer(prot_a_exp, np.ones(prot_b_exp.shape)) + np.outer(np.ones(prot_a_exp.shape), prot_b_exp)) / 2.
elif communication_score == 'expression_gmean':
communication_scores = np.sqrt(np.outer(prot_a_exp, prot_b_exp))
else:
raise ValueError("Not a valid communication_score")
return communication_scores
```

####
`get_binary_scores(cell1, cell2, ppi_score=None)`

Computes binary communication scores for all protein-protein interactions between a pair of cell-types/tissues/samples. This corresponds to an AND function between binary values for each interacting protein coming from each cell.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.

ppi_score : array-like, default=None An array with a weight for each PPI. The weight multiplies the communication scores.

###### Returns

communication_scores : numpy.array An array with the communication scores for each intercellular PPI.

## Source code in `cell2cell/core/communication_scores.py`

```
def get_binary_scores(cell1, cell2, ppi_score=None):
'''Computes binary communication scores for all
protein-protein interactions between a pair of
cell-types/tissues/samples. This corresponds to
an AND function between binary values for each
interacting protein coming from each cell.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
ppi_score : array-like, default=None
An array with a weight for each PPI. The weight
multiplies the communication scores.
Returns
-------
communication_scores : numpy.array
An array with the communication scores for each intercellular
PPI.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
communication_scores = c1 * c2 * ppi_score
return communication_scores
```

####
`get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product')`

Computes continuous communication scores for all protein-protein interactions between a pair of cell-types/tissues/samples. This corresponds to a specific scoring function between preprocessed continuous expression values for each interacting protein coming from each cell.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.

ppi_score : array-like, default=None An array with a weight for each PPI. The weight multiplies the communication scores.

method : str, default='expression_product' Scoring function for computing the communication score. Options are: - 'expression_product' : Multiplication between the expression of the interacting proteins. One coming from cell1 and the other from cell2. - 'expression_mean' : Average between the expression of the interacting proteins. One coming from cell1 and the other from cell2. - 'expression_gmean' : Geometric mean between the expression of the interacting proteins. One coming from cell1 and the other from cell2.

###### Returns

communication_scores : numpy.array An array with the communication scores for each intercellular PPI.

## Source code in `cell2cell/core/communication_scores.py`

```
def get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product'):
'''Computes continuous communication scores for all
protein-protein interactions between a pair of
cell-types/tissues/samples. This corresponds to
a specific scoring function between preprocessed continuous
expression values for each interacting protein coming from
each cell.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
ppi_score : array-like, default=None
An array with a weight for each PPI. The weight
multiplies the communication scores.
method : str, default='expression_product'
Scoring function for computing the communication score.
Options are:
- 'expression_product' : Multiplication between the expression
of the interacting proteins. One coming from cell1 and the
other from cell2.
- 'expression_mean' : Average between the expression
of the interacting proteins. One coming from cell1 and the
other from cell2.
- 'expression_gmean' : Geometric mean between the expression
of the interacting proteins. One coming from cell1 and the
other from cell2.
Returns
-------
communication_scores : numpy.array
An array with the communication scores for each intercellular
PPI.
'''
c1 = cell1.weighted_ppi['A'].values
c2 = cell2.weighted_ppi['B'].values
if method == 'expression_product':
communication_scores = score_expression_product(c1, c2)
elif method == 'expression_mean':
communication_scores = score_expression_mean(c1, c2)
elif method == 'expression_gmean':
communication_scores = np.sqrt(score_expression_product(c1, c2))
else:
raise ValueError('{} is not implemented yet'.format(method))
if ppi_score is None:
ppi_score = np.array([1.0] * len(c1))
communication_scores = communication_scores * ppi_score
return communication_scores
```

####
`score_expression_mean(c1, c2)`

Computes the expression product score

###### Parameters

c1 : array-like A 1D-array containing the preprocessed expression values for the interactors in the first column of a list of protein-protein interactions.

c2 : array-like A 1D-array containing the preprocessed expression values for the interactors in the second column of a list of protein-protein interactions.

###### Returns

(c1 + c2)/2. : array-like Average of vectors.

## Source code in `cell2cell/core/communication_scores.py`

```
def score_expression_mean(c1, c2):
'''Computes the expression product score
Parameters
----------
c1 : array-like
A 1D-array containing the preprocessed expression values
for the interactors in the first column of a list of
protein-protein interactions.
c2 : array-like
A 1D-array containing the preprocessed expression values
for the interactors in the second column of a list of
protein-protein interactions.
Returns
-------
(c1 + c2)/2. : array-like
Average of vectors.
'''
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
return (c1 + c2)/2.
```

####
`score_expression_product(c1, c2)`

Computes the expression product score

###### Parameters

c1 : array-like A 1D-array containing the preprocessed expression values for the interactors in the first column of a list of protein-protein interactions.

c2 : array-like A 1D-array containing the preprocessed expression values for the interactors in the second column of a list of protein-protein interactions.

###### Returns

c1 * c2 : array-like Multiplication of vectors.

## Source code in `cell2cell/core/communication_scores.py`

```
def score_expression_product(c1, c2):
'''Computes the expression product score
Parameters
----------
c1 : array-like
A 1D-array containing the preprocessed expression values
for the interactors in the first column of a list of
protein-protein interactions.
c2 : array-like
A 1D-array containing the preprocessed expression values
for the interactors in the second column of a list of
protein-protein interactions.
Returns
-------
c1 * c2 : array-like
Multiplication of vectors.
'''
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
return c1 * c2
```

###
`interaction_space`

####
```
InteractionSpace
```

Interaction space that contains all the required elements to perform the analysis between every pair of cells.

###### Parameters

rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.

gene_cutoffs : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:

```
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
```

communication_score : str, default='expression_thresholding' Type of communication score used to detect active ligand-receptor pairs between each pair of cell. See cell2cell.core.communication_scores for more details. It can be:

```
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
```

cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores. See cell2cell.core.cci_scores for more details. It can be:

```
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
```

cci_type : str, default='undirected' Type of interaction between two cells. If it is undirected, all ligands and receptors are considered from both cells. If it is directed, ligands from one cell and receptors from the other are considered separately with respect to ligands from the second cell and receptor from the first one. So, it can be:

```
- 'undirected'
- 'directed'
```

cci_matrix_template : pandas.DataFrame, default=None A matrix of shape MxM where M are cell-types/tissues/samples. This is used as template for storing CCI scores. It may be useful for specifying which pairs of cells to consider.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Attributes

communication_score : str Type of communication score used to detect active ligand-receptor pairs between each pair of cell. See cell2cell.core.communication_scores for more details. It can be:

```
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
```

cci_score : str Scoring function to aggregate the communication scores. See cell2cell.core.cci_scores for more details. It can be:

```
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
```

cci_type : str Type of interaction between two cells. If it is undirected, all ligands and receptors are considered from both cells. If it is directed, ligands from one cell and receptors from the other are considered separately with respect to ligands from the second cell and receptor from the first one. So, it can be:

```
- 'undirected'
- 'directed'
```

modified_rnaseq_data : pandas.DataFrame Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment. Columns are are cell-types/tissues/samples and rows are genes. The preprocessing may correspond to scoring the gene expression as binary or continuous values depending on the scoring function for cell-cell interactions/communication.

interaction_elements : dict Dictionary containing all the pairs of cells considered (under the key of 'pairs'), Cell instances (under key 'cells') which include all cells/tissues/organs with their associated datasets (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix to store CCI scores(under key 'cci_matrix'). A communication matrix is also stored in this object when the communication scores are computed in the InteractionSpace class (under key 'communication_matrix')

distance_matrix : pandas.DataFrame Contains distances for each pair of cells, computed from the CCI scores previously obtained (and stored in interaction_elements['cci_matrix'].

## Source code in `cell2cell/core/interaction_space.py`

```
class InteractionSpace():
'''
Interaction space that contains all the required elements to perform the analysis between every pair of cells.
Parameters
----------
rnaseq_data : pandas.DataFrame
Gene expression data for a bulk RNA-seq experiment or a single-cell
experiment after aggregation into cell types. Columns are
cell-types/tissues/samples and rows are genes.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication.
gene_cutoffs : dict
Contains two keys: 'type' and 'parameter'. The first key represent the
way to use a cutoff or threshold, while parameter is the value used
to binarize the expression values.
The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
communication_score : str, default='expression_thresholding'
Type of communication score used to detect active ligand-receptor
pairs between each pair of cell. See
cell2cell.core.communication_scores for more details.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
cci_score : str, default='bray_curtis'
Scoring function to aggregate the communication scores. See
cell2cell.core.cci_scores for more details.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
cci_type : str, default='undirected'
Type of interaction between two cells. If it is undirected, all ligands
and receptors are considered from both cells. If it is directed, ligands
from one cell and receptors from the other are considered separately with
respect to ligands from the second cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
cci_matrix_template : pandas.DataFrame, default=None
A matrix of shape MxM where M are cell-types/tissues/samples. This
is used as template for storing CCI scores. It may be useful
for specifying which pairs of cells to consider.
complex_sep : str, default=None
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min'
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
interaction_columns : tuple, default=('A', 'B')
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Attributes
----------
communication_score : str
Type of communication score used to detect active ligand-receptor
pairs between each pair of cell. See
cell2cell.core.communication_scores for more details.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
cci_score : str
Scoring function to aggregate the communication scores. See
cell2cell.core.cci_scores for more details.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
cci_type : str
Type of interaction between two cells. If it is undirected, all ligands
and receptors are considered from both cells. If it is directed, ligands
from one cell and receptors from the other are considered separately with
respect to ligands from the second cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used
for inferring the cell-cell interactions and communication.
modified_rnaseq_data : pandas.DataFrame
Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
may correspond to scoring the gene expression as binary or continuous values
depending on the scoring function for cell-cell interactions/communication.
interaction_elements : dict
Dictionary containing all the pairs of cells considered (under
the key of 'pairs'), Cell instances (under key 'cells')
which include all cells/tissues/organs with their associated datasets
(rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
to store CCI scores(under key 'cci_matrix'). A communication matrix
is also stored in this object when the communication scores are
computed in the InteractionSpace class (under key
'communication_matrix')
distance_matrix : pandas.DataFrame
Contains distances for each pair of cells, computed from
the CCI scores previously obtained (and stored in
interaction_elements['cci_matrix'].
'''
def __init__(self, rnaseq_data, ppi_data, gene_cutoffs, communication_score='expression_thresholding',
cci_score='bray_curtis', cci_type='undirected', cci_matrix_template=None, complex_sep=None,
complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True):
self.communication_score = communication_score
self.cci_score = cci_score
self.cci_type = cci_type
if self.communication_score == 'expression_thresholding':
if 'type' in gene_cutoffs.keys():
cutoff_values = cutoffs.get_cutoffs(rnaseq_data=rnaseq_data,
parameters=gene_cutoffs,
verbose=verbose)
else:
raise ValueError("If dataframe is not included in gene_cutoffs, please provide the type of method to obtain them.")
else:
cutoff_values = None
prot_a = interaction_columns[0]
prot_b = interaction_columns[1]
self.ppi_data = ppi_data.copy()
if ('A' in self.ppi_data.columns) & (prot_a != 'A'):
self.ppi_data = self.ppi_data.drop(columns='A')
if ('B' in self.ppi_data.columns) & (prot_b != 'B'):
self.ppi_data = self.ppi_data.drop(columns='B')
self.ppi_data = self.ppi_data.rename(columns={prot_a : 'A', prot_b : 'B'})
if 'score' not in self.ppi_data.columns:
self.ppi_data = self.ppi_data.assign(score=1.0)
self.modified_rnaseq = integrate_data.get_modified_rnaseq(rnaseq_data=rnaseq_data,
cutoffs=cutoff_values,
communication_score=self.communication_score,
)
self.interaction_elements = generate_interaction_elements(modified_rnaseq=self.modified_rnaseq,
ppi_data=self.ppi_data,
cci_matrix_template=cci_matrix_template,
cci_type=self.cci_type,
complex_sep=complex_sep,
complex_agg_method=complex_agg_method,
verbose=verbose)
self.interaction_elements['ppi_score'] = self.ppi_data['score'].values
def pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True):
'''
Computes a CCI score for a pair of cells.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
cci_score : str, default='bray_curtis'
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples. In this case it is a
Jaccard-like score.
'''
if verbose:
print("Computing interaction score between {} and {}".format(cell1.type, cell2.type))
if use_ppi_score:
ppi_score = self.ppi_data['score'].values
else:
ppi_score = None
# Calculate cell-cell interaction score
if cci_score == 'bray_curtis':
cci_value = cci_scores.compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'jaccard':
cci_value = cci_scores.compute_jaccard_like_cci_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'count':
cci_value = cci_scores.compute_count_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'icellnet':
cci_value = cci_scores.compute_icellnet_score(cell1, cell2, ppi_score=ppi_score)
else:
raise NotImplementedError("CCI score {} to compute pairwise cell-interactions is not implemented".format(cci_score))
return cci_value
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
'''Computes overall CCI scores for each pair of cells.
Parameters
----------
cci_score : str, default=None
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
self.interaction_elements['cci_matrix'] : pandas.DataFrame
Contains CCI scores for each pair of cells
'''
if cci_score is None:
cci_score = self.cci_score
else:
assert isinstance(cci_score, str)
### Compute pairwise physical interactions
if verbose:
print("Computing pairwise interactions")
# Compute pair by pair
for pair in self.interaction_elements['pairs']:
cell1 = self.interaction_elements['cells'][pair[0]]
cell2 = self.interaction_elements['cells'][pair[1]]
cci_value = self.pair_cci_score(cell1,
cell2,
cci_score=cci_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
self.interaction_elements['cci_matrix'].at[pair[0], pair[1]] = cci_value
if self.cci_type == 'undirected':
self.interaction_elements['cci_matrix'].at[pair[1], pair[0]] = cci_value
# Compute using matmul -> Too slow and uses a lot of memory TODO: Try to optimize this
# if cci_score == 'bray_curtis':
# cci_matrix = cci_scores.matmul_bray_curtis_like(self.interaction_elements['A_score'],
# self.interaction_elements['B_score'])
# self.interaction_elements['cci_matrix'] = pd.DataFrame(cci_matrix,
# index=self.interaction_elements['cell_names'],
# columns=self.interaction_elements['cell_names']
# )
# Generate distance matrix
if ~(cci_score in ['count', 'icellnet']):
self.distance_matrix = self.interaction_elements['cci_matrix'].apply(lambda x: 1 - x)
else:
#self.distance_matrix = self.interaction_elements['cci_matrix'].div(self.interaction_elements['cci_matrix'].max().max()).apply(lambda x: 1 - x)
# Regularized distance
mean = np.nanmean(self.interaction_elements['cci_matrix'])
self.distance_matrix = self.interaction_elements['cci_matrix'].div(self.interaction_elements['cci_matrix'] + mean).apply(lambda x: 1 - x)
np.fill_diagonal(self.distance_matrix.values, 0.0) # Make diagonal zero (delete autocrine-interactions)
def pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding',
use_ppi_score=False, verbose=True):
'''Computes a communication score for each protein-protein interaction
between a pair of cells.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
communication_scores : numpy.array
An array with the communication scores for each intercellular
PPI.
'''
# TODO: Implement communication scores
if verbose:
print("Computing communication score between {} and {}".format(cell1.type, cell2.type))
# Check that new score is the same type as score used to build interaction space (binary or continuous)
if (communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']) \
& (self.communication_score in ['expression_thresholding', 'differential_combinations']):
raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
if (communication_score in ['expression_thresholding', 'differential_combinations']) \
& (self.communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']):
raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
if use_ppi_score:
ppi_score = self.ppi_data['score'].values
else:
ppi_score = None
if communication_score in ['expression_thresholding', 'differential_combinations']:
communication_value = communication_scores.get_binary_scores(cell1=cell1,
cell2=cell2,
ppi_score=ppi_score)
elif communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']:
communication_value = communication_scores.get_continuous_scores(cell1=cell1,
cell2=cell2,
ppi_score=ppi_score,
method=communication_score)
else:
raise NotImplementedError(
"Communication score {} to compute pairwise cell-communication is not implemented".format(communication_score))
return communication_value
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
interaction_columns=('A', 'B'), cells=None, cci_type=None, verbose=True):
'''Computes the communication scores for each LR pairs in
a given pair of sender-receiver cell
Parameters
----------
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None
Reference list of protein-protein interactions (or
ligand-receptor pairs) used for inferring the cell-cell
interactions and communication. It could be the same as
'ppi_data' if ppi_data is not bidirectional (that is,
contains ProtA-ProtB interaction as well as ProtB-ProtA
interaction). ref_ppi must be undirected (contains only
ProtA-ProtB and not ProtB-ProtA interaction). If None
the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None
Contains the names of the columns where to find the
partners in a dataframe of protein-protein interactions.
If the list is for ligand-receptor pairs, the first column
is for the ligands and the second for the receptors. If
None, the one stored in the attribute interaction_columns
will be used
cells : list=None
List of cells to consider.
cci_type : str, default=None
Type of interaction between two cells. Used to specify
if we want to consider a LR pair in both directions.
It can be:
- 'undirected'
- 'directed
If None, the one stored in the attribute analysis_setup
will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
self.interaction_elements['communication_matrix'] : pandas.DataFrame
Contains communication scores for each LR pair in a
given pair of sender-receiver cells.
'''
if communication_score is None:
communication_score = self.communication_score
else:
assert isinstance(communication_score, str)
# Cells to consider
if cells is None:
cells = self.interaction_elements['cell_names']
# Labels:
if cci_type is None:
cell_pairs = self.interaction_elements['pairs']
elif cci_type != self.cci_type:
cell_pairs = generate_pairs(cells, cci_type)
else:
#cell_pairs = generate_pairs(cells, self.cci_type) # Think about other scenarios that may need this line
cell_pairs = self.interaction_elements['pairs']
col_labels = ['{};{}'.format(pair[0], pair[1]) for pair in cell_pairs]
# Ref PPI data
if ref_ppi_data is None:
ref_index = self.ppi_data.apply(lambda row: (row['A'], row['B']), axis=1)
keep_index = list(range(self.ppi_data.shape[0]))
else:
ref_ppi = ref_ppi_data.copy()
prot_a = interaction_columns[0]
prot_b = interaction_columns[1]
if ('A' in ref_ppi.columns) & (prot_a != 'A'):
ref_ppi = ref_ppi.drop(columns='A')
if ('B' in ref_ppi.columns) & (prot_b != 'B'):
ref_ppi = ref_ppi.drop(columns='B')
ref_ppi = ref_ppi.rename(columns={prot_a: 'A', prot_b: 'B'})
ref_index = list(ref_ppi.apply(lambda row: (row['A'], row['B']), axis=1).values)
keep_index = list(pd.merge(self.ppi_data, ref_ppi, how='inner').index)
# DataFrame to Store values
communication_matrix = pd.DataFrame(index=ref_index, columns=col_labels)
### Compute pairwise physical interactions
if verbose:
print("Computing pairwise communication")
for i, pair in enumerate(cell_pairs):
cell1 = self.interaction_elements['cells'][pair[0]]
cell2 = self.interaction_elements['cells'][pair[1]]
comm_score = self.pair_communication_score(cell1,
cell2,
communication_score=communication_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
kept_values = comm_score.flatten()[keep_index]
communication_matrix[col_labels[i]] = kept_values
self.interaction_elements['communication_matrix'] = communication_matrix
```

#####
`compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)`

Computes overall CCI scores for each pair of cells.

###### Parameters

cci_score : str, default=None Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:

```
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

self.interaction_elements['cci_matrix'] : pandas.DataFrame Contains CCI scores for each pair of cells

## Source code in `cell2cell/core/interaction_space.py`

```
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
'''Computes overall CCI scores for each pair of cells.
Parameters
----------
cci_score : str, default=None
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
self.interaction_elements['cci_matrix'] : pandas.DataFrame
Contains CCI scores for each pair of cells
'''
if cci_score is None:
cci_score = self.cci_score
else:
assert isinstance(cci_score, str)
### Compute pairwise physical interactions
if verbose:
print("Computing pairwise interactions")
# Compute pair by pair
for pair in self.interaction_elements['pairs']:
cell1 = self.interaction_elements['cells'][pair[0]]
cell2 = self.interaction_elements['cells'][pair[1]]
cci_value = self.pair_cci_score(cell1,
cell2,
cci_score=cci_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
self.interaction_elements['cci_matrix'].at[pair[0], pair[1]] = cci_value
if self.cci_type == 'undirected':
self.interaction_elements['cci_matrix'].at[pair[1], pair[0]] = cci_value
# Compute using matmul -> Too slow and uses a lot of memory TODO: Try to optimize this
# if cci_score == 'bray_curtis':
# cci_matrix = cci_scores.matmul_bray_curtis_like(self.interaction_elements['A_score'],
# self.interaction_elements['B_score'])
# self.interaction_elements['cci_matrix'] = pd.DataFrame(cci_matrix,
# index=self.interaction_elements['cell_names'],
# columns=self.interaction_elements['cell_names']
# )
# Generate distance matrix
if ~(cci_score in ['count', 'icellnet']):
self.distance_matrix = self.interaction_elements['cci_matrix'].apply(lambda x: 1 - x)
else:
#self.distance_matrix = self.interaction_elements['cci_matrix'].div(self.interaction_elements['cci_matrix'].max().max()).apply(lambda x: 1 - x)
# Regularized distance
mean = np.nanmean(self.interaction_elements['cci_matrix'])
self.distance_matrix = self.interaction_elements['cci_matrix'].div(self.interaction_elements['cci_matrix'] + mean).apply(lambda x: 1 - x)
np.fill_diagonal(self.distance_matrix.values, 0.0) # Make diagonal zero (delete autocrine-interactions)
```

#####
`compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=('A', 'B'), cells=None, cci_type=None, verbose=True)`

Computes the communication scores for each LR pairs in a given pair of sender-receiver cell

###### Parameters

communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:

```
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

ref_ppi_data : pandas.DataFrame, default=None Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). If None the one stored in the attribute ref_ppi will be used.

interaction_columns : tuple, default=None Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. If None, the one stored in the attribute interaction_columns will be used

cells : list=None List of cells to consider.

cci_type : str, default=None Type of interaction between two cells. Used to specify if we want to consider a LR pair in both directions. It can be: - 'undirected' - 'directed If None, the one stored in the attribute analysis_setup will be used.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

self.interaction_elements['communication_matrix'] : pandas.DataFrame Contains communication scores for each LR pair in a given pair of sender-receiver cells.

## Source code in `cell2cell/core/interaction_space.py`

```
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
interaction_columns=('A', 'B'), cells=None, cci_type=None, verbose=True):
'''Computes the communication scores for each LR pairs in
a given pair of sender-receiver cell
Parameters
----------
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None
Reference list of protein-protein interactions (or
ligand-receptor pairs) used for inferring the cell-cell
interactions and communication. It could be the same as
'ppi_data' if ppi_data is not bidirectional (that is,
contains ProtA-ProtB interaction as well as ProtB-ProtA
interaction). ref_ppi must be undirected (contains only
ProtA-ProtB and not ProtB-ProtA interaction). If None
the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None
Contains the names of the columns where to find the
partners in a dataframe of protein-protein interactions.
If the list is for ligand-receptor pairs, the first column
is for the ligands and the second for the receptors. If
None, the one stored in the attribute interaction_columns
will be used
cells : list=None
List of cells to consider.
cci_type : str, default=None
Type of interaction between two cells. Used to specify
if we want to consider a LR pair in both directions.
It can be:
- 'undirected'
- 'directed
If None, the one stored in the attribute analysis_setup
will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
self.interaction_elements['communication_matrix'] : pandas.DataFrame
Contains communication scores for each LR pair in a
given pair of sender-receiver cells.
'''
if communication_score is None:
communication_score = self.communication_score
else:
assert isinstance(communication_score, str)
# Cells to consider
if cells is None:
cells = self.interaction_elements['cell_names']
# Labels:
if cci_type is None:
cell_pairs = self.interaction_elements['pairs']
elif cci_type != self.cci_type:
cell_pairs = generate_pairs(cells, cci_type)
else:
#cell_pairs = generate_pairs(cells, self.cci_type) # Think about other scenarios that may need this line
cell_pairs = self.interaction_elements['pairs']
col_labels = ['{};{}'.format(pair[0], pair[1]) for pair in cell_pairs]
# Ref PPI data
if ref_ppi_data is None:
ref_index = self.ppi_data.apply(lambda row: (row['A'], row['B']), axis=1)
keep_index = list(range(self.ppi_data.shape[0]))
else:
ref_ppi = ref_ppi_data.copy()
prot_a = interaction_columns[0]
prot_b = interaction_columns[1]
if ('A' in ref_ppi.columns) & (prot_a != 'A'):
ref_ppi = ref_ppi.drop(columns='A')
if ('B' in ref_ppi.columns) & (prot_b != 'B'):
ref_ppi = ref_ppi.drop(columns='B')
ref_ppi = ref_ppi.rename(columns={prot_a: 'A', prot_b: 'B'})
ref_index = list(ref_ppi.apply(lambda row: (row['A'], row['B']), axis=1).values)
keep_index = list(pd.merge(self.ppi_data, ref_ppi, how='inner').index)
# DataFrame to Store values
communication_matrix = pd.DataFrame(index=ref_index, columns=col_labels)
### Compute pairwise physical interactions
if verbose:
print("Computing pairwise communication")
for i, pair in enumerate(cell_pairs):
cell1 = self.interaction_elements['cells'][pair[0]]
cell2 = self.interaction_elements['cells'][pair[1]]
comm_score = self.pair_communication_score(cell1,
cell2,
communication_score=communication_score,
use_ppi_score=use_ppi_score,
verbose=verbose)
kept_values = comm_score.flatten()[keep_index]
communication_matrix[col_labels[i]] = kept_values
self.interaction_elements['communication_matrix'] = communication_matrix
```

#####
`pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True)`

Computes a CCI score for a pair of cells.

###### Parameters

cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.

cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.

cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:

```
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case it is a Jaccard-like score.

## Source code in `cell2cell/core/interaction_space.py`

```
def pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True):
'''
Computes a CCI score for a pair of cells.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
cci_score : str, default='bray_curtis'
Scoring function to aggregate the communication scores between
a pair of cells. It computes an overall potential of cell-cell
interactions. If None, it will use the one stored in the
attribute analysis_setup of this object.
Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
cci_score : float
Overall score for the interaction between a pair of
cell-types/tissues/samples. In this case it is a
Jaccard-like score.
'''
if verbose:
print("Computing interaction score between {} and {}".format(cell1.type, cell2.type))
if use_ppi_score:
ppi_score = self.ppi_data['score'].values
else:
ppi_score = None
# Calculate cell-cell interaction score
if cci_score == 'bray_curtis':
cci_value = cci_scores.compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'jaccard':
cci_value = cci_scores.compute_jaccard_like_cci_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'count':
cci_value = cci_scores.compute_count_score(cell1, cell2, ppi_score=ppi_score)
elif cci_score == 'icellnet':
cci_value = cci_scores.compute_icellnet_score(cell1, cell2, ppi_score=ppi_score)
else:
raise NotImplementedError("CCI score {} to compute pairwise cell-interactions is not implemented".format(cci_score))
return cci_value
```

#####
`pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding', use_ppi_score=False, verbose=True)`

Computes a communication score for each protein-protein interaction between a pair of cells.

###### Parameters

```
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

communication_scores : numpy.array An array with the communication scores for each intercellular PPI.

## Source code in `cell2cell/core/interaction_space.py`

```
def pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding',
use_ppi_score=False, verbose=True):
'''Computes a communication score for each protein-protein interaction
between a pair of cells.
Parameters
----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell
Second cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the receiver.
communication_score : str, default=None
Type of communication score to infer the potential use of
a given ligand-receptor pair by a pair of cells/tissues/samples.
If None, the score stored in the attribute analysis_setup
will be used.
Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False
Whether using a weight of LR pairs specified in the ppi_data
to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
communication_scores : numpy.array
An array with the communication scores for each intercellular
PPI.
'''
# TODO: Implement communication scores
if verbose:
print("Computing communication score between {} and {}".format(cell1.type, cell2.type))
# Check that new score is the same type as score used to build interaction space (binary or continuous)
if (communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']) \
& (self.communication_score in ['expression_thresholding', 'differential_combinations']):
raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
if (communication_score in ['expression_thresholding', 'differential_combinations']) \
& (self.communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']):
raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
if use_ppi_score:
ppi_score = self.ppi_data['score'].values
else:
ppi_score = None
if communication_score in ['expression_thresholding', 'differential_combinations']:
communication_value = communication_scores.get_binary_scores(cell1=cell1,
cell2=cell2,
ppi_score=ppi_score)
elif communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']:
communication_value = communication_scores.get_continuous_scores(cell1=cell1,
cell2=cell2,
ppi_score=ppi_score,
method=communication_score)
else:
raise NotImplementedError(
"Communication score {} to compute pairwise cell-communication is not implemented".format(communication_score))
return communication_value
```

####
`generate_interaction_elements(modified_rnaseq, ppi_data, cci_type='undirected', cci_matrix_template=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)`

Create all elements needed to perform the analyses of pairwise cell-cell interactions/communication. Corresponds to the interaction elements used by the class InteractionSpace.

###### Parameters

modified_rnaseq : pandas.DataFrame Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment. Columns are are cell-types/tissues/samples and rows are genes. The preprocessing may correspond to scoring the gene expression as binary or continuous values depending on the scoring function for cell-cell interactions/communication.

cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.

cci_matrix_template : pandas.DataFrame, default=None A matrix of shape MxM where M are cell-types/tissues/samples. This is used as template for storing CCI scores. It may be useful for specifying which pairs of cells to consider.

```
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
```

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

interaction_elements : dict Dictionary containing all the pairs of cells considered (under the key of 'pairs'), Cell instances (under key 'cells') which include all cells/tissues/organs with their associated datasets (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix to store CCI scores(under key 'cci_matrix'). A communication matrix is also stored in this object when the communication scores are computed in the InteractionSpace class (under key 'communication_score')

## Source code in `cell2cell/core/interaction_space.py`

```
def generate_interaction_elements(modified_rnaseq, ppi_data, cci_type='undirected', cci_matrix_template=None,
complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'),
verbose=True):
'''Create all elements needed to perform the analyses of pairwise
cell-cell interactions/communication. Corresponds to the interaction
elements used by the class InteractionSpace.
Parameters
----------
modified_rnaseq : pandas.DataFrame
Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
may correspond to scoring the gene expression as binary or continuous values
depending on the scoring function for cell-cell interactions/communication.
ppi_data : pandas.DataFrame
List of protein-protein interactions (or ligand-receptor pairs) used for
inferring the cell-cell interactions and communication.
cci_type : str, default='undirected'
Specifies whether computing the cci_score in a directed or undirected
way. For a pair of cells A and B, directed means that the ligands are
considered only from cell A and receptors only from cell B or viceversa.
While undirected simultaneously considers signaling from cell A to
cell B and from cell B to cell A.
cci_matrix_template : pandas.DataFrame, default=None
A matrix of shape MxM where M are cell-types/tissues/samples. This
is used as template for storing CCI scores. It may be useful
for specifying which pairs of cells to consider.
complex_sep : str, default=None
Symbol that separates the protein subunits in a multimeric complex.
For example, '&' is the complex_sep for a list of ligand-receptor pairs
where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min'
Method to aggregate the expression value of multiple genes in a
complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
interaction_columns : tuple, default=('A', 'B')
Contains the names of the columns where to find the partners in a
dataframe of protein-protein interactions. If the list is for
ligand-receptor pairs, the first column is for the ligands and the second
for the receptors.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
interaction_elements : dict
Dictionary containing all the pairs of cells considered (under
the key of 'pairs'), Cell instances (under key 'cells')
which include all cells/tissues/organs with their associated datasets
(rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
to store CCI scores(under key 'cci_matrix'). A communication matrix
is also stored in this object when the communication scores are
computed in the InteractionSpace class (under key
'communication_score')
'''
if verbose:
print('Creating Interaction Space')
# Include complex expression
if complex_sep is not None:
col_a_genes, complex_a, col_b_genes, complex_b, complexes = get_genes_from_complexes(ppi_data=ppi_data,
complex_sep=complex_sep,
interaction_columns=interaction_columns
)
modified_rnaseq = add_complexes_to_expression(rnaseq_data=modified_rnaseq,
complexes=complexes,
agg_method=complex_agg_method
)
# Cells
cell_instances = list(modified_rnaseq.columns) # @Erick, check if position 0 of columns contain index header.
cell_number = len(cell_instances)
# Generate pairwise interactions
pairwise_interactions = generate_pairs(cell_instances, cci_type)
# Interaction elements
interaction_elements = {}
interaction_elements['cell_names'] = cell_instances
interaction_elements['pairs'] = pairwise_interactions
interaction_elements['cells'] = cell.get_cells_from_rnaseq(modified_rnaseq, verbose=verbose)
# Cell-specific scores in PPIs
# For matmul functions
#interaction_elements['A_score'] = np.array([], dtype=np.int64)#.reshape(ppi_data.shape[0],0)
#interaction_elements['B_score'] = np.array([], dtype=np.int64)#.reshape(ppi_data.shape[0],0)
# For 'for' loop
for cell_instance in interaction_elements['cells'].values():
cell_instance.weighted_ppi = integrate_data.get_weighted_ppi(ppi_data=ppi_data,
modified_rnaseq_data=cell_instance.rnaseq_data,
column='value', # value is in each cell
)
#interaction_elements['A_score'] = np.hstack([interaction_elements['A_score'], cell_instance.weighted_ppi['A'].values])
#interaction_elements['B_score'] = np.hstack([interaction_elements['B_score'], cell_instance.weighted_ppi['B'].values])
# Cell-cell interaction matrix
if cci_matrix_template is None:
interaction_elements['cci_matrix'] = pd.DataFrame(np.zeros((cell_number, cell_number)),
columns=cell_instances,
index=cell_instances)
else:
interaction_elements['cci_matrix'] = cci_matrix_template
return interaction_elements
```

####
`generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True)`

Generates a list of pairs of interacting cell-types/tissues/samples.

###### Parameters

cells : list A lyst of cell-type/tissue/sample names.

cci_type : str, Type of interactions. Options are:

```
- 'directed' : Directed cell-cell interactions, so pair A-B is different
to pair B-A and both are considered.
- 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
to pair B-A and just one of them is considered.
```

self_interaction : boolean, default=True Whether considering autocrine interactions (pair A-A, B-B, etc).

remove_duplicates : boolean, default=True Whether removing duplicates when a list of cells is passed and names are duplicated. If False and a list [A, A, B] is passed, pairs could be [A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True and cci_type is 'directed'. In the same scenario but when remove_duplicates is True, the resulting list would be [A-A, A-B, B-A, B-B].

###### Returns

pairs : list List with pairs of interacting cell-types/tissues/samples.

## Source code in `cell2cell/core/interaction_space.py`

```
def generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True):
'''Generates a list of pairs of interacting cell-types/tissues/samples.
Parameters
----------
cells : list
A lyst of cell-type/tissue/sample names.
cci_type : str,
Type of interactions.
Options are:
- 'directed' : Directed cell-cell interactions, so pair A-B is different
to pair B-A and both are considered.
- 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
to pair B-A and just one of them is considered.
self_interaction : boolean, default=True
Whether considering autocrine interactions (pair A-A, B-B, etc).
remove_duplicates : boolean, default=True
Whether removing duplicates when a list of cells is passed and names are
duplicated. If False and a list [A, A, B] is passed, pairs could be
[A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True
and cci_type is 'directed'. In the same scenario but when remove_duplicates
is True, the resulting list would be [A-A, A-B, B-A, B-B].
Returns
-------
pairs : list
List with pairs of interacting cell-types/tissues/samples.
'''
if self_interaction:
if cci_type == 'directed':
pairs = list(itertools.product(cells, cells))
#pairs = list(itertools.combinations(cells + cells, 2)) # Directed
elif cci_type == 'undirected':
pairs = list(itertools.combinations(cells, 2)) + [(c, c) for c in cells] # Undirected
else:
raise NotImplementedError("CCI type has to be directed or undirected")
else:
if cci_type == 'directed':
pairs_ = list(itertools.product(cells, cells))
pairs = []
for p in pairs_:
if p[0] == p[1]:
continue
else:
pairs.append(p)
elif cci_type == 'undirected':
pairs = list(itertools.combinations(cells, 2))
else:
raise NotImplementedError("CCI type has to be directed or undirected")
if remove_duplicates:
pairs = list(set(pairs)) # Remove duplicates
return pairs
```

##
`datasets`

`special`

###
`anndata`

####
`balf_covid(filename='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad')`

BALF samples from COVID-19 patients The data consists in 63k immune and epithelial cells in lungs from 3 control, 3 moderate COVID-19, and 6 severe COVID-19 patients.

This dataset was previously published in [1], and this objects contains the raw counts for the annotated cell types available in: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145926

References: [1] Liao, M., Liu, Y., Yuan, J. et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med 26, 842â€“844 (2020). https://doi.org/10.1038/s41591-020-0901-9

###### Parameters

```
filename : str, default='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad'
Path to the h5ad file in case it was manually downloaded.
```

###### Returns

```
Annotated data matrix.
```

## Source code in `cell2cell/datasets/anndata.py`

```
def balf_covid(filename='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad'):
"""BALF samples from COVID-19 patients
The data consists in 63k immune and epithelial cells in lungs
from 3 control, 3 moderate COVID-19, and 6 severe COVID-19 patients.
This dataset was previously published in [1], and this objects contains
the raw counts for the annotated cell types available in:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145926
References:
[1] Liao, M., Liu, Y., Yuan, J. et al.
Single-cell landscape of bronchoalveolar immune cells in patients
with COVID-19. Nat Med 26, 842â€“844 (2020).
https://doi.org/10.1038/s41591-020-0901-9
Parameters
----------
filename : str, default='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad'
Path to the h5ad file in case it was manually downloaded.
Returns
-------
Annotated data matrix.
"""
url = 'https://zenodo.org/record/7535867/files/BALF-COVID19-Liao_et_al-NatMed-2020.h5ad'
adata = read(filename, backup_url=url)
return adata
```

###
`gsea_data`

####
`gsea_msig(organism='human', pathwaydb='GOBP', readable_name=False)`

Load a MSigDB from a gmt file

###### Parameters

organism : str, default='human' Organism for whom the DB will be loaded. Available options are {'human', 'mouse'}.

str, default='GOBP'

Molecular Signature Database to load. Available options are {'GOBP', 'KEGG', 'Reactome'}

readable_name : boolean, default=False If True, the pathway names are transformed to a more readable format. That is, removing underscores and pathway DB name at the beginning.

###### Returns

pathway_per_gene : defaultdict Dictionary containing all genes in the DB as keys, and their values are lists with their pathway annotations.

## Source code in `cell2cell/datasets/gsea_data.py`

```
def gsea_msig(organism='human', pathwaydb='GOBP', readable_name=False):
'''Load a MSigDB from a gmt file
Parameters
----------
organism : str, default='human'
Organism for whom the DB will be loaded.
Available options are {'human', 'mouse'}.
pathwaydb: str, default='GOBP'
Molecular Signature Database to load.
Available options are {'GOBP', 'KEGG', 'Reactome'}
readable_name : boolean, default=False
If True, the pathway names are transformed to a more readable format.
That is, removing underscores and pathway DB name at the beginning.
Returns
-------
pathway_per_gene : defaultdict
Dictionary containing all genes in the DB as keys, and
their values are lists with their pathway annotations.
'''
_check_pathwaydb(organism, pathwaydb)
pathway_per_gene = load_gmt(readable_name=readable_name, **PATHWAY_DATA[organism][pathwaydb])
return pathway_per_gene
```

###
`heuristic_data`

####
```
HeuristicGOTerms
```

GO terms for contact and secreted proteins.

###### Attributes

contact_go_terms : list List of GO terms associated with proteins that participate in contact interactions (usually on the surface of cells).

mediator_go_terms : list List of GO terms associated with secreted proteins that mediate intercellular interactions or communication.

## Source code in `cell2cell/datasets/heuristic_data.py`

```
class HeuristicGOTerms:
'''GO terms for contact and secreted proteins.
Attributes
----------
contact_go_terms : list
List of GO terms associated with proteins that
participate in contact interactions (usually
on the surface of cells).
mediator_go_terms : list
List of GO terms associated with secreted
proteins that mediate intercellular interactions
or communication.
'''
def __init__(self):
self.contact_go_terms = ['GO:0007155', # Cell adhesion
'GO:0022608', # Multicellular organism adhesion
'GO:0098740', # Multiorganism cell adhesion
'GO:0098743', # Cell aggregation
'GO:0030054', # Cell-junction #
'GO:0009986', # Cell surface #
'GO:0097610', # Cell surface forrow
'GO:0007160', # Cell-matrix adhesion
'GO:0043235', # Receptor complex,
'GO:0008305', # Integrin complex,
'GO:0043113', # Receptor clustering
'GO:0009897', # External side of plasma membrane #
'GO:0038023', # Signaling receptor activity #
]
self.mediator_go_terms = ['GO:0005615', # Extracellular space
'GO:0005576', # Extracellular region
'GO:0031012', # Extracellular matrix
'GO:0005201', # Extracellular matrix structural constituent
'GO:1990430', # Extracellular matrix protein binding
'GO:0048018', # Receptor ligand activity #
]
```

###
`random_data`

####
`generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None)`

Generates a square cell-cell interaction matrix with random scores.

###### Parameters

cell_number : int Number of cells.

labels : list, default=None List containing labels for each cells. Length of this list must match the cell_number.

symmetric : boolean, default=True Whether generating a symmetric CCI matrix.

random_state : int, default=None Seed for randomization.

###### Returns

cci_matrix : pandas.DataFrame Matrix with rows and columns as cells. Values represent a random CCI score between 0 and 1.

## Source code in `cell2cell/datasets/random_data.py`

```
def generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None):
'''Generates a square cell-cell interaction
matrix with random scores.
Parameters
----------
cell_number : int
Number of cells.
labels : list, default=None
List containing labels for each cells. Length of
this list must match the cell_number.
symmetric : boolean, default=True
Whether generating a symmetric CCI matrix.
random_state : int, default=None
Seed for randomization.
Returns
-------
cci_matrix : pandas.DataFrame
Matrix with rows and columns as cells. Values
represent a random CCI score between 0 and 1.
'''
if labels is not None:
assert len(labels) == cell_number, "Lenght of labels must match cell_number"
else:
labels = ['Cell-{}'.format(n) for n in range(1, cell_number+1)]
if random_state is not None:
np.random.seed(random_state)
cci_scores = np.random.random((cell_number, cell_number))
if symmetric:
cci_scores = (cci_scores + cci_scores.T) / 2.
cci_matrix = pd.DataFrame(cci_scores, index=labels, columns=labels)
return cci_matrix
```

####
`generate_random_metadata(cell_labels, group_number)`

Randomly assigns groups to cell labels.

###### Parameters

cell_labels : list A list of cell labels.

group_number : int Number of major groups of cells.

###### Returns

metadata : pandas.DataFrame DataFrame containing the major groups that each cell received randomly (under column 'Group'). Cells are under the column 'Cell'.

## Source code in `cell2cell/datasets/random_data.py`

```
def generate_random_metadata(cell_labels, group_number):
'''Randomly assigns groups to cell labels.
Parameters
----------
cell_labels : list
A list of cell labels.
group_number : int
Number of major groups of cells.
Returns
-------
metadata : pandas.DataFrame
DataFrame containing the major groups that each cell
received randomly (under column 'Group'). Cells are
under the column 'Cell'.
'''
metadata = pd.DataFrame()
metadata['Cell'] = cell_labels
groups = list(range(1, group_number+1))
metadata['Group'] = metadata['Cell'].apply(lambda x: np.random.choice(groups, 1)[0])
return metadata
```

####
`generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True)`

Generates a random list of protein-protein interactions.

###### Parameters

max_size : int Maximum size of interactions to obtain. Since the PPIs are obtained by independently resampling interactors A and B rather than creating all possible combinations (it may demand too much memory), some PPIs can be duplicated and when dropping them results into a smaller number of PPIs than the max_size.

interactors_A : list A list of protein names to include in the first column of the PPIs.

interactors_B : list, default=None A list of protein names to include in the second columns of the PPIs. If None, interactors_A will be used as interactors_B too.

random_state : int, default=None Seed for randomization.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

ppi_data : pandas.DataFrame DataFrame containing a list of protein-protein interactions. It has three columns: 'A', 'B', and 'score' for interactors A, B and weights of interactions, respectively.

## Source code in `cell2cell/datasets/random_data.py`

```
def generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True):
'''Generates a random list of protein-protein interactions.
Parameters
----------
max_size : int
Maximum size of interactions to obtain. Since the PPIs
are obtained by independently resampling interactors A and B
rather than creating all possible combinations (it may demand too much
memory), some PPIs can be duplicated and when dropping them
results into a smaller number of PPIs than the max_size.
interactors_A : list
A list of protein names to include in the first column of
the PPIs.
interactors_B : list, default=None
A list of protein names to include in the second columns
of the PPIs. If None, interactors_A will be used as
interactors_B too.
random_state : int, default=None
Seed for randomization.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
ppi_data : pandas.DataFrame
DataFrame containing a list of protein-protein interactions.
It has three columns: 'A', 'B', and 'score' for interactors
A, B and weights of interactions, respectively.
'''
if interactors_B is not None:
assert max_size <= len(interactors_A)*len(interactors_B), "The maximum size can't be greater than all combinations between partners A and B"
else:
assert max_size <= len(interactors_A)**2, "The maximum size can't be greater than all combinations of partners A"
if verbose:
print('Generating random PPI network.')
def small_block_ppi(size, interactors_A, interactors_B, random_state):
if random_state is not None:
random_state += 1
if interactors_B is None:
interactors_B = interactors_A
col_A = resample(interactors_A, n_samples=size, random_state=random_state)
col_B = resample(interactors_B, n_samples=size, random_state=random_state)
ppi_data = pd.DataFrame()
ppi_data['A'] = col_A
ppi_data['B'] = col_B
ppi_data.assign(score=1.0)
ppi_data = ppi.remove_ppi_bidirectionality(ppi_data, ('A', 'B'), verbose=verbose)
ppi_data = ppi_data.drop_duplicates()
ppi_data.reset_index(inplace=True, drop=True)
return ppi_data
ppi_data = small_block_ppi(max_size*2, interactors_A, interactors_B, random_state)
# TODO: This part need to be fixed, it does not converge to the max_size -> len((set(A)) * len(set(B) - set(A)))
# while ppi_data.shape[0] < size:
# if random_state is not None:
# random_state += 2
# b = small_block_ppi(size, interactors_A, interactors_B, random_state)
# print(b)
# ppi_data = pd.concat([ppi_data, b])
# ppi_data = ppi.remove_ppi_bidirectionality(ppi_data, ('A', 'B'), verbose=verbose)
# ppi_data = ppi_data.drop_duplicates()
# ppi_data.dropna()
# ppi_data.reset_index(inplace=True, drop=True)
# print(ppi_data.shape[0])
if ppi_data.shape[0] > max_size:
ppi_data = ppi_data.loc[list(range(max_size)), :]
ppi_data.reset_index(inplace=True, drop=True)
return ppi_data
```

####
`generate_random_rnaseq(size, row_names, random_state=None, verbose=True)`

Generates a RNA-seq dataset that is normally distributed gene-wise and size normalized (each column sums up to a million).

###### Parameters

size : int Number of cell-types/tissues/samples (columns).

row_names : array-like List containing the name of genes (rows).

random_state : int, default=None Seed for randomization.

verbose : boolean, default=True Whether printing or not steps of the analysis.

###### Returns

df : pandas.DataFrame Dataframe containing gene expression given the list of genes for each cell-type/tissue/sample.

## Source code in `cell2cell/datasets/random_data.py`

```
def generate_random_rnaseq(size, row_names, random_state=None, verbose=True):
'''
Generates a RNA-seq dataset that is normally distributed gene-wise and size
normalized (each column sums up to a million).
Parameters
----------
size : int
Number of cell-types/tissues/samples (columns).
row_names : array-like
List containing the name of genes (rows).
random_state : int, default=None
Seed for randomization.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
Returns
-------
df : pandas.DataFrame
Dataframe containing gene expression given the list
of genes for each cell-type/tissue/sample.
'''
if verbose:
print('Generating random RNA-seq dataset.')
columns = ['Cell-{}'.format(c) for c in range(1, size+1)]
if random_state is not None:
np.random.seed(random_state)
data = np.random.randn(len(row_names), len(columns)) # Normal distribution
min = np.abs(np.amin(data, axis=1))
min = min.reshape((len(min), 1))
data = data + min
df = pd.DataFrame(data, index=row_names, columns=columns)
if verbose:
print('Normalizing random RNA-seq dataset (into TPM)')
df = rnaseq.scale_expression_by_sum(df, axis=0, sum_value=1e6)
return df
```

###
`toy_data`

####
`generate_toy_distance()`

Generates a square matrix with cell-cell distance.

###### Returns

distance : pandas.DataFrame DataFrame with Euclidean-like distance between each pair of cells in the toy RNA-seq dataset.

## Source code in `cell2cell/datasets/toy_data.py`

```
def generate_toy_distance():
'''Generates a square matrix with cell-cell distance.
Returns
-------
distance : pandas.DataFrame
DataFrame with Euclidean-like distance between each
pair of cells in the toy RNA-seq dataset.
'''
data = np.asarray([[0.0, 10.0, 12.0, 5.0, 3.0],
[10.0, 0.0, 15.0, 8.0, 9.0],
[12.0, 15.0, 0.0, 4.5, 7.5],
[5.0, 8.0, 4.5, 0.0, 6.5],
[3.0, 9.0, 7.5, 6.5, 0.0],
])
distance = pd.DataFrame(data,
index=['C1', 'C2', 'C3', 'C4', 'C5'],
columns=['C1', 'C2', 'C3', 'C4', 'C5']
)
return distance
```

####
`generate_toy_metadata()`

Generates metadata for cells in the toy RNA-seq dataset.

###### Returns

metadata : pandas.DataFrame DataFrame with metadata for each cell. Metadata contains the major groups of those cells.

## Source code in `cell2cell/datasets/toy_data.py`

```
def generate_toy_metadata():
'''Generates metadata for cells in the toy RNA-seq dataset.
Returns
-------
metadata : pandas.DataFrame
DataFrame with metadata for each cell. Metadata contains the
major groups of those cells.
'''
data = np.asarray([['C1', 'G1'],
['C2', 'G2'],
['C3', 'G3'],
['C4', 'G3'],
['C5', 'G1']
])
metadata = pd.DataFrame(data, columns=['#SampleID', 'Groups'])
return metadata
```

####
`generate_toy_ppi(prot_complex=False)`

Generates a toy list of protein-protein interactions.

###### Parameters

prot_complex : boolean, default=False Whether including PPIs where interactors could contain multimeric complexes.

###### Returns

ppi : pandas.DataFrame Dataframe containing PPIs. Columns are 'A' (first interacting partners), 'B' (second interacting partners) and 'score' for weighting each PPI.

## Source code in `cell2cell/datasets/toy_data.py`

```
def generate_toy_ppi(prot_complex=False):
'''Generates a toy list of protein-protein interactions.
Parameters
----------
prot_complex : boolean, default=False
Whether including PPIs where interactors could contain
multimeric complexes.
Returns
-------
ppi : pandas.DataFrame
Dataframe containing PPIs. Columns are 'A' (first interacting
partners), 'B' (second interacting partners) and 'score'
for weighting each PPI.
'''
if prot_complex:
data = np.asarray([['Protein-A', 'Protein-B'],
['Protein-B', 'Protein-C'],
['Protein-C', 'Protein-A'],
['Protein-B', 'Protein-B'],
['Protein-B', 'Protein-A'],
['Protein-E', 'Protein-F'],
['Protein-F', 'Protein-F'],
['Protein-C&Protein-E', 'Protein-F'],
['Protein-B', 'Protein-E'],
['Protein-A&Protein-B', 'Protein-F'],
])
else:
data = np.asarray([['Protein-A', 'Protein-B'],
['Protein-B', 'Protein-C'],
['Protein-C', 'Protein-A'],
['Protein-B', 'Protein-B'],
['Protein-B', 'Protein-A'],
['Protein-E', 'Protein-F'],
['Protein-F', 'Protein-F'],
['Protein-C', 'Protein-F'],
['Protein-B', 'Protein-E'],
['Protein-A', 'Protein-F'],
])
ppi = pd.DataFrame(data, columns=['A', 'B'])
ppi = ppi.assign(score=1.0)
return ppi
```

####
`generate_toy_rnaseq()`

Generates a toy RNA-seq dataset

###### Returns

rnaseq : pandas.DataFrame DataFrame contianing the toy RNA-seq dataset. Columns are cells and rows are genes.

## Source code in `cell2cell/datasets/toy_data.py`

```
def generate_toy_rnaseq():
'''Generates a toy RNA-seq dataset
Returns
-------
rnaseq : pandas.DataFrame
DataFrame contianing the toy RNA-seq dataset. Columns
are cells and rows are genes.
'''
data = np.asarray([[5, 10, 8, 15, 2],
[15, 5, 20, 1, 30],
[18, 12, 5, 40, 20],
[9, 30, 22, 5, 2],
[2, 1, 1, 27, 15],
[30, 11, 16, 5, 12],
])
rnaseq = pd.DataFrame(data,
index=['Protein-A', 'Protein-B', 'Protein-C', 'Protein-D', 'Protein-E', 'Protein-F'],
columns=['C1', 'C2', 'C3', 'C4', 'C5']
)
rnaseq.index.name = 'gene_id'
return rnaseq
```

##
`external`

`special`

###
`goenrich`

####
`gene2go(filename, experimental=False, tax_id=9606, **kwds)`

read go-annotation file

:param filename: protein or gene identifier column :param experimental: use only experimentally validated annotations :param tax_id: filter according to taxon

## Source code in `cell2cell/external/goenrich.py`

```
def gene2go(filename, experimental=False, tax_id=9606, **kwds):
""" read go-annotation file
:param filename: protein or gene identifier column
:param experimental: use only experimentally validated annotations
:param tax_id: filter according to taxon
"""
defaults = {'comment': '#',
'names': GENE2GO_COLUMNS}
defaults.update(kwds)
result = pd.read_csv(filename, sep='\t', **defaults)
retain_mask = result.tax_id == tax_id
result.drop(result.index[~retain_mask], inplace=True)
if experimental:
retain_mask = result.Evidence.isin(EXPERIMENTAL_EVIDENCE)
result.drop(result.index[~retain_mask], inplace=True)
return result
```

####
`goa(filename, experimental=True, **kwds)`

read go-annotation file

:param filename: protein or gene identifier column :param experimental: use only experimentally validated annotations

## Source code in `cell2cell/external/goenrich.py`

```
def goa(filename, experimental=True, **kwds):
""" read go-annotation file
:param filename: protein or gene identifier column
:param experimental: use only experimentally validated annotations
"""
defaults = {'comment': '!',
'names': GENE_ASSOCIATION_COLUMNS}
if experimental and 'usecols' in kwds:
kwds['usecols'] += ('evidence_code',)
defaults.update(kwds)
result = pd.read_csv(filename, sep='\t', **defaults)
if experimental:
retain_mask = result.evidence_code.isin(EXPERIMENTAL_EVIDENCE)
result.drop(result.index[~retain_mask], inplace=True)
return result
```

####
`ontology(file)`

read ontology from file :param file: file path of file handle

## Source code in `cell2cell/external/goenrich.py`

```
def ontology(file):
""" read ontology from file
:param file: file path of file handle
"""
O = nx.DiGraph()
if isinstance(file, str):
f = open(file)
we_opened_file = True
else:
f = file
we_opened_file = False
try:
tokens = _tokenize(f)
terms = _filter_terms(tokens)
entries = _parse_terms(terms)
nodes, edges = zip(*entries)
O.add_nodes_from(nodes)
O.add_edges_from(itertools.chain.from_iterable(edges))
O.graph['roots'] = {data['name'] : n for n, data in O.nodes.items()
if data['name'] == data['namespace']}
finally:
if we_opened_file:
f.close()
for
```