Documentation for cell2cell
This documentation is for our cell2cell suite, which includes the regular cell2cell and Tensor-cell2cell tools. The former is for inferring cell-cell interactions and communication in one sample or context, while the latter is for deconvolving complex patterns of cell-cell communication across multiple samples or contexts simultaneously into interpretable factors representing patterns of communication.
Here, multiple classes and functions are implemented to facilitate the analyses, including a variety of visualizations to simplify the interpretation of results:
- cell2cell.analysis : Includes simplified pipelines for running the analyses, and functions for downstream analyses of Tensor-cell2cell
- cell2cell.clustering : Includes multiple scipy-based functions for performing clustering methods.
- cell2cell.core : Includes the core functions for inferring cell-cell interactions and communication. It includes scoring methods, cell classes, and interaction spaces.
- cell2cell.datasets : Includes toy datasets and annotations for testing functions in basic scenarios.
- cell2cell.external : Includes built-in approaches borrowed from other tools to avoid incompatibilities (e.g. UMAP, tensorly, and PCoA).
- cell2cell.io : Includes functions for opening and saving diverse types of files.
- cell2cell.plotting : Includes all the visualization options that cell2cell offers.
- cell2cell.preprocessing : Includes functions for manipulating data and variables (e.g. data preprocessing, integration, permutation, among others).
- cell2cell.spatial : Includes filtering of cell-cell interactions results given intercellular distance, as well as defining neighborhoods by grids or moving windows.
- cell2cell.stats : Includes statistical analyses such as enrichment analysis, multiple test correction methods, permutation approaches, and Gini coefficient.
- cell2cell.tensor : Includes all functions pertinent to the analysis of Tensor-cell2cell
- cell2cell.utils : Includes general utilities for analyzing networks and performing parallel computing.
Below, all the inputs, parameters (including their different options), and outputs are detailed. Source code of the functions is also included.
analysis
cell2cell_pipelines
BulkInteractions
Interaction class with all necessary methods to run the cell2cell pipeline on a bulk RNA-seq dataset. Cells here could be represented by tissues, samples or any bulk organization of cells.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment. Columns are samples and rows are genes.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
metadata : pandas.Dataframe, default=None Metadata associated with the samples in the RNA-seq dataset.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
communication_score : str, default='expression_thresholding' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.
sample_col : str, default='sampleID' Column-name for the samples in the metadata.
group_col : str, default='tissue' Column-name for the grouping information associated with the samples in the metadata.
expression_threshold : float, default=10 Threshold value to binarize gene expression when using communication_score='expression_thresholding'. Units have to be the same as the rnaseq_data matrix (e.g., TPMs, counts, etc.).
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Attributes
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment. Columns are samples and rows are genes.
metadata : pandas.DataFrame Metadata associated with the samples in the RNA-seq dataset.
index_col : str Column-name for the samples in the metadata.
group_col : str Column-name for the grouping information associated with the samples in the metadata.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
complex_sep : str Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
ref_ppi : pandas.DataFrame Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.
Source code in cell2cell/analysis/cell2cell_pipelines.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 | |
interaction_elements
property
Returns the interaction elements within an interaction space.
compute_pairwise_cci_scores(cci_score=None, use_ppi_score=False, verbose=True)
Computes overall CCI scores for each pair of cells.
Parameters
cci_score : str, default=None Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Source code in cell2cell/analysis/cell2cell_pipelines.py
compute_pairwise_communication_scores(communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=None, cells=None, cci_type=None, verbose=True)
Computes the communication scores for each LR pairs in a given pair of sender-receiver cell
Parameters
communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:
- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). If None the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. If None, the one stored in the attribute interaction_columns will be used.
cells : list=None List of cells to consider.
cci_type : str, default=None Type of interaction between two cells. Used to specify if we want to consider a LR pair in both directions. It can be:
- 'undirected'
- 'directed'
If None, 'directed' will be used.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Source code in cell2cell/analysis/cell2cell_pipelines.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | |
SingleCellInteractions
Interaction class with all necessary methods to run the cell2cell pipeline on a single-cell RNA-seq dataset.
Parameters
rnaseq_data : pandas.DataFrame or scanpy.AnnData Gene expression data for a single-cell RNA-seq experiment. If it is a dataframe columns are single cells and rows are genes, while if it is a AnnData object, columns are genes and rows are single cells.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
metadata : pandas.Dataframe Metadata containing the cell types for each single cells in the RNA-seq dataset.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
communication_score : str, default='expression_thresholding' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a ligand from a
sender cell and of a receptor on a receiver cell
from binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. Options:
- 'bray_curtis' : Bray-Curtis-like score.
- 'jaccard' : Jaccard-like score.
- 'count' : Number of LR pairs that the pair of cells use.
- 'icellnet' : Sum of the L-R expression product of a pair of cells
cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.
expression_threshold : float, default=0.2 Threshold value to binarize gene expression when using communication_score='expression_thresholding'. Units have to be the same as the aggregated gene expression matrix (e.g., counts, fraction of cells with non-zero counts, etc.).
aggregation_method : str, default='nn_cell_fraction' Specifies the method to use to aggregate gene expression of single cells into their respective cell types. Used to perform the CCI analysis since it is on the cell types rather than single cells. Options are:
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
barcode_col : str, default='barcodes' Column-name for the single cells in the metadata.
celltype_col : str, default='celltypes' Column-name in the metadata for the grouping single cells into cell types by the selected aggregation method.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Attributes
rnaseq_data : pandas.DataFrame or scanpy.AnnData Gene expression data for a single-cell RNA-seq experiment. If it is a dataframe columns are single cells and rows are genes, while if it is a AnnData object, columns are genes and rows are single cells.
metadata : pandas.DataFrame Metadata containing the cell types for each single cells in the RNA-seq dataset.
index_col : str Column-name for the single cells in the metadata.
group_col : str Column-name in the metadata for the grouping single cells into cell types by the selected aggregation method.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
complex_sep : str Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
ref_ppi : pandas.DataFrame Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.
aggregation_method : str Specifies the method to use to aggregate gene expression of single cells into their respective cell types. Used to perform the CCI analysis since it is on the cell types rather than single cells. Options are:
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
ccc_permutation_pvalues : pandas.DataFrame Contains the P-values of the permutation analysis on the communication scores.
cci_permutation_pvalues : pandas.DataFrame Contains the P-values of the permutation analysis on the CCI scores.
__adata : boolean Auxiliary variable used for storing whether rnaseq_data is an AnnData object.
Source code in cell2cell/analysis/cell2cell_pipelines.py
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 | |
permute_cell_labels(permutations=100, evaluation='communication', fdr_correction=True, random_state=None, verbose=False)
Performs permutation analysis of cell-type labels. Detects significant CCI or communication scores.
Parameters
permutations : int, default=100 Number of permutations where in each of them a random shuffle of cell-type labels is performed, followed of computing CCI or communication scores to create a null distribution.
evaluation : str, default='communication' Whether calculating P-values for CCI or communication scores.
- 'interactions' : For CCI scores.
- 'communication' : For communication scores.
fdr_correction : boolean, default=True Whether performing a multiple test correction after computing P-values. In this case corresponds to an FDR or Benjamini-Hochberg correction, using an alpha equal to 0.05.
random_state : int, default=None Seed for randomization.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Source code in cell2cell/analysis/cell2cell_pipelines.py
702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 | |
initialize_interaction_space(rnaseq_data, ppi_data, cutoff_setup, analysis_setup, excluded_cells=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)
Initializes a InteractionSpace object to perform the analyses
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are samples and rows are genes.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed'
excluded_cells : list, default=None List of cells in the rnaseq_data to be excluded. If None, all cells are considered.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all the required elements to perform the cell-cell interaction and communication analysis between every pair of cells. After performing the analyses, the results are stored in this object.
Source code in cell2cell/analysis/cell2cell_pipelines.py
828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 | |
tensor_downstream
compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
Computes Gini coefficient on the distribution of edge weights in each factor-specific cell-cell communication network. Factors obtained from the tensor decomposition with Tensor-cell2cell.
Parameters
result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors
sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels
receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels
Returns
gini_df : pandas.DataFrame Dataframe containing the Gini coefficient of each factor from a tensor decomposition. Calculated on the factor-specific cell-cell communication networks.
Source code in cell2cell/analysis/tensor_downstream.py
flatten_factor_ccc_networks(networks, orderby='senders')
Flattens all adjacency matrices in the factor-specific cell-cell communication networks. It generates a matrix where rows are factors and columns are cell-cell pairs.
Parameters
networks : dict A dictionary containing a pandas.DataFrame for each of the factors (factor names are the keys of the dict). These dataframes are the adjacency matrices of the CCC networks.
orderby : str Order of the flatten cell-cell pairs. Options are 'senders' and 'receivers'. 'senders' means to flatten the matrices in a way that all cell-cell pairs with a same sender cell are put next to each others. 'receivers' means the same, but by considering the receiver cell instead.
Returns
flatten_networks : pandas.DataFrame A dataframe wherein rows contains a factor-specific network. Columns are the directed cell-cell pairs.
Source code in cell2cell/analysis/tensor_downstream.py
get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
Generates adjacency matrices for each of the factors obtained from a tensor decomposition. These matrices represent a cell-cell communication directed network.
Parameters
result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors
sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels
receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels
Returns
networks : dict A dictionary containing a pandas.DataFrame for each of the factors (factor names are the keys of the dict). These dataframes are the adjacency matrices of the CCC networks.
Source code in cell2cell/analysis/tensor_downstream.py
get_joint_loadings(result, dim1, dim2, factor)
Creates the joint loading distribution between two tensor dimensions for a given factor output from decomposition.
Parameters
result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors
dim1 : str One of the tensor dimensions (options are in the keys of the dict, or interaction.factors.keys())
dim2 : str A second tensor dimension (options are in the keys of the dict, or interaction.factors.keys())
factor: str One of the factors output from the decomposition (e.g. 'Factor 1').
Returns
joint_dist : pandas.DataFrame Joint distribution of factor loadings for the specified dimensions. Rows correspond to elements in dim1 and columns to elements in dim2.
Source code in cell2cell/analysis/tensor_downstream.py
get_lr_by_cell_pairs(result, lr_label, sender_label, receiver_label, order_cells_by='receivers', factor=None, cci_threshold=None, lr_threshold=None)
Returns a dataframe containing the product loadings of a specific combination of ligand-receptor pair and sender-receiver pair.
Parameters
result : any Tensor class in cell2cell.tensor.tensor or a dict Either a Tensor type or a dictionary which resulted from the tensor decomposition. If it is a dict, it should be the one in, for example, InteractionTensor.factors
lr_label : str Label for the dimension of the ligand-receptor pairs. Usually found in InteractionTensor.order_labels
sender_label : str Label for the dimension of sender cells. Usually found in InteractionTensor.order_labels
receiver_label : str Label for the dimension of receiver cells. Usually found in InteractionTensor.order_labels
order_cells_by : str, default='receivers' Order of the returned dataframe. Options are 'senders' and 'receivers'. 'senders' means to order the dataframe in a way that all cell-cell pairs with a same sender cell are put next to each others. 'receivers' means the same, but by considering the receiver cell instead.
factor : str, default=None Name of the factor to be used to compute the product loadings. If None, all factors will be included to compute them.
cci_threshold : float, default=None Threshold to be applied on the product loadings of the sender-cell pairs. If specified, only cell-cell pairs with a product loading above the threshold at least in one of the factors included will be included in the returned dataframe.
lr_threshold : float, default=None Threshold to be applied on the ligand-receptor loadings. If specified, only LR pairs with a loading above the threshold at least in one of the factors included will be included in the returned dataframe.
Returns
cci_lr : pandas.DataFrame Dataframe containing the product loadings of a specific combination of ligand-receptor pair and sender-receiver pair. If the factor is specified, the returned dataframe will contain the product loadings of that factor. If the factor is not specified, the returned dataframe will contain the product loadings across all factors.
Source code in cell2cell/analysis/tensor_downstream.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
tensor_pipelines
run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor=False, rank=None, tf_optimization='regular', random_state=None, backend=None, device=None, elbow_metric='error', smooth_elbow=False, upper_rank=25, tf_init='random', tf_svd='numpy_svd', cmaps=None, sample_col='Element', group_col='Category', fig_fontsize=14, output_folder=None, output_fig=True, fig_format='pdf', **kwargs)
Runs basic pipeline of Tensor-cell2cell (excluding downstream analyses).
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor.
tensor_metadata : list
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element.
copy_tensor : boolean, default=False Whether generating a copy of the original tensor to avoid modifying it.
rank : int, default=None Rank of the Tensor Factorization (number of factors to deconvolve the original tensor). If None, it will automatically inferred from an elbow analysis.
tf_optimization : str, default='regular' It defines whether performing an optimization with higher number of iterations, independent factorization runs, and higher resolution (lower tolerance), or with lower number of iterations, factorization runs, and resolution. Options are:
- 'regular' : It uses 100 max iterations, 1 factorization run, and 10e-7 tolerance.
Faster to run.
- 'robust' : It uses 500 max iterations, 100 factorization runs, and 10e-8 tolerance.
Slower to run.
random_state : boolean, default=None Seed for randomization.
backend : str, default=None Backend that TensorLy will use to perform calculations on this tensor. When None, the default backend used is the currently active backend, usually is ('numpy'). Options are:
device : str, default=None Device to use when backend allows multiple devices. Options are:
elbow_metric : str, default='error' Metric to perform the elbow analysis (y-axis).
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
smooth_elbow : boolean, default=False Whether smoothing the elbow-analysis curve with a Savitzky-Golay filter.
upper_rank : int, default=25 Upper bound of ranks to explore with the elbow analysis.
tf_init : str, default='random' Initialization method for computing the Tensor Factorization.
tf_svd : str, default='numpy_svd' Function to compute the SVD for initializing the Tensor Factorization, acceptable values in tensorly.SVD_FUNS
cmaps : list, default=None A list of colormaps used for coloring elements in each dimension. The length of this list is equal to the number of dimensions of the tensor. If None, all dimensions will be colores with the colormap 'gist_rainbow'.
sample_col : str, default='Element' Name of the column containing the element names in the metadata.
group_col : str, default='Category' Name of the column containing the metadata or grouping information for each element in the metadata.
fig_fontsize : int, default=14 Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
output_folder : str, default=None Path to the folder where the figures generated will be saved. If None, figures will not be saved.
output_fig : boolean, default=True Whether generating the figures with matplotlib.
fig_format : str, default='pdf'
Format to store figures when an output_folder is specified
and output_fig is True. Otherwise, this is not necessary.
**kwargs : dict Extra arguments for the tensor factorization according to inputs in tensorly.
Returns
interaction_tensor : cell2cell.tensor.tensor.BaseTensor
Either the original input interaction_tensor or a copy of it.
This also stores the results from running the Tensor-cell2cell
pipeline in the corresponding attributes.
Source code in cell2cell/analysis/tensor_pipelines.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
clustering
cluster_interactions
compute_distance(data_matrix, axis=0, metric='euclidean')
Computes the pairwise distance between elements in a matrix of shape m x n. Uses the function scipy.spatial.distance.pdist
Parameters
data_matrix : pandas.DataFrame or ndarray A m x n matrix used to compute the distances
axis : int, default=0 To decide on which elements to compute the distance. If axis=0, the distances will be between elements in the rows, while axis=1 will lead to distances between elements in the columns.
metric : str, default='euclidean' The distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
Returns
D : ndarray Returns a condensed distance matrix Y. For each i and j (where i < j < m), where m is the number of original observations. The metric dist(u=X[i], v=X[j]) is computed and stored in entry m * i + j - ((i + 2) * (i + 1)) // 2.
Source code in cell2cell/clustering/cluster_interactions.py
compute_linkage(distance_matrix, method='ward', optimal_ordering=True)
Returns a linkage for a given distance matrix using a specific method.
Parameters
distance_matrix : numpy.ndarray A square array containing the distance between a given row and a given column. Diagonal elements must be zero.
method : str, 'ward' by default Method to compute the linkage. It could be:
- 'single'
- 'complete'
- 'average'
- 'weighted'
- 'centroid'
- 'median'
- 'ward'
For more details, go to:
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
optimal_ordering : boolean, default=True Whether sorting the leaf of the dendrograms to have a minimal distance between successive leaves. For more information, see scipy.cluster.hierarchy.optimal_leaf_ordering
Returns
Z : numpy.ndarray The hierarchical clustering encoded as a linkage matrix.
Source code in cell2cell/clustering/cluster_interactions.py
get_clusters_from_linkage(linkage, threshold, criterion='maxclust', labels=None)
Gets clusters from a linkage given a threshold and a criterion.
Parameters
linkage : numpy.ndarray The hierarchical clustering encoded with the matrix returned by the linkage function (Z).
threshold : float The threshold to apply when forming flat clusters.
criterion : str, 'maxclust' by default The criterion to use in forming flat clusters. Depending on the criterion, the threshold has different meanings. More information on: https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.fcluster.html
labels : array-like, None by default List of labels of the elements contained in the linkage. The order must match the order they were provided when generating the linkage.
Returns
clusters : dict A dictionary containing the clusters obtained. The keys correspond to the cluster numbers and the vaues to a list with element names given the labels, or the element index based on the linkage.
Source code in cell2cell/clustering/cluster_interactions.py
core
cci_scores
compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=None)
Calculates a Bray-Curtis-like score for the interaction between two cells based on their intercellular protein-protein interactions such as ligand-receptor interactions.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.
Returns
cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case is a Bray-Curtis-like score.
Source code in cell2cell/core/cci_scores.py
compute_count_score(cell1, cell2, ppi_score=None)
Calculates the number of active protein-protein interactions for the interaction between two cells, which could be the number of active ligand-receptor interactions.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.
Returns
cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples.
Source code in cell2cell/core/cci_scores.py
compute_icellnet_score(cell1, cell2, ppi_score=None)
Calculates the sum of communication scores for the interaction between two cells. Based on ICELLNET.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.
Returns
cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples.
Source code in cell2cell/core/cci_scores.py
compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None)
Calculates a Jaccard-like score for the interaction between two cells based on their intercellular protein-protein interactions such as ligand-receptor interactions.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute interaction between a pair of them. In a directed interaction, this is the receiver.
Returns
cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case it is a Jaccard-like score.
Source code in cell2cell/core/cci_scores.py
matmul_bray_curtis_like(A_scores, B_scores, ppi_score=None)
Computes Bray-Curtis-like scores using matrices of proteins by cell-types/tissues/samples.
Parameters
A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
Returns
bray_curtis : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/cci_scores.py
matmul_cosine(A_scores, B_scores, ppi_score=None)
Computes cosine-similarity scores using matrices of proteins by cell-types/tissues/samples.
Parameters
A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
Returns
cosine : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/cci_scores.py
matmul_count_active(A_scores, B_scores, ppi_score=None)
Computes the count of active protein-protein interactions used for intercellular communication using matrices of proteins by cell-types/tissues/samples.
Parameters
A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
Returns
counts : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/cci_scores.py
matmul_jaccard_like(A_scores, B_scores, ppi_score=None)
Computes Jaccard-like scores using matrices of proteins by cell-types/tissues/samples.
Parameters
A_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
B_scores : array-like Matrix of size NxM, where N are the proteins in the first column of a list of PPIs and M are the cell-types/tissues/samples.
Returns
jaccard : numpy.array Matrix MxM, representing the CCI score for all pairs of cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/cci_scores.py
cell
Cell
Specific cell-type/tissue/organ element in a RNAseq dataset.
Parameters
sc_rnaseq_data : pandas.DataFrame A gene expression matrix. Contains only one column that corresponds to cell-type/tissue/sample, while the genes are rows and the specific. Column name will be the label of the instance.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Attributes
id : int ID number of the instance generated.
type : str Name of the respective cell-type/tissue/sample.
rnaseq_data : pandas.DataFrame Copy of sc_rnaseq_data.
weighted_ppi : pandas.DataFrame Dataframe created from a list of protein-protein interactions, here the columns of the interacting proteins are replaced by a score or a preprocessed gene expression of the respective proteins.
Source code in cell2cell/core/cell.py
get_cells_from_rnaseq(rnaseq_data, cell_columns=None, verbose=True)
Creates new instances of Cell based on the RNAseq data of each cell-type/tissue/sample in a gene expression matrix.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
cell_columns : array-like, default=None List of names of cell-types/tissues/samples in the dataset to be used. If None, all columns will be used.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
cells : dict Dictionary containing all Cell instances generated from a RNAseq dataset. The keys of this dictionary are the names of the corresponding Cell instances.
Source code in cell2cell/core/cell.py
communication_scores
aggregate_ccc_matrices(ccc_matrices, method='gmean')
Aggregates matrices of communication scores. Each matrix has the communication scores across all pairs of cell-types/tissues/samples for a different pair of interacting proteins.
Parameters
ccc_matrices : list List of matrices of communication scores. Each matrix is for an specific pair of interacting proteins.
method : str, default='gmean'. Method to aggregate the matrices element-wise. Options are:
- 'gmean' : Geometric mean in an element-wise way.
- 'sum' : Sum in an element-wise way.
- 'mean' : Mean in an element-wise way.
Returns
aggregated_ccc_matrix : numpy.array A matrix contiaining aggregated communication scores from multiple PPIs. It's shape is of MxM, where M are all cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/communication_scores.py
compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product')
Computes communication scores for an specific protein-protein interaction using vectors of gene expression levels for a given interacting protein produced by different cell-types/tissues/samples.
Parameters
prot_a_exp : array-like Vector with gene expression levels for an interacting protein A in a given PPI. Coordinates are different cell-types/tissues/samples.
prot_b_exp : array-like Vector with gene expression levels for an interacting protein B in a given PPI. Coordinates are different cell-types/tissues/samples.
communication_score : str, default='expression_product' Scoring function for computing the communication score. Options are:
- 'expression_product' : Multiplication between the expression
of the interacting proteins.
- 'expression_mean' : Average between the expression
of the interacting proteins.
- 'expression_gmean' : Geometric mean between the expression
of the interacting proteins.
Returns
communication_scores : numpy.array Matrix MxM, representing the CCC scores of an specific PPI across all pairs of cell-types/tissues/samples. M are all cell-types/tissues/samples. In directed interactions, the vertical axis (axis 0) represents the senders, while the horizontal axis (axis 1) represents the receivers.
Source code in cell2cell/core/communication_scores.py
get_binary_scores(cell1, cell2, ppi_score=None)
Computes binary communication scores for all protein-protein interactions between a pair of cell-types/tissues/samples. This corresponds to an AND function between binary values for each interacting protein coming from each cell.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.
ppi_score : array-like, default=None An array with a weight for each PPI. The weight multiplies the communication scores.
Returns
communication_scores : numpy.array An array with the communication scores for each intercellular PPI.
Source code in cell2cell/core/communication_scores.py
get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product')
Computes continuous communication scores for all protein-protein interactions between a pair of cell-types/tissues/samples. This corresponds to a specific scoring function between preprocessed continuous expression values for each interacting protein coming from each cell.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.
ppi_score : array-like, default=None An array with a weight for each PPI. The weight multiplies the communication scores.
method : str, default='expression_product' Scoring function for computing the communication score. Options are: - 'expression_product' : Multiplication between the expression of the interacting proteins. One coming from cell1 and the other from cell2. - 'expression_mean' : Average between the expression of the interacting proteins. One coming from cell1 and the other from cell2. - 'expression_gmean' : Geometric mean between the expression of the interacting proteins. One coming from cell1 and the other from cell2.
Returns
communication_scores : numpy.array An array with the communication scores for each intercellular PPI.
Source code in cell2cell/core/communication_scores.py
score_expression_mean(c1, c2)
Computes the expression product score
Parameters
c1 : array-like A 1D-array containing the preprocessed expression values for the interactors in the first column of a list of protein-protein interactions.
c2 : array-like A 1D-array containing the preprocessed expression values for the interactors in the second column of a list of protein-protein interactions.
Returns
(c1 + c2)/2. : array-like Average of vectors.
Source code in cell2cell/core/communication_scores.py
score_expression_product(c1, c2)
Computes the expression product score
Parameters
c1 : array-like A 1D-array containing the preprocessed expression values for the interactors in the first column of a list of protein-protein interactions.
c2 : array-like A 1D-array containing the preprocessed expression values for the interactors in the second column of a list of protein-protein interactions.
Returns
c1 * c2 : array-like Multiplication of vectors.
Source code in cell2cell/core/communication_scores.py
interaction_space
InteractionSpace
Interaction space that contains all the required elements to perform the analysis between every pair of cells.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
gene_cutoffs : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
communication_score : str, default='expression_thresholding' Type of communication score used to detect active ligand-receptor pairs between each pair of cell. See cell2cell.core.communication_scores for more details. It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores. See cell2cell.core.cci_scores for more details. It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
cci_type : str, default='undirected' Type of interaction between two cells. If it is undirected, all ligands and receptors are considered from both cells. If it is directed, ligands from one cell and receptors from the other are considered separately with respect to ligands from the second cell and receptor from the first one. So, it can be:
- 'undirected'
- 'directed'
cci_matrix_template : pandas.DataFrame, default=None A matrix of shape MxM where M are cell-types/tissues/samples. This is used as template for storing CCI scores. It may be useful for specifying which pairs of cells to consider.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Attributes
communication_score : str Type of communication score used to detect active ligand-receptor pairs between each pair of cell. See cell2cell.core.communication_scores for more details. It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
cci_score : str Scoring function to aggregate the communication scores. See cell2cell.core.cci_scores for more details. It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'icellnet'
cci_type : str Type of interaction between two cells. If it is undirected, all ligands and receptors are considered from both cells. If it is directed, ligands from one cell and receptors from the other are considered separately with respect to ligands from the second cell and receptor from the first one. So, it can be:
- 'undirected'
- 'directed'
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
modified_rnaseq_data : pandas.DataFrame Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment. Columns are are cell-types/tissues/samples and rows are genes. The preprocessing may correspond to scoring the gene expression as binary or continuous values depending on the scoring function for cell-cell interactions/communication.
interaction_elements : dict Dictionary containing all the pairs of cells considered (under the key of 'pairs'), Cell instances (under key 'cells') which include all cells/tissues/organs with their associated datasets (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix to store CCI scores(under key 'cci_matrix'). A communication matrix is also stored in this object when the communication scores are computed in the InteractionSpace class (under key 'communication_matrix')
distance_matrix : pandas.DataFrame Contains distances for each pair of cells, computed from the CCI scores previously obtained (and stored in interaction_elements['cci_matrix'].
Source code in cell2cell/core/interaction_space.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 | |
compute_pairwise_cci_scores(cci_score=None, use_ppi_score=False, verbose=True)
Computes overall CCI scores for each pair of cells.
Parameters
cci_score : str, default=None Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
self.interaction_elements['cci_matrix'] : pandas.DataFrame Contains CCI scores for each pair of cells
Source code in cell2cell/core/interaction_space.py
compute_pairwise_communication_scores(communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=('A', 'B'), cells=None, cci_type=None, verbose=True)
Computes the communication scores for each LR pairs in a given pair of sender-receiver cell
Parameters
communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
ref_ppi_data : pandas.DataFrame, default=None Reference list of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication. It could be the same as 'ppi_data' if ppi_data is not bidirectional (that is, contains ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). If None the one stored in the attribute ref_ppi will be used.
interaction_columns : tuple, default=None Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. If None, the one stored in the attribute interaction_columns will be used
cells : list=None List of cells to consider.
cci_type : str, default=None Type of interaction between two cells. Used to specify if we want to consider a LR pair in both directions. It can be: - 'undirected' - 'directed If None, the one stored in the attribute analysis_setup will be used.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
self.interaction_elements['communication_matrix'] : pandas.DataFrame Contains communication scores for each LR pair in a given pair of sender-receiver cells.
Source code in cell2cell/core/interaction_space.py
602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 | |
pair_cci_score(cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True)
Computes a CCI score for a pair of cells.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.
cci_score : str, default='bray_curtis' Scoring function to aggregate the communication scores between a pair of cells. It computes an overall potential of cell-cell interactions. If None, it will use the one stored in the attribute analysis_setup of this object. Options:
- 'bray_curtis' : Bray-Curtis-like score
- 'jaccard' : Jaccard-like score
- 'count' : Number of LR pairs that the pair of cells uses
- 'icellnet' : Sum of the L-R expression product of a pair of cells
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
cci_score : float Overall score for the interaction between a pair of cell-types/tissues/samples. In this case it is a Jaccard-like score.
Source code in cell2cell/core/interaction_space.py
pair_communication_score(cell1, cell2, communication_score='expression_thresholding', use_ppi_score=False, verbose=True)
Computes a communication score for each protein-protein interaction between a pair of cells.
Parameters
cell1 : cell2cell.core.cell.Cell First cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the sender.
cell2 : cell2cell.core.cell.Cell Second cell-type/tissue/sample to compute the communication score. In a directed interaction, this is the receiver.
communication_score : str, default=None Type of communication score to infer the potential use of a given ligand-receptor pair by a pair of cells/tissues/samples. If None, the score stored in the attribute analysis_setup will be used. Available communication_scores are:
- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
binarizing their gene expression levels.
- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
use_ppi_score : boolean, default=False Whether using a weight of LR pairs specified in the ppi_data to compute the scores.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
communication_scores : numpy.array An array with the communication scores for each intercellular PPI.
Source code in cell2cell/core/interaction_space.py
522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 | |
generate_interaction_elements(modified_rnaseq, ppi_data, cci_type='undirected', cci_matrix_template=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)
Create all elements needed to perform the analyses of pairwise cell-cell interactions/communication. Corresponds to the interaction elements used by the class InteractionSpace.
Parameters
modified_rnaseq : pandas.DataFrame Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment. Columns are are cell-types/tissues/samples and rows are genes. The preprocessing may correspond to scoring the gene expression as binary or continuous values depending on the scoring function for cell-cell interactions/communication.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
cci_type : str, default='undirected' Specifies whether computing the cci_score in a directed or undirected way. For a pair of cells A and B, directed means that the ligands are considered only from cell A and receptors only from cell B or viceversa. While undirected simultaneously considers signaling from cell A to cell B and from cell B to cell A.
cci_matrix_template : pandas.DataFrame, default=None A matrix of shape MxM where M are cell-types/tissues/samples. This is used as template for storing CCI scores. It may be useful for specifying which pairs of cells to consider.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
interaction_elements : dict Dictionary containing all the pairs of cells considered (under the key of 'pairs'), Cell instances (under key 'cells') which include all cells/tissues/organs with their associated datasets (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix to store CCI scores(under key 'cci_matrix'). A communication matrix is also stored in this object when the communication scores are computed in the InteractionSpace class (under key 'communication_score')
Source code in cell2cell/core/interaction_space.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True)
Generates a list of pairs of interacting cell-types/tissues/samples.
Parameters
cells : list A lyst of cell-type/tissue/sample names.
cci_type : str, Type of interactions. Options are:
- 'directed' : Directed cell-cell interactions, so pair A-B is different
to pair B-A and both are considered.
- 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
to pair B-A and just one of them is considered.
self_interaction : boolean, default=True Whether considering autocrine interactions (pair A-A, B-B, etc).
remove_duplicates : boolean, default=True Whether removing duplicates when a list of cells is passed and names are duplicated. If False and a list [A, A, B] is passed, pairs could be [A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True and cci_type is 'directed'. In the same scenario but when remove_duplicates is True, the resulting list would be [A-A, A-B, B-A, B-B].
Returns
pairs : list List with pairs of interacting cell-types/tissues/samples.
Source code in cell2cell/core/interaction_space.py
datasets
anndata
balf_covid(filename='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad')
BALF samples from COVID-19 patients The data consists in 63k immune and epithelial cells in lungs from 3 control, 3 moderate COVID-19, and 6 severe COVID-19 patients.
This dataset was previously published in [1], and this objects contains the raw counts for the annotated cell types available in: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145926
References: [1] Liao, M., Liu, Y., Yuan, J. et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med 26, 842–844 (2020). https://doi.org/10.1038/s41591-020-0901-9
Parameters
filename : str, default='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad'
Path to the h5ad file in case it was manually downloaded.
Returns
Annotated data matrix.
Source code in cell2cell/datasets/anndata.py
gsea_data
gsea_msig(organism='human', pathwaydb='GOBP', readable_name=False)
Load a MSigDB from a gmt file
Parameters
organism : str, default='human' Organism for whom the DB will be loaded. Available options are {'human', 'mouse'}.
pathwaydb: str, default='GOBP' Molecular Signature Database to load. Available options are {'GOBP', 'KEGG', 'Reactome'}
readable_name : boolean, default=False If True, the pathway names are transformed to a more readable format. That is, removing underscores and pathway DB name at the beginning.
Returns
pathway_per_gene : defaultdict Dictionary containing all genes in the DB as keys, and their values are lists with their pathway annotations.
Source code in cell2cell/datasets/gsea_data.py
heuristic_data
HeuristicGOTerms
GO terms for contact and secreted proteins.
Attributes
contact_go_terms : list List of GO terms associated with proteins that participate in contact interactions (usually on the surface of cells).
mediator_go_terms : list List of GO terms associated with secreted proteins that mediate intercellular interactions or communication.
Source code in cell2cell/datasets/heuristic_data.py
random_data
generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None)
Generates a square cell-cell interaction matrix with random scores.
Parameters
cell_number : int Number of cells.
labels : list, default=None List containing labels for each cells. Length of this list must match the cell_number.
symmetric : boolean, default=True Whether generating a symmetric CCI matrix.
random_state : int, default=None Seed for randomization.
Returns
cci_matrix : pandas.DataFrame Matrix with rows and columns as cells. Values represent a random CCI score between 0 and 1.
Source code in cell2cell/datasets/random_data.py
generate_random_metadata(cell_labels, group_number)
Randomly assigns groups to cell labels.
Parameters
cell_labels : list A list of cell labels.
group_number : int Number of major groups of cells.
Returns
metadata : pandas.DataFrame DataFrame containing the major groups that each cell received randomly (under column 'Group'). Cells are under the column 'Cell'.
Source code in cell2cell/datasets/random_data.py
generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True)
Generates a random list of protein-protein interactions.
Parameters
max_size : int Maximum size of interactions to obtain. Since the PPIs are obtained by independently resampling interactors A and B rather than creating all possible combinations (it may demand too much memory), some PPIs can be duplicated and when dropping them results into a smaller number of PPIs than the max_size.
interactors_A : list A list of protein names to include in the first column of the PPIs.
interactors_B : list, default=None A list of protein names to include in the second columns of the PPIs. If None, interactors_A will be used as interactors_B too.
random_state : int, default=None Seed for randomization.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
ppi_data : pandas.DataFrame DataFrame containing a list of protein-protein interactions. It has three columns: 'A', 'B', and 'score' for interactors A, B and weights of interactions, respectively.
Source code in cell2cell/datasets/random_data.py
generate_random_rnaseq(size, row_names, random_state=None, verbose=True)
Generates a RNA-seq dataset that is normally distributed gene-wise and size normalized (each column sums up to a million).
Parameters
size : int Number of cell-types/tissues/samples (columns).
row_names : array-like List containing the name of genes (rows).
random_state : int, default=None Seed for randomization.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
df : pandas.DataFrame Dataframe containing gene expression given the list of genes for each cell-type/tissue/sample.
Source code in cell2cell/datasets/random_data.py
toy_data
generate_toy_distance()
Generates a square matrix with cell-cell distance.
Returns
distance : pandas.DataFrame DataFrame with Euclidean-like distance between each pair of cells in the toy RNA-seq dataset.
Source code in cell2cell/datasets/toy_data.py
generate_toy_metadata()
Generates metadata for cells in the toy RNA-seq dataset.
Returns
metadata : pandas.DataFrame DataFrame with metadata for each cell. Metadata contains the major groups of those cells.
Source code in cell2cell/datasets/toy_data.py
generate_toy_ppi(prot_complex=False)
Generates a toy list of protein-protein interactions.
Parameters
prot_complex : boolean, default=False Whether including PPIs where interactors could contain multimeric complexes.
Returns
ppi : pandas.DataFrame Dataframe containing PPIs. Columns are 'A' (first interacting partners), 'B' (second interacting partners) and 'score' for weighting each PPI.
Source code in cell2cell/datasets/toy_data.py
generate_toy_rnaseq()
Generates a toy RNA-seq dataset
Returns
rnaseq : pandas.DataFrame DataFrame contianing the toy RNA-seq dataset. Columns are cells and rows are genes.
Source code in cell2cell/datasets/toy_data.py
external
goenrich
gene2go(filename, experimental=False, tax_id=9606, **kwds)
read go-annotation file
| Parameters: |
|
|---|
Source code in cell2cell/external/goenrich.py
goa(filename, experimental=True, **kwds)
read go-annotation file
| Parameters: |
|
|---|
Source code in cell2cell/external/goenrich.py
ontology(file)
read ontology from file
| Parameters: |
|
|---|
Source code in cell2cell/external/goenrich.py
sgd(filename, experimental=False, **kwds)
read yeast genome database go-annotation file
| Parameters: |
|
|---|
Source code in cell2cell/external/goenrich.py
gseapy
generate_lr_geneset(lr_list, complex_sep=None, lr_sep='^', pathway_per_gene=None, organism='human', pathwaydb='GOBP', min_pathways=15, max_pathways=10000, readable_name=False, output_folder=None)
Generate a gene set from a list of LR pairs.
Parameters
lr_list : list List of LR pairs.
complex_sep : str, default=None Separator of the members of a complex. If None, the ligand and receptor are assumed to be single genes.
lr_sep : str, default='^' Separator of the ligand and receptor in the LR pair.
pathway_per_gene : dict, default=None
Dictionary with genes as keys and pathways as values.
You can pass this if you are using different annotations than those
available resources in cell2cell.datasets.gsea_data.gsea_msig().
organism : str, default='human' Organism for whom the DB will be loaded. Available options are {'human', 'mouse'}.
pathwaydb: str, default='GOBP' Molecular Signature Database to load. Available options are {'GOBP', 'KEGG', 'Reactome'}
min_pathways : int, default=15 Minimum number of pathways that a LR pair can be annotated to.
max_pathways : int, default=10000 Maximum number of pathways that a LR pair can be annotated to.
readable_name : boolean, default=False If True, the pathway names are transformed to a more readable format.
output_folder : str, default=None Path to store the GMT file. If None, it stores the gmt file in the current directory.
Returns
lr_set : dict Dictionary with pathways as keys and LR pairs as values.
Source code in cell2cell/external/gseapy.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
load_gmt(filename, backup_url=None, readable_name=False)
Load a GMT file.
Parameters
filename : str Path to the GMT file.
backup_url : str, default=None URL to download the GMT file from if not present locally.
readable_name : boolean, default=False If True, the pathway names are transformed to a more readable format. That is, removing underscores and pathway DB name at the beginning.
Returns
pathway_per_gene : dict Dictionary with genes as keys and pathways as values.
Source code in cell2cell/external/gseapy.py
run_gsea(loadings, lr_set, output_folder, weight=1, min_size=15, permutations=999, processes=6, random_state=6, significance_threshold=0.05)
Run GSEA using the LR gene set.
Parameters
loadings : pandas.DataFrame Dataframe with the loadings of the LR pairs for each factor.
lr_set : dict Dictionary with pathways as keys and LR pairs as values. LR pairs must match the indexes in the loadings dataframe.
output_folder : str Path to the output folder.
weight : int, default=1 Weight to use for score underlying the GSEA (parameter p).
min_size : int, default=15 Minimum number of LR pairs that a pathway must contain.
permutations : int, default=999 Number of permutations to use for the GSEA. The total permutations will be this number plus 1 (this extra case is the unpermuted one).
processes : int, default=6 Number of processes to use for the GSEA.
random_state : int, default=6 Random seed to use for the GSEA.
significance_threshold : float, default=0.05 Significance threshold to use for the FDR correction.
Returns
pvals : pandas.DataFrame Dataframe containing the P-values for each pathway (rows) in each of the factors (columns).
score : pandas.DataFrame Dataframe containing the Normalized Enrichment Scores (NES) for each pathway (rows) in each of the factors (columns).
gsea_df : pandas.DataFrame Dataframe with the detailed GSEA results.
Source code in cell2cell/external/gseapy.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
pcoa
pcoa(distance_matrix, method='eigh', number_of_dimensions=0, inplace=False)
Perform Principal Coordinate Analysis. Principal Coordinate Analysis (PCoA) is a method similar to Principal Components Analysis (PCA) with the difference that PCoA operates on distance matrices, typically with non-euclidian and thus ecologically meaningful distances like UniFrac in microbiome research. In ecology, the euclidean distance preserved by Principal Component Analysis (PCA) is often not a good choice because it deals poorly with double zeros (Species have unimodal distributions along environmental gradients, so if a species is absent from two sites at the same site, it can't be known if an environmental variable is too high in one of them and too low in the other, or too low in both, etc. On the other hand, if an species is present in two sites, that means that the sites are similar.). Note that the returned eigenvectors are not normalized to unit length. Parameters
distance_matrix : pandas.DataFrame
A distance matrix.
method : str, optional
Eigendecomposition method to use in performing PCoA.
By default, uses SciPy's eigh, which computes exact
eigenvectors and eigenvalues for all dimensions. The alternate
method, fsvd, uses faster heuristic eigendecomposition but loses
accuracy. The magnitude of accuracy lost is dependent on dataset.
number_of_dimensions : int, optional
Dimensions to reduce the distance matrix to. This number determines
how many eigenvectors and eigenvalues will be returned.
By default, equal to the number of dimensions of the distance matrix,
as default eigendecomposition using SciPy's eigh method computes
all eigenvectors and eigenvalues. If using fast heuristic
eigendecomposition through fsvd, a desired number of dimensions
should be specified. Note that the default eigendecomposition
method eigh does not natively support a specifying number of
dimensions to reduce a matrix to, so if this parameter is specified,
all eigenvectors and eigenvalues will be simply be computed with no
speed gain, and only the number specified by number_of_dimensions
will be returned. Specifying a value of 0, the default, will
set number_of_dimensions equal to the number of dimensions of the
specified distance_matrix.
inplace : bool, optional
If true, centers a distance matrix in-place in a manner that reduces
memory consumption.
Returns
OrdinationResults Object that stores the PCoA results, including eigenvalues, the proportion explained by each of them, and transformed sample coordinates. See Also
OrdinationResults Notes
.. note:: If the distance is not euclidean (for example if it is a semimetric and the triangle inequality doesn't hold), negative eigenvalues can appear. There are different ways to deal with that problem (see Legendre & Legendre 1998, \S 9.2.3), but none are currently implemented here. However, a warning is raised whenever negative eigenvalues appear, allowing the user to decide if they can be safely ignored.
Source code in cell2cell/external/pcoa.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
pcoa_biplot(ordination, y)
Compute the projection of descriptors into a PCoA matrix This implementation is as described in Chapter 9 of Legendre & Legendre, Numerical Ecology 3rd edition. Parameters
ordination: OrdinationResults
The computed principal coordinates analysis of dimensions (n, c) where
the matrix y will be projected onto.
y: DataFrame
Samples by features table of dimensions (n, m). These can be
environmental features or abundance counts. This table should be
normalized in cases of dimensionally heterogenous physical variables.
Returns
OrdinationResults
The modified input object that includes projected features onto the
ordination space in the features attribute.
Source code in cell2cell/external/pcoa.py
pcoa_utils
center_distance_matrix(distance_matrix, inplace=False)
Centers a distance matrix. Note: If the used distance was euclidean, pairwise distances needn't be computed from the data table Y because F_matrix = Y.dot(Y.T) (if Y has been centered). But since we're expecting distance_matrix to be non-euclidian, we do the following computation as per Numerical Ecology (Legendre & Legendre 1998). Parameters
distance_matrix : 2D array_like Distance matrix. inplace : bool, optional Whether or not to center the given distance matrix in-place, which is more efficient in terms of memory and computation.
Source code in cell2cell/external/pcoa_utils.py
corr(x, y=None)
Computes correlation between columns of x, or x and y.
Correlation is covariance of (columnwise) standardized matrices,
so each matrix is first centered and scaled to have variance one,
and then their covariance is computed.
Parameters
x : 2D array_like
Matrix of shape (n, p). Correlation between its columns will
be computed.
y : 2D array_like, optional
Matrix of shape (n, q). If provided, the correlation is
computed between the columns of x and the columns of
y. Else, it's computed between the columns of x.
Returns
correlation
Matrix of computed correlations. Has shape (p, p) if y is
not provided, else has shape (p, q).
Source code in cell2cell/external/pcoa_utils.py
e_matrix(distance_matrix)
Compute E matrix from a distance matrix. Squares and divides by -2 the input elementwise. Eq. 9.20 in Legendre & Legendre 1998.
f_matrix(E_matrix)
Compute F matrix from E matrix. Centring step: for each element, the mean of the corresponding row and column are substracted, and the mean of the whole matrix is added. Eq. 9.21 in Legendre & Legendre 1998.
Source code in cell2cell/external/pcoa_utils.py
mean_and_std(a, axis=None, weights=None, with_mean=True, with_std=True, ddof=0)
Compute the weighted average and standard deviation along the specified axis. Parameters
a : array_like
Calculate average and standard deviation of these values.
axis : int, optional
Axis along which the statistics are computed. The default is
to compute them on the flattened array.
weights : array_like, optional
An array of weights associated with the values in a. Each
value in a contributes to the average according to its
associated weight. The weights array can either be 1-D (in
which case its length must be the size of a along the given
axis) or of the same shape as a. If weights=None, then all
data in a are assumed to have a weight equal to one.
with_mean : bool, optional, defaults to True
Compute average if True.
with_std : bool, optional, defaults to True
Compute standard deviation if True.
ddof : int, optional, defaults to 0
It means delta degrees of freedom. Variance is calculated by
dividing by n - ddof (where n is the number of
elements). By default it computes the maximum likelyhood
estimator.
Returns
average, std
Return the average and standard deviation along the specified
axis. If any of them was not required, returns None instead
Source code in cell2cell/external/pcoa_utils.py
scale(a, weights=None, with_mean=True, with_std=True, ddof=0, copy=True)
Scale array by columns to have weighted average 0 and standard deviation 1. Parameters
a : array_like
2D array whose columns are standardized according to the
weights.
weights : array_like, optional
Array of weights associated with the columns of a. By
default, the scaling is unweighted.
with_mean : bool, optional, defaults to True
Center columns to have 0 weighted mean.
with_std : bool, optional, defaults to True
Scale columns to have unit weighted std.
ddof : int, optional, defaults to 0
If with_std is True, variance is calculated by dividing by n
- ddof (where n is the number of elements). By default it
computes the maximum likelyhood stimator.
copy : bool, optional, defaults to True
Whether to perform the standardization in place, or return a
new copy of a.
Returns
2D ndarray Scaled array. Notes
Wherever std equals 0, it is replaced by 1 in order to avoid division by zero.
Source code in cell2cell/external/pcoa_utils.py
svd_rank(M_shape, S, tol=None)
Matrix rank of M given its singular values S.
See np.linalg.matrix_rank for a rationale on the tolerance
(we're not using that function because it doesn't let us reuse a
precomputed SVD).
Source code in cell2cell/external/pcoa_utils.py
umap
run_umap(rnaseq_data, axis=1, metric='euclidean', min_dist=0.4, n_neighbors=8, random_state=None, **kwargs)
Runs UMAP on a expression matrix. Parameters
rnaseq_data : pandas.DataFrame A dataframe of gene expression values wherein the rows are the genes or embeddings of a dimensionality reduction method and columns the cells, tissues or samples.
axis : int, default=0 An axis of the dataframe (0 across rows, 1 across columns). Across rows means that the UMAP is to compare genes, while across columns is to compare cells, tissues or samples.
metric : str, default='euclidean' The distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
min_dist: float, default=0.4
The effective minimum distance between embedded points. Smaller values
will result in a more clustered/clumped embedding where nearby points
on the manifold are drawn closer together, while larger values will
result on a more even dispersal of points. The value should be set
relative to the spread value, which determines the scale at which
embedded points will be spread out.
n_neighbors: float, default=8 The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.
random_state : int, default=None Seed for randomization.
**kwargs : dict Extra arguments for UMAP as defined in umap.UMAP.
Returns
umap_df : pandas.DataFrame Dataframe containing the UMAP embeddings for the axis analyzed. Contains columns 'umap1 and 'umap2'.
Source code in cell2cell/external/umap.py
io
directories
create_directory(pathname)
Creates a directory.
Uses a path to create a directory. It creates all intermediate folders before creating the leaf folder.
Parameters
pathname : str Full path of the folder to create.
Source code in cell2cell/io/directories.py
get_files_from_directory(pathname, dir_in_filepath=False)
Obtains a list of filenames in a folder.
Parameters
pathname : str Full path of the folder to explore.
dir_in_filepath : boolean, default=False
Whether adding pathname to the filenames
Returns
filenames : list A list containing the names (strings) of the files in the folder.
Source code in cell2cell/io/directories.py
read_data
load_cutoffs(cutoff_file, gene_column=None, drop_nangenes=True, log_transformation=False, verbose=True, **kwargs)
Loads a table of cutoff of thresholding values for each gene.
Parameters
cutoff_file : str Absolute path to a file containing thresholding values for genes. Genes are rows and threshold values are in the only column beyond the one containing the gene names.
gene_column : str, default=None Column name where the gene labels are contained. If None, the first column will be assummed to contain gene names.
drop_nangenes : boolean, default=True Whether dropping empty genes across all columns.
log_transformation : boolean, default=False Whether applying a log10 transformation on the data.
verbose : boolean, default=True Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for loading files the function cell2cell.io.read_data.load_table
Returns
cutoff_data : pandas.DataFrame Dataframe with the cutoff values for each gene. Rows are genes and just one column is included, which corresponds to 'value', wherein the thresholding or cutoff values are contained.
Source code in cell2cell/io/read_data.py
load_go_annotations(goa_file, experimental_evidence=True, verbose=True)
Loads GO annotations for each gene in a given organism.
Parameters
goa_file : str Absolute path to an ga file. It could be an URL as for example: goa_file = 'http://current.geneontology.org/annotations/wb.gaf.gz'
experimental_evidence : boolean, default=True Whether considering only annotations with experimental evidence (at least one article/evidence).
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
goa : pandas.DataFrame Dataframe containing information about GO term annotations of each gene for a given organism according to the ga file.
Source code in cell2cell/io/read_data.py
load_go_terms(go_terms_file, verbose=True)
Loads GO term information from a obo-basic file.
Parameters
go_terms_file : str Absolute path to an obo file. It could be an URL as for example: go_terms_file = 'http://purl.obolibrary.org/obo/go/go-basic.obo'
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
go_terms : networkx.Graph NetworkX Graph containing GO terms datasets from .obo file.
Source code in cell2cell/io/read_data.py
load_metadata(metadata_file, cell_labels=None, index_col=None, **kwargs)
Loads a metadata table for a given list of cells.
Parameters
metadata_file : str Absolute path to a file containing a metadata table for cell-types/tissues/samples in a RNA-seq dataset.
cell_labels : list, default=None List of cell-types/tissues/samples to consider. Names must match the labels in the metadata table. These names must be contained in the values of the column indicated by index_col.
index_col : str, default=None Column to be consider the index of the metadata. If None, the index will be the numbers of the rows.
**kwargs : dict Extra arguments for loading files the function cell2cell.io.read_data.load_table
Returns
meta : pandas.DataFrame Metadata for the cell-types/tissues/samples provided.
Source code in cell2cell/io/read_data.py
load_ppi(ppi_file, interaction_columns, sort_values=None, score=None, rnaseq_genes=None, complex_sep=None, dropna=False, strna='', upper_letter_comparison=False, verbose=True, **kwargs)
Loads a list of protein-protein interactions from a table and returns it in a simplified format.
Parameters
ppi_file : str Absolute path to a file containing a list of protein-protein interactions.
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors. Example: ('partner_A', 'partner_B').
sort_values : str, default=None Column name of a column used for sorting the table. If it is not None, that the column, and the whole dataframe, we will be ordered in an ascending manner.
score : str, default=None Column name of a column containing weights to consider in the cell-cell interactions/communication analyses. If None, no weights are used and PPIs are assumed to have an equal contribution to CCI and CCC scores.
rnaseq_genes : list, default=None List of genes in a RNA-seq dataset to filter the list of PPIs. If None, the entire list will be used.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44". If None, it is assummed that the list does not contains complexes.
dropna : boolean, default=False Whether dropping PPIs with any missing information.
strna : str, default='' If dropna is False, missing values will be filled with strna.
upper_letter_comparison : boolean, default=False Whether making uppercase the gene names in the expression matrices and the protein names in the ppi_data to match their names and integrate their respective expression level. Useful when there are inconsistencies in the names between the expression matrix and the ligand-receptor annotations.
**kwargs : dict Extra arguments for loading files the function cell2cell.io.read_data.load_table
Returns
simplified_ppi : pandas.DataFrame A simplified list of PPIs. In this case, interaction_columns are renamed into 'A' and 'B' for the first and second interacting proteins, respectively. A third column 'score' is included, containing weights of PPIs.
Source code in cell2cell/io/read_data.py
load_rnaseq(rnaseq_file, gene_column, drop_nangenes=True, log_transformation=False, verbose=True, **kwargs)
Loads a gene expression matrix for a RNA-seq experiment. Preprocessing steps can be done on-the-fly.
Parameters
rnaseq_file : str Absolute path to a file containing a gene expression matrix. Genes are rows and cell-types/tissues/samples are columns.
gene_column : str Column name where the gene labels are contained.
drop_nangenes : boolean, default=True Whether dropping empty genes across all columns.
log_transformation : boolean, default=False Whether applying a log10 transformation on the data.
verbose : boolean, default=True Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for loading files the function cell2cell.io.read_data.load_table
Returns
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/io/read_data.py
load_table(filename, format='auto', sep='\t', sheet_name=0, compression=None, verbose=True, **kwargs)
Opens a file containing a table into a pandas dataframe.
Parameters
filename : str Absolute path to a file storing a table.
format : str, default='auto' Format of the file. Options are:
- 'auto' : Automatically determines the format given
the file extension. Files ending with .gz will be
consider as tsv files.
- 'excel' : An excel file, either .xls or .xlsx
- 'csv' : Comma separated value format
- 'tsv' : Tab separated value format
- 'txt' : Text file
sep : str, default=' ' Separation between columns. Examples are: ' ', ' ', ';', ',', etc.
sheet_name : str, int, list, or None, default=0 Strings are used for sheet names. Integers are used in zero-indexed sheet positions. Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets. Available cases:
- Defaults to 0: 1st sheet as a DataFrame
- 1: 2nd sheet as a DataFrame
- "Sheet1": Load sheet with name “Sheet1”
- [0, 1, "Sheet5"]: Load first, second and sheet named
“Sheet5” as a dict of DataFrame
- None: All sheets.
compression : str, or None, default=‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’, detects compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. Options: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
verbose : boolean, default=True Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for loading files with the respective pandas function given the format of the file.
Returns
table : pandas.DataFrame Dataframe containing the table stored in a file.
Source code in cell2cell/io/read_data.py
load_tables_from_directory(pathname, extension, sep='\t', sheet_name=0, compression=None, verbose=True, **kwargs)
Opens all tables with the same extension in a folder.
Parameters
pathname : str Full path of the folder to explore.
extension : str Extension of the file. Options are:
- 'excel' : An excel file, either .xls or .xlsx
- 'csv' : Comma separated value format
- 'tsv' : Tab separated value format
- 'txt' : Text file
sep : str, default=' ' Separation between columns. Examples are: ' ', ' ', ';', ',', etc.
sheet_name : str, int, list, or None, default=0 Strings are used for sheet names. Integers are used in zero-indexed sheet positions. Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets. Available cases:
- Defaults to 0: 1st sheet as a DataFrame
- 1: 2nd sheet as a DataFrame
- "Sheet1": Load sheet with name “Sheet1”
- [0, 1, "Sheet5"]: Load first, second and sheet named
“Sheet5” as a dict of DataFrame
- None: All sheets.
compression : str, or None, default=‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’, detects compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. Options: {‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
verbose : boolean, default=True Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for loading files with the respective pandas function given the format of the file.
Returns
data : dict Dictionary containing the tables (pandas.DataFrame) loaded from the files. Keys are the filenames without the extension and values are the dataframes.
Source code in cell2cell/io/read_data.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
load_tensor(filename, backend=None, device=None)
Imports a communication tensor that could be used with Tensor-cell2cell.
Parameters
filename : str Absolute path to a file storing a communication tensor that was previously saved by using pickle.
backend : str, default=None Backend that TensorLy will use to perform calculations on this tensor. When None, the default backend used is the currently active backend, usually is ('numpy'). Options are:
device : str, default=None Device to use when backend allows using multiple devices. Options are:
Returns
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor.
Source code in cell2cell/io/read_data.py
load_tensor_factors(filename)
Imports factors previously exported from a tensor decomposition done in a cell2cell.tensor.BaseTensor-like object.
Parameters
filename : str Absolute path to a file storing an excel file containing the factors, their loadings, and element names for each of the dimensions of a previously decomposed tensor.
Returns
factors : collections.OrderedDict An ordered dictionary wherein keys are the names of each tensor dimension, and values are the loadings in a pandas.DataFrame. In this dataframe, rows are the elements of the respective dimension and columns are the factors from the tensor factorization. Values are the corresponding loadings.
Source code in cell2cell/io/read_data.py
load_variable_with_pickle(filename)
Imports a large size variable stored in a file previously exported with pickle.
Parameters
filename : str Absolute path to a file storing a python variable that was previously created by using pickle.
Returns
variable : a python variable The variable of interest.
Source code in cell2cell/io/read_data.py
save_data
export_variable_with_pickle(variable, filename)
Exports a large size variable in a python readable way using pickle.
Parameters
variable : a python variable Variable to export
filename : str Complete path to the file wherein the variable will be stored. For example: /home/user/variable.pkl
Source code in cell2cell/io/save_data.py
plotting
aesthetics
generate_legend(color_dict, loc='center left', bbox_to_anchor=(1.01, 0.5), ncol=1, fancybox=True, shadow=True, title='Legend', fontsize=14, sorted_labels=True, ax=None)
Adds a legend to a previous plot or displays an independent legend given specific colors for labels.
Parameters
color_dict : dict Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. Keys are the labels and values are the RGBA tuples.
loc : str, default='center left' Alignment of the legend given the location specieid in bbox_to_anchor.
bbox_to_anchor : tuple, default=(1.01, 0.5) Location of the legend in a (X, Y) format. For example, if you want your axes legend located at the figure's top right-hand corner instead of the axes' corner, simply specify the corner's location and the coordinate system of that location, which in this case would be (1, 1).
ncol : int, default=1 Number of columns to display the legend.
fancybox : boolean, default=True Whether round edges should be enabled around the FancyBboxPatch which makes up the legend's background.
shadow : boolean, default=True Whether to draw a shadow behind the legend.
title : str, default='Legend' Title of the legend box
fontsize : int, default=14 Size of the text in the legends.
sorted_labels : boolean, default=True Whether alphabetically sorting the labels.
fig : matplotlib.figure.Figure, default=None Figure object to add a legend. If fig=None and ax=None, a new empty figure will be generated.
ax : matplotlib.axes.Axes, default=None Axes instance for a plot.
Returns
legend1 : matplotlib.legend.Legend A legend object in a figure.
Source code in cell2cell/plotting/aesthetics.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
get_colors_from_labels(labels, cmap='gist_rainbow', factor=1)
Generates colors for each label in a list given a colormap
Parameters
labels : list A list of labels to assign a color.
cmap : str, default='gist_rainbow' A matplotlib color palette name.
factor : int, default=1 Factor to amplify the separation of colors.
Returns
colors : dict A dictionary where the keys are the labels and the values correspond to the assigned colors.
Source code in cell2cell/plotting/aesthetics.py
map_colors_to_metadata(metadata, ref_df=None, colors=None, sample_col='#SampleID', group_col='Groups', cmap='gist_rainbow')
Assigns a color to elements in a dataframe containing metadata.
Parameters
metadata : pandas.DataFrame A dataframe with metadata for specific elements.
ref_df : pandas.DataFrame A dataframe whose columns contains a subset of elements in the metadata.
colors : dict, default=None Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. If colors is specified, cmap will be ignored.
sample_col : str, default='#SampleID' Column in the metadata for elements to color.
group_col : str, default='Groups' Column in the metadata containing the major groups of the elements to color.
cmap : str, default='gist_rainbow' Name of the color palette for coloring the major groups of elements.
Returns
new_colors : pandas.DataFrame A pandas dataframe where the index is the list of elements in the sample_col and the column group_col contains the colors assigned to each element given their groups.
Source code in cell2cell/plotting/aesthetics.py
ccc_plot
clustermap_ccc(interaction_space, metadata=None, sample_col='#SampleID', group_col='Groups', meta_cmap='gist_rainbow', colors=None, cell_labels=('SENDER-CELL', 'RECEIVER-CELL'), metric='jaccard', method='ward', optimal_leaf=True, excluded_cells=None, title='', only_used_lr=True, cbar_title='Presence', cbar_fontsize=12, row_fontsize=8, col_fontsize=8, filename=None, **kwargs)
Generates a clustermap (heatmap + dendrograms from a hierarchical clustering) based on CCC scores for each LR pair in every cell-cell pair.
Parameters
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all a distance matrix after running the the method compute_pairwise_communication_scores. Alternatively, this object can be a numpy-array or a pandas DataFrame. Also, a SingleCellInteractions or a BulkInteractions object after running the method compute_pairwise_communication_scores.
metadata : pandas.Dataframe, default=None Metadata associated with the cells, cell types or samples in the matrix containing CCC scores. If None, cells will not be colored by major groups.
sample_col : str, default='#SampleID' Column in the metadata for the cells, cell types or samples in the matrix containing CCC scores.
group_col : str, default='Groups' Column in the metadata containing the major groups of cells, cell types or samples in the matrix with CCC scores.
meta_cmap : str, default='gist_rainbow' Name of the color palette for coloring the major groups of cells.
colors : dict, default=None Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. If colors is specified, meta_cmap will be ignored.
cell_labels : tuple, default=('SENDER-CELL','RECEIVER-CELL') A tuple containing the labels for indicating the group colors of sender and receiver cells if metadata or colors are provided.
metric : str, default='jaccard' The distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
method : str, default='ward' Clustering method for computing a linkage as in scipy.cluster.hierarchy.linkage
optimal_leaf : boolean, default=True Whether sorting the leaf of the dendrograms to have a minimal distance between successive leaves. For more information, see scipy.cluster.hierarchy.optimal_leaf_ordering
excluded_cells : list, default=None List containing cell names that are present in the interaction_space object but that will be excluded from this plot.
title : str, default='' Title of the clustermap.
only_used_lr : boolean, default=True Whether displaying or not only LR pairs that were used at least by one pair of cells. If True, those LR pairs that were not used will not be displayed.
cbar_title : str, default='CCI score' Title for the colorbar, depending on the score employed.
cbar_fontsize : int, default=12 Font size for the colorbar title as well as labels for axes X and Y.
row_fontsize : int, default=8 Font size for the rows in the clustermap (ligand-receptor pairs).
col_fontsize : int, default=8 Font size for the columns in the clustermap (sender-receiver cell pairs).
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
**kwargs : dict Dictionary containing arguments for the seaborn.clustermap function.
Returns
fig : seaborn.matrix.ClusterGrid A seaborn ClusterGrid instance.
Source code in cell2cell/plotting/ccc_plot.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
cci_plot
clustermap_cci(interaction_space, method='ward', optimal_leaf=True, metadata=None, sample_col='#SampleID', group_col='Groups', meta_cmap='gist_rainbow', colors=None, excluded_cells=None, title='', cbar_title='CCI score', cbar_fontsize=18, filename=None, **kwargs)
Generates a clustermap (heatmap + dendrograms from a hierarchical clustering) based on CCI scores of cell-cell pairs.
Parameters
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all a distance matrix after running the the method compute_pairwise_cci_scores. Alternatively, this object can be a numpy-array or a pandas DataFrame. Also, a SingleCellInteractions or a BulkInteractions object after running the method compute_pairwise_cci_scores.
method : str, default='ward' Clustering method for computing a linkage as in scipy.cluster.hierarchy.linkage
optimal_leaf : boolean, default=True Whether sorting the leaf of the dendrograms to have a minimal distance between successive leaves. For more information, see scipy.cluster.hierarchy.optimal_leaf_ordering
metadata : pandas.Dataframe, default=None Metadata associated with the cells, cell types or samples in the matrix containing CCI scores. If None, cells will not be colored by major groups.
sample_col : str, default='#SampleID' Column in the metadata for the cells, cell types or samples in the matrix containing CCI scores.
group_col : str, default='Groups' Column in the metadata containing the major groups of cells, cell types or samples in the matrix with CCI scores.
meta_cmap : str, default='gist_rainbow' Name of the color palette for coloring the major groups of cells.
colors : dict, default=None Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. If colors is specified, meta_cmap will be ignored.
excluded_cells : list, default=None List containing cell names that are present in the interaction_space object but that will be excluded from this plot.
title : str, default='' Title of the clustermap.
cbar_title : str, default='CCI score' Title for the colorbar, depending on the score employed.
cbar_fontsize : int, default=18 Font size for the colorbar title as well as labels for axes X and Y.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
**kwargs : dict Dictionary containing arguments for the seaborn.clustermap function.
Returns
hier : seaborn.matrix.ClusterGrid A seaborn ClusterGrid instance.
Source code in cell2cell/plotting/cci_plot.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
circular_plot
circos_plot(interaction_space, sender_cells, receiver_cells, ligands, receptors, excluded_score=0, metadata=None, sample_col='#SampleID', group_col='Groups', meta_cmap='Set2', cells_cmap='Pastel1', colors=None, ax=None, figsize=(10, 10), fontsize=14, legend=True, ligand_label_color='dimgray', receptor_label_color='dimgray', filename=None)
Generates the circos plot in the exact order that sender and receiver cells are provided. Similarly, ligands and receptors are sorted by the order they are input.
Parameters
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all a distance matrix after running the the method compute_pairwise_communication_scores. Alternatively, this object can a SingleCellInteractions or a BulkInteractions object after running the method compute_pairwise_communication_scores.
sender_cells : list List of cells to be included as senders.
receiver_cells : list List of cells to be included as receivers.
ligands : list List of genes/proteins to be included as ligands produced by the sender cells.
receptors : list List of genes/proteins to be included as receptors produced by the receiver cells.
excluded_score : float, default=0 Rows that have a communication score equal or lower to this will be dropped from the network.
metadata : pandas.Dataframe, default=None Metadata associated with the cells, cell types or samples in the matrix containing CCC scores. If None, cells will be color only by individual cells.
sample_col : str, default='#SampleID' Column in the metadata for the cells, cell types or samples in the matrix containing CCC scores.
group_col : str, default='Groups' Column in the metadata containing the major groups of cells, cell types or samples in the matrix with CCC scores.
meta_cmap : str, default='Set2' Name of the matplotlib color palette for coloring the major groups of cells.
cells_cmap : str, default='Pastel1' Name of the color palette for coloring individual cells.
colors : dict, default=None Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. If colors is specified, meta_cmap will be ignored.
ax : matplotlib.axes.Axes, default=None Axes instance for a plot.
figsize : tuple, default=(10, 10) Size of the figure (width*height), each in inches.
fontsize : int, default=14 Font size for ligand and receptor labels.
legend : boolean, default=True Whether including legends for cell and cell group colors as well as ligand/receptor colors.
ligand_label_color : str, default='dimgray' Name of the matplotlib color palette for coloring the labels of ligands.
receptor_label_color : str, default='dimgray' Name of the matplotlib color palette for coloring the labels of receptors.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
ax : matplotlib.axes.Axes Axes instance containing a circos plot.
Source code in cell2cell/plotting/circular_plot.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 | |
determine_small_radius(coordinate_dict)
Computes the radius of a circle whose diameter is the distance between the center of two nodes.
Parameters
coordinate_dict : dict A dictionary containing the coordinates to plot each node.
Returns
radius : float The half of the distance between the center of two nodes.
Source code in cell2cell/plotting/circular_plot.py
generate_circos_legend(cell_legend, signal_legend=None, meta_legend=None, fontsize=14, ax=None)
Adds legends to circos plot.
Parameters
cell_legend : dict Dictionary containing the colors for the cells.
signal_legend : dict, default=None Dictionary containing the colors for the LR pairs in a given pair of cells. Corresponds to the colors of the links.
meta_legend : dict, default=None Dictionary containing the colors for the cells given their major groups.
fontsize : int, default=14 Size of the labels in the legend.
Source code in cell2cell/plotting/circular_plot.py
get_arc_angles(G, sorting_feature=None)
Obtains the angles of polar coordinates to plot nodes as arcs of a circumference.
Parameters
G : networkx.Graph or networkx.DiGraph A networkx graph.
sorting_feature : str, default=None A node attribute present in the dictionary associated with each node. The values associated with this attributed will be used for sorting the nodes.
Returns
angles : dict A dictionary containing the angles for positioning the nodes in polar coordinates. Keys are the node names and values are tuples with angles for the start and end of the arc that represents a node.
Source code in cell2cell/plotting/circular_plot.py
get_cartesian(theta, radius, center=(0, 0), angle='radians')
Performs a polar to cartesian coordinates conversion.
Parameters
theta : float or ndarray An angle for a polar coordinate.
radius : float The radius in a polar coordinate.
center : tuple, default=(0,0) The center of the circle in the cartesian coordinates.
angle : str, default='radians' Type of angle that theta is. Options are: - 'degrees' : from 0 to 360 - 'radians' : from 0 to 2*numpy.pi
Returns
(x, y) : tuple Cartesian coordinates for X and Y axis respective.
Source code in cell2cell/plotting/circular_plot.py
get_node_colors(G, coloring_feature=None, cmap='viridis')
Generates colors for each node in a network given one of their properties.
Parameters
G : networkx.Graph A graph containing a list of nodes
coloring_feature : str, default=None A node attribute present in the dictionary associated with each node. The values associated with this attributed will be used for coloring the nodes.
cmap : str, default='viridis' Name of a matplotlib color palette for coloring the nodes.
Returns
node_colors : dict A dictionary wherein each key is a node and values are tuples containing colors in the RGBA format.
feature_colores : dict A dictionary wherein each key is a value for the attribute of nodes in the coloring_feature property and values are tuples containing colors in the RGBA format.
Source code in cell2cell/plotting/circular_plot.py
get_readable_ccc_matrix(ccc_matrix)
Transforms a CCC matrix from an InteractionSpace instance into a readable dataframe.
Parameters
ccc_matrix : pandas.DataFrame A dataframe containing the communication scores for a given combination between a pair of sender-receiver cells and a ligand-receptor pair. Columns are pairs of cells and rows LR pairs.
Returns
readable_ccc : pandas.DataFrame A dataframe containing flat information in each row about communication scores for a given pair of cells and a specific LR pair. A row contains the sender and receiver cells as well as the ligand and the receptor participating in an interaction and their respective communication score. Columns are: ['sender', 'receiver', 'ligand', 'receptor', 'communication_score']
Source code in cell2cell/plotting/circular_plot.py
sort_nodes(sender_cells, receiver_cells, ligands, receptors)
Sorts cells by senders first and alphabetically and creates pairs of senders-ligands. If senders and receivers share cells, it creates pairs of senders-receptors for those shared cells. Then sorts receivers cells and creates pairs of receivers-receptors, for those cells that are not shared with senders.
Parameters
sender_cells : list List of sender cells to sort.
receiver_cells : list List of receiver cells to sort.
ligands : list List of ligands to sort.
receptors : list List of receptors to sort.
Returns
sorted_nodes : dict A dictionary where keys are the nodes of cells-proteins and values are the position they obtained (a ranking from 0 to N, where N is the total number of nodes).
Source code in cell2cell/plotting/circular_plot.py
factor_plot
ccc_networks_plot(factors, included_factors=None, sender_label='Sender Cells', receiver_label='Receiver Cells', ccc_threshold=None, panel_size=(8, 8), nrows=2, network_layout='spring', edge_color='magenta', edge_width=25, edge_arrow_size=20, edge_alpha=0.25, node_color='#210070', node_size=1000, node_alpha=0.9, node_label_size=20, node_label_alpha=0.7, node_label_offset=(0.1, -0.2), factor_title_size=36, filename=None)
Plots factor-specific cell-cell communication networks resulting from decomposition with Tensor-cell2cell.
Parameters
factors : dict Ordered dictionary containing a dataframe with the factor loadings for each dimension/order of the tensor.
included_factors : list, default=None Factors to be included. Factor names must be the same as the key values in the factors dictionary.
sender_label : str Label for the dimension of sender cells. It is one key of the factors dict.
receiver_label : str Label for the dimension of receiver cells. It is one key of the factors dict.
ccc_threshold : float, default=None Threshold to consider only edges with a higher weight than this value.
panel_size : tuple, default=(8, 8) Size of one subplot or network (width*height), each in inches.
nrows : int, default=2 Number of rows in the set of subplots.
network_layout : str, default='spring' Visualization layout of the networks. It uses algorithms implemented in NetworkX, including: -'spring' : Fruchterman-Reingold force-directed algorithm. -'circular' : Position nodes on a circle.
edge_color : str, default='magenta' Color of the edges in the network.
edge_width : int, default=25 Thickness of the edges in the network.
edge_arrow_size : int, default=20 Size of the arrow of an edge pointing towards the receiver cells.
edge_alpha : float, default=0.25 Transparency of the edges. Values must be between 0 and 1. Higher values indicates less transparency.
node_color : str, default="#210070" Color of the nodes in the network.
node_size : int, default=1000 Size of the nodes in the network.
node_alpha : float, default=0.9 Transparency of the nodes. Values must be between 0 and 1. Higher values indicates less transparency.
node_label_size : int, default=20 Size of the labels for the node names.
node_label_alpha : int, default=0.7 Transparency of the node labeks. Values must be between 0 and 1. Higher values indicates less transparency.
node_label_offset : tuple, default=(0.1, -0.2) Offset values to move the node labels away from the center of the nodes.
factor_title_size : int, default=36 Size of the subplot titles. Each network has a title like 'Factor 1', 'Factor 2', ... ,'Factor R'.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure A matplotlib figure.
axes : matplotlib.axes.Axes or array of Axes Matplotlib axes representing the subplots containing the networks.
Source code in cell2cell/plotting/factor_plot.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 | |
context_boxplot(context_loadings, metadict, included_factors=None, group_order=None, statistical_test='Mann-Whitney', pval_correction='benjamini-hochberg', text_format='star', nrows=1, figsize=(12, 6), cmap='tab10', title_size=14, axis_label_size=12, group_label_rotation=45, ylabel='Context Loadings', dot_color='lightsalmon', dot_edge_color='brown', filename=None, verbose=False)
Plots a boxplot to compare the loadings of context groups in each of the factors resulting from a tensor decomposition.
Parameters
context_loadings : pandas.DataFrame Dataframe containing the loadings of each of the contexts from a tensor decomposition. Rows are contexts and columns are the factors obtained.
metadict : dict
A dictionary containing the groups where each of the contexts
belong to. Keys corresponds to the indexes in context_loadings
and values are the respective groups. For example:
metadict={'Context 1' : 'Group 1', 'Context 2' : 'Group 1',
'Context 3' : 'Group 2', 'Context 4' : 'Group 2'}
included_factors : list, default=None Factors to be included. Factor names must be the same as column elements in the context_loadings.
group_order : list, default=None
Order of the groups to plot the boxplots. Considering the
example of the metadict, it could be:
group_order=['Group 1', 'Group 2'] or
group_order=['Group 2', 'Group 1']
If None, the order that groups are found in metadict
will be considered.
statistical_test : str, default='Mann-Whitney' The statistical test to compare context groups within each factor. Options include: 't-test_ind', 't-test_welch', 't-test_paired', 'Mann-Whitney', 'Mann-Whitney-gt', 'Mann-Whitney-ls', 'Levene', 'Wilcoxon', 'Kruskal'.
pval_correction : str, default='benjamini-hochberg' Multiple test correction method to reduce false positives. Options include: 'bonferroni', 'bonf', 'Bonferroni', 'holm-bonferroni', 'HB', 'Holm-Bonferroni', 'holm', 'benjamini-hochberg', 'BH', 'fdr_bh', 'Benjamini-Hochberg', 'fdr_by', 'Benjamini-Yekutieli', 'BY', None
text_format : str, default='star' Format to display the results of the statistical test. Options are:
- 'star', to display P- values < 1e-4 as "****"; < 1e-3 as "***";
< 1e-2 as "**"; < 0.05 as "*", and < 1 as "ns".
- 'simple', to display P-values < 1e-5 as "1e-5"; < 1e-4 as "1e-4";
< 1e-3 as "0.001"; < 1e-2 as "0.01"; and < 5e-2 as "0.05".
nrows : int, default=1 Number of rows to generate the subplots.
figsize : tuple, default=(12, 6) Size of the figure (width*height), each in inches.
cmap : str, default='tab10' Name of the color palette for coloring the major groups of contexts.
title_size : int, default=14 Font size of the title in each of the factor boxplots.
axis_label_size : int, default=12 Font size of the labels for X and Y axes.
group_label_rotation : int, default=45 Angle of rotation for the tick labels in the X axis.
ylabel : str, default='Context Loadings' Label for the Y axis.
dot_color : str, default='lightsalmon' A matplotlib color for the dots representing individual contexts in the boxplot. For more info see: https://matplotlib.org/stable/gallery/color/named_colors.html
dot_edge_color : str, default='brown' A matplotlib color for the edge of the dots in the boxplot. For more info see: https://matplotlib.org/stable/gallery/color/named_colors.html
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
verbose : boolean, default=None Whether printing out the result of the pairwise statistical tests in each of the factors
Returns
fig : matplotlib.figure.Figure A matplotlib figure.
axes : matplotlib.axes.Axes or array of Axes Matplotlib axes representing the subplots containing the boxplots.
Source code in cell2cell/plotting/factor_plot.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
loading_clustermap(loadings, loading_threshold=0.0, use_zscore=True, metric='euclidean', method='ward', optimal_leaf=True, figsize=(15, 8), heatmap_lw=0.2, cbar_fontsize=12, tick_fontsize=10, cmap=None, cbar_label=None, filename=None, **kwargs)
Plots a clustermap of the tensor-factorization loadings from one tensor dimension or the joint loadings from multiple tensor dimensions.
Parameters
loadings : pandas.DataFrame Loadings for a given tensor dimension after running the tensor decomposition. Rows are the elements in one dimension or joint pairs/n-tuples in multiple dimensions. It is recommended that the loadings resulting from the decomposition should be l2-normalized prior to their use, by considering all dimensions together. For example, take the factors dictionary found in any InteractionTensor or any BaseTensor derived class, and execute cell2cell.tensor.normalize(factors).
loading_threshold : float Threshold to filter out elements in the loadings dataframe. This plot considers elements with loadings greater than this threshold in at least one of the factors.
use_zscore : boolean Whether converting loadings to z-scores across factors.
metric : str, default='euclidean' The distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
method : str, 'ward' by default Method to compute the linkage. It could be:
- 'single'
- 'complete'
- 'average'
- 'weighted'
- 'centroid'
- 'median'
- 'ward'
For more details, go to:
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
optimal_leaf : boolean, default=True Whether sorting the leaf of the dendrograms to have a minimal distance between successive leaves. For more information, see scipy.cluster.hierarchy.optimal_leaf_ordering
figsize : tuple, default=(16, 9) Size of the figure (width*height), each in inches.
heatmap_lw : float, default=0.2 Width of the lines that will divide each cell.
cbar_fontsize : int, default=12 Font size for the colorbar title.
tick_fontsize : int, default=10 Font size for ticks in the x and y axes.
cmap : str, default=None Name of the color palette for coloring the heatmap. If None, cmap='Blues' would be used when use_zscore=False; and cmap='vlag' when use_zscore=True.
cbar_label : str, default=None
Label for the color bar. If None, default labels will be 'Z-scores
across factors'
or 'Loadings', depending on use_zcore is True or False, respectively.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
**kwargs : dict Dictionary containing arguments for the seaborn.clustermap function.
Returns
cm : seaborn.matrix.ClusterGrid A seaborn ClusterGrid instance.
Source code in cell2cell/plotting/factor_plot.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
pcoa_plot
pcoa_3dplot(interaction_space, metadata=None, sample_col='#SampleID', group_col='Groups', pcoa_method='eigh', meta_cmap='gist_rainbow', colors=None, excluded_cells=None, title='', axis_fontsize=14, legend_fontsize=12, figsize=(6, 5), view_angles=(30, 135), filename=None)
Projects the cells into an Euclidean space (PCoA) given their distances based on their CCI scores. Then, plots each cell by their first three coordinates in a 3D scatter plot.
Parameters
interaction_space : cell2cell.core.interaction_space.InteractionSpace Interaction space that contains all a distance matrix after running the the method compute_pairwise_cci_scores. Alternatively, this object can be a numpy-array or a pandas DataFrame. Also, a SingleCellInteractions or a BulkInteractions object after running the method compute_pairwise_cci_scores.
metadata : pandas.Dataframe, default=None Metadata associated with the cells, cell types or samples in the matrix containing CCC scores. If None, cells will not be colored by major groups.
sample_col : str, default='#SampleID' Column in the metadata for the cells, cell types or samples in the matrix containing CCI scores.
group_col : str, default='Groups' Column in the metadata containing the major groups of cells, cell types or samples in the matrix with CCI scores.
pcoa_method : str, default='eigh'
Eigendecomposition method to use in performing PCoA.
By default, uses SciPy's eigh, which computes exact
eigenvectors and eigenvalues for all dimensions. The alternate
method, fsvd, uses faster heuristic eigendecomposition but loses
accuracy. The magnitude of accuracy lost is dependent on dataset.
meta_cmap : str, default='gist_rainbow' Name of the color palette for coloring the major groups of cells.
colors : dict, default=None Dictionary containing tuples in the RGBA format for indicating colors of major groups of cells. If colors is specified, meta_cmap will be ignored.
excluded_cells : list, default=None List containing cell names that are present in the interaction_space object but that will be excluded from this plot.
title : str, default='' Title of the PCoA 3D plot.
axis_fontsize : int, default=14 Size of the font for the labels of each axis (X, Y and Z).
legend_fontsize : int, default=12 Size of the font for labels in the legend.
figsize : tuple, default=(6, 5) Size of the figure (width*height), each in inches.
view_angles : tuple, default=(30, 135) Rotation angles of the plot. Set the elevation and azimuth of the axes.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
results : dict Dictionary that contains:
- 'fig' : matplotlib.figure.Figure, containing the whole figure
- 'axes' : matplotlib.axes.Axes, containing the axes of the 3D plot
- 'ordination' : Ordination or projection obtained from the PCoA
- 'distance_matrix' : Distance matrix used to perform the PCoA (usually in
interaction_space.distance_matrix
Source code in cell2cell/plotting/pcoa_plot.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
pval_plot
dot_plot(sc_interactions, evaluation='communication', significance=0.05, senders=None, receivers=None, figsize=(16, 9), tick_size=8, cmap='PuOr', filename=None)
Generates a dot plot for the CCI or communication scores given their P-values. Size of the dots are given by the -log10(P-value) and colors by the value of the CCI or communication score.
Parameters
sc_interactions : cell2cell.analysis.cell2cell_pipelines.SingleCellInteractions Interaction class with all necessary methods to run the cell2cell pipeline on a single-cell RNA-seq dataset. The method permute_cell_labels() must be run before generating this plot.
evaluation : str, default='communication' P-values of CCI or communication scores used for this plot. - 'interactions' : For CCI scores - 'communication' : For communication scores
significance : float, default=0.05 The significance threshold to be plotted. LR pairs or cell-cell pairs with at least one P-value below this threshold will be considered.
senders : list, default=None Optional filter to plot specific sender cells.
receivers : list, default=None Optional filter to plot specific receiver cells.
figsize : tuple, default=(16, 9) Size of the figure (width*height), each in inches.
tick_size : int, default=8 Specifies the size of ticklabels as well as the maximum size of the dots.
cmap : str, default='PuOr' A matplotlib color palette name.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
Source code in cell2cell/plotting/pval_plot.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
generate_dot_plot(pval_df, score_df, significance=0.05, xlabel='', ylabel='', cbar_title='Score', cmap='PuOr', figsize=None, label_size=20, title_size=20, tick_size=14, filename=None, min_row_height=0.3, reference_height=1.0)
Generates a dot plot for given P-values and respective scores with improved spacing.
Parameters
pval_df : pandas.DataFrame A dataframe containing the P-values, with multiple elements in both rows and columns
score_df : pandas.DataFrame
A dataframe containing the scores that were tested. Rows and
columns must be the same as in pval_df.
significance : float, default=0.05 The significance threshold to be plotted. LR pairs or cell-cell pairs with at least one P-value below this threshold will be considered.
xlabel : str, default='' Name or label of the X axis.
ylabel : str, default='' Name or label of the Y axis.
cbar_title : str, default='Score'
A title for the colorbar associated with the scores in
score_df. It is usually the name of the score.
cmap : str, default='PuOr' A matplotlib color palette name.
figsize : tuple, default=None Size of the figure (width*height), each in inches. If None, it will be automatically calculated based on the data.
label_size : int, default=20 Specifies the size of the labels of both X and Y axes.
title_size : int, default=20 Specifies the size of the title of the colorbar and P-val sizes.
tick_size : int, default=14 Specifies the size of ticklabels as well as the maximum size of the dots.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
min_row_height : float, default=0.3 Minimum height per row in inches to prevent dot overlap.
reference_height : float, default=1.0 Fixed height in inches for the reference legend subplot.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
Source code in cell2cell/plotting/pval_plot.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
tensor_plot
generate_plot_df(interaction_tensor)
Generates a melt dataframe with loadings for each element in all dimensions across factors
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor
Returns
plot_df : pandas.DataFrame A dataframe containing loadings for every element of all dimensions across factors from the decomposition. Rows are loadings individual elements of each dimension in a given factor, while columns are the following list ['Factor', 'Variable', 'Value', 'Order']
Source code in cell2cell/plotting/tensor_plot.py
plot_coupled_elbow(loss_dict, elbow=None, figsize=(4, 2.25), ylabel='Normalized Error', fontsize=14, filename=None, show_individual=False, tensor1_name='Tensor1', tensor2_name='Tensor2')
Plots the errors of an elbow analysis for coupled tensors with a single run.
Parameters
loss_dict : dict Dictionary with keys 'tensor1', 'tensor2', and 'combined', each containing a list of (rank, error) tuples.
elbow : int, default=None Rank to mark with a red dot. Usually used to represent the detected elbow.
figsize : tuple, default=(4, 2.25) Figure size, width by height
ylabel : str, default='Normalized Error' Label for the y-axis
fontsize : int, default=14 Fontsize for axis labels.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
show_individual : bool, default=False Whether to show individual tensor errors (tensor1, tensor2) alongside the combined error. If False, only the combined error is shown.
tensor1_name : str, default='Tensor1' Name for the first tensor to use in the legend.
tensor2_name : str, default='Tensor2' Name for the second tensor to use in the legend.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
Source code in cell2cell/plotting/tensor_plot.py
plot_coupled_factorization_errors(errors1, errors2, combined_errors, tensor1_name='Tensor 1', tensor2_name='Tensor 2', figsize=(10, 5), fontsize=12, show_individual=True, filename=None)
Plots the factorization errors across iterations for coupled tensor decomposition.
Parameters
errors1 : list List of reconstruction errors for the first tensor at each iteration.
errors2 : list List of reconstruction errors for the second tensor at each iteration.
combined_errors : list List of combined weighted reconstruction errors at each iteration.
tensor1_name : str, default='Tensor 1' Name for the first tensor to use in the legend.
tensor2_name : str, default='Tensor 2' Name for the second tensor to use in the legend.
figsize : tuple, default=(10, 5) Figure size (width, height).
fontsize : int, default=12 Font size for labels and legend.
show_individual : bool, default=True Whether to show individual tensor errors or only combined error.
filename : str, default=None Path to save the figure. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib.
Source code in cell2cell/plotting/tensor_plot.py
893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 | |
plot_elbow(loss, elbow=None, figsize=(4, 2.25), ylabel='Normalized Error', fontsize=14, filename=None)
Plots the errors of an elbow analysis with just one run of a tensor factorization for each rank.
Parameters
loss : list List of tuples with (x, y) coordinates for the elbow analysis. X values are the different ranks and Y values are the errors of each decomposition.
elbow : int, default=None X coordinate to color the error as red. Usually used to represent the detected elbow.
figsize : tuple, default=(4, 2.25) Figure size, width by height
ylabel : str, default='Normalized Error' Label for the y-axis
fontsize : int, default=14 Fontsize for axis labels.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
Source code in cell2cell/plotting/tensor_plot.py
plot_factorization_errors(errors, figsize=(8, 5), fontsize=12, filename=None)
Plots the factorization errors across iterations for a tensor decomposition.
Parameters
errors : list List of reconstruction errors at each iteration of the factorization.
figsize : tuple, default=(8, 5) Figure size (width, height).
fontsize : int, default=12 Font size for labels and title.
filename : str, default=None Path to save the figure. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib.
Source code in cell2cell/plotting/tensor_plot.py
plot_multiple_run_coupled_elbow(all_loss, elbow=None, ci='95%', figsize=(4, 2.25), ylabel='Normalized Error', fontsize=14, smooth=False, filename=None, show_individual=False, tensor1_name='Tensor1', tensor2_name='Tensor2')
Plots the errors/similarities of a coupled elbow analysis with multiple runs of tensor factorizations for each rank.
Parameters
----------
all_loss : dict
Dictionary containing arrays with metrics associated with multiple runs for
each tensor. Keys are 'tensor1', 'tensor2', and 'combined'. Each value is an
array of shape (runs, upper_rank).
elbow : int, default=None
X coordinate to color the metric as red. Usually used to represent the detected
elbow.
ci : str, default='95%'
Confidence interval for representing the multiple runs in each rank.
{'std', '95%'}
figsize : tuple, default=(4, 2.25)
Figure size, width by height
ylabel : str, default='Normalized Error'
Label for the y-axis. Should be 'Normalized Error' for error metric or
'Similarity
(1-CorrIndex)' for similarity metric.
fontsize : int, default=14
Fontsize for axis labels.
smooth : boolean, default=False
Whether smoothing the curve with a Savitzky-Golay filter.
filename : str, default=None
Path to save the figure of the elbow analysis. If None, the figure is not
saved.
show_individual : boolean, default=False
Whether to show individual tensor metrics (tensor1, tensor2) alongside the
combined metric. If False, only the combined metric is shown.
tensor1_name : str, default='Tensor1'
Name for the first tensor to use in the legend.
tensor2_name : str, default='Tensor2'
Name for the second tensor to use in the legend.
Returns
-------
fig : matplotlib.figure.Figure
Figure object made with matplotlib
Source code in cell2cell/plotting/tensor_plot.py
686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 | |
plot_multiple_run_elbow(all_loss, elbow=None, ci='95%', figsize=(4, 2.25), ylabel='Normalized Error', fontsize=14, smooth=False, filename=None)
Plots the errors of an elbow analysis with multiple runs of a tensor factorization for each rank.
Parameters
all_loss : ndarray Array containing the errors associated with multiple runs for a given rank. This array is of shape (runs, upper_rank).
elbow : int, default=None X coordinate to color the error as red. Usually used to represent the detected elbow.
ci : str, default='std' Confidence interval for representing the multiple runs in each rank.
figsize : tuple, default=(4, 2.25) Figure size, width by height
ylabel : str, default='Normalized Error' Label for the y-axis
fontsize : int, default=14 Fontsize for axis labels.
smooth : boolean, default=False Whether smoothing the curve with a Savitzky-Golay filter.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
Source code in cell2cell/plotting/tensor_plot.py
reorder_dimension_elements(factors, reorder_elements, metadata=None)
Reorders elements in the dataframes including factor loadings.
Parameters
factors : dict Ordered dictionary containing a dataframe with the factor loadings for each dimension/order of the tensor.
reorder_elements : dict, default=None Dictionary for reordering elements in each of the tensor dimension. Keys of this dictionary could be any or all of the keys in interaction_tensor.factors. Values are list with the names or labels of the elements in a tensor dimension. For example, for the context dimension, all elements included in interaction_tensor.factors['Context'].index must be present.
metadata : list, default=None
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element.
Returns
reordered_factors : dict Ordered dictionary containing a dataframe with the factor loadings for each dimension/order of the tensor. This dictionary includes the new orders.
new_metadata : list, default=None
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element. In this case, elements are sorted according to reorder_elements.
Source code in cell2cell/plotting/tensor_plot.py
tensor_factors_plot(interaction_tensor, order_labels=None, order_sorting=None, reorder_elements=None, metadata=None, sample_col='Element', group_col='Category', meta_cmaps=None, fontsize=20, plot_legend=True, filename=None)
Plots the loadings for each element in each dimension of the tensor, generate by a tensor factorization.
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor.
order_labels : list, default=None List with the labels of each dimension to use in the plot. If none, the default names given when factorizing the tensor will be used.
order_sorting : list, default=None List specifying the order of dimensions to plot. Can be either: - List of indices: [0, 2, 1, 3] to reorder dimensions by position - List of dimension names: ['Contexts', 'Sender Cells', 'Receiver Cells', 'Ligand-Receptor Pairs'] If None, uses the original order.
reorder_elements : dict, default=None Dictionary for reordering elements in each of the tensor dimension. Keys of this dictionary could be any or all of the keys in interaction_tensor.factors. Values are list with the names or labels of the elements in a tensor dimension. For example, for the context dimension, all elements included in interaction_tensor.factors['Context'].index must be present.
metadata : list, default=None
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element.
sample_col : str, default='Element' Name of the column containing the element names in the metadata.
group_col : str, default='Category' Name of the column containing the metadata or grouping information for each element in the metadata.
meta_cmaps : list, default=None A list of colormaps used for coloring elements in each dimension. The length of this list is equal to the number of dimensions of the tensor. If None, all dimensions will be colores with the colormap 'gist_rainbow'.
fontsize : int, default=20 Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
plot_legend : boolean, default=True Whether plotting the legends for the coloring of each element in their respective dimensions.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
axes : matplotlib.axes.Axes or array of Axes List of Axes for each subplot in the figure.
Source code in cell2cell/plotting/tensor_plot.py
tensor_factors_plot_from_loadings(factors, rank=None, order_labels=None, order_sorting=None, reorder_elements=None, metadata=None, sample_col='Element', group_col='Category', meta_cmaps=None, fontsize=20, plot_legend=True, filename=None)
Plots the loadings for each element in each dimension of the tensor, generate by a tensor factorization.
Parameters
factors : collections.OrderedDict An ordered dictionary wherein keys are the names of each tensor dimension, and values are the loadings in a pandas.DataFrame. In this dataframe, rows are the elements of the respective dimension and columns are the factors from the tensor factorization. Values are the corresponding loadings.
rank : int, default=None Number of factors generated from the decomposition
order_labels : list, default=None
List with the labels of each dimension to use in the plot. If none, the
default names given when factorizing the tensor will be used. This labels
should be provided in the original order of the tensor dimensions.
If order_sorting is provided, labels will be automatically reordered to fit
the new order. If order_sorting is not provided, the labels will be used
in the original order of the tensor dimensions.
order_sorting : list, default=None List specifying the order of dimensions to plot. Can be either: - List of indices: [0, 2, 1, 3] to reorder dimensions by position - List of dimension names: ['Contexts', 'Sender Cells', 'Receiver Cells', 'Ligand-Receptor Pairs'] If None, uses the original order.
reorder_elements : dict, default=None Dictionary for reordering elements in each of the tensor dimension. Keys of this dictionary could be any or all of the keys in interaction_tensor.factors. Values are list with the names or labels of the elements in a tensor dimension. For example, for the context dimension, all elements included in interaction_tensor.factors['Context'].index must be present.
metadata : list, default=None
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element.
sample_col : str, default='Element' Name of the column containing the element names in the metadata.
group_col : str, default='Category' Name of the column containing the metadata or grouping information for each element in the metadata.
meta_cmaps : list, default=None A list of colormaps used for coloring elements in each dimension. The length of this list is equal to the number of dimensions of the tensor. If None, all dimensions will be colores with the colormap 'gist_rainbow'.
fontsize : int, default=20 Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
plot_legend : boolean, default=True Whether plotting the legends for the coloring of each element in their respective dimensions.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
axes : matplotlib.axes.Axes or array of Axes List of Axes for each subplot in the figure.
Source code in cell2cell/plotting/tensor_plot.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | |
umap_plot
umap_biplot(umap_df, figsize=(8, 8), ax=None, show_axes=True, show_legend=True, hue=None, cmap='tab10', fontsize=20, filename=None)
Plots a UMAP biplot for the UMAP embeddings.
Parameters
umap_df : pandas.DataFrame Dataframe containing the UMAP embeddings for the axis analyzed. It must contain columns 'umap1 and 'umap2'. If a hue column is provided in the parameter 'hue', that column must be provided in this dataframe.
figsize : tuple, default=(8, 8) Size of the figure (width*height), each in inches.
ax : matplotlib.axes.Axes, default=None The matplotlib axes containing a plot.
show_axes : boolean, default=True Whether showing lines, ticks and ticklabels of both axes.
show_legend : boolean, default=True Whether including the legend when a hue is provided.
hue : vector or key in 'umap_df' Grouping variable that will produce points with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case.
cmap : str, default='tab10' Name of the color palette for coloring elements with UMAP embeddings.
fontsize : int, default=20 Fontsize of the axis labels (UMAP1 and UMAP2).
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
Returns
fig : matplotlib.figure.Figure
A matplotlib Figure instance.
ax : matplotlib.axes.Axes
The matplotlib axes containing the plot.
Source code in cell2cell/plotting/umap_plot.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
preprocessing
cutoffs
get_constant_cutoff(rnaseq_data, constant_cutoff=10)
Generates a cutoff/threshold dataframe for all genes in rnaseq_data assigning a constant value as the cutoff.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
constant_cutoff : float, default=10 Cutoff or threshold assigned to each gene.
Returns
cutoffs : pandas.DataFrame A dataframe containing the value corresponding to cutoff or threshold assigned to each gene. Rows are genes and the column corresponds to 'value'. All values are the same and corresponds to the constant_cutoff.
Source code in cell2cell/preprocessing/cutoffs.py
get_cutoffs(rnaseq_data, parameters, verbose=True)
This function creates cutoff/threshold values for genes in rnaseq_data and the respective cells/tissues/samples by a given method or parameter.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
parameters : dict This dictionary must contain a 'parameter' key and a 'type' key. The first one is the respective parameter to compute the threshold or cutoff values. The type corresponds to the approach to compute the values according to the parameter employed. Options of 'type' that can be used:
- 'local_percentile' : computes the value of a given percentile,
for each gene independently. In this case,
the parameter corresponds to the percentile
to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile
from all genes and samples simultaneously.
In this case, the parameter corresponds to
the percentile to compute, as a float value
between 0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is
the path of that file. It must contain the same genes as
index and same samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a
cutoff for each gene in each sample. This allows
to use specific cutoffs for each sample. The
columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a
cutoff for each gene in only one column. These
cutoffs will be applied to all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in
the 'parameter'.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
cutoffs : pandas.DataFrame Dataframe wherein rows are genes in rnaseq_data. Depending on the type in the parameters dictionary, it may have only one column ('value') or the same columns that rnaseq_data has, generating specfic cutoffs for each cell/tissue/sample.
Source code in cell2cell/preprocessing/cutoffs.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
get_global_percentile_cutoffs(rnaseq_data, percentile=0.75)
Obtains a global value associated with a given percentile across cells/tissues/samples and genes in a rnaseq_data.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
percentile : float, default=0.75 This is the percentile to be computed.
Returns
cutoffs : pandas.DataFrame A dataframe containing the value corresponding to the percentile across the dataset. Rows are genes and the column corresponds to 'value'. All values here are the same global percentile.
Source code in cell2cell/preprocessing/cutoffs.py
get_local_percentile_cutoffs(rnaseq_data, percentile=0.75)
Obtains a local value associated with a given percentile across cells/tissues/samples for each gene in a rnaseq_data.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
percentile : float, default=0.75 This is the percentile to be computed.
Returns
cutoffs : pandas.DataFrame A dataframe containing the value corresponding to the percentile across the genes. Rows are genes and the column corresponds to 'value'.
Source code in cell2cell/preprocessing/cutoffs.py
find_elements
find_duplicates(element_list)
Function based on: https://stackoverflow.com/a/5419576/12032899 Finds duplicate items and list their index location.
Parameters
element_list : list List of elements
Returns
duplicate_dict : dict Dictionary with duplicate items. Keys are the items, and values are lists with the respective indexes where they are.
Source code in cell2cell/preprocessing/find_elements.py
get_element_abundances(element_lists)
Computes the fraction of occurrence of each element in a list of lists.
Parameters
element_lists : list List of lists of elements. Elements will be counted only once in each of the lists.
Returns
abundance_dict : dict
Dictionary containing the number of times that an
element was present, divided by the total number of
lists in element_lists.
Source code in cell2cell/preprocessing/find_elements.py
get_elements_over_fraction(abundance_dict, fraction)
Obtains a list of elements with the fraction of occurrence at least the threshold.
Parameters
abundance_dict : dict Dictionary containing the number of times that an element was present, divided by the total number of possible occurrences.
fraction : float Threshold to filter the elements. Elements with at least this threshold will be included.
Returns
elements : list List of elements that met the fraction criteria.
Source code in cell2cell/preprocessing/find_elements.py
gene_ontology
find_all_children_of_go_term(go_terms, go_term_name, output_list, verbose=True)
Finds all children GO terms (below in hierarchy) of a given GO term.
Parameters
go_terms : networkx.Graph NetworkX Graph containing GO terms datasets from .obo file. It could be loaded using cell2cell.io.read_data.load_go_terms(filename).
go_term_name : str Specific GO term to find their children. For example: 'GO:0007155'.
output_list : list List used to perform a Depth First Search and find the children in a recursive way. Here the children will be automatically written.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Source code in cell2cell/preprocessing/gene_ontology.py
find_go_terms_from_keyword(go_terms, keyword, verbose=False)
Uses a keyword to find related GO terms.
Parameters
go_terms : networkx.Graph NetworkX Graph containing GO terms datasets from .obo file. It could be loaded using cell2cell.io.read_data.load_go_terms(filename).
keyword : str Keyword to be included in the names of retrieved GO terms.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Returns
go_filter : list List containing all GO terms related to a keyword.
Source code in cell2cell/preprocessing/gene_ontology.py
get_genes_from_go_hierarchy(go_annotations, go_terms, go_filter, go_header='GO', gene_header='Gene', verbose=False)
Obtains genes associated with specific GO terms and their children GO terms (below in the hierarchy).
Parameters
go_annotations : pandas.DataFrame Dataframe containing information about GO term annotations of each gene for a given organism according to the ga file. Can be loading with the function cell2cell.io.read_data.load_go_annotations().
go_terms : networkx.Graph NetworkX Graph containing GO terms datasets from .obo file. It could be loaded using cell2cell.io.read_data.load_go_terms(filename).
go_filter : list List containing one or more GO-terms to find associated genes.
go_header : str, default='GO' Column name wherein GO terms are located in the dataframe.
gene_header : str, default='Gene' Column name wherein genes are located in the dataframe.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Returns
genes : list List of genes that are associated with GO-terms contained in go_filter, and related to the children GO terms of those terms.
Source code in cell2cell/preprocessing/gene_ontology.py
get_genes_from_go_terms(go_annotations, go_filter, go_header='GO', gene_header='Gene', verbose=True)
Finds genes associated with specific GO-terms.
Parameters
go_annotations : pandas.DataFrame Dataframe containing information about GO term annotations of each gene for a given organism according to the ga file. Can be loading with the function cell2cell.io.read_data.load_go_annotations().
go_filter : list List containing one or more GO-terms to find associated genes.
go_header : str, default='GO' Column name wherein GO terms are located in the dataframe.
gene_header : str, default='Gene' Column name wherein genes are located in the dataframe.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
genes : list List of genes that are associated with GO-terms contained in go_filter.
Source code in cell2cell/preprocessing/gene_ontology.py
integrate_data
get_modified_rnaseq(rnaseq_data, cutoffs=None, communication_score='expression_thresholding')
Preprocess gene expression into values used by a communication scoring function (either continuous or binary).
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
cutoffs : pandas.DataFrame A dataframe containing the value corresponding to cutoff or threshold assigned to each gene. Rows are genes and columns could be either 'value' for a single threshold for all cell-types/tissues/samples or the names of cell-types/tissues/samples for thresholding in a specific way. They could be obtained through the function cell2cell.preprocessing.cutoffs.get_cutoffs()
communication_score : str, default='expression_thresholding' Type of communication score used to detect active ligand-receptor pairs between each pair of cell. See cell2cell.core.communication_scores for more details. It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'expression_gmean'
Returns
modified_rnaseq : pandas.DataFrame Preprocessed gene expression given a communication scoring function to use. Rows are genes and columns are cell-types/tissues/samples.
Source code in cell2cell/preprocessing/integrate_data.py
get_ppi_dict_from_go_terms(ppi_data, go_annotations, go_terms, contact_go_terms, mediator_go_terms=None, use_children=True, go_header='GO', gene_header='Gene', interaction_columns=('A', 'B'), verbose=True)
Filters a complete list of protein-protein interactions into sublists containing proteins involved in different kinds of intercellular interactions, by provided lists of GO terms.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
go_annotations : pandas.DataFrame Dataframe containing information about GO term annotations of each gene for a given organism according to the ga file. Can be loading with the function cell2cell.io.read_data.load_go_annotations().
go_terms : networkx.Graph NetworkX Graph containing GO terms datasets from .obo file. It could be loaded using cell2cell.io.read_data.load_go_terms(filename).
contact_go_terms : list GO terms for selecting proteins participating in cell contact interactions (e.g. surface proteins, receptors).
mediator_go_terms : list, default=None GO terms for selecting proteins participating in mediated or secreted signaling (e.g. extracellular proteins, ligands). If None, only interactions involved in cell contacts will be returned.
use_children : boolean, default=True Whether considering children GO terms (below in hierarchy) to the ones passed as inputs (contact_go_terms and mediator_go_terms).
go_header : str, default='GO' Column name wherein GO terms are located in the dataframe.
gene_header : str, default='Gene' Column name wherein genes are located in the dataframe.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
ppi_dict : dict Dictionary containing lists of PPIs involving proteins that participate in diffferent kinds of intercellular interactions. Options are under the keys:
- 'contacts' : Contains proteins participating in cell contact
interactions (e.g. surface proteins, receptors)
- 'mediated' : Contains proteins participating in mediated or
secreted signaling (e.g. ligand-receptor interactions)
- 'combined' : Contains both 'contacts' and 'mediated' PPIs.
- 'complete' : Contains all combinations of interactions between
ligands, receptors, surface proteins, etc).
If mediator_go_terms input is None, this dictionary will contain
PPIs only for 'contacts'.
Source code in cell2cell/preprocessing/integrate_data.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | |
get_ppi_dict_from_proteins(ppi_data, contact_proteins, mediator_proteins=None, interaction_columns=('A', 'B'), bidirectional=True, verbose=True)
Filters a complete list of protein-protein interactions into sublists containing proteins involved in different kinds of intercellular interactions, by provided lists of proteins.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
contact_proteins : list Protein names of proteins participating in cell contact interactions (e.g. surface proteins, receptors).
mediator_proteins : list, default=None Protein names of proteins participating in mediated or secreted signaling (e.g. extracellular proteins, ligands). If None, only interactions involved in cell contacts will be returned.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
bidirectional : boolean, default=True Whether duplicating PPIs in both direction of interactions. That is, if the list considers ProtA-ProtB interaction, the interaction ProtB-ProtA will be also included.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
ppi_dict : dict Dictionary containing lists of PPIs involving proteins that participate in diffferent kinds of intercellular interactions. Options are under the keys:
- 'contacts' : Contains proteins participating in cell contact
interactions (e.g. surface proteins, receptors)
- 'mediated' : Contains proteins participating in mediated or
secreted signaling (e.g. ligand-receptor interactions)
- 'combined' : Contains both 'contacts' and 'mediated' PPIs.
- 'complete' : Contains all combinations of interactions between
ligands, receptors, surface proteins, etc).
If mediator_proteins input is None, this dictionary will contain
PPIs only for 'contacts'.
Source code in cell2cell/preprocessing/integrate_data.py
get_thresholded_rnaseq(rnaseq_data, cutoffs)
Binzarizes a RNA-seq dataset given cutoff or threshold values.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
cutoffs : pandas.DataFrame A dataframe containing the value corresponding to cutoff or threshold assigned to each gene. Rows are genes and columns could be either 'value' for a single threshold for all cell-types/tissues/samples or the names of cell-types/tissues/samples for thresholding in a specific way. They could be obtained through the function cell2cell.preprocessing.cutoffs.get_cutoffs()
Returns
binary_rnaseq_data : pandas.DataFrame Preprocessed gene expression into binary values given cutoffs or thresholds either general or specific for all cell-types/ tissues/samples.
Source code in cell2cell/preprocessing/integrate_data.py
get_weighted_ppi(ppi_data, modified_rnaseq_data, column='value', interaction_columns=('A', 'B'))
Assigns preprocessed gene expression values to proteins in a list of protein-protein interactions.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
modified_rnaseq_data : pandas.DataFrame Preprocessed gene expression given a communication scoring function to use. Rows are genes and columns are cell-types/tissues/samples.
column : str, default='value' Column name to consider the gene expression values.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
weighted_ppi : pandas.DataFrame List of protein-protein interactions that contains gene expression values instead of the names of interacting proteins. Gene expression values are preprocessed given a communication scoring function to use.
Source code in cell2cell/preprocessing/integrate_data.py
manipulate_dataframes
check_presence_in_dataframe(df, elements, columns=None)
Searches for elements in a dataframe and returns those that are present in the dataframe.
Parameters
df : pandas.DataFrame A dataframe
elements : list List of elements to find in the dataframe. They must be a data type contained in the dataframe.
columns : list, default=None Names of columns to consider in the search. If None, all columns are used.
Returns
found_elements : list List of elements in the input list that were found in the dataframe.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
check_symmetry(df)
Checks whether a dataframe is symmetric.
Parameters
df : pandas.DataFrame A dataframe.
Returns
symmetric : boolean Whether a dataframe is symmetric.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
convert_to_distance_matrix(df)
Converts a symmetric dataframe into a distance dataframe. That is, diagonal elements are all zero.
Parameters
df : pandas.DataFrame A dataframe.
Returns
df_ : pandas.DataFrame A copy of df, but with all diagonal elements with a value of zero.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
shuffle_cols_in_df(df, columns, shuffling_number=1, random_state=None)
Randomly shuffles specific columns in a dataframe.
Parameters
df : pandas.DataFrame A dataframe.
columns : list Names of columns to shuffle.
shuffling_number : int, default=1 Number of shuffles per column.
random_state : int, default=None Seed for randomization.
Returns
df_ : pandas.DataFrame A shuffled dataframe.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
shuffle_dataframe(df, shuffling_number=1, axis=0, random_state=None)
Randomly shuffles a whole dataframe across a given axis.
Parameters
df : pandas.DataFrame A dataframe.
shuffling_number : int, default=1 Number of shuffles per column.
axis : int, default=0 An axis of the dataframe (0 across rows, 1 across columns). Across rows means that shuffles each column independently, and across columns shuffles each row independently.
random_state : int, default=None Seed for randomization.
Returns
df_ : pandas.DataFrame A shuffled dataframe.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
shuffle_rows_in_df(df, rows, shuffling_number=1, random_state=None)
Randomly shuffles specific rows in a dataframe.
Parameters
df : pandas.DataFrame A dataframe.
rows : list Names of rows (or indexes) to shuffle.
shuffling_number : int, default=1 Number of shuffles per row.
random_state : int, default=None Seed for randomization.
Returns
df_.T : pandas.DataFrame A shuffled dataframe.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
subsample_dataframe(df, n_samples, random_state=None)
Randomly subsamples rows of a dataframe.
Parameters
df : pandas.DataFrame A dataframe.
n_samples : int Number of samples, rows in this case. If n_samples is larger than the number of rows, the entire dataframe will be returned, but shuffled.
random_state : int, default=None Seed for randomization.
Returns
subsampled_df : pandas.DataFrame A subsampled and shuffled dataframe.
Source code in cell2cell/preprocessing/manipulate_dataframes.py
ppi
bidirectional_ppi_for_cci(ppi_data, interaction_columns=('A', 'B'), verbose=True)
Makes a list of protein-protein interactions to be bidirectional. That is, repeating a PPI like ProtA-ProtB but in the other direction (ProtB-ProtA) if not present.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
bi_ppi_data : pandas.DataFrame Bidirectional ppi_data. Contains duplicated PPIs in both directions. That is, it contains both ProtA-ProtB and ProtB-ProtA interactions.
Source code in cell2cell/preprocessing/ppi.py
filter_complex_ppi_by_proteins(ppi_data, proteins, complex_sep='&', upper_letter_comparison=True, interaction_columns=('A', 'B'))
Filters a list of protein-protein interactions that for sure contains protein complexes to contain only interacting proteins or subunites in a list of specific protein or gene names.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
proteins : list A list of protein names to filter PPIs.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
upper_letter_comparison : boolean, default=True Whether making uppercase the protein names in the list of proteins and the names in the ppi_data to match their names and integrate their Useful when there are inconsistencies in the names that comes from a expression matrix and from ligand-receptor annotations.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
integrated_ppi : pandas.DataFrame A filtered list of PPIs, containing protein complexes in some cases, by a given list of proteins or gene names.
Source code in cell2cell/preprocessing/ppi.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
filter_ppi_by_proteins(ppi_data, proteins, complex_sep=None, upper_letter_comparison=True, interaction_columns=('A', 'B'))
Filters a list of protein-protein interactions to contain only interacting proteins in a list of specific protein or gene names.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
proteins : list A list of protein names to filter PPIs.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
upper_letter_comparison : boolean, default=True Whether making uppercase the protein names in the list of proteins and the names in the ppi_data to match their names and integrate their Useful when there are inconsistencies in the names that comes from a expression matrix and from ligand-receptor annotations.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
integrated_ppi : pandas.DataFrame A filtered list of PPIs by a given list of proteins or gene names.
Source code in cell2cell/preprocessing/ppi.py
filter_ppi_network(ppi_data, contact_proteins, mediator_proteins=None, reference_list=None, bidirectional=True, interaction_type='contacts', interaction_columns=('A', 'B'), verbose=True)
Filters a list of protein-protein interactions to contain interacting proteins involved in different kinds of cell-cell interactions/communication.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
contact_proteins : list Protein names of proteins participating in cell contact interactions (e.g. surface proteins, receptors).
mediator_proteins : list, default=None Protein names of proteins participating in mediated or secreted signaling (e.g. extracellular proteins, ligands). If None, only interactions involved in cell contacts will be returned.
reference_list : list, default=None Reference list of protein names. Filtered PPIs from contact_proteins and mediator proteins will be keep only if those proteins are also present in this list when is not None.
bidirectional : boolean, default=True Whether duplicating PPIs in both direction of interactions. That is, if the list considers ProtA-ProtB interaction, the interaction ProtB-ProtA will be also included.
interaction_type : str, default='contacts' Type of intercellular interactions/communication where the proteins have to be involved in. Available types are:
- 'contacts' : Contains proteins participating in cell contact
interactions (e.g. surface proteins, receptors)
- 'mediated' : Contains proteins participating in mediated or
secreted signaling (e.g. ligand-receptor interactions)
- 'combined' : Contains both 'contacts' and 'mediated' PPIs.
- 'complete' : Contains all combinations of interactions between
ligands, receptors, surface proteins, etc).
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
new_ppi_data : pandas.DataFrame A filtered list of PPIs by a given list of proteins or gene names depending on the type of intercellular communication.
Source code in cell2cell/preprocessing/ppi.py
get_all_to_all_ppi(ppi_data, proteins, interaction_columns=('A', 'B'))
Filters a list of protein-protein interactions to contain only proteins in a given list in both columns of interacting partners.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
proteins : list A list of protein names to filter PPIs.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
new_ppi_data : pandas.DataFrame A filtered list of PPIs by a given list of proteins or gene names.
Source code in cell2cell/preprocessing/ppi.py
get_filtered_ppi_network(ppi_data, contact_proteins, mediator_proteins=None, reference_list=None, interaction_type='contacts', interaction_columns=('A', 'B'), verbose=True)
Filters a list of protein-protein interactions to contain interacting proteins involved in different kinds of cell-cell interactions/communication.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
contact_proteins : list Protein names of proteins participating in cell contact interactions (e.g. surface proteins, receptors).
mediator_proteins : list, default=None Protein names of proteins participating in mediated or secreted signaling (e.g. extracellular proteins, ligands). If None, only interactions involved in cell contacts will be returned.
reference_list : list, default=None Reference list of protein names. Filtered PPIs from contact_proteins and mediator proteins will be keep only if those proteins are also present in this list when is not None.
interaction_type : str, default='contacts' Type of intercellular interactions/communication where the proteins have to be involved in. Available types are:
- 'contacts' : Contains proteins participating in cell contact
interactions (e.g. surface proteins, receptors)
- 'mediated' : Contains proteins participating in mediated or
secreted signaling (e.g. ligand-receptor interactions)
- 'combined' : Contains both 'contacts' and 'mediated' PPIs.
- 'complete' : Contains all combinations of interactions between
ligands, receptors, surface proteins, etc).
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
new_ppi_data : pandas.DataFrame A filtered list of PPIs by a given list of proteins or gene names depending on the type of intercellular communication.
Source code in cell2cell/preprocessing/ppi.py
527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 | |
get_genes_from_complexes(ppi_data, complex_sep='&', interaction_columns=('A', 'B'))
Gets protein/gene names for individual proteins (subunits when in complex) in a list of PPIs. If protein is a complex, for example ProtA&ProtB, it will return ProtA and ProtB separately.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
col_a_genes : list List of protein/gene names for proteins and subunits in the first column of interacting partners.
complex_a : list List of list of subunits of each complex that were present in the first column of interacting partners and that were returned as subunits in the previous list.
col_b_genes : list List of protein/gene names for proteins and subunits in the second column of interacting partners.
complex_b : list List of list of subunits of each complex that were present in the second column of interacting partners and that were returned as subunits in the previous list.
complexes : dict Dictionary where keys are the complex names in the list of PPIs, while values are list of subunits for the respective complex names.
Source code in cell2cell/preprocessing/ppi.py
get_one_group_to_other_ppi(ppi_data, proteins_a, proteins_b, interaction_columns=('A', 'B'))
Filters a list of protein-protein interactions to contain specific proteins in the first column of interacting partners an other specific proteins in the second column.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
proteins_a : list A list of protein names to filter the first column of interacting proteins in a list of PPIs.
proteins_b : list A list of protein names to filter the second column of interacting proteins in a list of PPIs.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
new_ppi_data : pandas.DataFrame A filtered list of PPIs by a given lists of proteins or gene names.
Source code in cell2cell/preprocessing/ppi.py
preprocess_ppi_data(ppi_data, interaction_columns, sort_values=None, score=None, rnaseq_genes=None, complex_sep=None, dropna=False, strna='', upper_letter_comparison=True, verbose=True)
Preprocess a list of protein-protein interactions by removed bidirectionality and keeping the minimum number of columns.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
sort_values : str, default=None Column name to sort PPIs in an ascending manner. If None, sorting is not done.
rnaseq_genes : list, default=None List of protein or gene names to filter the PPIs.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
dropna : boolean, default=False Whether dropping incomplete PPIs (with NaN values).
strna : str, default='' String to replace empty or NaN values with.
upper_letter_comparison : boolean, default=True Whether making uppercase the gene names in the expression matrices and the protein names in the ppi_data to match their names and integrate their respective expression level. Useful when there are inconsistencies in the names between the expression matrix and the ligand-receptor annotations.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
simplified_ppi : pandas.DataFrame A simplified list of protein-protein interactions. It does not contains duplicated interactions in both directions (if ProtA-ProtB and ProtB-ProtA interactions are present, only the one that appears first is kept) either extra columns beyond interacting ones. It contains only three columns: 'A', 'B', 'score', wherein 'A' and 'B' are the interacting partners in the PPI and 'score' represents a weight of the interaction for computing cell-cell interactions/communication.
Source code in cell2cell/preprocessing/ppi.py
remove_ppi_bidirectionality(ppi_data, interaction_columns, verbose=True)
Removes duplicate interactions. For example, when ProtA-ProtB and ProtB-ProtA interactions are present in the dataset, only one of them will be kept.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
unidirectional_ppi : pandas.DataFrame List of protein-protein interactions without duplicated interactions in both directions (if ProtA-ProtB and ProtB-ProtA interactions are present, only the one that appears first is kept).
Source code in cell2cell/preprocessing/ppi.py
simplify_ppi(ppi_data, interaction_columns, score=None, verbose=True)
Reduces a dataframe of protein-protein interactions into a simpler version with only three columns (A, B and score).
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
interaction_columns : tuple Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
score : str, default=None Column name where weights for the PPIs are specified. If None, a default score of one is automatically assigned to each PPI.
verbose : boolean, default=True Whether printing or not steps of the analysis.
Returns
A simplified dataframe of protein-protein interactions with only
three columns: 'A', 'B', 'score', wherein 'A' and 'B' are the
interacting partners in the PPI and 'score' represents a weight
of the interaction for computing cell-cell interactions/communication.
Source code in cell2cell/preprocessing/ppi.py
rnaseq
add_complexes_to_expression(rnaseq_data, complexes, agg_method='min')
Adds multimeric complexes into the gene expression matrix. Their gene expressions are the minimum expression value among the respective subunits composing them.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
complexes : dict Dictionary where keys are the complex names in the list of PPIs, while values are list of subunits for the respective complex names.
agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
Returns
tmp_rna : pandas.DataFrame Gene expression data for RNA-seq experiment containing multimeric complex names. Their gene expressions are the minimum expression value among the respective subunits composing them. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
aggregate_single_cells(rnaseq_data, metadata, barcode_col='barcodes', celltype_col='cell_types', method='average', transposed=True)
Aggregates gene expression of single cells into cell types for each gene.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a single-cell RNA-seq experiment. Columns are single cells and rows are genes. If columns are genes and rows are single cells, specify transposed=True.
metadata : pandas.Dataframe Metadata containing the cell types for each single cells in the RNA-seq dataset.
barcode_col : str, default='barcodes' Column-name for the single cells in the metadata.
celltype_col : str, default='cell_types' Column-name in the metadata for the grouping single cells into cell types by the selected aggregation method.
method : str, default='average Specifies the method to use to aggregate gene expression of single cells into their respective cell types. Used to perform the CCI analysis since it is on the cell types rather than single cells. Options are:
- 'nn_cell_fraction' : Among the single cells composing a cell type, it
calculates the fraction of single cells with non-zero count values
of a given gene.
- 'average' : Computes the average gene expression among the single cells
composing a cell type for a given gene.
transposed : boolean, default=True Whether the rnaseq_data is organized with columns as genes and rows as single cells.
Returns
agg_df : pandas.DataFrame Dataframe containing the gene expression values that were aggregated by cell types. Columns are cell types and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
divide_expression_by_max(rnaseq_data, axis=1)
Normalizes each gene value given the max value across an axis.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
axis : int, default=0 Axis to perform the max-value normalization. Options are {0 for normalizing across rows (column-wise) or 1 for normalizing across columns (row-wise)}.
Returns
new_data : pandas.DataFrame A gene expression data for RNA-seq experiment with normalized values. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
divide_expression_by_mean(rnaseq_data, axis=1)
Normalizes each gene value given the mean value across an axis.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
axis : int, default=0 Axis to perform the mean-value normalization. Options are {0 for normalizing across rows (column-wise) or 1 for normalizing across columns (row-wise)}.
Returns
new_data : pandas.DataFrame A gene expression data for RNA-seq experiment with normalized values. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
drop_empty_genes(rnaseq_data)
Drops genes that are all zeroes and/or without expression values for all cell-types/tissues/samples.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
Returns
data : pandas.DataFrame A gene expression data for RNA-seq experiment without empty genes. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
log10_transformation(rnaseq_data, addition=1e-06)
Log-transforms gene expression values in a gene expression matrix for a RNA-seq experiment.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
Returns
data : pandas.DataFrame A gene expression data for RNA-seq experiment with log-transformed values. Values are log10(expression + addition). Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
scale_expression_by_sum(rnaseq_data, axis=0, sum_value=1000000.0)
Normalizes all samples to sum up the same scale factor.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for RNA-seq experiment. Columns are cell-types/tissues/samples and rows are genes.
axis : int, default=0 Axis to perform the global-scaling normalization. Options are {0 for normalizing across rows (column-wise) or 1 for normalizing across columns (row-wise)}.
sum_value : float, default=1e6 Scaling factor. Normalized axis will sum up this value.
Returns
scaled_data : pandas.DataFrame A gene expression data for RNA-seq experiment with scaled values. All rows or columns, depending on the specified axis sum up to the same value. Columns are cell-types/tissues/samples and rows are genes.
Source code in cell2cell/preprocessing/rnaseq.py
signal
smooth_curve(values, window_length=None, polyorder=3, **kwargs)
Apply a Savitzky-Golay filter to an array to smooth the curve.
Parameters
values : array-like An array or list of values.
window_length : int, default=None Size of the window of values to use too smooth the curve.
polyorder : int, default=3 The order of the polynomial used to fit the samples.
**kwargs : dict Extra arguments for the scipy.signal.savgol_filter function.
Returns
smooth_values : array-like An array or list of values representing the smooth curvee.
Source code in cell2cell/preprocessing/signal.py
spatial
distances
celltype_pair_distance(df1, df2, method='min', distance='euclidean')
Calculates the distance between two sets of data points (single cell coordinates) represented by df1 and df2. It supports two distance metrics: Euclidean and Manhattan distances. The method parameter allows you to specify how the distances between the two sets are aggregated.
Parameters
df1 : pandas.DataFrame The first set of single cell coordinates.
df1 : pandas.DataFrame The second set of single cell coordinates.
method : str, default='min' The aggregation method for the calculated distances. It can be one of 'min', 'max', or 'mean'.
distance : str, default='euclidean' The distance metric to use. It can be 'euclidean' or 'manhattan'.
Returns
agg_dist : numpy.float The aggregated distance between the two sets of data points based on the specified method and distance metric.
Source code in cell2cell/spatial/distances.py
pairwise_celltype_distances(df, group_col, coord_cols=['X', 'Y'], method='min', distance='euclidean', pairs=None)
Calculates pairwise distances between groups of single cells. It computes an aggregate distance between all possible combinations of groups.
Parameters
df : pandas.DataFrame A dataframe where each row is a single cell, and there are columns containing spatial coordinates and cell group.
group_col : str The name of the column that defines the groups for which distances are calculated.
coord_cols : list, default=None The list of column names that represent the coordinates of the single cells.
pairs : list A list of specific group pairs for which distances should be calculated. If not provided, all possible combinations of group pairs will be considered.
Returns
distances : pandas.DataFrame The pairwise distances between groups based on the specified group column. In this dataframe rows and columns are the cell groups used to compute distances.
Source code in cell2cell/spatial/distances.py
filtering
dist_filter_liana(liana_outputs, distances, max_dist, min_dist=0, source_col='source', target_col='target', keep_dist=False)
Filters a dataframe with outputs from LIANA based on a distance threshold defined applied to another dataframe containing distances between cell groups.
Parameters
liana_outputs : pandas.DataFrame Dataframe containing the results from LIANA, where rows are pairs of ligand-receptor interactions by pair of source-target cell groups.
distances : pandas.DataFrame Square dataframe containing distances between pairs of cell groups.
max_dist : float The distance threshold used to filter the pairs from the liana_outputs dataframe.
min_dist : float, default=0 The minimum distance between cell pairs to consider them in the interaction tensor.
source_col : str, default='source' Column name in both dataframes that represents the source cell groups.
target_col : str, default='target' Column name in both dataframes that represents the target cell groups.
keep_dist : bool, default=False To determine whether to keep the 'distance' column in the filtered output. If set to True, the 'distance' column will be retained; otherwise, it will be dropped and the LIANA dataframe will contain the original columns.
Returns
filtered_liana_outputs : pandas.DataFrame It containing pairs from the liana_outputs dataframe that meet the distance threshold criteria.
Source code in cell2cell/spatial/filtering.py
dist_filter_tensor(interaction_tensor, distances, max_dist, min_dist=0, source_axis=2, target_axis=3)
Filters an Interaction Tensor based on intercellular distances between cell types.
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor
distances : pandas.DataFrame Square dataframe containing distances between pairs of cell groups. It must contain all cell groups that act as sender and receiver cells in the tensor.
max_dist : float The maximum distance between cell pairs to consider them in the interaction tensor.
min_dist : float, default=0 The minimum distance between cell pairs to consider them in the interaction tensor.
source_axis : int, default=2 The index indicating the axis in the tensor corresponding to sender cells.
target_axis : int, default=3 The index indicating the axis in the tensor corresponding to receiver cells.
Returns
new_interaction_tensor : cell2cell.tensor.BaseTensor A tensor with communication scores made zero for cell type pairs with intercellular distance over the distance threshold.
Source code in cell2cell/spatial/filtering.py
neighborhoods
add_sliding_window_info_to_adata(adata, window_mapping)
Adds window information to the AnnData object's .obs DataFrame. Each window is represented
as a column, and cells/spots belonging to a window are marked with a 1.0, while others are marked
with a 0.0. It modifies the adata object in place.
Parameters
adata : AnnData The AnnData object to which the window information will be added.
window_mapping : dict
A dictionary mapping each window to a set of cell/spot indeces or barcodes.
This is the output from the create_moving_windows function.
Source code in cell2cell/spatial/neighborhoods.py
calculate_window_size(adata, num_windows)
Calculates the window size required to fit a specified number of windows across the width of the coordinate space in spatial transcriptomics data.
Parameters
adata : AnnData
The AnnData object containing spatial transcriptomics data. The spatial coordinates
must be stored in adata.obsm['spatial'].
num_windows : int The desired number of windows to fit across the width of the coordinate space.
Returns
window_size : float The calculated size of each window to fit the specified number of windows across the width of the coordinate space.
Source code in cell2cell/spatial/neighborhoods.py
create_sliding_windows(adata, window_size, stride)
Maps windows to the cells they contain based on spatial transcriptomics data. Returns a dictionary where keys are window identifiers and values are sets of cell indices.
Parameters
adata : AnnData
The AnnData object containing spatial transcriptomics data. The spatial coordinates
must be stored in adata.obsm['spatial'].
window_size : float The size of each square window along each dimension.
stride : float The stride with which the window moves along each dimension.
Returns
window_mapping : dict A dictionary mapping each window to a set of cell indices that fall within that window.
Source code in cell2cell/spatial/neighborhoods.py
create_spatial_grid(adata, num_bins, copy=False)
Segments spatial transcriptomics data into a square grid based on spatial coordinates and annotates each cell or spot with its corresponding grid position.
Parameters
adata : AnnData
The AnnData object containing spatial transcriptomics data. The spatial coordinates
must be stored in adata.obsm['spatial']. This object is either modified in place
or a copy is returned based on the copy parameter.
num_bins : int The number of bins (squares) along each dimension of the grid. The grid is square, so this number applies to both the horizontal and vertical divisions.
copy : bool, default=False If True, the function operates on and returns a copy of the input AnnData object. If False, the function modifies the input AnnData object in place.
Returns
adata_ : AnnData or None
If copy=True, a new AnnData object with added grid annotations is returned.
Source code in cell2cell/spatial/neighborhoods.py
stats
enrichment
fisher_representation(sample_size, class_in_sample, population_size, class_in_population)
Performs an analysis of enrichment/depletion based on observation in a sample. It computes a p-value given a fisher exact test.
Parameters
sample_size : int Size of the sample obtained or number of elements obtained from the analysis.
class_in_sample : int Number of elements of a given class that are contained in the sample. This is the class to be tested.
population_size : int Size of the sampling space. That is, the total number of possible elements to be chosen when sampling.
class_in_population : int Number of elements of a given class that are contained in the population. This is the class to be tested.
Returns
results : dict A dictionary containing the odd ratios and p-values for depletion and enrichment analysis.
Source code in cell2cell/stats/enrichment.py
hypergeom_representation(sample_size, class_in_sample, population_size, class_in_population)
Performs an analysis of enrichment/depletion based on observation in a sample. It computes a p-value given a hypergeometric distribution.
Parameters
sample_size : int Size of the sample obtained or number of elements obtained from the analysis.
class_in_sample : int Number of elements of a given class that are contained in the sample. This is the class to be tested.
population_size : int Size of the sampling space. That is, the total number of possible elements to be chosen when sampling.
class_in_population : int Number of elements of a given class that are contained in the population. This is the class to be tested.
Returns
p_vals : tuple A tuple containing the p-values for depletion and enrichment analysis, respectively.
Source code in cell2cell/stats/enrichment.py
gini
gini_coefficient(distribution)
Computes the Gini coefficient of an array of values. Code borrowed from: https://stackoverflow.com/questions/39512260/calculating-gini-coefficient-in-python-numpy
Parameters
distribution : array-like An array of values representing the distribution to be evaluated.
Returns
gini : float Gini coefficient for the evaluated distribution.
Source code in cell2cell/stats/gini.py
multitest
compute_fdrcorrection_asymmetric_matrix(X, alpha=0.1)
Computes and FDR correction or Benjamini-Hochberg procedure on a asymmetric matrix of p-values. Here, the correction is performed for every value in X.
Parameters
X : pandas.DataFrame An asymmetric dataframe of P-values.
alpha : float, default=0.1 Error rate of the FDR correction. Must be 0 < alpha < 1.
Returns
adj_X : pandas.DataFrame An asymmetric dataframe with adjusted P-values of X.
Source code in cell2cell/stats/multitest.py
compute_fdrcorrection_symmetric_matrix(X, alpha=0.1)
Computes and FDR correction or Benjamini-Hochberg procedure on a symmetric matrix of p-values. Here, only the diagonal and values on the upper triangle are considered to avoid repetition with the lower triangle.
Parameters
X : pandas.DataFrame A symmetric dataframe of P-values.
alpha : float, default=0.1 Error rate of the FDR correction. Must be 0 < alpha < 1.
Returns
adj_X : pandas.DataFrame A symmetric dataframe with adjusted P-values of X.
Source code in cell2cell/stats/multitest.py
permutation
compute_pvalue_from_dist(obs_value, dist, consider_size=False, comparison='upper')
Computes the probability of observing a value in a given distribution.
Parameters
obs_value : float An observed value used to get a p-value from a distribution.
dist : array-like A simulated oe empirical distribution of values used to compare the observed value and get a p-value.
consider_size : boolean, default=False Whether considering the size of the distribution for limiting small probabilities to be as minimal as the reciprocal of the size.
comparison : str, default='upper' Type of hypothesis testing:
- 'lower' : Lower-tailed, whether the value is smaller than most
of the values in the distribution.
- 'upper' : Upper-tailed, whether the value is greater than most
of the values in the distribution.
- 'different' : Two-tailed, whether the value is different than
most of the values in the distribution.
Returns
pval : float P-value obtained from comparing the observed value and values in the distribution.
Source code in cell2cell/stats/permutation.py
pvalue_from_dist(obs_value, dist, label='', consider_size=False, comparison='upper')
Computes a p-value for an observed value given a simulated or empirical distribution. It plots the distribution and prints the p-value.
Parameters
obs_value : float An observed value used to get a p-value from a distribution.
dist : array-like A simulated oe empirical distribution of values used to compare the observed value and get a p-value.
label : str, default='' Label used for the histogram plot. Useful for identifying it across multiple plots.
consider_size : boolean, default=False Whether considering the size of the distribution for limiting small probabilities to be as minimal as the reciprocal of the size.
comparison : str, default='upper' Type of hypothesis testing:
- 'lower' : Lower-tailed, whether the value is smaller than most
of the values in the distribution.
- 'upper' : Upper-tailed, whether the value is greater than most
of the values in the distribution.
- 'different' : Two-tailed, whether the value is different than
most of the values in the distribution.
Returns
fig : matplotlib.figure.Figure Figure that shows the histogram for dist.
pval : float P-value obtained from comparing the observed value and values in the distribution.
Source code in cell2cell/stats/permutation.py
random_switching_ppi_labels(ppi_data, genes=None, random_state=None, interaction_columns=('A', 'B'), permuted_column='both')
Randomly permutes the labels of interacting proteins in a list of protein-protein interactions.
Parameters
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
genes : list, default=None List of genes, with names matching proteins in the PPIs, to exclusively consider in the analysis.
random_state : int, default=None Seed for randomization.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
permuted_column : str, default='both' Column among the interacting_columns to permute. Options are:
- 'first' : To permute labels considering only proteins in the first
column.
- 'second' : To permute labels considering only proteins in the second
column.
- ' both' : To permute labels considering all the proteins in the list.
Returns
ppi_data_ : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) with randomly permuted labels of proteins.
Source code in cell2cell/stats/permutation.py
run_label_permutation(rnaseq_data, ppi_data, genes, analysis_setup, cutoff_setup, permutations=10000, permuted_label='gene_labels', excluded_cells=None, consider_size=True, verbose=False)
Permutes a label before computing cell-cell interaction scores.
Parameters
rnaseq_data : pandas.DataFrame Gene expression data for a bulk RNA-seq experiment or a single-cell experiment after aggregation into cell types. Columns are cell-types/tissues/samples and rows are genes.
ppi_data : pandas.DataFrame List of protein-protein interactions (or ligand-receptor pairs) used for inferring the cell-cell interactions and communication.
genes : list List of genes in rnaseq_data to exclusively consider in the analysis.
analysis_setup : dict Contains main setup for running the cell-cell interactions and communication analyses. Three main setups are needed (passed as keys):
- 'communication_score' : is the type of communication score used to detect
active ligand-receptor pairs between each pair of cell.
It can be:
- 'expression_thresholding'
- 'expression_product'
- 'expression_mean'
- 'cci_score' : is the scoring function to aggregate the communication
scores.
It can be:
- 'bray_curtis'
- 'jaccard'
- 'count'
- 'cci_type' : is the type of interaction between two cells. If it is
undirected, all ligands and receptors are considered from both cells.
If it is directed, ligands from one cell and receptors from the other
are considered separately with respect to ligands from the second
cell and receptor from the first one.
So, it can be:
- 'undirected'
- 'directed
cutoff_setup : dict Contains two keys: 'type' and 'parameter'. The first key represent the way to use a cutoff or threshold, while parameter is the value used to binarize the expression values. The key 'type' can be:
- 'local_percentile' : computes the value of a given percentile, for each
gene independently. In this case, the parameter corresponds to the
percentile to compute, as a float value between 0 and 1.
- 'global_percentile' : computes the value of a given percentile from all
genes and samples simultaneously. In this case, the parameter
corresponds to the percentile to compute, as a float value between
0 and 1. All genes have the same cutoff.
- 'file' : load a cutoff table from a file. Parameter in this case is the
path of that file. It must contain the same genes as index and same
samples as columns.
- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in each sample. This allows to use specific cutoffs for
each sample. The columns here must be the same as the ones in the
rnaseq_data.
- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
for each gene in only one column. These cutoffs will be applied to
all samples.
- 'constant_value' : binarizes the expression. Evaluates whether
expression is greater than the value input in the parameter.
permutations : int, default=100 Number of permutations where in each of them a random shuffle of labels is performed, followed of computing CCI scores to create a null distribution.
permuted_label : str, default='gene_labels' Label to be permuted. Types are:
- 'genes' : Permutes cell-labels in a gene-specific way.
- 'gene_labels' : Permutes the labels of genes in the RNA-seq dataset.
- 'cell_labels' : Permutes the labels of cell-types/tissues/samples
in the RNA-seq dataset.
excluded_cells : list, default=None List of cells to exclude from the analysis.
consider_size : boolean, default=True Whether considering the size of the distribution for limiting small probabilities to be as minimal as the reciprocal of the size.
verbose : boolean, default=False Whether printing or not steps of the analysis.
Returns
cci_pvals : pandas.DataFrame Matrix where rows and columns are cell-types/tissues/samples and each value is a P-value for the corresponding CCI score.
Source code in cell2cell/stats/permutation.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
tensor
coupled_factorization
coupled_non_negative_parafac(tensor1, tensor2, rank, mode_mapping, mask1=None, mask2=None, n_iter_max=100, init='svd', svd='truncated_svd', tol=1e-06, random_state=None, verbose=0, normalize_factors=False, return_errors=False, cvg_criterion='abs_rec_error', balance_errors=True, manual_weights=(0.5, 0.5), separate_weights=True)
Performs coupled non-negative CP decomposition on two tensors with flexible mode mapping.
Parameters
tensor1 : tensorly.tensor First tensor to factorize.
tensor2 : tensorly.tensor Second tensor to factorize.
rank : int Number of components for the factorization.
mode_mapping : dict or int/list (for backward compatibility) Mode mapping specification. Can be: - dict: {'shared': [(t1_mode, t2_mode), ...]} # Pairs of shared modes - int/list: non_shared_modes (for backward compatibility with same-dimension tensors)
mask1 : tensorly.tensor, default=None Mask for the first tensor.
mask2 : tensorly.tensor, default=None Mask for the second tensor.
n_iter_max : int, default=100 Maximum number of iterations.
init : str, default='svd' Initialization method. Options are {'svd', 'random'}.
svd : str, default='truncated_svd' SVD function to use.
tol : float, default=1e-7 Convergence tolerance.
random_state : int, default=None Random state for reproducibility.
verbose : bool, default=False Whether to print progress.
normalize_factors : bool, default=True Whether to normalize factors.
return_errors : bool, default=False Whether to return reconstruction errors.
cvg_criterion : str, default='abs_rec_error' Convergence criterion. Options are {'abs_rec_error', 'rec_error'}.
balance_errors : bool, default=True Whether to balance errors based on tensor sizes.
manual_weights : tuple, default=(0.5, 0.5) Manual weights (weight1, weight2) for importance of tensors in the factorization. Weights should be positive. Example: (2.0, 1.0) gives tensor1 twice the importance of tensor2 in both the factorization and the combined error metric. If None, automatic weight calculation is performed to have weight1 and weight2 inversely proportional to non-shared mode dimensions of each tensor.
separate_weights : bool, default=True Whether to use separate weights for each tensor during optimization.
Returns
cp_tensor1 : CPTensor CP decomposition result for tensor1.
cp_tensor2 : CPTensor CP decomposition result for tensor2.
errors : tuple, optional
Reconstruction errors for both tensors, if return_errors is True.
Examples
Two tensors with different dimensions but shared modes 0,1
tensor1 = tl.random.random((10, 20, 30, 40)) tensor2 = tl.random.random((10, 20, 50)) mode_mapping = {'shared': [(0, 0), (1, 1)]} cp1, cp2 = coupled_non_negative_parafac(tensor1, tensor2, rank=5, ... mode_mapping=mode_mapping)
Using manual weights to prioritize tensor1
cp1, cp2 = coupled_non_negative_parafac(tensor1, tensor2, rank=5, ... mode_mapping=mode_mapping, ... manual_weights=(2.0, 1.0))
Source code in cell2cell/tensor/coupled_factorization.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 | |
coupled_tensor
CoupledInteractionTensor
Coupled Tensor Factorization for two interaction tensors with flexible mode mapping.
This class performs simultaneous non-negative CP decomposition on two tensors that can have different numbers of dimensions but share some modes. The mode mapping explicitly specifies which dimensions are shared and which are tensor-specific.
Parameters
tensor1 : cell2cell.tensor.BaseTensor First interaction tensor (e.g., InteractionTensor, PreBuiltTensor).
tensor2 : cell2cell.tensor.BaseTensor Second interaction tensor (e.g., InteractionTensor, PreBuiltTensor).
mode_mapping : dict or int/list (for backward compatibility) Mode mapping specification. Can be: - dict: {'shared': [(t1_mode, t2_mode), ...]} # Pairs of shared modes - int/list: non_shared_modes (for backward compatibility with same-dimension tensors)
tensor1_name : str, default='Tensor1' Name for the first tensor (used in factor labeling).
tensor2_name : str, default='Tensor2' Name for the second tensor (used in factor labeling).
auto_sort_shared : bool, default=True Whether to automatically sort the shared modes in both tensors if there is a mismatch in the order of elements in any shared mode. It will reorder the elements in the tensor2 to match the order in tensor1.
balance_errors : bool, default=True Whether to balance the errors based on tensor-specific dimensions.
device : str, default=None Device to use when backend allows using multiple devices.
Attributes
tensor1 : tensorly.tensor First tensor object.
tensor2 : tensorly.tensor Second tensor object.
mode_mapping : dict The mode mapping specification.
cp1 : CPTensor CP decomposition result for tensor1.
cp2 : CPTensor CP decomposition result for tensor2.
factors1 : dict Factor loadings for tensor1.
factors2 : dict Factor loadings for tensor2.
factors : dict Combined factor loadings with shared and tensor-specific factors.
factorization_errors1_ : list List of reconstruction errors for tensor1 at each iteration of the coupled tensor factorization. Only available after running compute_tensor_factorization.
factorization_errors2_ : list List of reconstruction errors for tensor2 at each iteration of the coupled tensor factorization. Only available after running compute_tensor_factorization.
combined_errors_ : list List of combined weighted reconstruction errors at each iteration of the coupled tensor factorization. The weighting follows the balance_errors parameter. Only available after running compute_tensor_factorization.
manual_weights : tuple Manual weights (weight1, weight2) for importance of tensors in the factorization. Weights should be positive. Example: (2.0, 1.0) gives tensor1 twice the importance of tensor2 in both the factorization and the combined error metric.
Source code in cell2cell/tensor/coupled_tensor.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 | |
shape
property
Return shapes of both tensors
compute_tensor_factorization(rank, tf_type='coupled_non_negative_cp', init='svd', svd='truncated_svd', random_state=None, runs=1, normalize_loadings=True, var_ordered_factors=True, n_iter_max=100, tol=1e-06, balance_errors=None, manual_weights=(0.5, 0.5), verbose=False, **kwargs)
Performs coupled tensor factorization on both tensors.
Parameters
rank : int Number of components for the factorization.
tf_type : str, default='coupled_non_negative_cp' Type of Tensor Factorization.
init : str, default='svd' Initialization method. Options are {'svd', 'random'}.
svd : str, default='truncated_svd' SVD function to use.
random_state : int, default=None Random state for reproducibility.
runs : int, default=1 Number of models to choose among and find the lowest error.
normalize_loadings : boolean, default=True Whether normalizing the loadings in each factor.
var_ordered_factors : boolean, default=True Whether ordering factors by variance explained.
n_iter_max : int, default=100 Maximum number of iterations.
tol : float, default=1e-7 Convergence tolerance.
balance_errors : bool, default=None Whether to balance the errors based on tensor-specific dimensions. If None, valued used when initializing the CoupledTensor will be used.
manual_weights : tuple, default=(0.5, 0.5) Manual weights (weight1, weight2) for importance of tensors in the factorization. Weights should be positive. Example: (2.0, 1.0) gives tensor1 twice the importance of tensor2 in both the factorization and the combined error metric. If None, automatic weight calculation is performed to have weight1 and weight2 inversely proportional to non-shared mode dimensions of each tensor.
verbose : boolean, default=False Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for the tensor factorization.
Source code in cell2cell/tensor/coupled_tensor.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
copy()
elbow_rank_selection(upper_rank=50, runs=20, tf_type='coupled_non_negative_cp', init='random', svd='truncated_svd', metric='error', random_state=None, n_iter_max=100, tol=1e-06, automatic_elbow=True, manual_elbow=None, smooth=False, mask1=None, mask2=None, balance_errors=None, manual_weights=(0.5, 0.5), ci='std', figsize=(4, 2.25), fontsize=14, filename=None, output_fig=True, show_individual=False, verbose=False, **kwargs)
Elbow analysis on the error/similarity achieved by the Coupled Tensor Factorization.
Parameters
upper_rank : int, default=50 Upper bound of ranks to explore with the elbow analysis.
runs : int, default=20 Number of tensor factorization performed for a given rank.
tf_type : str, default='coupled_non_negative_cp' Type of Tensor Factorization.
init : str, default='random' Initialization method. {'svd', 'random'}
svd : str, default='truncated_svd' Function to compute the SVD.
metric : str, default='error' Metric to perform the elbow analysis.
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
random_state : int, default=None Seed for randomization.
n_iter_max : int, default=100 Maximum number of iterations.
tol : float, default=1e-7 Convergence tolerance.
automatic_elbow : boolean, default=True Whether using an automatic strategy to find the elbow.
manual_elbow : int, default=None
Rank to highlight. Considered only when automatic_elbow=False.
smooth : boolean, default=False Whether smoothing the curve.
mask1 : tensorly.tensor, default=None Mask for the first tensor.
mask2 : tensorly.tensor, default=None Mask for the second tensor.
balance_errors : bool, default=None Whether to balance the errors based on tensor-specific dimensions. If None, valued used when initializing the CoupledTensor will be used.
manual_weights : tuple, default=(0.5, 0.5) Manual weights (weight1, weight2) for importance of tensors in the factorization. Weights should be positive. Example: (2.0, 1.0) gives tensor1 twice the importance of tensor2 in both the factorization and the combined error metric. If None, automatic weight calculation is performed to have weight1 and weight2 inversely proportional to non-shared mode dimensions of each tensor.
ci : str, default='std' Confidence interval. {'std', '95%'}
figsize : tuple, default=(4, 2.25) Figure size.
fontsize : int, default=14 Font size for axis labels.
filename : str, default=None Path to save the figure.
output_fig : boolean, default=True Whether generating the figure.
show_individual : boolean, default=False Whether to show individual tensor metrics alongside the combined metric. Applies to both 'error' and 'similarity' metrics when runs > 1. If True, plots will show tensor1, tensor2, and combined metrics. If False, only shows the combined metric.
verbose : boolean, default=False Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for the tensor factorization.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
loss : dict Dictionary with 'tensor1', 'tensor2', and 'combined' keys, each containing a list of (rank, value) tuples for the respective metric.
Source code in cell2cell/tensor/coupled_tensor.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 | |
excluded_value_fraction(tensor='both')
Returns the fraction of excluded values in the tensor(s)
Source code in cell2cell/tensor/coupled_tensor.py
explained_variance()
Calculate explained variance for coupled factorization
Source code in cell2cell/tensor/coupled_tensor.py
export_factor_loadings(filename, save_separate=False)
Export factor loadings to Excel file
Source code in cell2cell/tensor/coupled_tensor.py
get_factorization_errors(plot=False, tensor1_name=None, tensor2_name=None, figsize=(10, 5), fontsize=12, show_individual=True, filename=None)
Retrieves the factorization errors across iterations for coupled tensor factorization.
Source code in cell2cell/tensor/coupled_tensor.py
get_top_factor_elements(order_name, factor_name, top_number=10, tensor='unified')
Get top elements for a given factor
Source code in cell2cell/tensor/coupled_tensor.py
missing_fraction(tensor='both')
Returns the fraction of values that are missing (NaNs) in the tensor(s)
Source code in cell2cell/tensor/coupled_tensor.py
reorder_metadata(metadata1, metadata2)
Reorder metadata to match the factor ordering used by the coupled tensor.
Parameters
metadata1 : list List of DataFrames/metadata for tensor1 factors, in original tensor1 mode order
metadata2 : list List of DataFrames/metadata for tensor2 factors, in original tensor2 mode order
Returns
reordered_metadata : list Metadata reordered to match self.factors ordering: [shared_modes, tensor1_specific_modes, tensor2_specific_modes]
Source code in cell2cell/tensor/coupled_tensor.py
sparsity_fraction(tensor='both')
Returns the fraction of values that are zeros in the tensor(s)
Source code in cell2cell/tensor/coupled_tensor.py
to_device(device)
Move tensors to specified device
Parameters
device : str Device name to use for the decomposition. Options could be 'cpu', 'cuda', 'gpu', depending on the backend used with tensorly.
Source code in cell2cell/tensor/coupled_tensor.py
write_file(filename)
external_scores
dataframes_to_tensor(context_df_dict, sender_col, receiver_col, ligand_col, receptor_col, score_col, how='inner', outer_fraction=0.0, lr_fill=np.nan, cell_fill=np.nan, lr_sep='^', dup_aggregation='max', context_order=None, order_labels=None, sort_elements=True, device=None)
Generates an InteractionTensor from a dictionary containing dataframes for all contexts.
Parameters
context_df_dict : dict Dictionary containing a dataframe for each context. The dataframe must contain columns containing sender cells, receiver cells, ligands, receptors, and communication scores, separately. Keys are context names and values are dataframes.
sender_col : str Name of the column containing the sender cells in all context dataframes.
receiver_col : str Name of the column containing the receiver cells in all context dataframes.
ligand_col : str Name of the column containing the ligands in all context dataframes.
receptor_col : str Name of the column containing the receptors in all context dataframes.
score_col : str Name of the column containing the communication scores in all context dataframes.
how : str, default='inner' Approach to consider cell types and genes present across multiple contexts.
- 'inner' : Considers only cell types and LR pairs that are present in all
contexts (intersection).
- 'outer' : Considers all cell types and LR pairs that are present
across contexts (union).
- 'outer_lrs' : Considers only cell types that are present in all
contexts (intersection), while all LR pairs that are
present across contexts (union).
- 'outer_cells' : Considers only LR pairs that are present in all
contexts (intersection), while all cell types that are
present across contexts (union).
outer_fraction : float, default=0.0
Threshold to filter the elements when how includes any outer option.
Elements with a fraction abundance across contexts (in context_df_dict)
at least this threshold will be included. When this value is 0, considers
all elements across the samples. When this value is 1, it acts as using
how='inner'.
lr_fill : float, default=numpy.nan Value to fill communication scores when a ligand-receptor pair is not present across all contexts.
cell_fill : float, default=numpy.nan Value to fill communication scores when a cell is not present across all ligand-receptor pairs or all contexts.
lr_sep : str, default='^' Separation character to join ligands and receptors into a LR pair name.
dup_aggregation : str, default='max' Approach to aggregate communication score if there are multiple instances of an LR pair for a specific sender-receiver pair in one of the dataframes.
- 'max' : Maximum of the multiple instances
- 'min' : Minimum of the multiple instances
- 'mean' : Average of the multiple instances
- 'median' : Median of the multiple instances
context_order : list, default=None List used to sort the contexts when building the tensor. Elements must be all elements in context_df_dict.keys().
order_labels : list, default=None List containing the labels for each order or dimension of the tensor. For example: ['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells']
sort_elements : boolean, default=True Whether alphabetically sorting elements in the InteractionTensor. The Context Dimension is not sorted if a 'context_order' list is provided.
device : str, default=None Device to use when backend is pytorch. Options are:
Returns
interaction_tensor : cell2cell.tensor.PreBuiltTensor A communication tensor generated for the Tensor-cell2cell pipeline.
Source code in cell2cell/tensor/external_scores.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
factor_manipulation
normalize_factors(factors)
L2-normalizes the factors considering all tensor dimensions from a tensor decomposition result
Parameters
factors : dict
Ordered dictionary containing a dataframe with the factor loadings for each
dimension/order of the tensor. This is the result from a tensor decomposition,
it can be found as the attribute factors in any tensor class derived from the
class BaseTensor (e.g. BaseTensor.factors).
Returns
norm_factors : dict The normalized factors.
Source code in cell2cell/tensor/factor_manipulation.py
shuffle_factors(factors, axis=0)
factorization
normalized_error(reference_tensor, reconstructed_tensor)
Computes a normalized error between two tensors
Parameters
reference_tensor : ndarray list A tensor that could be a list of lists, a multidimensional numpy array or a tensorly.tensor. This tensor is the input of a tensor decomposition and used as reference in the normalized error for a new tensor reconstructed from the factors of the tensor decomposition.
reconstructed_tensor : ndarray list A tensor that could be a list of lists, a multidimensional numpy array or a tensorly.tensor. This tensor is an approximation of the reference_tensor by using the resulting factors of a tensor decomposition to compute it.
Returns
norm_error : float The normalized error between a reference tensor and a reconstructed tensor. The error is normalized by dividing by the Frobinius norm of the reference tensor.
Source code in cell2cell/tensor/factorization.py
metrics
correlation_index(factors_1, factors_2, tol=5e-16, method='stacked')
CorrIndex implementation to assess tensor decomposition outputs. From [1] Sobhani et al 2022 (https://doi.org/10.1016/j.sigpro.2022.108457). Metric is scaling and column-permutation invariant, wherein each column is a factor.
Parameters
factors_1 : dict
Ordered dictionary containing a dataframe with the factor loadings for each
dimension/order of the tensor. This is the result from a tensor decomposition,
it can be found as the attribute factors in any tensor class derived from the
class BaseTensor (e.g. BaseTensor.factors).
factors_2 : dict Similar to factors_1 but coming from another tensor decomposition of a tensor with equal shape.
tol : float, default=5e-16 Precision threshold below which to call the CorrIndex score 0.
method : str, default='stacked' Method to obtain the CorrIndex by comparing the A matrices from two decompositions. Possible options are:
- 'stacked' : The original method implemented in [1]. Here all A matrices from the same decomposition are
vertically concatenated, building a big A matrix for each decomposition.
- 'max_score' : This computes the CorrIndex for each pair of A matrices (i.e. between A_1 in factors_1 and
factors_2, between A_2 in factors_1 and factors_2, and so on). Then the max score is
selected (the most conservative approach). In other words, it selects the max score among the
CorrIndexes computed dimension-wise.
- 'min_score' : Similar to 'max_score', but the min score is selected (the least conservative approach).
- 'avg_score' : Similar to 'max_score', but the avg score is selected.
Returns
score : float CorrIndex metric [0,1]; lower score indicates higher similarity between matrices
Source code in cell2cell/tensor/metrics.py
pairwise_correlation_index(factors, tol=5e-16, method='stacked')
Computes the CorrIndex between all pairs of factors
Parameters
factors : list
List with multiple Ordered dictionaries, each containing a dataframe with
the factor loadings for each dimension/order of the tensor. This is the
result from a tensor decomposition, it can be found as the attribute
factors in any tensor class derived from the class BaseTensor
(e.g. BaseTensor.factors).
tol : float, default=5e-16 Precision threshold below which to call the CorrIndex score 0.
method : str, default='stacked' Method to obtain the CorrIndex by comparing the A matrices from two decompositions. Possible options are:
- 'stacked' : The original method implemented in [1]. Here all A matrices from the same decomposition are
vertically concatenated, building a big A matrix for each decomposition.
- 'max_score' : This computes the CorrIndex for each pair of A matrices (i.e. between A_1 in factors_1 and
factors_2, between A_2 in factors_1 and factors_2, and so on). Then the max score is
selected (the most conservative approach). In other words, it selects the max score among the
CorrIndexes computed dimension-wise.
- 'min_score' : Similar to 'max_score', but the min score is selected (the least conservative approach).
- 'avg_score' : Similar to 'max_score', but the avg score is selected.
Returns
scores : pd.DataFrame Dataframe with CorrIndex metric for each pair of decompositions. This metric bounds are [0,1]; lower score indicates higher similarity between matrices
Source code in cell2cell/tensor/metrics.py
subset
find_element_indexes(interaction_tensor, elements, axis=0, remove_duplicates=True, keep='first', original_order=False)
Finds the location/indexes of a list of elements in one of the axis of an InteractionTensor.
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor
elements : list A list of names for the elements to find in one of the axis.
axis : int, default=0 An axis of the interaction_tensor, representing one of its dimensions.
remove_duplicates : boolean, default=True
Whether removing duplicated names in elements.
keep : str, default='first' Determines which duplicates (if any) to keep. Options are:
- first : Drop duplicates except for the first occurrence.
- last : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
original_order : boolean, default=False
Whether keeping the original order of the elements in
interaction_tensor.order_names[axis] or keeping the
new order as indicated in elements.
Returns
indexes : list List of indexes for the elements that where found in the axis indicated of the interaction_tensor.
Source code in cell2cell/tensor/subset.py
subset_metadata(tensor_metadata, interaction_tensor, sample_col='Element')
Subsets the metadata of an InteractionTensor to contain only elements in a reference InteractionTensor (interaction_tensor).
Parameters
tensor_metadata : list
List of pandas dataframes with metadata information for elements of each
dimension in the tensor. A column called as the variable sample_col contains
the name of each element in the tensor while another column called as the
variable group_col contains the metadata or grouping information of each
element.
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor. This tensor is used as reference to subset the metadata. The subset metadata will contain only elements that are present in this tensor, so if metadata was originally built for another tensor, the elements that are exclusive for that original tensor will be excluded.
sample_col : str, default='Element' Name of the column containing the element names in the metadata.
Returns
subset_metadata : list
List of pandas dataframes with metadata information for elements contained
in interaction_tensor.order_names. It is a subset of tensor_metadata.
Source code in cell2cell/tensor/subset.py
subset_tensor(interaction_tensor, subset_dict, remove_duplicates=True, keep='first', original_order=False)
Subsets an InteractionTensor to contain only specific elements in respective dimensions.
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor generated with any of the tensor class in cell2cell.tensor
subset_dict : dict Dictionary to subset the tensor. It must contain the axes or dimensions that will be subset as the keys of the dictionary and the values corresponds to lists of element names for the respective axes or dimensions. Those axes that are not present in this dictionary will not be subset. E.g. {0 : ['Context 1', 'Context2'], 1: ['LR 10', 'LR 100']}
remove_duplicates : boolean, default=True
Whether removing duplicated names in elements.
keep : str, default='first' Determines which duplicates (if any) to keep. Options are:
- first : Drop duplicates except for the first occurrence.
- last : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
original_order : boolean, default=False
Whether keeping the original order of the elements in
interaction_tensor.order_names or keeping the
new order as indicated in the lists in the subset_dict.
Returns
subset_tensor : cell2cell.tensor.BaseTensor
A copy of interaction_tensor that was subset to contain
only the elements specified for the respective axis in the
subset_dict. Corresponds to a communication tensor
generated with any of the tensor class in cell2cell.tensor
Source code in cell2cell/tensor/subset.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
tensor
BaseTensor
Empty base tensor class that contains the main functions for the Tensor Factorization of a Communication Tensor
Attributes
communication_score : str Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
how : str Approach to consider cell types and genes present across multiple contexts.
- 'inner' : Considers only cell types and genes that are present in all
contexts (intersection).
- 'outer' : Considers all cell types and genes that are present
across contexts (union).
- 'outer_genes' : Considers only cell types that are present in all
contexts (intersection), while all genes that are
present across contexts (union).
- 'outer_cells' : Considers only genes that are present in all
contexts (intersection), while all cell types that are
present across contexts (union).
outer_fraction : float
Threshold to filter the elements when how includes any outer option.
Elements with a fraction abundance across samples (in rnaseq_matrices)
at least this threshold will be included. When this value is 0, considers
all elements across the samples. When this value is 1, it acts as using
how='inner'.
tensor : tensorly.tensor Tensor object created with the library tensorly.
genes : list List of strings detailing the genes used through all contexts. Obtained depending on the attribute 'how'.
cells : list List of strings detailing the cells used through all contexts. Obtained depending on the attribute 'how'.
order_names : list List of lists containing the string names of each element in each of the dimensions or orders in the tensor. For a 4D-Communication tensor, the first list should contain the names of the contexts, the second the names of the ligand-receptor interactions, the third the names of the sender cells and the fourth the names of the receiver cells.
order_labels : list List of labels for dimensions or orders in the tensor.
tl_object : ndarray list
A tensorly object containing a list of initialized factors of the tensor
decomposition where element i is of shape (tensor.shape[i], rank).
norm_tl_object : ndarray list
A tensorly object containing a list of initialized factors of the tensor
decomposition where element i is of shape (tensor.shape[i], rank). This
results from normalizing the factor loadings of the tl_object.
factors : dict Ordered dictionary containing a dataframe with the factor loadings for each dimension/order of the tensor.
rank : int Rank of the Tensor Factorization (number of factors to deconvolve the original tensor).
mask : ndarray list Helps avoiding missing values during a tensor factorization. A mask should be a boolean array of the same shape as the original tensor and should be 0 where the values are missing and 1 everywhere else.
explained_variance : float Explained variance score for a tnesor factorization.
explained_variance_ratio_ : ndarray list Percentage of variance explained by each of the factors. Only present when "normalize_loadings" is True. Otherwise, it is None.
loc_nans : ndarray list
An array of shape equal to tensor with ones where NaN values were assigned
when building the tensor. Other values are zeros. It stores the
location of the NaN values.
loc_zeros : ndarray list
An array of shape equal to tensor with ones where zeros that are not in
loc_nans are located. Other values are assigned a zero. It tracks the
real zero values rather than NaN values that were converted to zero.
elbow_metric : str Stores the metric used to perform the elbow analysis (y-axis).
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
elbow_metric_mean : ndarray list Metric computed from the elbow analysis for each of the different rank evaluated. This list contains (X,Y) pairs where X values are the different ranks and Y values are the mean value of the metric employed. This mean is computed from multiple runs, or the values for just one run. Metric could be the normalized error of the decomposition or the similarity between multiple runs with different initialization, based on the CorrIndex.
elbow_metric_raw : ndarray list
Similar to elbow_metric_mean, but instead of containing (X, Y) pairs,
it is an array of shape runs by ranks that were used for the analysis.
It contains all the metrics for each run in each of the evaluated ranks.
factorization_errors_ : list List of reconstruction errors at each iteration of the tensor factorization. Only available after running compute_tensor_factorization. Each element is the normalized reconstruction error for that iteration.
shape : tuple Shape of the tensor.
Source code in cell2cell/tensor/tensor.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 | |
shape
property
Returns the shape of the tensor
compute_tensor_factorization(rank, tf_type='non_negative_cp', init='svd', svd='truncated_svd', random_state=None, runs=1, normalize_loadings=True, var_ordered_factors=True, n_iter_max=100, tol=1e-06, verbose=False, **kwargs)
Performs a Tensor Factorization. There are no returns, instead the attributes factors, rank, and factorization_errors_ of the Tensor class are updated.
Parameters
rank : int Rank of the Tensor Factorization (number of factors to deconvolve the original tensor).
tf_type : str, default='non_negative_cp' Type of Tensor Factorization.
- 'non_negative_cp' : Non-negative PARAFAC through the traditional ALS.
- 'non_negative_cp_hals' : Non-negative PARAFAC through the Hierarchical ALS.
It reaches an optimal solution faster than the
traditional ALS, but it does not allow a mask.
- 'parafac' : PARAFAC through the traditional ALS. It allows negative loadings.
- 'constrained_parafac' : PARAFAC through the traditional ALS. It allows
negative loadings. Also, it incorporates L1 and L2
regularization, includes a 'non_negative' option, and
allows constraining the sparsity of the decomposition.
For more information, see
http://tensorly.org/stable/modules/generated/tensorly.decomposition.constrained_parafac.html#tensorly.decomposition.constrained_parafac
init : str, default='svd' Initialization method for computing the Tensor Factorization.
svd : str, default='truncated_svd' Function to use to compute the SVD, acceptable values in tensorly.SVD_FUNS
random_state : int, default=None Seed for randomization.
runs : int, default=1 Number of models to choose among and find the lowest error. This helps to avoid local minima when using runs > 1.
normalize_loadings : boolean, default=True Whether normalizing the loadings in each factor to unit Euclidean length.
var_ordered_factors : boolean, default=True
Whether ordering factors by the variance they explain. The order is from
highest to lowest variance. normalize_loadings must be True. Otherwise,
this parameter is ignored.
tol : float, default=10e-7
Tolerance for the decomposition algorithm to stop when the variation in
the reconstruction error is less than the tolerance. Lower tol helps
to improve the solution obtained from the decomposition, but it takes
longer to run.
n_iter_max : int, default=100
Maximum number of iteration to reach an optimal solution with the
decomposition algorithm. Higher n_iter_maxhelps to improve the solution
obtained from the decomposition, but it takes longer to run.
verbose : boolean, default=False Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for the tensor factorization according to inputs in tensorly.
Source code in cell2cell/tensor/tensor.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
copy()
elbow_rank_selection(upper_rank=50, runs=20, tf_type='non_negative_cp', init='random', svd='truncated_svd', metric='error', random_state=None, n_iter_max=100, tol=1e-06, automatic_elbow=True, manual_elbow=None, smooth=False, mask=None, ci='std', figsize=(4, 2.25), fontsize=14, filename=None, output_fig=True, verbose=False, **kwargs)
Elbow analysis on the error achieved by the Tensor Factorization for selecting the number of factors to use. A plot is made with the results.
Parameters
upper_rank : int, default=50 Upper bound of ranks to explore with the elbow analysis.
runs : int, default=20 Number of tensor factorization performed for a given rank. Each factorization varies in the seed of initialization.
tf_type : str, default='non_negative_cp' Type of Tensor Factorization.
- 'non_negative_cp' : Non-negative PARAFAC through the traditional ALS.
- 'non_negative_cp_hals' : Non-negative PARAFAC through the Hierarchical ALS.
It reaches an optimal solution faster than the
traditional ALS, but it does not allow a mask.
- 'parafac' : PARAFAC through the traditional ALS. It allows negative loadings.
- 'constrained_parafac' : PARAFAC through the traditional ALS. It allows
negative loadings. Also, it incorporates L1 and L2
regularization, includes a 'non_negative' option, and
allows constraining the sparsity of the decomposition.
For more information, see
http://tensorly.org/stable/modules/generated/tensorly.decomposition.constrained_parafac.html#tensorly.decomposition.constrained_parafac
init : str, default='svd' Initialization method for computing the Tensor Factorization.
svd : str, default='truncated_svd' Function to compute the SVD, acceptable values in tensorly.SVD_FUNS
metric : str, default='error' Metric to perform the elbow analysis (y-axis)
- 'error' : Normalized error to compute the elbow.
- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
random_state : int, default=None Seed for randomization.
tol : float, default=10e-7
Tolerance for the decomposition algorithm to stop when the variation in
the reconstruction error is less than the tolerance. Lower tol helps
to improve the solution obtained from the decomposition, but it takes
longer to run.
n_iter_max : int, default=100
Maximum number of iteration to reach an optimal solution with the
decomposition algorithm. Higher n_iter_maxhelps to improve the solution
obtained from the decomposition, but it takes longer to run.
automatic_elbow : boolean, default=True Whether using an automatic strategy to find the elbow. If True, the method implemented by the package kneed is used.
manual_elbow : int, default=None
Rank or number of factors to highlight in the curve of error achieved by
the Tensor Factorization. This input is considered only when
automatic_elbow=True
smooth : boolean, default=False Whether smoothing the curve with a Savitzky-Golay filter.
mask : ndarray list, default=None Helps avoiding missing values during a tensor factorization. A mask should be a boolean array of the same shape as the original tensor and should be 0 where the values are missing and 1 everywhere else.
ci : str, default='std' Confidence interval for representing the multiple runs in each rank.
figsize : tuple, default=(4, 2.25) Figure size, width by height
fontsize : int, default=14 Fontsize for axis labels.
filename : str, default=None Path to save the figure of the elbow analysis. If None, the figure is not saved.
output_fig : boolean, default=True Whether generating the figure with matplotlib.
verbose : boolean, default=False Whether printing or not steps of the analysis.
**kwargs : dict Extra arguments for the tensor factorization according to inputs in tensorly.
Returns
fig : matplotlib.figure.Figure Figure object made with matplotlib
loss : list List of normalized errors for each rank. Here the errors are te average across distinct runs for each rank.
Source code in cell2cell/tensor/tensor.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 | |
excluded_value_fraction()
Returns the fraction of excluded values in the tensor, given the values that are masked in tensor.mask
Returns
excluded_fraction : float Fraction of missing/excluded values in the tensor.
Source code in cell2cell/tensor/tensor.py
explained_variance()
Computes the explained variance score for a tensor decomposition. Inspired on the function in sklearn.metrics.explained_variance_score.
Returns
explained_variance : float Explained variance score for a tnesor factorization.
Source code in cell2cell/tensor/tensor.py
export_factor_loadings(filename)
Exports the factor loadings of the tensor into an Excel file
Parameters
filename : str Full path and filename to store the file. E.g., '/home/user/Loadings.xlsx'
Source code in cell2cell/tensor/tensor.py
get_factorization_errors(plot=False, figsize=(8, 5), fontsize=12, filename=None)
Retrieves the factorization errors across iterations and optionally plots them.
Source code in cell2cell/tensor/tensor.py
get_top_factor_elements(order_name, factor_name, top_number=10)
Obtains the top-elements with higher loadings for a given factor
Parameters
order_name : str Name of the dimension/order in the tensor according to the keys of the dictionary in BaseTensor.factors. The attribute factors is built once the tensor factorization is run.
factor_name : str Name of one of the factors. E.g., 'Factor 1'
top_number : int, default=10 Number of top-elements to return
Returns
top_elements : pandas.DataFrame A dataframe with the loadings of the top-elements for the given factor.
Source code in cell2cell/tensor/tensor.py
missing_fraction()
Returns the fraction of values that are missing (NaNs) in the tensor, given the values that are in tensor.loc_nans
Returns
missing_fraction : float Fraction of values that are real zeros.
Source code in cell2cell/tensor/tensor.py
sparsity_fraction()
Returns the fraction of values that are zeros in the tensor, given the values that are in tensor.loc_zeros
Returns
sparsity_fraction : float Fraction of values that are real zeros.
Source code in cell2cell/tensor/tensor.py
to_device(device)
Move tensors to specified device
Parameters
device : str Device name to use for the decomposition. Options could be 'cpu', 'cuda', 'gpu', depending on the backend used with tensorly.
Source code in cell2cell/tensor/tensor.py
write_file(filename)
Exports this object into a pickle file.
Parameters
filename : str Complete path to the file wherein the variable will be stored. For example: /home/user/variable.pkl
Source code in cell2cell/tensor/tensor.py
InteractionTensor
Bases: BaseTensor
4D-Communication Tensor built from gene expression matrices for different contexts and a list of ligand-receptor pairs
Parameters
rnaseq_matrices : list A list with dataframes of gene expression wherein the rows are the genes and columns the cell types, tissues or samples.
ppi_data : pandas.DataFrame A dataframe containing protein-protein interactions (rows). It has to contain at least two columns, one for the first protein partner in the interaction as well as the second protein partner.
order_labels : list, default=None List containing the labels for each order or dimension of the tensor. For example: ['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells']
context_names : list, default=None A list of strings containing the names of the corresponding contexts to each rnaseq_matrix. The length of this list must match the length of the list rnaseq_matrices.
how : str, default='inner' Approach to consider cell types and genes present across multiple contexts.
- 'inner' : Considers only cell types and genes that are present in all
contexts (intersection).
- 'outer' : Considers all cell types and genes that are present
across contexts (union).
- 'outer_genes' : Considers only cell types that are present in all
contexts (intersection), while all genes that are
present across contexts (union).
- 'outer_cells' : Considers only genes that are present in all
contexts (intersection), while all cell types that are
present across contexts (union).
outer_fraction : float, default=0.0
Threshold to filter the elements when how includes any outer option.
Elements with a fraction abundance across samples (in rnaseq_matrices)
at least this threshold will be included. When this value is 0, considers
all elements across the samples. When this value is 1, it acts as using
how='inner'.
communication_score : str, default='expression_mean' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
complex_agg_method : str, default='min' Method to aggregate the expression value of multiple genes in a complex.
- 'min' : Minimum expression value among all genes.
- 'mean' : Average expression value among all genes.
- 'gmean' : Geometric mean expression value among all genes.
upper_letter_comparison : boolean, default=True Whether making uppercase the gene names in the expression matrices and the protein names in the ppi_data to match their names and integrate their respective expression level. Useful when there are inconsistencies in the names between the expression matrix and the ligand-receptor annotations.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
group_ppi_by : str, default=None Column name in the list of PPIs used for grouping individual PPIs into major groups such as signaling pathways.
group_ppi_method : str, default='gmean' Method for aggregating multiple PPIs into major groups.
- 'mean' : Computes the average communication score among all PPIs of the
group for a given pair of cells/tissues/samples
- 'gmean' : Computes the geometric mean of the communication scores among all
PPIs of the group for a given pair of cells/tissues/samples
- 'sum' : Computes the sum of the communication scores among all PPIs of the
group for a given pair of cells/tissues/samples
device : str, default=None Device to use when backend allows using multiple devices. Options are:
verbose : boolean, default=False Whether printing or not steps of the analysis.
Source code in cell2cell/tensor/tensor.py
727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 | |
PreBuiltTensor
Bases: BaseTensor
Initializes a cell2cell.tensor.BaseTensor with a prebuilt communication tensor
Parameters
tensor : ndarray list Prebuilt tensor. Could be a list of lists, a numpy array or a tensorly.tensor.
order_names : list List of lists containing the string names of each element in each of the dimensions or orders in the tensor. For a 4D-Communication tensor, the first list should contain the names of the contexts, the second the names of the ligand-receptor interactions, the third the names of the sender cells and the fourth the names of the receiver cells.
order_labels : list, default=None List containing the labels for each order or dimension of the tensor. For example: ['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells']
mask : ndarray list, default=None Helps avoiding missing values during a tensor factorization. A mask should be a boolean array of the same shape as the original tensor and should be 0 where the values are missing and 1 everywhere else.
loc_nans : ndarray list, default=None
An array of shape equal to tensor with ones where NaN values were assigned
when building the tensor. Other values are zeros. It stores the
location of the NaN values.
device : str, default=None Device to use when backend allows using multiple devices. Options are:
Source code in cell2cell/tensor/tensor.py
922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 | |
aggregate_ccc_tensor(ccc_tensor, ppi_data, group_ppi_by=None, group_ppi_method='gmean')
Aggregates communication scores of multiple PPIs into major groups (e.g., pathways) in a communication tensor
Parameters
ccc_tensor : ndarray list List of directed cell-cell communication matrices, one for each ligand- receptor pair in ppi_data. These matrices contain the communication score for pairs of cells for the corresponding PPI. This tensor represent a 3D-communication tensor for the context.
ppi_data : pandas.DataFrame A dataframe containing protein-protein interactions (rows). It has to contain at least two columns, one for the first protein partner in the interaction as well as the second protein partner.
group_ppi_by : str, default=None Column name in the list of PPIs used for grouping individual PPIs into major groups such as signaling pathways.
group_ppi_method : str, default='gmean' Method for aggregating multiple PPIs into major groups.
- 'mean' : Computes the average communication score among all PPIs of the
group for a given pair of cells/tissues/samples
- 'gmean' : Computes the geometric mean of the communication scores among all
PPIs of the group for a given pair of cells/tissues/samples
- 'sum' : Computes the sum of the communication scores among all PPIs of the
group for a given pair of cells/tissues/samples
Returns
aggregated_tensor : ndarray list List of directed cell-cell communication matrices, one for each major group of ligand-receptor pair in ppi_data. These matrices contain the communication score for pairs of cells for the corresponding PPI group. This tensor represent a 3D-communication tensor for the context, but for major groups instead of individual PPIs.
Source code in cell2cell/tensor/tensor.py
build_context_ccc_tensor(rnaseq_matrices, ppi_data, how='inner', outer_fraction=0.0, communication_score='expression_product', complex_sep=None, upper_letter_comparison=True, interaction_columns=('A', 'B'), group_ppi_by=None, group_ppi_method='gmean', verbose=True)
Builds a 4D-Communication tensor. Takes the gene expression matrices and the list of PPIs to compute the communication scores between the interacting cells for each PPI. This is done for each context.
Parameters
rnaseq_matrices : list A list with dataframes of gene expression wherein the rows are the genes and columns the cell types, tissues or samples.
ppi_data : pandas.DataFrame A dataframe containing protein-protein interactions (rows). It has to contain at least two columns, one for the first protein partner in the interaction as well as the second protein partner.
how : str, default='inner' Approach to consider cell types and genes present across multiple contexts.
- 'inner' : Considers only cell types and genes that are present in all
contexts (intersection).
- 'outer' : Considers all cell types and genes that are present
across contexts (union).
- 'outer_genes' : Considers only cell types that are present in all
contexts (intersection), while all genes that are
present across contexts (union).
- 'outer_cells' : Considers only genes that are present in all
contexts (intersection), while all cell types that are
present across contexts (union).
outer_fraction : float, default=0.0
Threshold to filter the elements when how includes any outer option.
Elements with a fraction abundance across samples (in rnaseq_matrices)
at least this threshold will be included. When this value is 0, considers
all elements across the samples. When this value is 1, it acts as using
how='inner'.
communication_score : str, default='expression_mean' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
complex_sep : str, default=None Symbol that separates the protein subunits in a multimeric complex. For example, '&' is the complex_sep for a list of ligand-receptor pairs where a protein partner could be "CD74&CD44".
upper_letter_comparison : boolean, default=True Whether making uppercase the gene names in the expression matrices and the protein names in the ppi_data to match their names and integrate their respective expression level. Useful when there are inconsistencies in the names between the expression matrix and the ligand-receptor annotations.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
group_ppi_by : str, default=None Column name in the list of PPIs used for grouping individual PPIs into major groups such as signaling pathways.
group_ppi_method : str, default='gmean' Method for aggregating multiple PPIs into major groups.
- 'mean' : Computes the average communication score among all PPIs of the
group for a given pair of cells/tissues/samples
- 'gmean' : Computes the geometric mean of the communication scores among all
PPIs of the group for a given pair of cells/tissues/samples
- 'sum' : Computes the sum of the communication scores among all PPIs of the
group for a given pair of cells/tissues/samples
verbose : boolean, default=False Whether printing or not steps of the analysis.
Returns
tensors : list List of 3D-Communication tensors for each context. This list corresponds to the 4D-Communication tensor.
genes : list List of genes included in the tensor.
cells : list List of cells included in the tensor.
ppi_names: list List of names for each of the PPIs included in the tensor. Used as labels for the elements in the cognate tensor dimension (in the attribute order_names of the InteractionTensor)
mask_tensor: numpy.array Mask used to exclude values in the tensor. When using how='outer' it masks missing values (e.g., cell types that are not present in a given context), while using how='inner' makes the mask_tensor to be None.
Source code in cell2cell/tensor/tensor.py
1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 | |
generate_ccc_tensor(rnaseq_data, ppi_data, communication_score='expression_product', interaction_columns=('A', 'B'))
Computes a 3D-Communication tensor for a given context based on the gene expression matrix and the list of PPIS
Parameters
rnaseq_data : pandas.DataFrame Gene expression matrix for a given context, sample or condition. Rows are genes and columns are cell types/tissues/samples.
ppi_data : pandas.DataFrame A dataframe containing protein-protein interactions (rows). It has to contain at least two columns, one for the first protein partner in the interaction as well as the second protein partner.
communication_score : str, default='expression_mean' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
interaction_columns : tuple, default=('A', 'B') Contains the names of the columns where to find the partners in a dataframe of protein-protein interactions. If the list is for ligand-receptor pairs, the first column is for the ligands and the second for the receptors.
Returns
ccc_tensor : ndarray list List of directed cell-cell communication matrices, one for each ligand- receptor pair in ppi_data. These matrices contain the communication score for pairs of cells for the corresponding PPI. This tensor represent a 3D-communication tensor for the context.
Source code in cell2cell/tensor/tensor.py
generate_tensor_metadata(interaction_tensor, metadata_dicts, fill_with_order_elements=True)
Uses a list of of dicts (or None when a dict is missing) to generate a list of metadata for each order in the tensor.
Parameters
interaction_tensor : cell2cell.tensor.BaseTensor A communication tensor.
metadata_dicts : list A list of dictionaries. Each dictionary represents an order of the tensor. In an interaction tensor these orders should be contexts, LR pairs, sender cells and receiver cells. The keys are the elements in each order (they are contained in interaction_tensor.order_names) and the values are the categories that each elements will be assigned as metadata.
fill_with_order_elements : boolean, default=True Whether using each element of a dimension as its own metadata when a None is passed instead of a dictionary for the respective order/dimension. If True, each element in that order will be use itself, that dimension will not contain metadata.
Returns
metadata : list A list of pandas.DataFrames that will be used as an input of the cell2cell.plot.tensor_factors_plot.
Source code in cell2cell/tensor/tensor.py
1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 | |
interactions_to_tensor(interactions, experiment='single_cell', context_names=None, how='inner', outer_fraction=0.0, communication_score='expression_product', upper_letter_comparison=True, verbose=True)
Takes a list of Interaction pipelines (see classes in cell2cell.analysis.pipelines) and generates a communication tensor.
Parameters
interactions : list List of Interaction pipelines. The Interactions has to be all either BulkInteractions or SingleCellInteractions.
experiment : str, default='single_cell' Type of Interaction pipelines in the list. Either 'single_cell' or 'bulk'.
context_names : list List of context names or labels for each of the Interaction pipelines. This list matches the length of interactions and the labels have to follows the same order.
how : str, default='inner' Approach to consider cell types and genes present across multiple contexts.
- 'inner' : Considers only cell types and genes that are present in all
contexts (intersection).
- 'outer' : Considers all cell types and genes that are present
across contexts (union).
- 'outer_genes' : Considers only cell types that are present in all
contexts (intersection), while all genes that are
present across contexts (union).
- 'outer_cells' : Considers only genes that are present in all
contexts (intersection), while all cell types that are
present across contexts (union).
outer_fraction : float, default=0.0
Threshold to filter the elements when how includes any outer option.
Elements with a fraction abundance across samples at least this
threshold will be included. When this value is 0, considers
all elements across the samples. When this value is 1, it acts as using
how='inner'.
communication_score : str, default='expression_mean' Type of communication score to infer the potential use of a given ligand- receptor pair by a pair of cells/tissues/samples. Available communication_scores are:
- 'expression_mean' : Computes the average between the expression of a ligand
from a sender cell and the expression of a receptor on a
receiver cell.
- 'expression_product' : Computes the product between the expression of a
ligand from a sender cell and the expression of a
receptor on a receiver cell.
- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
expression of a receptor on a receiver cell.
upper_letter_comparison : boolean, default=True Whether making uppercase the gene names in the expression matrices and the protein names in the ppi_data to match their names and integrate their respective expression level. Useful when there are inconsistencies in the names between the expression matrix and the ligand-receptor annotations.
Returns
tensor : cell2cell.tensor.InteractionTensor A 4D-communication tensor.
Source code in cell2cell/tensor/tensor.py
1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 | |
tensor_manipulation
concatenate_interaction_tensors(interaction_tensors, axis, order_labels, remove_duplicates=False, keep='first', mask=None, device=None)
Concatenates interaction tensors in a given tensor dimension or axis.
Parameters
interaction_tensors : list List of any tensor class in cell2cell.tensor.
axis : int The axis along which the arrays will be joined. If axis is None, arrays are flattened before use.
order_labels : list List of labels for dimensions or orders in the tensor.
remove_duplicates : boolean, default=False Whether removing duplicated names in the concatenated axis.
keep : str, default='first' Determines which duplicates (if any) to keep. Options are:
- first : Drop duplicates except for the first occurrence.
- last : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
mask : ndarray list Helps avoiding missing values during a tensor factorization. A mask should be a boolean array of the same shape as the original tensor and should be 0 where the values are missing and 1 everywhere else. This must be of equal shape as the concatenated tensor.
device : str, default=None Device to use when backend is pytorch. Options are:
Returns
concatenated_tensor : cell2cell.tensor.PreBuiltTensor Final tensor after concatenation. It is a PreBuiltTensor that works any interaction tensor based on the class BaseTensor.
Source code in cell2cell/tensor/tensor_manipulation.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |
utils
networks
export_network_to_cytoscape(network, filename)
Exports a network into a spreadsheet that is readable by the software Gephi.
Parameters
network : networkx.Graph, networkx.DiGraph or a pandas.DataFrame A networkx Graph or Directed Graph, or an adjacency matrix, where in rows and columns are nodes and values represents a weight for the respective edge.
filename : str, default=None Path to save the network into a Cytoscape-readable format (JSON file in this case). E.g. '/home/user/network.json'
Source code in cell2cell/utils/networks.py
export_network_to_gephi(network, filename, format='excel', network_type='Undirected')
Exports a network into a spreadsheet that is readable by the software Gephi.
Parameters
network : networkx.Graph, networkx.DiGraph or a pandas.DataFrame A networkx Graph or Directed Graph, or an adjacency matrix, where in rows and columns are nodes and values represents a weight for the respective edge.
filename : str, default=None Path to save the network into a Gephi-readable format.
format : str, default='excel' Format to export the spreadsheet. Options are:
- 'excel' : An excel file, either .xls or .xlsx
- 'csv' : Comma separated value format
- 'tsv' : Tab separated value format
network_type : str, default='Undirected' Type of edges in the network. They could be either 'Undirected' or 'Directed'.
Source code in cell2cell/utils/networks.py
generate_network_from_adjacency(adjacency_matrix, package='networkx')
Generates a network or graph object from an adjacency matrix.
Parameters
adjacency_matrix : pandas.DataFrame An adjacency matrix, where in rows and columns are nodes and values represents a weight for the respective edge.
package : str, default='networkx' Package or python library to built the network. Implemented optios are {'networkx'}. Soon will be available for 'igraph'.
Returns
network : graph-like A graph object built with a python-library for networks.
Source code in cell2cell/utils/networks.py
parallel_computing
agents_number(n_jobs)
Computes the number of agents/cores/threads that the computer can really provide given a number of jobs/threads requested.
Parameters
n_jobs : int Number of threads for parallelization.
Returns
agents : int Number of threads that the computer can really provide.
Source code in cell2cell/utils/parallel_computing.py
parallel_spatial_ccis(inputs)
Parallel computing in cell2cell2.analysis.pipelines.SpatialSingleCellInteractions