LIANA x Tensor-cell2cell Quickstart (R)
This tutorial provides an abbreviated version of Tutorials 01-06 in order to give a quick overview of the pipeline to go from a raw counts matrix to downstream cell-cell communication (CCC) analyses. By combining LIANA and Tensor-cell2cell, we get a general framework that can robustly incorporate many existing CCC inference tools, ultimately retriving consensus communication scores for any sample and analyzing all those samples together to identify context-dependent communication programs.
Initial Setups
Enabling GPU use
First, if you are using a NVIDIA GPU with CUDA cores, set use_gpu=TRUE
and enable PyTorch with the following code block. Otherwise, set use_gpu=FALSE
or skip this part
[1]:
use_gpu <- TRUE
library(reticulate, quietly = TRUE)
[2]:
reticulate::use_condaenv("ccc_protocols", required = TRUE)
[3]:
if (use_gpu){
device <- 'cuda:0'
tensorly <- reticulate::import('tensorly')
tensorly$set_backend('pytorch')
}else{
device <- NULL
}
Libraries
Then, import all the packages we will use in this tutorial:
[4]:
suppressMessages({
library(dplyr, quietly = TRUE)
library(tidyr, quietly = TRUE)
library(purrr, quietly = TRUE)
library(forcats, quietly = TRUE)
library(textshape, quietly = TRUE)
library(ggpubr, quietly = TRUE)
library(rstatix, quietly = TRUE)
library(reshape2, quietly = TRUE)
library(tibble, quietly = TRUE)
library(scater, quietly = TRUE)
library(scuttle, quietly = TRUE)
library(ggplot2, quietly = TRUE)
library(liana, quietly = TRUE)
library(decoupleR, quietly = TRUE)
# library("ExperimentHub")
c2c <- reticulate::import(module = "cell2cell")
})
[5]:
# library(showtext)
# showtext_auto()
Directories
Afterwards, specify the data and output directories:
[6]:
data_folder = '../../data/quickstart_pbmc/'
output_folder = '../../data/quickstart_pbmc/outputs'
for (folder_ in c(data_folder, output_folder)){
if (!dir.exists(folder_)){
dir.create(folder_)
}
}
Loading Data
We begin by loading the single-cell transcriptomics data. For this tutorial, we will use a dataset of ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Originally preprocessed for pertpy.
We can download the data and store it in the SingleCellExperiment format:
[7]:
sce <- readRDS(url('https://zenodo.org/records/10069528/files/kang_counts_25k.RDS?download=1'))
Preprocess Expression
Note, we do not include a batch correction step as Tensor-cell2cell can get robust decomposition results without this; however, we have extensive analysis and discuss regarding this topic in Supplementary Tutorial 01
Quality-Control Filtering
The loaded data has already been pre-processed to a degree and comes with cell annotations. Nevertheless, let’s highlight some of the key steps. To ensure that any noisy cells or features are removed, we filter any non-informative cells and genes:
[8]:
# basic filters
sce <- scater::addPerCellQC(sce)
min.features <- 300
min.cells <- 5
[9]:
# basic outlier filtering
qc <- scater::perCellQCMetrics(sce)
# basic feature filtering
sce <- sce[rowSums(counts(sce) >= 1) >= min.cells, ]
# basic cells filtering
sce <- sce[, colSums(counts(sce) >= 1) >= min.features]
We additionally remove high mitochondrial content:
[10]:
is.mito <- grep("MT-", rownames(sce))
per.cell <- scuttle::perCellQCMetrics(sce, subsets=list(Mito=is.mito))
colData(sce)[['subsets_Mito_percent']] <- per.cell$subsets_Mito_percent
sce <- sce[, colData(sce)$subsets_Mito_percent < 15]
sce
class: SingleCellExperiment
dim: 14658 24297
metadata(0):
assays(1): counts
rownames(14658): AL627309.1 RP11-206L10.2 ... S100B PRMT2
rowData names(1): name
colnames(24297): AAACATACATTTCC-1 AAACATACCAGAAA-1 ... TTTGCATGGGACGA-2
TTTGCATGTCTTAC-2
colData names(18): nCount_RNA nFeature_RNA ... total
subsets_Mito_percent
reducedDimNames(2): X_pca X_umap
mainExpName: NULL
altExpNames(0):
Normalization
Normalized counts are usually obtained in two essential steps, the first being count depth scaling which ensures that the measured count depths are comparable across cells. This is then usually followed up with log1p transformation, which essentially stabilizes the variance of the counts and enables the use of linear metrics downstream:
[11]:
scale.factor <- 1e4
lib.sizes <- colSums(counts(sce)) / scale.factor
assay(sce, 'logcounts') <- scuttle::normalizeCounts(sce, size.factors=lib.sizes, log=FALSE, center.size.factors=FALSE)
assay(sce, 'logcounts') <- log1p(assay(sce, 'logcounts'))
Let’s see what our data looks like:
[12]:
head(colData(sce))
DataFrame with 6 rows and 18 columns
nCount_RNA nFeature_RNA tsne1 tsne2 condition
<numeric> <integer> <numeric> <numeric> <factor>
AAACATACATTTCC-1 3017 877 -27.640373 14.96663 ctrl
AAACATACCAGAAA-1 2481 713 -27.493646 28.92489 ctrl
AAACATACCATGCA-1 703 337 -10.468194 -5.98439 ctrl
AAACATACCTCGCT-1 3420 850 -24.367997 20.42928 ctrl
AAACATACCTGGTA-1 3158 1111 27.952170 24.15974 ctrl
AAACATACGATGAA-1 1869 635 -0.470236 -25.39871 ctrl
cluster cell_type patient nCount_SCT nFeature_SCT
<integer> <factor> <factor> <numeric> <integer>
AAACATACATTTCC-1 9 CD14+ Monocytes patient_1016 1704 711
AAACATACCAGAAA-1 9 CD14+ Monocytes patient_1256 1614 662
AAACATACCATGCA-1 3 CD4 T cells patient_1488 908 337
AAACATACCTCGCT-1 9 CD14+ Monocytes patient_1256 1738 653
AAACATACCTGGTA-1 4 Dendritic cells patient_1039 1857 928
AAACATACGATGAA-1 5 CD4 T cells patient_1488 1525 634
integrated_snn_res.0.4 seurat_clusters sample cell_abbr
<factor> <factor> <factor> <factor>
AAACATACATTTCC-1 1 1 ctrl&1016 CD14
AAACATACCAGAAA-1 1 1 ctrl&1256 CD14
AAACATACCATGCA-1 6 6 ctrl&1488 CD4T
AAACATACCTCGCT-1 1 1 ctrl&1256 CD14
AAACATACCTGGTA-1 12 12 ctrl&1039 DCs
AAACATACGATGAA-1 2 2 ctrl&1488 CD4T
sum detected total subsets_Mito_percent
<numeric> <integer> <numeric> <numeric>
AAACATACATTTCC-1 3017 877 3017 0
AAACATACCAGAAA-1 2481 713 2481 0
AAACATACCATGCA-1 703 337 703 0
AAACATACCTCGCT-1 3420 850 3420 0
AAACATACCTGGTA-1 3158 1111 3158 0
AAACATACGATGAA-1 1869 635 1869 0
Define columns to use for downstream analysis:
[13]:
sample_col <- 'sample'
condition_col <- 'condition'
idents_col <- 'cell_abbr' # cell types column from SCE metadata
Deciphering Cell-Cell Communication
we will use LIANA to infer the ligand-receptor interactions for each sample. LIANA is highly modularized and it natively implements the formulations of a number of methods, including CellPhoneDBv2, Connectome, log2FC, NATMI, SingleCellSignalR, CellChat, a geometric mean, as well as a consensus in the form of a rank aggregate from any combination of methods
[14]:
liana::show_methods()
- 'connectome'
- 'logfc'
- 'natmi'
- 'sca'
- 'cellphonedb'
- 'cytotalk'
- 'call_squidpy'
- 'call_cellchat'
- 'call_connectome'
- 'call_sca'
- 'call_italk'
- 'call_natmi'
LIANA classifies the scoring functions from the different methods into two categories: those that infer the “Magnitude” and “Specificity” of interactions. We define the “Magnitude” of an interaction as a measure of the strength of the interaction’s expression, and the “Specificity” of an interaction is a measure of how specific an interaction is to a given pair of clusters. Generally, these categories are complementary, and the magnitude of the interaction is often a proxy of the specificity of the interaction. For example, a ligand-receptor interaction with a high magnitude score in a given pair of cell types is likely to also be specific, and vice versa.
For clarity, here we map each output score (column names in the above dataframe) to the scoring method and scoring type:
Method Name |
Magnitude Score |
Specificity Score |
---|---|---|
CellPhoneDB |
lr.mean |
cellphonedb.pvalue |
Connectome |
prod_weight |
weight_sc |
log2FC |
None |
logfc_comb |
NATMI |
prod_weight |
edge_specificity |
SingleCellSignalR (sca) |
LRscore |
None |
When considering ligand-receptor prior knowledge resources, a common theme is the trade-off between coverage and quality, and similarly each resource comes with its own biases. In this regard, LIANA builds on OmniPath29 as any of the resources in LIANA are obtained via OmniPath. These include the expert-curated resources of CellPhoneDBv223, CellChat27, ICELLNET30, connectomeDB202025, CellTalkDB31, as well as 10 others. LIANA further provides a consensus expert-curated resource from the aforementioned five resources, along with some curated interactions from SignaLink. In this protocol, we will use the consensus resource from liana, though any of the other resources are available via LIANA, and one can also use liana with their own custom resource.
[15]:
liana::show_resources()
- 'Default'
- 'Consensus'
- 'Baccin2019'
- 'CellCall'
- 'CellChatDB'
- 'Cellinker'
- 'CellPhoneDB'
- 'CellTalkDB'
- 'connectomeDB2020'
- 'EMBRACE'
- 'Guide2Pharma'
- 'HPMR'
- 'ICELLNET'
- 'iTALK'
- 'Kirouac2010'
- 'LRdb'
- 'Ramilowski2015'
- 'OmniPath'
- 'MouseConsensus'
Selecting any of the lists of ligand-receptor pairs in LIANA can be done through the following command (here we select the aforementioned “consensus” resource:
[16]:
lr_pairs <- liana::select_resource('Consensus')
[17]:
# # check if ITGAD is any of the receptor.complex strings
# df %>% filter(stringr::str_detect(receptor.complex, 'ITGAD'))
Next, we can run LIANA on each sample across the available methods. By default, LIANA calculates an aggregate rank across these mtehods using a re-implementation of the RobustRankAggregate method, and generates a probability distribution for ligand-receptors that are ranked consistently better than expected under a null hypothesis (See Appendix 2). The consensus of ligand-receptor interactions across methods can therefore be treated as a p-value.
[18]:
sce <- liana_bysample(sce = sce,
sample_col = sample_col, # context dimension column from SCE metadata
idents_col = idents_col, # cell types column from SCE metadata
resource = 'Consensus',
expr_prop = 0.1, # must be expressed in expr_prop fraction of cells
min_cells = 5,
assay.type = 'logcounts', # run on log- and library-normalized counts
aggregate_how = 'both', # aggregate magnitude and specificity
verbose = TRUE,
inplace = TRUE,
permutation.params=list(nperms=100),
return_all=TRUE
)
Current sample: ctrl&101
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“2414 genes and/or 0 cells were removed as they had no counts!”
Warning message:
“`invoke()` is deprecated as of rlang 0.4.0.
Please use `exec()` or `inject()` instead.
This warning is displayed once every 8 hours.”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Warning message:
“`progress_estimated()` was deprecated in dplyr 1.0.0.
ℹ The deprecated feature was likely used in the liana package.
Please report the issue at <https://github.com/saezlab/liana/issues>.”
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&107
Running LIANA with `cell_abbr` as labels!
Warning message in exec(output, ...):
“3320 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1015
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1038 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1016
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1657 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1039
Running LIANA with `cell_abbr` as labels!
Warning message in exec(output, ...):
“3495 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1244
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1464 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1256
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1313 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: ctrl&1488
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1383 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&101
Running LIANA with `cell_abbr` as labels!
Warning message in exec(output, ...):
“2216 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&107
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“3606 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1015
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1392 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1016
Running LIANA with `cell_abbr` as labels!
Warning message in exec(output, ...):
“1920 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1039
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“3196 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1244
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“2000 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1256
Running LIANA with `cell_abbr` as labels!
Cell identities with less than 5 cells: Mega were removed!
Warning message in exec(output, ...):
“1471 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
Current sample: stim&1488
Running LIANA with `cell_abbr` as labels!
Warning message in exec(output, ...):
“1227 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!
Now Running: Natmi
Now Running: Connectome
Now Running: Logfc
Now Running: Sca
Now Running: Cellphonedb
Now aggregating natmi
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca
Now aggregating cellphonedb
Aggregating Ranks
Now aggregating natmi
Now aggregating connectome
Now aggregating logfc
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb
Aggregating Ranks
The parameterrs used here are as follows:
sce
stands for SingleCellExperiment, and we pass here with an object with a single sample/context.resource
is the name of any of the resources available in lianaidents_col
corresponds to the cell group label stored inadata.obs
.assay.type
indicates which counts assys in the SCE object to use, here the log-normalized counts are assigned tologcounts
, other options include the raw UMI counts stored incounts
expr_prop
is the expression proportion threshold (in terms of cells per cell type expressing the protein)threshold of expression for any protein subunit involved in the interaction, according to which we keep or discard the interactions.min_cells
is the minimum number of cells per cell type required for a cell type to be considered in the analysisaggregate_how
specifies whether to aggregate on magnitude score types, specificity score types, or bothverbose
is a boolean that indicates whether to print the progress of the functioninplace
is a boolean that indicates whether storing the results in place, i.e. toadata.uns[“liana_res”]
.
Let’s see what the results look like:
[19]:
liana_res <- sce@metadata$liana_res %>%
bind_rows(.id = sample_col) %>%
mutate(!!sample_col := factor(.[[sample_col]], levels = unique(.[[sample_col]])))
head(liana_res)
sample | source | target | ligand.complex | receptor.complex | magnitude_rank | specificity_rank | magnitude_mean_rank | natmi.prod_weight | sca.LRscore | cellphonedb.lr.mean | specificity_mean_rank | natmi.edge_specificity | connectome.weight_sc | logfc.logfc_comb | cellphonedb.pvalue |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<fct> | <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
ctrl&101 | FGR3 | CD14 | TIMP1 | CD63 | 1.789712e-13 | 3.292801e-07 | 1.000000 | 20.35529 | 0.9750467 | 4.603019 | 192.375 | 0.10078179 | 1.4100654 | 2.550752 | 0 |
ctrl&101 | CD14 | CD14 | TIMP1 | CD63 | 1.431769e-12 | 3.292801e-07 | 2.000000 | 17.40347 | 0.9730681 | 4.203108 | 221.125 | 0.08616690 | 1.2365363 | 2.373106 | 0 |
ctrl&101 | FGR3 | DCs | TIMP1 | CD63 | 4.832222e-12 | 6.258644e-07 | 3.000000 | 15.74580 | 0.9717250 | 4.185146 | 250.625 | 0.07795956 | 1.1598158 | 1.713355 | 0 |
ctrl&101 | DCs | CD14 | TIMP1 | CD63 | 1.145416e-11 | 7.412884e-07 | 4.000000 | 15.38291 | 0.9714029 | 3.929365 | 257.125 | 0.07616284 | 1.1177537 | 1.805500 | 0 |
ctrl&101 | NK | NK | B2M | KLRD1 | 6.138712e-11 | 3.292801e-07 | 6.333333 | 10.83300 | 0.9661083 | 3.916777 | 219.375 | 0.09286514 | 1.6962340 | 1.242624 | 0 |
ctrl&101 | CD14 | DCs | TIMP1 | CD63 | 9.163325e-11 | 1.395631e-06 | 6.000000 | 13.46242 | 0.9694913 | 3.785236 | 294.625 | 0.06665424 | 0.9862868 | 1.535708 | 0 |
This dataframe provides the results from running each method, as well as the consensus scores across the methods. In our case, we are interested in the magnitude consensus scored denoted by the 'magnitude_rank'
column.
We can visualize the output as a dotplot across multiple samples:
[20]:
ligand_complex <- 'B2M'
receptor_complex <- c("KLRD1", "LILRB2", "CD3D")
source_labels <- c("CD4T", "B", "FGR3")
target_labels <- c("CD8T", 'DCs', 'CD14')
colour <- 'magnitude_rank'
size <- "specificity_rank"
lv <- liana_res %>%
filter(source %in% source_labels) %>%
filter(target %in% target_labels) %>%
filter(ligand.complex %in% ligand_complex) %>%
filter(receptor.complex %in% receptor_complex) %>%
unite(lr.pairs, ligand.complex, receptor.complex, sep = ' -> ') %>%
mutate(lr.pairs = factor(lr.pairs, levels = unique(lr.pairs))) %>%
# inverse ranks
mutate_at(vars(specificity_rank, magnitude_rank), ~-log10(. + 1e-10))
[21]:
h_ = 16
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)
ggplot(lv, aes(x = target, y = source, color = .data[[colour]], size=.data[[size]])) +
geom_point() +
facet_wrap(sample ~ lr.pairs, ncol = 12, scales='free_x') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))

Let’s export it in case we want to access it later:
[22]:
write.csv(liana_res, file.path(output_folder, '/rLIANA_by_sample.csv'))
Alternatively, one could just export the whole SingleCellExperiment object, together with the ligand-receptor results stored:
[23]:
saveRDS(sce, file.path(output_folder, '/sce_processed.rds'))
Comparing cell-cell communication across multiple samples
We can use Tensor-cell2cell to infer context-dependent CCC patterns from multiple samples simultaneously. To do so, we must first restructure the communication scores (LIANA’s output) into a 4D-Communication Tensor.
The tensor is built as follows: we create matrices with the communication scores for each of the ligand-receptor pairs within the same sample, then generate a 3D tensor for each sample, and finally concatenate them to form the 4D tensor:
First, we generate a list containing all samples from our SingleCellExperiment object. Here, we can find the names of each of the samples in the sample
column of the adata.obs information:
[24]:
sorted_samples <- sort(names(sce@metadata$liana_res))
Then we can directly pass the communication scores from LIANA to build the 3D tensors for each sample (panel c in last figure), and concatenate them, with the following function:
[25]:
# build the tensor
context_df_dict <- liana:::preprocess_scores(context_df_dict = sce@metadata[['liana_res']],
invert = TRUE, # transform the scores
invert_fun = function(x) 1-x, # Transformation function
non_negative = TRUE, # fills negative values
non_negative_fill = 0 # set negative values to 0
)
tensor <- liana_tensor_c2c(context_df_dict = context_df_dict,
sender_col = "source", # Column name of the sender cells
receiver_col = "target", # Column name of the receiver cells
ligand_col = "ligand.complex", # Column name of the ligands
receptor_col = "receptor.complex", # Column name of the receptors
score_col = 'magnitude_rank', # Column name of the communication scores to use
how='outer_cells', # What to include across all samples
outer_fraction=1/3, # Fraction of samples as threshold to include cells and LR pairs.
context_order=sorted_samples, # Order to store the contexts in the tensor
conda_env = 'ccc_protocols', # used to pass an existing conda env with cell2cell
build_only = TRUE, # set this to FALSE to combine the downstream rank selection and decomposition steps all here
device = device # Device to use when backend is pytorch.
)
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
Inverting `magnitude_rank`!
[1] 0
Loading `ccc_protocols` Conda Environment
Building the tensor using magnitude_rank...
The key parameters when building a tensor are:
context_df_dict
is the dataframe containing the results from LIANA, usually located insce@metadata[['liana_res']]
.sender_col
,receiver_col
,ligand_col
,receptor_col
, andscore_col
are the column names in the dataframe containing the samples, sender cells, receiver cells, ligands, receptors, and communication scores, respectively. Each row of the dataframe contains a unique combination of these elements.invert
andinvert_fun
control the function we use to convert the communication score before using it to build the tensor. In this case, the ‘magnitude_rank’ score generated by LIANA considers low values as the most important ones, ranging from 0 to 1. In contrast, Tensor-cell2cell requires higher values to be the most important scores, so here we pass a function (function(x) 1-x
) to adapt LIANA’s magnitude-rank scores (subtracts the LIANA’s score from 1).how
controls which ligand-receptor pairs and cell types to include when building the tensor. This decision depends on whether the missing values across a number of samples for both ligand-receptor interactions and sender-receiver cell pairs are considered to be biologically-relevant. Options are:'inner'
is the more strict option since it only considers only cell types and LR pairs that are present in all contexts (intersection).'outer'
considers all cell types and LR pairs that are present across contexts (union).'outer_lrs'
considers only cell types that are present in all contexts (intersection), while all LR pairs that are present across contexts (union).'outer_cells'
considers only LR pairs that are present in all contexts (intersection), while all cell types that are present across contexts (union).
outer_frac
controls the elements to include in the union scenario of thehow
options. Only elements that are present at least in this fraction of samples/contexts will be included. When this value is 0, the tensor includes all elements across the samples. When this value is 1, it acts as usinghow='inner'
.context_order
is a list specifying the order of the samples. The order of samples does not affect the results, but it is useful for posterior visualizations.
We can check the shape of this tensor to verify the number of samples, LR pairs, sender cells, adn receiver cells, respectively:
[26]:
tensor$shape
torch.Size([16, 467, 7, 7])
We can export our tensor:
[27]:
reticulate::py_save_object(object = tensor,
filename = file.path(output_folder, 'pbmc.pkl'))
Then, we can load it with:
[28]:
tensor <- reticulate::py_load_object(filename = file.path(output_folder, 'pbmc.pkl'))
Perform Tensor Factorization
Now that we have built the tensor and its metadata, we can run Tensor Component Analysis via Tensor-cell2cell with one simple command that we implemented for our unified framework:
[29]:
tensor <- liana::decompose_tensor(tensor = tensor,
rank = 7, # Number of factors to perform the factorization. If NULL, it is automatically determined by an elbow analysis
tf_optimization = 'regular', # To define how robust we want the analysis to be.
seed = 0, # Random seed for reproducibility
factors_only = FALSE,
)
Decomposing the tensor...
Key parameters are:
rank
is the number of factors or latent patterns we want to obtain from the analysis. You can either indicate a specific number or leave it asNone
to perform the decomposition with a suggested number from an elbow analysis.tf_optimization
indicates whether running the analysis in the'regular'
or the'robust'
way. The'regular'
way runs the tensor decomposition less number of times than the robust way to select an optimal result. Additionally, the former employs less strict convergence parameters to obtain optimal results than the latter, which is also translated into a faster generation of results. Important: When usingtf_optimization='robust'
the analysis takes much longer to run than usingtf_optimization='regular'
. However, the latter may generate less robust results.seed
is the seed for randomization. It controls the randomization used when initializing the optimization algorithm that performs the tensor decomposition. It is useful for reproducing the same result every time that the analysis is run. IfNULL
, a different randomization will be used each time.
[30]:
print(paste0('The estimated tensor rank is: ', tensor$rank))
[1] "The estimated tensor rank is: 7"
[31]:
# # Estimate standard error
# error_average <- tensor$elbow_metric_raw %>%
# t() %>%
# as.data.frame() %>%
# mutate(rank=row_number()) %>%
# pivot_longer(-rank, names_to = "run_no", values_to = "error") %>%
# group_by(rank) %>%
# summarize(average = mean(error),
# N = n(),
# SE.low = average - (sd(error)/sqrt(N)),
# SE.high = average + (sd(error)/sqrt(N))
# )
# # plot
# error_average %>%
# ggplot(aes(x=rank, y=average), group=1) +
# geom_line(col='red') +
# geom_ribbon(aes(ymin = SE.low, ymax = SE.high), alpha = 0.1) +
# geom_vline(xintercept = tensor$rank, colour='darkblue') + # rank of interest
# theme_bw() +
# labs(y="Error", x="Rank")
[32]:
sce@metadata$tensor_res <- liana:::format_c2c_factors(tensor$factors)
h_ = 13
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)
plot_c2c_overview(sce = sce, group_col = condition_col, sample_col = sample_col)

The figure representing the loadings in each factor generated here can be interpreted by interconnecting all dimensions within a single factor. For example, if we take one of these factors, the cell-cell communication program occurs in each sample proportionally to their loadings. Then, this signature can be interpreted with the loadings of the ligand-receptor pairs, sender cells, and receiver cells. Ligands in high-loading ligand-receptor pairs are sent predominantly by high-loading sender cells, and interact with the cognate receptors on the high-loadings receiver cells.
Downstream Visualizations: Making sense of the factors
After running the decomposition, the results are stored in the factors
attribute of the tensor object. This attribute is a dictionary containing the loadings for each of the elements in every tensor dimension. Loadings delineate the importance of that particular element to that factor. Keys are the names of the different dimension.
[33]:
names(tensor$factors)
- 'contexts'
- 'interactions'
- 'senders'
- 'receivers'
We can inspect the loadings of the samples, for example, located under the key 'Contexts'
:
[34]:
tensor$factors[['contexts']]
Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Factor 6 | Factor 7 | |
---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
ctrl&101 | 0.2753879 | 0.2505314 | 0.2690610 | 0.2148839 | 3.192032e-10 | 0.2126195 | 0.2452438 |
ctrl&1015 | 0.3073317 | 0.2915613 | 0.2911225 | 0.2295018 | 2.224032e-02 | 0.2348130 | 0.2771375 |
ctrl&1016 | 0.2563827 | 0.2189345 | 0.2331933 | 0.2479214 | 2.264486e-02 | 0.2436604 | 0.2151357 |
ctrl&1039 | 0.2990260 | 0.2944541 | 0.2803146 | 0.2287531 | 1.721946e-02 | 0.2271619 | 0.2805458 |
ctrl&107 | 0.2986128 | 0.2080037 | 0.2611433 | 0.2766160 | 5.844633e-05 | 0.2261954 | 0.2447433 |
ctrl&1244 | 0.2911649 | 0.2651903 | 0.2765718 | 0.2586165 | 6.973772e-04 | 0.2399731 | 0.2612663 |
ctrl&1256 | 0.2946287 | 0.2670734 | 0.2826619 | 0.1966789 | 4.576398e-02 | 0.2093361 | 0.2895984 |
ctrl&1488 | 0.2718269 | 0.2215898 | 0.2579891 | 0.2383075 | 2.803234e-07 | 0.1915762 | 0.2818053 |
stim&101 | 0.2334743 | 0.2181655 | 0.2463230 | 0.2190290 | 3.502629e-01 | 0.2658829 | 0.2658360 |
stim&1015 | 0.1985835 | 0.3148001 | 0.2269200 | 0.2395267 | 4.272637e-01 | 0.2638037 | 0.2652130 |
stim&1016 | 0.2111627 | 0.2675103 | 0.2396157 | 0.2353978 | 3.159429e-01 | 0.2588787 | 0.2261185 |
stim&1039 | 0.2230813 | 0.2231751 | 0.2152400 | 0.2542573 | 2.852869e-01 | 0.2738154 | 0.1957329 |
stim&107 | 0.2012976 | 0.1892416 | 0.2373782 | 0.3299971 | 3.240025e-01 | 0.2905101 | 0.2175857 |
stim&1244 | 0.1902167 | 0.2339165 | 0.2275528 | 0.3002013 | 3.769016e-01 | 0.2738758 | 0.2494139 |
stim&1256 | 0.2092426 | 0.2733710 | 0.2208163 | 0.2139765 | 3.677670e-01 | 0.2668102 | 0.2410888 |
stim&1488 | 0.1764407 | 0.2241510 | 0.2143701 | 0.2797826 | 3.575881e-01 | 0.2941726 | 0.2202224 |
Downstream Visualizations
We can use these loadings to compare pairs of sample major groups with boxplots and statistical tests:
[35]:
sample_loadings <- liana::get_c2c_factors(sce,
sample_col=sample_col,
group_col=condition_col) %>%
purrr::pluck("contexts") %>%
pivot_longer(-c('condition', 'context'),
names_to = "factor",
values_to = "loadings") %>%
mutate(factor = factor(factor)) #%>%
# mutate(factor = fct_relevel(factor, "Factor.10", after=9))
h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)
# Do pairwise tests
pwc <- sample_loadings %>%
group_by(factor) %>%
rstatix::pairwise_t_test(
loadings ~ condition,
p.adjust.method = "fdr"
)
[36]:
# plot
pwc <- pwc %>% add_xy_position(x = condition_col)
ggboxplot(sample_loadings, x = condition_col, y = "loadings", add = "point", fill=condition_col) +
facet_wrap(~factor, ncol = 4) +
stat_pvalue_manual(pwc) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Using the loadings for any dimension, we can also generate heatmaps for the elements with loadings above a certain threshold. Additionally, we can cluster these elements by the similarity of their loadings across all factors:
[37]:
library(ComplexHeatmap)
pushViewport(viewport(gp = gpar(fontfamily = "Arial")))
h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)
liana::plot_lr_heatmap(sce = sce, n = 5)
Loading required package: grid
========================================
ComplexHeatmap version 2.14.0
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
If you use it in published research, please cite either one:
- Gu, Z. Complex Heatmap Visualization. iMeta 2022.
- Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
genomic data. Bioinformatics 2016.
The new InteractiveComplexHeatmap package can directly export static
complex heatmaps into an interactive Shiny app with zero effort. Have a try!
This message can be suppressed by:
suppressPackageStartupMessages(library(ComplexHeatmap))
========================================

Here patients are grouped by the importance that each communication pattern (factor) has in relation to the other patients. This captures combinations of related communication patterns that explain similarities and differences at a sample-specific resolution.
Overall CCI potential: Heatmap and network visualizations of sender-receiver cell pairs
In addition, we can also evaluate the overall interactions between sender-receiver cell pairs that are determinant for a given factor or program. We can do it through a heatmap where the X-axis represent the receiver cells and the Y-axis shows the receiver cells. Here, the potential of interaction is calculated as the outer product between the loadings for the sender and receiver cells dimensions of a particular factor. To illustrate this, we chose Factor 5, but this can be repeated for every factor obtained in the decomposition.
[38]:
selected_factor <- 'Factor.5'
liana::plot_c2c_cells(sce = sce,
factor_of_int = selected_factor,
name = "Loadings \nProduct")

Similarly, an interaction network can be created for each factor by using the loading product between sender and receiver cells. First we need to choose a threshold to indicate what pair of cells are interacting. This can be done as shown in the extended tutorial
[39]:
threshold = 0.075
Then, we can plot all networks we are interested in:
[40]:
# format for use with cell2cell
format_py_df<-function(df){
df<-df %>%
as.data.frame() %>%
textshape::column_to_rownames() %>%
select(-contains("condition"))
return(df)
}
factors <- liana::get_c2c_factors(sce = sce,
sample_col = sample_col,
group_col = condition_col)
py_factors<-unname(lapply(factors, function(df) format_py_df(df)))
py_factors = reticulate::py_dict(keys = names(factors), values = py_factors)
c2c$plotting$ccc_networks_plot(py_factors,
included_factors=c('Factor.1', 'Factor.5', 'Factor.6'),
sender_label='senders',
receiver_label='receivers',
ccc_threshold=threshold, # Only important communication
nrows=as.integer(1),
filename=file.path(output_folder, 'network_plot_qsp_r.png')
)
[[1]]
<Figure size 2400x800 with 3 Axes>
[[2]]
[[2]][[1]]
<Axes: title={'center': 'Factor.1'}>
[[2]][[2]]
<Axes: title={'center': 'Factor.5'}>
[[2]][[3]]
<Axes: title={'center': 'Factor.6'}>
Pathway Enrichment Analysis: Interpreting the context-driven communication
Classical Pathway Enrichment
As the number of inferred interactions increases, the interpretation of the inferred cell-cell communication networks becomes more challenging. To this end, we can perform pathway enrichment analysis to identify the general biological processes that are enriched in the inferred interactions. Here, we will perform classical gene set enrichment analysis with Reactome
gene sets.
For the pathway enrichment analysis with GSEA, we use ligand-receptor pairs instead of individual genes. Reactome
was initially designed to work with sets of genes, so first we need to generate ligand-receptor sets for each of it’s pathways:
[41]:
# Generate list with ligand-receptors pairs in DB
lr_pairs <- liana::select_resource('Consensus')[[1]] %>%
select(ligand = source_genesymbol, receptor = target_genesymbol)
lr_list <- lr_pairs %>%
unite('interaction', ligand, receptor, sep = '^') %>%
pull(interaction)
# Specify the organism and pathway database to use for building the LR set
organism = "human"
pathwaydb = "Reactome"
# Generate ligand-receptor gene sets
lr_set <- c2c$external$generate_lr_geneset(lr_list = lr_list,
complex_sep='_', # Separation symbol of the genes in the protein complex
lr_sep='^', # Separation symbol between a ligand and a receptor complex
organism=organism,
pathwaydb=pathwaydb,
readable_name=TRUE
)
Next, we can perform enrichment analysis on each factor using the loadings of the ligand-receptor pairs to obtain the normalized-enrichment scores (NES) and corresponding P-values from GSEA:
[42]:
lr_loadings = factors[['interactions']] %>%
column_to_rownames("lr")
gsea_res <- c2c$external$run_gsea(loadings=lr_loadings,
lr_set=lr_set,
output_folder=output_folder,
weight=as.integer(1),
min_size=as.integer(15),
permutations=as.integer(999),
processes=as.integer(6),
random_state=as.integer(6),
significance_threshold=as.numeric(0.05)
)
names(gsea_res) <- c('pvals', 'scores', 'gsea_df')
The enriched pathways for each factor are:
[43]:
gsea_df <- gsea_res$gsea_df
gsea_df %>%
filter(`Adj. P-value` < 0.05 & NES > 0) %>%
arrange(desc(abs(NES)))
Factor | Term | NES | P-value | Adj. P-value | |
---|---|---|---|---|---|
<chr> | <chr> | <dbl> | <dbl> | <dbl> | |
0 | Factor.1 | IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL | 1.208194 | 0.001000000 | 0.01611612 |
46 | Factor.3 | NEUTROPHIL DEGRANULATION | 1.156093 | 0.001000000 | 0.01611612 |
92 | Factor.5 | SIGNALING BY INTERLEUKINS | 1.137917 | 0.003003003 | 0.03453453 |
2 | Factor.1 | ADAPTIVE IMMUNE SYSTEM | 1.136481 | 0.001000000 | 0.01611612 |
93 | Factor.5 | IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL | 1.123732 | 0.002002002 | 0.02479402 |
47 | Factor.3 | IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL | 1.122184 | 0.002002002 | 0.02479402 |
23 | Factor.2 | INFECTIOUS DISEASE | 1.121033 | 0.001000000 | 0.01611612 |
48 | Factor.3 | ADAPTIVE IMMUNE SYSTEM | 1.116904 | 0.001000000 | 0.01611612 |
24 | Factor.2 | IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL | 1.116482 | 0.001000000 | 0.01611612 |
25 | Factor.2 | ADAPTIVE IMMUNE SYSTEM | 1.101746 | 0.001000000 | 0.01611612 |
49 | Factor.3 | INNATE IMMUNE SYSTEM | 1.099481 | 0.002002002 | 0.02479402 |
69 | Factor.4 | IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL | 1.095260 | 0.001001001 | 0.01611612 |
70 | Factor.4 | ADAPTIVE IMMUNE SYSTEM | 1.068020 | 0.001001001 | 0.01611612 |
117 | Factor.6 | CYTOKINE SIGNALING IN IMMUNE SYSTEM | 1.066745 | 0.001001001 | 0.01611612 |
The depleted pathways are:
[44]:
gsea_df %>%
filter(`Adj. P-value` < 0.05 & NES < 0) %>%
arrange(desc(abs(NES)))
Factor | Term | NES | P-value | Adj. P-value |
---|---|---|---|---|
<chr> | <chr> | <dbl> | <dbl> | <dbl> |
Finally, we can visualize the enrichment results using a dotplot:
[45]:
gsea.dotplot<-function(pval_df, score_df, significance = 0.05, font_size = 15){
pval_df <- pval_df[apply(pval_df<significance,1,any), apply(pval_df<significance, 2, any)] %>%
mutate_all(function(x) -1*log10(x + 1e-9))
score_df <- score_df[rownames(pval_df), colnames(pval_df)]
pval_df <- pval_df %>%
tibble::rownames_to_column('Annotation')
score_df <- score_df %>%
tibble::rownames_to_column('Annotation')
viz_df <- cbind(melt(pval_df), melt(score_df)[[3]])
names(viz_df) <- c('Annotation', 'Factor', 'Significance', 'NES')
dotplot <- ggplot(viz_df, aes(x = Factor, y = Annotation)) + geom_point(aes(size = Significance, color = NES)) +
scale_color_gradient2() + theme_bw(base_size = font_size) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5))+
scale_size(range = c(0, 8)) +
guides(size=guide_legend(title="-log10(p-value)"))
return(dotplot)
}
[46]:
h_ = 10
w_ = 12
options(repr.plot.height=h_, repr.plot.width=w_)
dotplot <- gsea.dotplot(pval_df = gsea_res$pvals,
score_df = gsea_res$scores,
significance = 0.05)
dotplot
Using Annotation as id variables
Using Annotation as id variables

Footprint Enrichment
Footprint enrichment analysis build upon classic geneset enrichment analysis, as instead of considering the genes involved in a biological activity, they consider the genes affected by the activity, or in other words the genes that change downstream of said activity (Dugourd and Saez-Rodriguez, 2019).
In this case, we will use the PROGENy pathway resource to perform footprint enrichment analysis. PROGENy was built in a data-driven manner using perturbation and cancer lineage data (Schubert et al, 2019), as a consequence it also assigns different importances or weights to each gene in its pathway genesets. To this end, we need an enrichment method that can take weights into account, and here we will use multi-variate linear models
from the decoupler-py
package to perform this analysis (Badia-i-Mompel et al., 2022).
Let’s load the PROGENy genesets and then convert them to sets of weighted ligand-receptor pairs:
[47]:
# obtain progeny gene sets
net <- decoupleR::get_progeny(organism = 'human', top=5000) %>%
dplyr::select(-p_value)
# convert to LR sets
progeny_lr <- liana::generate_lr_geneset(sce = sce,
resource = net)
Next, we can run the footprint enrichment analysis:
[48]:
# interaction loadings to matrix
mat <- factors$interactions %>%
textshape::column_to_rownames("lr") %>%
as.matrix()
# run enrichment analysis with decoupler
# (we fit a univariate linear model for each gene set)
# We don't consider genesets with minsize < 10
res <- decoupleR::run_ulm(mat = mat,
network = progeny_lr,
.source = "set",
.target = "lr",
minsize=10) %>%
mutate(p_adj = p.adjust(p_value, method = "fdr"))
Finally, we can visualize the results:
[49]:
res %>% # sig/isnig flag
mutate(significant = if_else(p_adj <= 0.05, "signif.", "not")) %>%
ggplot(aes(x=source, y=condition, shape=significant,
colour=score, size=-log10(p_value+1e-36))) +
geom_point() +
scale_colour_gradient2(high = "red", low="blue") +
scale_size_continuous(range = c(3, 12)) +
scale_shape_manual(values=c(21, 16)) +
theme_bw(base_size = 15) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x="Pathway",
y="Factor",
colour="Activity"
)

Let’s zoom in on Factor 5 which is associated with interferon-beta stimulation.
[50]:
selected_factor = 'Factor.5'
sample_loadings <- liana::get_c2c_factors(sce,
sample_col=sample_col,
group_col=condition_col) %>%
purrr::pluck("contexts") %>%
pivot_longer(-c(condition_col, 'context'),
names_to = "factor",
values_to = "loadings") %>%
mutate(factor = factor(factor))
res.factor <- res %>%
filter(condition == selected_factor) %>%
mutate(p_adj = p.adjust(p_value, method = "fdr")) %>%
arrange(desc(score)) %>%
mutate(source = factor(source, levels = source))
ggplot(data=res.factor, aes(x=score, y=source, fill = score)) +
geom_bar(stat="identity", width=0.5) + theme_bw() + scale_fill_gradient2(trans = 'reverse') +
ylab('') + xlab('Activity')
Warning message:
“Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(condition_col)
# Now:
data %>% select(all_of(condition_col))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.”

[ ]: