LIANA x Tensor-cell2cell Quickstart (R)

This tutorial provides an abbreviated version of Tutorials 01-06 in order to give a quick overview of the pipeline to go from a raw counts matrix to downstream cell-cell communication (CCC) analyses. By combining LIANA and Tensor-cell2cell, we get a general framework that can robustly incorporate many existing CCC inference tools, ultimately retriving consensus communication scores for any sample and analyzing all those samples together to identify context-dependent communication programs.

Initial Setups

Enabling GPU use

First, if you are using a NVIDIA GPU with CUDA cores, set use_gpu=TRUE and enable PyTorch with the following code block. Otherwise, set use_gpu=FALSE or skip this part

[1]:

use_gpu <- TRUE
library(reticulate, quietly = TRUE)

[2]:

reticulate::use_condaenv("ccc_protocols", required = TRUE)

[3]:

if (use_gpu){
    device <- 'cuda:0'
    tensorly <- reticulate::import('tensorly')
    tensorly$set_backend('pytorch')
}else{
    device <- NULL
}

Libraries

Then, import all the packages we will use in this tutorial:

[4]:

suppressMessages({
    library(dplyr, quietly = TRUE)
    library(tidyr, quietly = TRUE)
    library(purrr, quietly = TRUE)
    library(forcats, quietly = TRUE)
    library(textshape, quietly = TRUE)
    library(ggpubr, quietly = TRUE)
    library(rstatix, quietly = TRUE)
    library(reshape2, quietly = TRUE)
    library(tibble, quietly = TRUE)

    library(scater, quietly = TRUE)
    library(scuttle, quietly = TRUE)

    library(ggplot2, quietly = TRUE)

    library(liana, quietly = TRUE)
    library(decoupleR, quietly = TRUE)
#     library("ExperimentHub")
    c2c <- reticulate::import(module = "cell2cell")
})

[5]:

# library(showtext)
# showtext_auto()

Directories

Afterwards, specify the data and output directories:

[6]:

data_folder = '../../data/quickstart_pbmc/'
output_folder = '../../data/quickstart_pbmc/outputs'

for (folder_ in c(data_folder, output_folder)){
    if (!dir.exists(folder_)){
        dir.create(folder_)
    }
}

Loading Data

We begin by loading the single-cell transcriptomics data. For this tutorial, we will use a dataset of ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Originally preprocessed for pertpy.

We can download the data and store it in the SingleCellExperiment format:

[7]:

sce <- readRDS(url('https://zenodo.org/records/10069528/files/kang_counts_25k.RDS?download=1'))

Preprocess Expression

Note, we do not include a batch correction step as Tensor-cell2cell can get robust decomposition results without this; however, we have extensive analysis and discuss regarding this topic in Supplementary Tutorial 01

Quality-Control Filtering

The loaded data has already been pre-processed to a degree and comes with cell annotations. Nevertheless, let’s highlight some of the key steps. To ensure that any noisy cells or features are removed, we filter any non-informative cells and genes:

[8]:

# basic filters
sce <- scater::addPerCellQC(sce)
min.features <- 300
min.cells <- 5

[9]:

# basic outlier filtering
qc <- scater::perCellQCMetrics(sce)

# basic feature filtering
sce <- sce[rowSums(counts(sce) >= 1) >= min.cells, ]

# basic cells filtering
sce <- sce[, colSums(counts(sce) >= 1) >= min.features]

We additionally remove high mitochondrial content:

[10]:

is.mito <- grep("MT-", rownames(sce))
per.cell <- scuttle::perCellQCMetrics(sce, subsets=list(Mito=is.mito))
colData(sce)[['subsets_Mito_percent']] <- per.cell$subsets_Mito_percent
sce <- sce[, colData(sce)$subsets_Mito_percent < 15]
sce

class: SingleCellExperiment
dim: 14658 24297
metadata(0):
assays(1): counts
rownames(14658): AL627309.1 RP11-206L10.2 ... S100B PRMT2
rowData names(1): name
colnames(24297): AAACATACATTTCC-1 AAACATACCAGAAA-1 ... TTTGCATGGGACGA-2
  TTTGCATGTCTTAC-2
colData names(18): nCount_RNA nFeature_RNA ... total
  subsets_Mito_percent
reducedDimNames(2): X_pca X_umap
mainExpName: NULL
altExpNames(0):

Normalization

Normalized counts are usually obtained in two essential steps, the first being count depth scaling which ensures that the measured count depths are comparable across cells. This is then usually followed up with log1p transformation, which essentially stabilizes the variance of the counts and enables the use of linear metrics downstream:

[11]:

scale.factor <- 1e4
lib.sizes <- colSums(counts(sce)) / scale.factor
assay(sce, 'logcounts') <- scuttle::normalizeCounts(sce, size.factors=lib.sizes, log=FALSE, center.size.factors=FALSE)
assay(sce, 'logcounts') <- log1p(assay(sce, 'logcounts'))

Let’s see what our data looks like:

[12]:

head(colData(sce))

DataFrame with 6 rows and 18 columns
                 nCount_RNA nFeature_RNA      tsne1     tsne2 condition
                  <numeric>    <integer>  <numeric> <numeric>  <factor>
AAACATACATTTCC-1       3017          877 -27.640373  14.96663      ctrl
AAACATACCAGAAA-1       2481          713 -27.493646  28.92489      ctrl
AAACATACCATGCA-1        703          337 -10.468194  -5.98439      ctrl
AAACATACCTCGCT-1       3420          850 -24.367997  20.42928      ctrl
AAACATACCTGGTA-1       3158         1111  27.952170  24.15974      ctrl
AAACATACGATGAA-1       1869          635  -0.470236 -25.39871      ctrl
                   cluster       cell_type      patient nCount_SCT nFeature_SCT
                 <integer>        <factor>     <factor>  <numeric>    <integer>
AAACATACATTTCC-1         9 CD14+ Monocytes patient_1016       1704          711
AAACATACCAGAAA-1         9 CD14+ Monocytes patient_1256       1614          662
AAACATACCATGCA-1         3 CD4 T cells     patient_1488        908          337
AAACATACCTCGCT-1         9 CD14+ Monocytes patient_1256       1738          653
AAACATACCTGGTA-1         4 Dendritic cells patient_1039       1857          928
AAACATACGATGAA-1         5 CD4 T cells     patient_1488       1525          634
                 integrated_snn_res.0.4 seurat_clusters    sample cell_abbr
                               <factor>        <factor>  <factor>  <factor>
AAACATACATTTCC-1                     1               1  ctrl&1016      CD14
AAACATACCAGAAA-1                     1               1  ctrl&1256      CD14
AAACATACCATGCA-1                     6               6  ctrl&1488      CD4T
AAACATACCTCGCT-1                     1               1  ctrl&1256      CD14
AAACATACCTGGTA-1                     12              12 ctrl&1039      DCs
AAACATACGATGAA-1                     2               2  ctrl&1488      CD4T
                       sum  detected     total subsets_Mito_percent
                 <numeric> <integer> <numeric>            <numeric>
AAACATACATTTCC-1      3017       877      3017                    0
AAACATACCAGAAA-1      2481       713      2481                    0
AAACATACCATGCA-1       703       337       703                    0
AAACATACCTCGCT-1      3420       850      3420                    0
AAACATACCTGGTA-1      3158      1111      3158                    0
AAACATACGATGAA-1      1869       635      1869                    0

Define columns to use for downstream analysis:

[13]:

sample_col <- 'sample'
condition_col <- 'condition'
idents_col <- 'cell_abbr' # cell types column from SCE metadata

Deciphering Cell-Cell Communication

we will use LIANA to infer the ligand-receptor interactions for each sample. LIANA is highly modularized and it natively implements the formulations of a number of methods, including CellPhoneDBv2, Connectome, log2FC, NATMI, SingleCellSignalR, CellChat, a geometric mean, as well as a consensus in the form of a rank aggregate from any combination of methods

[14]:

liana::show_methods()

'connectome'
'logfc'
'natmi'
'sca'
'cellphonedb'
'cytotalk'
'call_squidpy'
'call_cellchat'
'call_connectome'
'call_sca'
'call_italk'
'call_natmi'

LIANA classifies the scoring functions from the different methods into two categories: those that infer the “Magnitude” and “Specificity” of interactions. We define the “Magnitude” of an interaction as a measure of the strength of the interaction’s expression, and the “Specificity” of an interaction is a measure of how specific an interaction is to a given pair of clusters. Generally, these categories are complementary, and the magnitude of the interaction is often a proxy of the specificity of the interaction. For example, a ligand-receptor interaction with a high magnitude score in a given pair of cell types is likely to also be specific, and vice versa.

For clarity, here we map each output score (column names in the above dataframe) to the scoring method and scoring type:

Method Name	Magnitude Score	Specificity Score
CellPhoneDB	lr.mean	cellphonedb.pvalue
Connectome	prod_weight	weight_sc
log2FC	None	logfc_comb
NATMI	prod_weight	edge_specificity
SingleCellSignalR (sca)	LRscore	None

When considering ligand-receptor prior knowledge resources, a common theme is the trade-off between coverage and quality, and similarly each resource comes with its own biases. In this regard, LIANA builds on OmniPath29 as any of the resources in LIANA are obtained via OmniPath. These include the expert-curated resources of CellPhoneDBv223, CellChat27, ICELLNET30, connectomeDB202025, CellTalkDB31, as well as 10 others. LIANA further provides a consensus expert-curated resource from the aforementioned five resources, along with some curated interactions from SignaLink. In this protocol, we will use the consensus resource from liana, though any of the other resources are available via LIANA, and one can also use liana with their own custom resource.

[15]:

liana::show_resources()

'Default'
'Consensus'
'Baccin2019'
'CellCall'
'CellChatDB'
'Cellinker'
'CellPhoneDB'
'CellTalkDB'
'connectomeDB2020'
'EMBRACE'
'Guide2Pharma'
'HPMR'
'ICELLNET'
'iTALK'
'Kirouac2010'
'LRdb'
'Ramilowski2015'
'OmniPath'
'MouseConsensus'

Selecting any of the lists of ligand-receptor pairs in LIANA can be done through the following command (here we select the aforementioned “consensus” resource:

[16]:

lr_pairs <- liana::select_resource('Consensus')

8501ec5f30e04590a2b8a83e90de5e1e

[17]:

# # check if ITGAD is any of the receptor.complex strings
# df %>% filter(stringr::str_detect(receptor.complex, 'ITGAD'))

Next, we can run LIANA on each sample across the available methods. By default, LIANA calculates an aggregate rank across these mtehods using a re-implementation of the RobustRankAggregate method, and generates a probability distribution for ligand-receptors that are ranked consistently better than expected under a null hypothesis (See Appendix 2). The consensus of ligand-receptor interactions across methods can therefore be treated as a p-value.

[18]:

sce <- liana_bysample(sce = sce,
                      sample_col = sample_col, # context dimension column from SCE metadata
                      idents_col = idents_col, # cell types column from SCE metadata
                      resource = 'Consensus',
                      expr_prop = 0.1, # must be expressed in expr_prop fraction of cells
                      min_cells = 5,
                      assay.type = 'logcounts', # run on log- and library-normalized counts
                      aggregate_how = 'both', # aggregate magnitude and specificity
                      verbose = TRUE,
                      inplace = TRUE,
                      permutation.params=list(nperms=100),
                      return_all=TRUE
                      )

Current sample: ctrl&101

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“2414 genes and/or 0 cells were removed as they had no counts!”
Warning message:
“`invoke()` is deprecated as of rlang 0.4.0.
Please use `exec()` or `inject()` instead.
This warning is displayed once every 8 hours.”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Warning message:
“`progress_estimated()` was deprecated in dplyr 1.0.0.
ℹ The deprecated feature was likely used in the liana package.
  Please report the issue at <https://github.com/saezlab/liana/issues>.”
Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&107

Running LIANA with `cell_abbr` as labels!

Warning message in exec(output, ...):
“3320 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1015

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1038 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1016

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1657 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1039

Running LIANA with `cell_abbr` as labels!

Warning message in exec(output, ...):
“3495 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1244

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1464 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1256

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1313 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: ctrl&1488

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1383 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&101

Running LIANA with `cell_abbr` as labels!

Warning message in exec(output, ...):
“2216 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&107

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“3606 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1015

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1392 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1016

Running LIANA with `cell_abbr` as labels!

Warning message in exec(output, ...):
“1920 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1039

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“3196 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1244

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“2000 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1256

Running LIANA with `cell_abbr` as labels!

Cell identities with less than 5 cells: Mega were removed!

Warning message in exec(output, ...):
“1471 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: stim&1488

Running LIANA with `cell_abbr` as labels!

Warning message in exec(output, ...):
“1227 genes and/or 0 cells were removed as they had no counts!”
Warning message in FUN(...):
“no within-block comparison between Mega and CD4T”
Warning message in FUN(...):
“no within-block comparison between Mega and CD14”
Warning message in FUN(...):
“no within-block comparison between Mega and B”
Warning message in FUN(...):
“no within-block comparison between Mega and NK”
Warning message in FUN(...):
“no within-block comparison between Mega and CD8T”
Warning message in FUN(...):
“no within-block comparison between Mega and FGR3”
Warning message in FUN(...):
“no within-block comparison between Mega and DCs”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

The parameterrs used here are as follows:

sce stands for SingleCellExperiment, and we pass here with an object with a single sample/context.
resource is the name of any of the resources available in liana
idents_col corresponds to the cell group label stored in adata.obs.
assay.type indicates which counts assys in the SCE object to use, here the log-normalized counts are assigned to logcounts, other options include the raw UMI counts stored in counts
expr_prop is the expression proportion threshold (in terms of cells per cell type expressing the protein)threshold of expression for any protein subunit involved in the interaction, according to which we keep or discard the interactions.
min_cells is the minimum number of cells per cell type required for a cell type to be considered in the analysis
aggregate_how specifies whether to aggregate on magnitude score types, specificity score types, or both
verbose is a boolean that indicates whether to print the progress of the function
inplace is a boolean that indicates whether storing the results in place, i.e. to adata.uns[“liana_res”].

Let’s see what the results look like:

[19]:

liana_res <- sce@metadata$liana_res %>%
            bind_rows(.id = sample_col) %>%
            mutate(!!sample_col := factor(.[[sample_col]], levels = unique(.[[sample_col]])))
head(liana_res)

A tibble: 6 × 16
sample	source	target	ligand.complex	receptor.complex	magnitude_rank	specificity_rank	magnitude_mean_rank	natmi.prod_weight	sca.LRscore	cellphonedb.lr.mean	specificity_mean_rank	natmi.edge_specificity	connectome.weight_sc	logfc.logfc_comb	cellphonedb.pvalue
<fct>	<chr>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
ctrl&101	FGR3	CD14	TIMP1	CD63	1.789712e-13	3.292801e-07	1.000000	20.35529	0.9750467	4.603019	192.375	0.10078179	1.4100654	2.550752	0
ctrl&101	CD14	CD14	TIMP1	CD63	1.431769e-12	3.292801e-07	2.000000	17.40347	0.9730681	4.203108	221.125	0.08616690	1.2365363	2.373106	0
ctrl&101	FGR3	DCs	TIMP1	CD63	4.832222e-12	6.258644e-07	3.000000	15.74580	0.9717250	4.185146	250.625	0.07795956	1.1598158	1.713355	0
ctrl&101	DCs	CD14	TIMP1	CD63	1.145416e-11	7.412884e-07	4.000000	15.38291	0.9714029	3.929365	257.125	0.07616284	1.1177537	1.805500	0
ctrl&101	NK	NK	B2M	KLRD1	6.138712e-11	3.292801e-07	6.333333	10.83300	0.9661083	3.916777	219.375	0.09286514	1.6962340	1.242624	0
ctrl&101	CD14	DCs	TIMP1	CD63	9.163325e-11	1.395631e-06	6.000000	13.46242	0.9694913	3.785236	294.625	0.06665424	0.9862868	1.535708	0

This dataframe provides the results from running each method, as well as the consensus scores across the methods. In our case, we are interested in the magnitude consensus scored denoted by the 'magnitude_rank' column.

We can visualize the output as a dotplot across multiple samples:

[20]:

ligand_complex <- 'B2M'
receptor_complex <- c("KLRD1", "LILRB2", "CD3D")
source_labels <- c("CD4T", "B", "FGR3")
target_labels <- c("CD8T", 'DCs', 'CD14')
colour <- 'magnitude_rank'
size <- "specificity_rank"

lv <- liana_res %>%
    filter(source %in% source_labels) %>%
    filter(target %in% target_labels) %>%
    filter(ligand.complex %in% ligand_complex) %>%
    filter(receptor.complex %in% receptor_complex) %>%
    unite(lr.pairs, ligand.complex, receptor.complex, sep = ' -> ') %>%
    mutate(lr.pairs = factor(lr.pairs, levels = unique(lr.pairs))) %>%
    # inverse ranks
    mutate_at(vars(specificity_rank, magnitude_rank), ~-log10(. + 1e-10))

[21]:

h_ = 16
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

ggplot(lv, aes(x = target, y = source, color = .data[[colour]], size=.data[[size]])) +
        geom_point() +
        facet_wrap(sample ~ lr.pairs, ncol = 12, scales='free_x') +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 90))

../../_images/notebooks_ccc_R_QuickStart_pbmc_44_0.png

Let’s export it in case we want to access it later:

[22]:

write.csv(liana_res, file.path(output_folder, '/rLIANA_by_sample.csv'))

Alternatively, one could just export the whole SingleCellExperiment object, together with the ligand-receptor results stored:

[23]:

saveRDS(sce, file.path(output_folder, '/sce_processed.rds'))

Comparing cell-cell communication across multiple samples

We can use Tensor-cell2cell to infer context-dependent CCC patterns from multiple samples simultaneously. To do so, we must first restructure the communication scores (LIANA’s output) into a 4D-Communication Tensor.

The tensor is built as follows: we create matrices with the communication scores for each of the ligand-receptor pairs within the same sample, then generate a 3D tensor for each sample, and finally concatenate them to form the 4D tensor:

ccc-scores

First, we generate a list containing all samples from our SingleCellExperiment object. Here, we can find the names of each of the samples in the sample column of the adata.obs information:

[24]:

sorted_samples <- sort(names(sce@metadata$liana_res))

Then we can directly pass the communication scores from LIANA to build the 3D tensors for each sample (panel c in last figure), and concatenate them, with the following function:

[25]:

# build the tensor
context_df_dict <- liana:::preprocess_scores(context_df_dict = sce@metadata[['liana_res']],
                                             invert = TRUE, # transform the scores
                                             invert_fun = function(x) 1-x, # Transformation function
                                             non_negative = TRUE, # fills negative values
                                             non_negative_fill = 0 # set negative values to 0
                             )

tensor <- liana_tensor_c2c(context_df_dict = context_df_dict,
                          sender_col = "source", # Column name of the sender cells
                          receiver_col = "target", # Column name of the receiver cells
                          ligand_col = "ligand.complex", # Column name of the ligands
                          receptor_col = "receptor.complex", # Column name of the receptors
                          score_col = 'magnitude_rank', # Column name of the communication scores to use
                          how='outer_cells',  # What to include across all samples
                          outer_fraction=1/3, # Fraction of samples as threshold to include cells and LR pairs.
                          context_order=sorted_samples, # Order to store the contexts in the tensor
                          conda_env = 'ccc_protocols', # used to pass an existing conda env with cell2cell
                          build_only = TRUE, # set this to FALSE to combine the downstream rank selection and decomposition steps all here
                          device = device # Device to use when backend is pytorch.
                        )

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

[1] 0

Loading `ccc_protocols` Conda Environment

Building the tensor using magnitude_rank...

The key parameters when building a tensor are:

context_df_dict is the dataframe containing the results from LIANA, usually located in sce@metadata[['liana_res']].
sender_col, receiver_col, ligand_col, receptor_col, and score_col are the column names in the dataframe containing the samples, sender cells, receiver cells, ligands, receptors, and communication scores, respectively. Each row of the dataframe contains a unique combination of these elements.
invert and invert_fun control the function we use to convert the communication score before using it to build the tensor. In this case, the ‘magnitude_rank’ score generated by LIANA considers low values as the most important ones, ranging from 0 to 1. In contrast, Tensor-cell2cell requires higher values to be the most important scores, so here we pass a function (function(x) 1-x) to adapt LIANA’s magnitude-rank scores (subtracts the LIANA’s score from 1).
how controls which ligand-receptor pairs and cell types to include when building the tensor. This decision depends on whether the missing values across a number of samples for both ligand-receptor interactions and sender-receiver cell pairs are considered to be biologically-relevant. Options are:
- 'inner' is the more strict option since it only considers only cell types and LR pairs that are present in all contexts (intersection).
- 'outer' considers all cell types and LR pairs that are present across contexts (union).
- 'outer_lrs' considers only cell types that are present in all contexts (intersection), while all LR pairs that are present across contexts (union).
- 'outer_cells' considers only LR pairs that are present in all contexts (intersection), while all cell types that are present across contexts (union).
outer_frac controls the elements to include in the union scenario of the how options. Only elements that are present at least in this fraction of samples/contexts will be included. When this value is 0, the tensor includes all elements across the samples. When this value is 1, it acts as using how='inner'.
context_order is a list specifying the order of the samples. The order of samples does not affect the results, but it is useful for posterior visualizations.

We can check the shape of this tensor to verify the number of samples, LR pairs, sender cells, adn receiver cells, respectively:

[26]:

tensor$shape

torch.Size([16, 467, 7, 7])

We can export our tensor:

[27]:

reticulate::py_save_object(object = tensor,
                           filename = file.path(output_folder, 'pbmc.pkl'))

Then, we can load it with:

[28]:

tensor <- reticulate::py_load_object(filename = file.path(output_folder, 'pbmc.pkl'))

Perform Tensor Factorization

Now that we have built the tensor and its metadata, we can run Tensor Component Analysis via Tensor-cell2cell with one simple command that we implemented for our unified framework:

[29]:

tensor <- liana::decompose_tensor(tensor = tensor,
                                  rank = 7, # Number of factors to perform the factorization. If NULL, it is automatically determined by an elbow analysis
                                  tf_optimization = 'regular', # To define how robust we want the analysis to be.
                                  seed = 0, # Random seed for reproducibility
                                  factors_only = FALSE,
                                  )

Decomposing the tensor...

Key parameters are:

rank is the number of factors or latent patterns we want to obtain from the analysis. You can either indicate a specific number or leave it as None to perform the decomposition with a suggested number from an elbow analysis.
tf_optimization indicates whether running the analysis in the 'regular' or the 'robust' way. The 'regular' way runs the tensor decomposition less number of times than the robust way to select an optimal result. Additionally, the former employs less strict convergence parameters to obtain optimal results than the latter, which is also translated into a faster generation of results. Important: When using tf_optimization='robust' the analysis takes much longer to run than using tf_optimization='regular'. However, the latter may generate less robust results.
seed is the seed for randomization. It controls the randomization used when initializing the optimization algorithm that performs the tensor decomposition. It is useful for reproducing the same result every time that the analysis is run. If NULL, a different randomization will be used each time.

[30]:

print(paste0('The estimated tensor rank is: ', tensor$rank))

[1] "The estimated tensor rank is: 7"

[31]:

# # Estimate standard error
# error_average <- tensor$elbow_metric_raw %>%
#     t() %>%
#     as.data.frame() %>%
#     mutate(rank=row_number()) %>%
#     pivot_longer(-rank, names_to = "run_no", values_to = "error") %>%
#     group_by(rank) %>%
#     summarize(average = mean(error),
#               N = n(),
#               SE.low = average - (sd(error)/sqrt(N)),
#               SE.high = average + (sd(error)/sqrt(N))
#            )

# # plot
# error_average %>%
#     ggplot(aes(x=rank, y=average), group=1) +
#     geom_line(col='red') +
#     geom_ribbon(aes(ymin = SE.low, ymax = SE.high), alpha = 0.1) +
#     geom_vline(xintercept = tensor$rank, colour='darkblue') + # rank of interest
#     theme_bw() +
#     labs(y="Error", x="Rank")

[32]:

sce@metadata$tensor_res <- liana:::format_c2c_factors(tensor$factors)

h_ = 13
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

plot_c2c_overview(sce = sce, group_col = condition_col, sample_col = sample_col)

../../_images/notebooks_ccc_R_QuickStart_pbmc_67_0.png

The figure representing the loadings in each factor generated here can be interpreted by interconnecting all dimensions within a single factor. For example, if we take one of these factors, the cell-cell communication program occurs in each sample proportionally to their loadings. Then, this signature can be interpreted with the loadings of the ligand-receptor pairs, sender cells, and receiver cells. Ligands in high-loading ligand-receptor pairs are sent predominantly by high-loading sender cells, and interact with the cognate receptors on the high-loadings receiver cells.

Downstream Visualizations: Making sense of the factors

After running the decomposition, the results are stored in the factors attribute of the tensor object. This attribute is a dictionary containing the loadings for each of the elements in every tensor dimension. Loadings delineate the importance of that particular element to that factor. Keys are the names of the different dimension.

[33]:

names(tensor$factors)

'contexts'
'interactions'
'senders'
'receivers'

We can inspect the loadings of the samples, for example, located under the key 'Contexts':

[34]:

tensor$factors[['contexts']]

A data.frame: 16 × 7
	Factor 1	Factor 2	Factor 3	Factor 4	Factor 5	Factor 6	Factor 7
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
ctrl&101	0.2753879	0.2505314	0.2690610	0.2148839	3.192032e-10	0.2126195	0.2452438
ctrl&1015	0.3073317	0.2915613	0.2911225	0.2295018	2.224032e-02	0.2348130	0.2771375
ctrl&1016	0.2563827	0.2189345	0.2331933	0.2479214	2.264486e-02	0.2436604	0.2151357
ctrl&1039	0.2990260	0.2944541	0.2803146	0.2287531	1.721946e-02	0.2271619	0.2805458
ctrl&107	0.2986128	0.2080037	0.2611433	0.2766160	5.844633e-05	0.2261954	0.2447433
ctrl&1244	0.2911649	0.2651903	0.2765718	0.2586165	6.973772e-04	0.2399731	0.2612663
ctrl&1256	0.2946287	0.2670734	0.2826619	0.1966789	4.576398e-02	0.2093361	0.2895984
ctrl&1488	0.2718269	0.2215898	0.2579891	0.2383075	2.803234e-07	0.1915762	0.2818053
stim&101	0.2334743	0.2181655	0.2463230	0.2190290	3.502629e-01	0.2658829	0.2658360
stim&1015	0.1985835	0.3148001	0.2269200	0.2395267	4.272637e-01	0.2638037	0.2652130
stim&1016	0.2111627	0.2675103	0.2396157	0.2353978	3.159429e-01	0.2588787	0.2261185
stim&1039	0.2230813	0.2231751	0.2152400	0.2542573	2.852869e-01	0.2738154	0.1957329
stim&107	0.2012976	0.1892416	0.2373782	0.3299971	3.240025e-01	0.2905101	0.2175857
stim&1244	0.1902167	0.2339165	0.2275528	0.3002013	3.769016e-01	0.2738758	0.2494139
stim&1256	0.2092426	0.2733710	0.2208163	0.2139765	3.677670e-01	0.2668102	0.2410888
stim&1488	0.1764407	0.2241510	0.2143701	0.2797826	3.575881e-01	0.2941726	0.2202224

Downstream Visualizations

We can use these loadings to compare pairs of sample major groups with boxplots and statistical tests:

[35]:

sample_loadings <- liana::get_c2c_factors(sce,
                                          sample_col=sample_col,
                                          group_col=condition_col) %>%
                                          purrr::pluck("contexts") %>%
                                          pivot_longer(-c('condition', 'context'),
                                                names_to = "factor",
                                                values_to = "loadings") %>%
                                          mutate(factor = factor(factor)) #%>%
                                          # mutate(factor = fct_relevel(factor, "Factor.10", after=9))

h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

# Do pairwise tests
pwc <- sample_loadings %>%
    group_by(factor) %>%
    rstatix::pairwise_t_test(
      loadings ~ condition,
      p.adjust.method = "fdr"
      )

[36]:

# plot
pwc <- pwc %>% add_xy_position(x = condition_col)
ggboxplot(sample_loadings, x = condition_col, y = "loadings", add = "point", fill=condition_col) +
    facet_wrap(~factor, ncol = 4) +
    stat_pvalue_manual(pwc) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

../../_images/notebooks_ccc_R_QuickStart_pbmc_76_0.png

Using the loadings for any dimension, we can also generate heatmaps for the elements with loadings above a certain threshold. Additionally, we can cluster these elements by the similarity of their loadings across all factors:

[37]:

library(ComplexHeatmap)
pushViewport(viewport(gp = gpar(fontfamily = "Arial")))

h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

liana::plot_lr_heatmap(sce = sce,  n = 5)

Loading required package: grid

========================================
ComplexHeatmap version 2.14.0
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference

If you use it in published research, please cite either one:
- Gu, Z. Complex Heatmap Visualization. iMeta 2022.
- Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
    genomic data. Bioinformatics 2016.


The new InteractiveComplexHeatmap package can directly export static
complex heatmaps into an interactive Shiny app with zero effort. Have a try!

This message can be suppressed by:
  suppressPackageStartupMessages(library(ComplexHeatmap))
========================================

../../_images/notebooks_ccc_R_QuickStart_pbmc_78_1.png

Here patients are grouped by the importance that each communication pattern (factor) has in relation to the other patients. This captures combinations of related communication patterns that explain similarities and differences at a sample-specific resolution.

Overall CCI potential: Heatmap and network visualizations of sender-receiver cell pairs

In addition, we can also evaluate the overall interactions between sender-receiver cell pairs that are determinant for a given factor or program. We can do it through a heatmap where the X-axis represent the receiver cells and the Y-axis shows the receiver cells. Here, the potential of interaction is calculated as the outer product between the loadings for the sender and receiver cells dimensions of a particular factor. To illustrate this, we chose Factor 5, but this can be repeated for every factor obtained in the decomposition.

[38]:

selected_factor <- 'Factor.5'
liana::plot_c2c_cells(sce = sce,
                   factor_of_int = selected_factor,
                   name = "Loadings \nProduct")

../../_images/notebooks_ccc_R_QuickStart_pbmc_82_0.png

Similarly, an interaction network can be created for each factor by using the loading product between sender and receiver cells. First we need to choose a threshold to indicate what pair of cells are interacting. This can be done as shown in the extended tutorial

[39]:

threshold = 0.075

Then, we can plot all networks we are interested in:

[40]:

# format for use with cell2cell
format_py_df<-function(df){
    df<-df %>%
    as.data.frame() %>%
    textshape::column_to_rownames() %>%
    select(-contains("condition"))

    return(df)
}
factors <- liana::get_c2c_factors(sce = sce,
                            sample_col = sample_col,
                            group_col = condition_col)
py_factors<-unname(lapply(factors, function(df) format_py_df(df)))
py_factors = reticulate::py_dict(keys = names(factors), values = py_factors)

c2c$plotting$ccc_networks_plot(py_factors,
                               included_factors=c('Factor.1', 'Factor.5', 'Factor.6'),
                               sender_label='senders',
                               receiver_label='receivers',
                               ccc_threshold=threshold, # Only important communication
                               nrows=as.integer(1),
                               filename=file.path(output_folder, 'network_plot_qsp_r.png')
                              )

[[1]]
<Figure size 2400x800 with 3 Axes>

[[2]]
[[2]][[1]]
<Axes: title={'center': 'Factor.1'}>

[[2]][[2]]
<Axes: title={'center': 'Factor.5'}>

[[2]][[3]]
<Axes: title={'center': 'Factor.6'}>

network_plot_qsp

Pathway Enrichment Analysis: Interpreting the context-driven communication

Classical Pathway Enrichment

As the number of inferred interactions increases, the interpretation of the inferred cell-cell communication networks becomes more challenging. To this end, we can perform pathway enrichment analysis to identify the general biological processes that are enriched in the inferred interactions. Here, we will perform classical gene set enrichment analysis with Reactome gene sets.

For the pathway enrichment analysis with GSEA, we use ligand-receptor pairs instead of individual genes. Reactome was initially designed to work with sets of genes, so first we need to generate ligand-receptor sets for each of it’s pathways:

[41]:

# Generate list with ligand-receptors pairs in DB
lr_pairs <- liana::select_resource('Consensus')[[1]] %>%
    select(ligand = source_genesymbol, receptor = target_genesymbol)
lr_list <- lr_pairs %>%
     unite('interaction', ligand, receptor, sep = '^') %>%
     pull(interaction)

# Specify the organism and pathway database to use for building the LR set
organism = "human"
pathwaydb = "Reactome"

# Generate ligand-receptor gene sets
lr_set <- c2c$external$generate_lr_geneset(lr_list = lr_list,
                                           complex_sep='_', # Separation symbol of the genes in the protein complex
                                           lr_sep='^', # Separation symbol between a ligand and a receptor complex
                                           organism=organism,
                                           pathwaydb=pathwaydb,
                                           readable_name=TRUE
                                           )

Next, we can perform enrichment analysis on each factor using the loadings of the ligand-receptor pairs to obtain the normalized-enrichment scores (NES) and corresponding P-values from GSEA:

[42]:

lr_loadings = factors[['interactions']] %>%
                column_to_rownames("lr")

gsea_res <- c2c$external$run_gsea(loadings=lr_loadings,
                               lr_set=lr_set,
                               output_folder=output_folder,
                               weight=as.integer(1),
                               min_size=as.integer(15),
                               permutations=as.integer(999),
                               processes=as.integer(6),
                               random_state=as.integer(6),
                               significance_threshold=as.numeric(0.05)
                              )
names(gsea_res) <- c('pvals', 'scores', 'gsea_df')

The enriched pathways for each factor are:

[43]:

gsea_df <- gsea_res$gsea_df
gsea_df %>%
    filter(`Adj. P-value` < 0.05 & NES > 0) %>%
    arrange(desc(abs(NES)))

A data.frame: 14 × 5
	Factor	Term	NES	P-value	Adj. P-value
	<chr>	<chr>	<dbl>	<dbl>	<dbl>
0	Factor.1	IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL	1.208194	0.001000000	0.01611612
46	Factor.3	NEUTROPHIL DEGRANULATION	1.156093	0.001000000	0.01611612
92	Factor.5	SIGNALING BY INTERLEUKINS	1.137917	0.003003003	0.03453453
2	Factor.1	ADAPTIVE IMMUNE SYSTEM	1.136481	0.001000000	0.01611612
93	Factor.5	IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL	1.123732	0.002002002	0.02479402
47	Factor.3	IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL	1.122184	0.002002002	0.02479402
23	Factor.2	INFECTIOUS DISEASE	1.121033	0.001000000	0.01611612
48	Factor.3	ADAPTIVE IMMUNE SYSTEM	1.116904	0.001000000	0.01611612
24	Factor.2	IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL	1.116482	0.001000000	0.01611612
25	Factor.2	ADAPTIVE IMMUNE SYSTEM	1.101746	0.001000000	0.01611612
49	Factor.3	INNATE IMMUNE SYSTEM	1.099481	0.002002002	0.02479402
69	Factor.4	IMMUNOREGULATORY INTERACTIONS BETWEEN A LYMPHOID AND A NON LYMPHOID CELL	1.095260	0.001001001	0.01611612
70	Factor.4	ADAPTIVE IMMUNE SYSTEM	1.068020	0.001001001	0.01611612
117	Factor.6	CYTOKINE SIGNALING IN IMMUNE SYSTEM	1.066745	0.001001001	0.01611612

The depleted pathways are:

[44]:

gsea_df %>%
    filter(`Adj. P-value` < 0.05 & NES < 0) %>%
    arrange(desc(abs(NES)))

A data.frame: 0 × 5
Factor	Term	NES	P-value	Adj. P-value
<chr>	<chr>	<dbl>	<dbl>	<dbl>

Finally, we can visualize the enrichment results using a dotplot:

[45]:

gsea.dotplot<-function(pval_df, score_df, significance = 0.05, font_size = 15){
    pval_df <- pval_df[apply(pval_df<significance,1,any), apply(pval_df<significance, 2, any)] %>%
            mutate_all(function(x) -1*log10(x + 1e-9))
    score_df <- score_df[rownames(pval_df), colnames(pval_df)]
    pval_df <- pval_df %>%
                    tibble::rownames_to_column('Annotation')
    score_df <- score_df %>%
                    tibble::rownames_to_column('Annotation')

    viz_df <- cbind(melt(pval_df), melt(score_df)[[3]])
    names(viz_df) <- c('Annotation', 'Factor', 'Significance', 'NES')


    dotplot <- ggplot(viz_df, aes(x = Factor, y = Annotation)) + geom_point(aes(size = Significance, color = NES)) +
                scale_color_gradient2() + theme_bw(base_size = font_size) +
                theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5))+
                       scale_size(range = c(0, 8)) +
                       guides(size=guide_legend(title="-log10(p-value)"))
    return(dotplot)
}

[46]:

h_ = 10
w_ = 12
options(repr.plot.height=h_, repr.plot.width=w_)

dotplot <- gsea.dotplot(pval_df = gsea_res$pvals,
                       score_df = gsea_res$scores,
                       significance = 0.05)
dotplot

Using Annotation as id variables

Using Annotation as id variables

../../_images/notebooks_ccc_R_QuickStart_pbmc_100_1.png

Footprint Enrichment

Footprint enrichment analysis build upon classic geneset enrichment analysis, as instead of considering the genes involved in a biological activity, they consider the genes affected by the activity, or in other words the genes that change downstream of said activity (Dugourd and Saez-Rodriguez, 2019).

In this case, we will use the PROGENy pathway resource to perform footprint enrichment analysis. PROGENy was built in a data-driven manner using perturbation and cancer lineage data (Schubert et al, 2019), as a consequence it also assigns different importances or weights to each gene in its pathway genesets. To this end, we need an enrichment method that can take weights into account, and here we will use multi-variate linear models from the decoupler-py package to perform this analysis (Badia-i-Mompel et al., 2022).

Let’s load the PROGENy genesets and then convert them to sets of weighted ligand-receptor pairs:

[47]:

# obtain progeny gene sets
net <- decoupleR::get_progeny(organism = 'human', top=5000) %>%
    dplyr::select(-p_value)

# convert to LR sets
progeny_lr <- liana::generate_lr_geneset(sce = sce,
                                          resource = net)

Next, we can run the footprint enrichment analysis:

[48]:

# interaction loadings to matrix
mat <- factors$interactions %>%
    textshape::column_to_rownames("lr") %>%
  as.matrix()

# run enrichment analysis with decoupler
# (we fit a univariate linear model for each gene set)
# We don't consider genesets with minsize < 10
res <- decoupleR::run_ulm(mat = mat,
                          network = progeny_lr,
                          .source = "set",
                          .target = "lr",
                          minsize=10) %>%
  mutate(p_adj = p.adjust(p_value, method = "fdr"))

Finally, we can visualize the results:

[49]:

res %>% # sig/isnig flag
  mutate(significant = if_else(p_adj <= 0.05, "signif.", "not")) %>%
  ggplot(aes(x=source, y=condition, shape=significant,
             colour=score, size=-log10(p_value+1e-36))) +
  geom_point() +
  scale_colour_gradient2(high = "red", low="blue") +
  scale_size_continuous(range = c(3, 12)) +
  scale_shape_manual(values=c(21, 16)) +
  theme_bw(base_size = 15) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(x="Pathway",
       y="Factor",
       colour="Activity"
       )

../../_images/notebooks_ccc_R_QuickStart_pbmc_107_0.png

Let’s zoom in on Factor 5 which is associated with interferon-beta stimulation.

[50]:

selected_factor = 'Factor.5'
sample_loadings <- liana::get_c2c_factors(sce,
                                          sample_col=sample_col,
                                          group_col=condition_col) %>%
                                          purrr::pluck("contexts") %>%
                                          pivot_longer(-c(condition_col, 'context'),
                                                names_to = "factor",
                                                values_to = "loadings") %>%
                                          mutate(factor = factor(factor))

res.factor <- res %>%
                filter(condition == selected_factor) %>%
                mutate(p_adj = p.adjust(p_value, method = "fdr")) %>%
                arrange(desc(score)) %>%
                mutate(source = factor(source, levels = source))

ggplot(data=res.factor, aes(x=score, y=source, fill = score)) +
  geom_bar(stat="identity", width=0.5) + theme_bw() + scale_fill_gradient2(trans = 'reverse') +
ylab('') + xlab('Activity')

Warning message:
“Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(condition_col)

  # Now:
  data %>% select(all_of(condition_col))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.”

../../_images/notebooks_ccc_R_QuickStart_pbmc_109_1.png

[ ]: