LIANA x Tensor-cell2cell Quickstart (R)

This tutorial provides an abbreviated version of Tutorials 01-06 in order to give a quick overview of the pipeline to go from a raw counts matrix to downstream cell-cell communication (CCC) analyses. By combining LIANA and Tensor-cell2cell, we get a general framework that can robustly incorporate many existing CCC inference tools, ultimately retriving consensus communication scores for any sample and analyzing all those samples together to identify context-dependent communication programs.

Initial Setups

Enabling GPU use

First, if you are using a NVIDIA GPU with CUDA cores, set use_gpu=TRUE and enable PyTorch with the following code block. Otherwise, set use_gpu=FALSE or skip this part

[1]:

use_gpu <- TRUE
library(reticulate, quietly = TRUE)
if (use_gpu){
    device <- 'cuda:0'
    tensorly <- reticulate::import('tensorly')
    tensorly$set_backend('pytorch')
}else{
    device <- NULL
}

Libraries

Then, import all the packages we will use in this tutorial:

[2]:

suppressMessages({
    library(dplyr, quietly = TRUE)
    library(tidyr, quietly = TRUE)
    library(purrr, quietly = TRUE)
    library(forcats, quietly = TRUE)
    library(textshape, quietly = TRUE)
    library(ggpubr, quietly = TRUE)
    library(rstatix, quietly = TRUE)
    library(reshape2, quietly = TRUE)
    library(tibble, quietly = TRUE)

    library(scater, quietly = TRUE)
    library(scuttle, quietly = TRUE)

    library(ggplot2, quietly = TRUE)

    library(liana, quietly = TRUE)
    library(decoupleR, quietly = TRUE)
    c2c <- reticulate::import(module = "cell2cell")
})

Directories

Afterwards, specify the data and output directories:

[3]:

data_folder = '../../data/quickstart/'
output_folder = '../../data/quickstart/outputs'

for (folder_ in c(data_folder, output_folder)){
    if (!dir.exists(folder_)){
        dir.create(folder_)
    }
}

Loading Data

We begin by loading the single-cell transcriptomics data. For this tutorial, we will use a lung dataset of 63k immune and epithelial cells across three control, three moderate, and six severe COVID-19 patients21. We can download the data and store it in the SingleCellExperiment format:

[4]:

options(timeout=600)
covid_data <- readRDS(url('https://zenodo.org/record/7706962/files/BALF-COVID19-Liao_et_al-NatMed-2020_SCE.rds'))

Preprocess Expression

Note, we do not include a batch correction step as Tensor-cell2cell can get robust decomposition results without this; however, we have extensive analysis and discuss regarding this topic in Supplementary Tutorial 01

Quality-Control Filtering

The loaded data has already been pre-processed to a degree and comes with cell annotations. Nevertheless, let’s highlight some of the key steps. To ensure that any noisy cells or features are removed, we filter any non-informative cells and genes:

[5]:

# basic filters
covid_data <- scater::addPerCellQC(covid_data)
min.features <- 200
min.cells <- 3

sce_counts <- counts(covid_data)

if (min.features > 0) {
  nfeatures <- Matrix::colSums(x = sce_counts > 0)
  sce_counts <- sce_counts[, which(x = nfeatures >= min.features)]
}
# filter genes on the number of cells expressing
if (min.cells > 0) {
  num.cells <- Matrix::rowSums(x = sce_counts > 0)
  sce_counts <- sce_counts[which(x = num.cells >= min.cells), ]
}

covid_data<-covid_data[rownames(sce_counts), colnames(sce_counts)]

We additionally remove high mitochondrial content:

[6]:

is.mito <- grep("MT-", rownames(covid_data))
per.cell <- scuttle::perCellQCMetrics(covid_data, subsets=list(Mito=is.mito))
colData(covid_data)[['subsets_Mito_percent']] <- per.cell$subsets_Mito_percent
covid_data <- covid_data[, colData(covid_data)$subsets_Mito_percent < 15]

Which is followed by removing cells with high number of total UMI counts, potentially representing more than one single cell (doublets):

[7]:

covid_data <- covid_data[, colData(covid_data)$detected < 5500]

Normalization

Normalized counts are usually obtained in two essential steps, the first being count depth scaling which ensures that the measured count depths are comparable across cells. This is then usually followed up with log1p transformation, which essentially stabilizes the variance of the counts and enables the use of linear metrics downstream:

[8]:

scale.factor <- 1e4
lib.sizes <- colSums(counts(covid_data)) / scale.factor
assay(covid_data, 'logcounts') <- scuttle::normalizeCounts(covid_data, size.factors=lib.sizes, log=FALSE, center.size.factors=FALSE)
assay(covid_data, 'logcounts') <- log1p(assay(covid_data, 'logcounts'))

Let’s see what our data looks like:

[9]:

head(colData(covid_data))

DataFrame with 6 rows and 13 columns
                      orig.ident      sample  sample_new     disease
                     <character> <character> <character> <character>
AAACCCACAGCTACAT-1_1        C100        C100         HC3           N
AAACCCATCCACGGGT-1_1        C100        C100         HC3           N
AAACCCATCCCATTCG-1_1        C100        C100         HC3           N
AAACGAACAAACAGGC-1_1        C100        C100         HC3           N
AAACGAAGTCGCACAC-1_1        C100        C100         HC3           N
AAACGAAGTCTATGAC-1_1        C100        C100         HC3           N
                         hasnCoV   cluster   cell.type   condition    ident
                     <character> <integer> <character> <character> <factor>
AAACCCACAGCTACAT-1_1           N        27           B     Control     C100
AAACCCATCCACGGGT-1_1           N        23 Macrophages     Control     C100
AAACCCATCCCATTCG-1_1           N         6           T     Control     C100
AAACGAACAAACAGGC-1_1           N        10 Macrophages     Control     C100
AAACGAAGTCGCACAC-1_1           N        10 Macrophages     Control     C100
AAACGAAGTCTATGAC-1_1           N         9           T     Control     C100
                           sum  detected     total subsets_Mito_percent
                     <numeric> <integer> <numeric>            <numeric>
AAACCCACAGCTACAT-1_1      3124      1377      3124             9.189882
AAACCCATCCACGGGT-1_1      1430       836      1430             0.909727
AAACCCATCCCATTCG-1_1      2342      1105      2342             6.319385
AAACGAACAAACAGGC-1_1     31378      4530     31378             9.981516
AAACGAAGTCGCACAC-1_1     12767      3409     12767             5.161745
AAACGAAGTCTATGAC-1_1      2198      1094      2198             8.871702

Deciphering Cell-Cell Communication

we will use LIANA to infer the ligand-receptor interactions for each sample. LIANA is highly modularized and it natively implements the formulations of a number of methods, including CellPhoneDBv2, Connectome, log2FC, NATMI, SingleCellSignalR, CellChat, a geometric mean, as well as a consensus in the form of a rank aggregate from any combination of methods

[10]:

liana::show_methods()

'connectome'
'logfc'
'natmi'
'sca'
'cellphonedb'
'cytotalk'
'call_squidpy'
'call_cellchat'
'call_connectome'
'call_sca'
'call_italk'
'call_natmi'

LIANA classifies the scoring functions from the different methods into two categories: those that infer the “Magnitude” and “Specificity” of interactions. We define the “Magnitude” of an interaction as a measure of the strength of the interaction’s expression, and the “Specificity” of an interaction is a measure of how specific an interaction is to a given pair of clusters. Generally, these categories are complementary, and the magnitude of the interaction is often a proxy of the specificity of the interaction. For example, a ligand-receptor interaction with a high magnitude score in a given pair of cell types is likely to also be specific, and vice versa.

For clarity, here we map each output score (column names in the above dataframe) to the scoring method and scoring type:

Method Name	Magnitude Score	Specificity Score
CellPhoneDB	lr.mean	cellphonedb.pvalue
Connectome	prod_weight	weight_sc
log2FC	None	logfc_comb
NATMI	prod_weight	edge_specificity
SingleCellSignalR (sca)	LRscore	None

When considering ligand-receptor prior knowledge resources, a common theme is the trade-off between coverage and quality, and similarly each resource comes with its own biases. In this regard, LIANA builds on OmniPath29 as any of the resources in LIANA are obtained via OmniPath. These include the expert-curated resources of CellPhoneDBv223, CellChat27, ICELLNET30, connectomeDB202025, CellTalkDB31, as well as 10 others. LIANA further provides a consensus expert-curated resource from the aforementioned five resources, along with some curated interactions from SignaLink. In this protocol, we will use the consensus resource from liana, though any of the other resources are available via LIANA, and one can also use liana with their own custom resource.

[11]:

liana::show_resources()

'Default'
'Consensus'
'Baccin2019'
'CellCall'
'CellChatDB'
'Cellinker'
'CellPhoneDB'
'CellTalkDB'
'connectomeDB2020'
'EMBRACE'
'Guide2Pharma'
'HPMR'
'ICELLNET'
'iTALK'
'Kirouac2010'
'LRdb'
'Ramilowski2015'
'OmniPath'
'MouseConsensus'

Selecting any of the lists of ligand-receptor pairs in LIANA can be done through the following command (here we select the aforementioned “consensus” resource:

[12]:

lr_pairs <- liana::select_resource('Consensus')

93d60c08422d4e8f967557df1334d8a7

Next, we can run LIANA on each sample across the available methods. By default, LIANA calculates an aggregate rank across these mtehods using a re-implementation of the RobustRankAggregate method, and generates a probability distribution for ligand-receptors that are ranked consistently better than expected under a null hypothesis (See Appendix 2). The consensus of ligand-receptor interactions across methods can therefore be treated as a p-value.

[13]:

covid_data <- liana_bysample(sce = covid_data,
                             sample_col = "sample_new", # context dimension column from SCE metadata
                             idents_col = "cell.type", # cell types column from SCE metadata
                             resource = 'Consensus',
                             expr_prop = 0.1, # must be expressed in expr_prop fraction of cells
                              min_cells = 5,
                             assay.type = 'logcounts', # run on log- and library-normalized counts
                             aggregate_how = 'both', # aggregate magnitude and specificity
                             verbose = TRUE,
                             inplace = TRUE,
                             permutation.params=list(nperms=100),
#                              parallelize = TRUE,
#                              workers = 40
                        )

`sample_new` was converted to a factor!

Current sample: HC1

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Plasma were removed!Cell identities with less than 5 cells: pDC were removed!

Warning message in exec(output, ...):
“5711 genes and/or 0 cells were removed as they had no counts!”
Warning message:
“`invoke()` is deprecated as of rlang 0.4.0.
Please use `exec()` or `inject()` instead.
This warning is displayed once every 8 hours.”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Warning message:
“`progress_estimated()` was deprecated in dplyr 1.0.0.
ℹ The deprecated feature was likely used in the liana package.
  Please report the issue at <https://github.com/saezlab/liana/issues>.”
Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: HC2

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: NK were removed!Cell identities with less than 5 cells: pDC were removed!

Warning message in exec(output, ...):
“6729 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: HC3

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Plasma were removed!

Warning message in exec(output, ...):
“5591 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: M1

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Mast were removed!Cell identities with less than 5 cells: Neutrophil were removed!

Warning message in exec(output, ...):
“3140 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: M2

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Mast were removed!Cell identities with less than 5 cells: Neutrophil were removed!Cell identities with less than 5 cells: Plasma were removed!

Warning message in exec(output, ...):
“3362 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: M3

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Mast were removed!Cell identities with less than 5 cells: Neutrophil were removed!Cell identities with less than 5 cells: Plasma were removed!Cell identities with less than 5 cells: pDC were removed!

Warning message in exec(output, ...):
“8374 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S1

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Warning message in exec(output, ...):
“2524 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S2

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Warning message in exec(output, ...):
“1629 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S3

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: B were removed!Cell identities with less than 5 cells: Plasma were removed!

Warning message in exec(output, ...):
“4639 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S4

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: B were removed!Cell identities with less than 5 cells: pDC were removed!

Warning message in exec(output, ...):
“3792 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S5

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Plasma were removed!

Warning message in exec(output, ...):
“4238 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

Current sample: S6

Running LIANA with `cell.type` as labels!

`Idents` were converted to factor

Cell identities with less than 5 cells: Mast were removed!Cell identities with less than 5 cells: pDC were removed!

Warning message in exec(output, ...):
“3449 genes and/or 0 cells were removed as they had no counts!”
LIANA: LR summary stats calculated!

Now Running: Natmi

Now Running: Connectome

Now Running: Logfc

Now Running: Sca

Now Running: Cellphonedb

Now aggregating natmi

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: connectome”
Warning message in exec(output, ...):
“Unknown method name or missing specifics for: logfc”
Now aggregating sca

Now aggregating cellphonedb

Aggregating Ranks

Now aggregating natmi

Now aggregating connectome

Now aggregating logfc

Warning message in exec(output, ...):
“Unknown method name or missing specifics for: sca”
Now aggregating cellphonedb

Aggregating Ranks

The parameterrs used here are as follows:

sce stands for SingleCellExperiment, and we pass here with an object with a single sample/context.
resource is the name of any of the resources available in liana
idents_col corresponds to the cell group label stored in adata.obs.
assay.type indicates which counts assys in the SCE object to use, here the log-normalized counts are assigned to logcounts, other options include the raw UMI counts stored in counts
expr_prop is the expression proportion threshold (in terms of cells per cell type expressing the protein)threshold of expression for any protein subunit involved in the interaction, according to which we keep or discard the interactions.
min_cells is the minimum number of cells per cell type required for a cell type to be considered in the analysis
aggregate_how specifies whether to aggregate on magnitude score types, specificity score types, or both
verbose is a boolean that indicates whether to print the progress of the function
inplace is a boolean that indicates whether storing the results in place, i.e. to adata.uns[“liana_res”].

Let’s see what the results look like:

[14]:

liana_res <- covid_data@metadata$liana_res %>%
            bind_rows(.id = "sample_new") %>%
            mutate(sample_new = factor(sample_new, levels = unique(sample_new)))

head(liana_res)

A tibble: 6 × 16
sample_new	source	target	ligand.complex	receptor.complex	magnitude_rank	specificity_rank	magnitude_mean_rank	natmi.prod_weight	sca.LRscore	cellphonedb.lr.mean	specificity_mean_rank	natmi.edge_specificity	connectome.weight_sc	logfc.logfc_comb	cellphonedb.pvalue
<fct>	<chr>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
HC1	T	NK	B2M	CD3D	3.983438e-11	0.0134542495	1.000000	8.059861	0.9610254	3.410586	487.75	0.08327606	1.3008562	0.8719799	0
HC1	Macrophages	NK	B2M	CD3D	3.186750e-10	0.0135072450	2.000000	8.059599	0.9610248	3.410500	487.00	0.08327335	1.3005565	0.8832177	0
HC1	NK	NK	B2M	CD3D	4.979297e-09	0.0181475450	3.666667	7.614378	0.9599465	3.264099	571.25	0.07867324	0.7909129	0.6978778	0
HC1	T	NK	B2M	KLRD1	1.366319e-08	0.0003588748	5.666667	6.865250	0.9579074	3.297900	211.75	0.17129308	6.9609202	0.8624079	0
HC1	B	NK	B2M	CD3D	2.039520e-08	0.0192306877	5.333333	7.518543	0.9597023	3.232586	595.00	0.07768305	0.6812107	0.6910179	0
HC1	Macrophages	NK	B2M	KLRD1	2.039520e-08	0.0003588748	6.666667	6.865027	0.9579067	3.297814	210.75	0.17128751	6.9606205	0.8736456	0

This dataframe provides the results from running each method, as well as the consensus scores across the methods. In our case, we are interested in the magnitude consensus scored denoted by the 'magnitude_rank' column.

We can visualize the output as a dotplot across multiple samples:

[15]:

ligand_complex <- 'B2M'
receptor_complex <- c('CD3D', 'KLRD1')
source_labels <- c("B", "pDC", "Macrophages")
target_labels <- c("T", "Mast", "pDC", "NK")
sample_key <- 'sample_new'
colour <- 'magnitude_rank'
size <- "specificity_rank"

lv <- liana_res %>%
    filter(source %in% source_labels) %>%
    filter(target %in% target_labels) %>%
    filter(ligand.complex %in% ligand_complex) %>%
    filter(receptor.complex %in% receptor_complex) %>%
    unite(lr.pairs, ligand.complex, receptor.complex, sep = ' -> ') %>%
    mutate(lr.pairs = factor(lr.pairs, levels = unique(lr.pairs))) %>%
    # inverse ranks
    mutate_at(vars(specificity_rank, magnitude_rank), ~-log10(. + 1e-10))

h_ = 7
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

ggplot(lv, aes(x = target, y = source, color = .data[[colour]], size=.data[[size]])) +
        geom_point() +
        facet_wrap(sample_new ~ lr.pairs, ncol = 12, scales='free_x') +
        theme_bw()

../../_images/notebooks_ccc_R_QuickStart_38_0.png

Let’s export it in case we want to access it later:

[16]:

write.csv(liana_res, file.path(output_folder, '/rLIANA_by_sample.csv'))

Alternatively, one could just export the whole SingleCellExperiment object, together with the ligand-receptor results stored:

[17]:

saveRDS(covid_data, file.path(output_folder, '/sce_processed.rds'))

Comparing cell-cell communication across multiple samples

We can use Tensor-cell2cell to infer context-dependent CCC patterns from multiple samples simultaneously. To do so, we must first restructure the communication scores (LIANA’s output) into a 4D-Communication Tensor.

The tensor is built as follows: we create matrices with the communication scores for each of the ligand-receptor pairs within the same sample, then generate a 3D tensor for each sample, and finally concatenate them to form the 4D tensor:

ccc-scores

First, we generate a list containing all samples from our SingleCellExperiment object. For visualization purposes we sorted them according toby COVID-19 severity. Here, we can find the names of each of the samples in the sample_new column of the adata.obs information:

[18]:

sorted_samples = sort(names(covid_data@metadata$liana_res))

Then we can directly pass the communication scores from LIANA to build the 3D tensors for each sample (panel c in last figure), and concatenate them, with the following function:

[19]:

# build the tensor
context_df_dict <- liana:::preprocess_scores(context_df_dict = covid_data@metadata[['liana_res']],
                                             outer_frac = 1/3, # Fraction of samples as threshold to include cells and LR pairs.
                                             invert = TRUE, # transform the scores
                                             invert_fun = function(x) 1-x, # Transformation function
                                             non_negative = TRUE, # fills negative values
                                             non_negative_fill = 0 # set negative values to 0
                             )


tensor <- liana_tensor_c2c(context_df_dict = context_df_dict,
                          sender_col = "source", # Column name of the sender cells
                          receiver_col = "target", # Column name of the receiver cells
                          ligand_col = "ligand.complex", # Column name of the ligands
                          receptor_col = "receptor.complex", # Column name of the receptors
                          score_col = 'magnitude_rank', # Column name of the communication scores to use
                          how='outer',  # What to include across all samples
                          context_order=sorted_samples, # Order to store the contexts in the tensor
                          conda_env = 'ccc_protocols', # used to pass an existing conda env with cell2cell
                          build_only = TRUE,# , # set this to FALSE to combine the downstream rank selection and decomposition steps all here
                          device = device # Device to use when backend is pytorch.
                        )

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

Inverting `magnitude_rank`!

[1] 0

Loading `ccc_protocols` Conda Environment

Building the tensor using magnitude_rank...

The key parameters when building a tensor are:

context_df_dict is the dataframe containing the results from LIANA, usually located in covid_data@metadata[['liana_res']].
sender_col, receiver_col, ligand_col, receptor_col, and score_col are the column names in the dataframe containing the samples, sender cells, receiver cells, ligands, receptors, and communication scores, respectively. Each row of the dataframe contains a unique combination of these elements.
invert and invert_fun control the function we use to convert the communication score before using it to build the tensor. In this case, the ‘magnitude_rank’ score generated by LIANA considers low values as the most important ones, ranging from 0 to 1. In contrast, Tensor-cell2cell requires higher values to be the most important scores, so here we pass a function (function(x) 1-x) to adapt LIANA’s magnitude-rank scores (subtracts the LIANA’s score from 1).
how controls which ligand-receptor pairs and cell types to include when building the tensor. This decision depends on whether the missing values across a number of samples for both ligand-receptor interactions and sender-receiver cell pairs are considered to be biologically-relevant. Options are:
- 'inner' is the more strict option since it only considers only cell types and LR pairs that are present in all contexts (intersection).
- 'outer' considers all cell types and LR pairs that are present across contexts (union).
- 'outer_lrs' considers only cell types that are present in all contexts (intersection), while all LR pairs that are present across contexts (union).
- 'outer_cells' considers only LR pairs that are present in all contexts (intersection), while all cell types that are present across contexts (union).
outer_frac controls the elements to include in the union scenario of the how options. Only elements that are present at least in this fraction of samples/contexts will be included. When this value is 0, the tensor includes all elements across the samples. When this value is 1, it acts as using how='inner'.
context_order is a list specifying the order of the samples. The order of samples does not affect the results, but it is useful for posterior visualizations.

We can check the shape of this tensor to verify the number of samples, LR pairs, sender cells, adn receiver cells, respectively:

[20]:

tensor$shape

torch.Size([12, 1054, 10, 10])

We can export our tensor:

[21]:

reticulate::py_save_object(object = tensor,
                           filename = file.path(output_folder, 'BALF-Tensor-R.pkl'))

Then, we can load it with:

[22]:

tensor <- reticulate::py_load_object(filename = file.path(output_folder, 'BALF-Tensor-R.pkl'))

Perform Tensor Factorization

Now that we have built the tensor and its metadata, we can run Tensor Component Analysis via Tensor-cell2cell with one simple command that we implemented for our unified framework:

[23]:

tensor <- liana::decompose_tensor(tensor = tensor,
                                  rank = NULL, # Number of factors to perform the factorization. If NULL, it is automatically determined by an elbow analysis
                                  tf_optimization = 'robust', # To define how robust we want the analysis to be.
                                  seed = 0, # Random seed for reproducibility
                                  factors_only = FALSE,
                                  )

Estimating ranks...

Decomposing the tensor...

Key parameters are:

rank is the number of factors or latent patterns we want to obtain from the analysis. You can either indicate a specific number or leave it as None to perform the decomposition with a suggested number from an elbow analysis.
tf_optimization indicates whether running the analysis in the 'regular' or the 'robust' way. The 'regular' way runs the tensor decomposition less number of times than the robust way to select an optimal result. Additionally, the former employs less strict convergence parameters to obtain optimal results than the latter, which is also translated into a faster generation of results. Important: When using tf_optimization='robust' the analysis takes much longer to run than using tf_optimization='regular'. However, the latter may generate less robust results.
seed is the seed for randomization. It controls the randomization used when initializing the optimization algorithm that performs the tensor decomposition. It is useful for reproducing the same result every time that the analysis is run. If NULL, a different randomization will be used each time.

[24]:

print(paste0('The estimated tensor rank is: ', tensor$rank))

[1] "The estimated tensor rank is: 10"

[25]:

# Estimate standard error
error_average <- tensor$elbow_metric_raw %>%
    t() %>%
    as.data.frame() %>%
    mutate(rank=row_number()) %>%
    pivot_longer(-rank, names_to = "run_no", values_to = "error") %>%
    group_by(rank) %>%
    summarize(average = mean(error),
              N = n(),
              SE.low = average - (sd(error)/sqrt(N)),
              SE.high = average + (sd(error)/sqrt(N))
           )

# plot
error_average %>%
    ggplot(aes(x=rank, y=average), group=1) +
    geom_line(col='red') +
    geom_ribbon(aes(ymin = SE.low, ymax = SE.high), alpha = 0.1) +
    geom_vline(xintercept = tensor$rank, colour='darkblue') + # rank of interest
    theme_bw() +
    labs(y="Error", x="Rank")

../../_images/notebooks_ccc_R_QuickStart_60_0.png

[26]:

covid_data@metadata$tensor_res <- liana:::format_c2c_factors(tensor$factors)

h_ = 13
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

plot_c2c_overview(sce = covid_data, group_col = 'condition', sample_col = 'sample_new')

../../_images/notebooks_ccc_R_QuickStart_61_0.png

The figure representing the loadings in each factor generated here can be interpreted by interconnecting all dimensions within a single factor. For example, if we take one of these factors, the cell-cell communication program occurs in each sample proportionally to their loadings. Then, this signature can be interpreted with the loadings of the ligand-receptor pairs, sender cells, and receiver cells. Ligands in high-loading ligand-receptor pairs are sent predominantly by high-loading sender cells, and interact with the cognate receptors on the high-loadings receiver cells.

Downstream Visualizations: Making sense of the factors

After running the decomposition, the results are stored in the factors attribute of the tensor object. This attribute is a dictionary containing the loadings for each of the elements in every tensor dimension. Loadings delineate the importance of that particular element to that factor. Keys are the names of the different dimension.

[27]:

names(tensor$factors)

'contexts'
'interactions'
'senders'
'receivers'

We can inspect the loadings of the samples, for example, located under the key 'Contexts':

[28]:

tensor$factors[['contexts']]

A data.frame: 12 × 10
	Factor 1	Factor 2	Factor 3	Factor 4	Factor 5	Factor 6	Factor 7	Factor 8	Factor 9	Factor 10
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
HC1	0.57167751	0.0040322351	0.2486676	0.06951163	0.1992737	0.09138259	0.2109948	0.09901118	0.08743461	0.10624311
HC2	0.61345816	0.0009374141	0.2422200	0.07904458	0.1662704	0.01086045	0.1696340	0.05779721	0.05216325	0.09475538
HC3	0.28084552	0.0640010983	0.3343133	0.38925144	0.4410195	0.25950903	0.2944204	0.21048699	0.06730253	0.13640665
M1	0.17811552	0.0243369564	0.2788751	0.54256505	0.2649768	0.51976120	0.3917096	0.31559205	0.29760960	0.25299737
M2	0.15766080	0.0367470123	0.2851860	0.50009239	0.2689672	0.50643295	0.4055026	0.30275965	0.29582947	0.22803977
M3	0.25531387	0.1038309112	0.2800323	0.24746110	0.1943755	0.35284221	0.3694525	0.23726170	0.15349986	0.11353971
S1	0.16227683	0.6068011522	0.2968922	0.16380185	0.2758042	0.09672241	0.1598114	0.26295593	0.31008309	0.16405572
S2	0.05848442	0.4525174201	0.2955828	0.27341583	0.3142951	0.25263739	0.2916600	0.35102302	0.42103592	0.36295748
S3	0.17109768	0.2214141041	0.3145112	0.29934934	0.3125799	0.12573658	0.2879774	0.29225123	0.16905822	0.30484933
S4	0.10837201	0.2505245507	0.3078077	0.05209281	0.3427818	0.32328588	0.2711394	0.40252998	0.43096238	0.47937286
S5	0.04310614	0.4705861211	0.2580555	0.16272153	0.2119863	0.20446612	0.2239244	0.35042289	0.43328699	0.40211794
S6	0.15356158	0.2773903906	0.3073083	0.10633880	0.3504498	0.19429857	0.2616548	0.36386305	0.33164164	0.43490598

Downstream Visualizations

We can use these loadings to compare pairs of sample major groups with boxplots and statistical tests:

[29]:

sample_loadings <- liana::get_c2c_factors(covid_data,
                                          sample_col='sample_new',
                                          group_col='condition') %>%
                                          purrr::pluck("contexts") %>%
                                          pivot_longer(-c("condition", "context"),
                                                names_to = "factor",
                                                values_to = "loadings") %>%
                                          mutate(factor = factor(factor)) %>%
                                          mutate(factor = fct_relevel(factor, "Factor.10", after=9))

h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

# Do pairwise tests
pwc <- sample_loadings %>%
    group_by(factor) %>%
    rstatix::pairwise_t_test(
      loadings ~ condition,
      p.adjust.method = "fdr"
      )

# plot
pwc <- pwc %>% add_xy_position(x = "condition")
ggboxplot(sample_loadings, x = "condition", y = "loadings", add = "point", fill="condition") +
    facet_wrap(~factor, ncol = 4) +
    stat_pvalue_manual(pwc) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

../../_images/notebooks_ccc_R_QuickStart_69_0.png

Using the loadings for any dimension, we can also generate heatmaps for the elements with loadings above a certain threshold. Additionally, we can cluster these elements by the similarity of their loadings across all factors:

[30]:

h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

liana::plot_lr_heatmap(sce = covid_data,  n = 5)

../../_images/notebooks_ccc_R_QuickStart_71_0.png

Here patients are grouped by the importance that each communication pattern (factor) has in relation to the other patients. This captures combinations of related communication patterns that explain similarities and differences at a sample-specific resolution. In this case the differences are reflected with an almost perfect clustering by COVID-19 severity, where moderate cases are more similar to control cases.

Overall CCI potential: Heatmap and network visualizations of sender-receiver cell pairs

In addition, we can also evaluate the overall interactions between sender-receiver cell pairs that are determinant for a given factor or program. We can do it through a heatmap where the X-axis represent the receiver cells and the Y-axis shows the receiver cells. Here, the potential of interaction is calculated as the outer product between the loadings for the sender and receiver cells dimensions of a particular factor. To illustrate this, we chose Factor 8, but this can be repeated for every factor obtained in the decomposition.

[31]:

selected_factor <- 'Factor.8'
liana::plot_c2c_cells(sce = covid_data,
                   factor_of_int = selected_factor,
                   name = "Loadings \nProduct")

../../_images/notebooks_ccc_R_QuickStart_75_0.png

Similarly, an interaction network can be created for each factor by using the loading product between sender and receiver cells. First we need to choose a threshold to indicate what pair of cells are interacting. This can be done as shown in the extended tutorial

[32]:

threshold = 0.075

Then, we can plot all networks we are interested in:

[33]:

# format for use with cell2cell
format_py_df<-function(df){
    df<-df %>%
    as.data.frame() %>%
    textshape::column_to_rownames() %>%
    select(-contains("condition"))

    return(df)
}
factors <- liana::get_c2c_factors(sce = covid_data,
                                  sample_col = "sample_new",
                                  group_col = "condition")
py_factors<-unname(lapply(factors, function(df) format_py_df(df)))
py_factors = reticulate::py_dict(keys = names(factors), values = py_factors)

c2c$plotting$ccc_networks_plot(py_factors,
                               included_factors=c('Factor.2', 'Factor.8', 'Factor.9'),
                               sender_label='senders',
                               receiver_label='receivers',
                               ccc_threshold=threshold, # Only important communication
                               nrows=as.integer(1),
                               filename=file.path(output_folder, 'network_plot_qs_r.png')
                              )

[[1]]
<Figure size 2400x800 with 3 Axes>

[[2]]
[[2]][[1]]
<Axes: title={'center': 'Factor.2'}>

[[2]][[2]]
<Axes: title={'center': 'Factor.8'}>

[[2]][[3]]
<Axes: title={'center': 'Factor.9'}>

network_plot_qs

Pathway Enrichment Analysis: Interpreting the context-driven communication

Classical Pathway Enrichment

As the number of inferred interactions increases, the interpretation of the inferred cell-cell communication networks becomes more challenging. To this end, we can perform pathway enrichment analysis to identify the general biological processes that are enriched in the inferred interactions. Here, we will perform classical gene set enrichment analysis with KEGG Pathways.

For the pathway enrichment analysis with GSEA, we use ligand-receptor pairs instead of individual genes. KEGG was initially designed to work with sets of genes, so first we need to generate ligand-receptor sets for each of it’s pathways:

[34]:

# Generate list with ligand-receptors pairs in DB
lr_pairs <- liana::select_resource('Consensus')[[1]] %>%
    select(ligand = source_genesymbol, receptor = target_genesymbol)
lr_list <- lr_pairs %>%
     unite('interaction', ligand, receptor, sep = '^') %>%
     pull(interaction)

# Specify the organism and pathway database to use for building the LR set
organism = "human"
pathwaydb = "KEGG"

# Generate ligand-receptor gene sets
lr_set <- c2c$external$generate_lr_geneset(lr_list = lr_list,
                                           complex_sep='_', # Separation symbol of the genes in the protein complex
                                           lr_sep='^', # Separation symbol between a ligand and a receptor complex
                                           organism=organism,
                                           pathwaydb=pathwaydb,
                                           readable_name=TRUE
                                           )

Next, we can perform enrichment analysis on each factor using the loadings of the ligand-receptor pairs to obtain the normalized-enrichment scores (NES) and corresponding P-values from GSEA:

[35]:

factors <- liana::get_c2c_factors(sce = covid_data,
                            sample_col = "sample_new",
                            group_col = "condition")

lr_loadings = factors[['interactions']] %>%
                column_to_rownames("lr")

gsea_res <- c2c$external$run_gsea(loadings=lr_loadings,
                               lr_set=lr_set,
                               output_folder=output_folder,
                               weight=as.integer(1),
                               min_size=as.integer(15),
                               permutations=as.integer(999),
                               processes=as.integer(6),
                               random_state=as.integer(6),
                               significance_threshold=as.numeric(0.05)
                              )
names(gsea_res)<-c('pvals', 'scores', 'gsea_df')

The enriched pathways for each factor are:

[36]:

gsea_df <- gsea_res$gsea_df
gsea_df %>%
    filter(`Adj. P-value` < 0.05 & NES > 0) %>%
    arrange(desc(abs(NES)))

A data.frame: 14 × 5
	Factor	Term	NES	P-value	Adj. P-value
	<chr>	<chr>	<dbl>	<dbl>	<dbl>
52	Factor.5	CHEMOKINE SIGNALING PATHWAY	1.546406	0.001000000	0.01444444
39	Factor.4	ANTIGEN PROCESSING AND PRESENTATION	1.479729	0.001000000	0.01444444
117	Factor.10	ANTIGEN PROCESSING AND PRESENTATION	1.364777	0.001000000	0.01444444
78	Factor.7	ANTIGEN PROCESSING AND PRESENTATION	1.361638	0.001000000	0.01444444
13	Factor.2	ANTIGEN PROCESSING AND PRESENTATION	1.356078	0.001000000	0.01444444
0	Factor.1	ANTIGEN PROCESSING AND PRESENTATION	1.345314	0.001000000	0.01444444
53	Factor.5	CYTOKINE CYTOKINE RECEPTOR INTERACTION	1.319703	0.001000000	0.01444444
40	Factor.4	CELL ADHESION MOLECULES CAMS	1.317278	0.001000000	0.01444444
91	Factor.8	CHEMOKINE SIGNALING PATHWAY	1.265677	0.005005005	0.04647505
14	Factor.2	CHEMOKINE SIGNALING PATHWAY	1.257363	0.003003003	0.03549004
26	Factor.3	ANTIGEN PROCESSING AND PRESENTATION	1.229543	0.004004004	0.04337671
104	Factor.9	ANTIGEN PROCESSING AND PRESENTATION	1.225197	0.005005005	0.04647505
1	Factor.1	CELL ADHESION MOLECULES CAMS	1.220750	0.001000000	0.01444444
16	Factor.2	CYTOKINE CYTOKINE RECEPTOR INTERACTION	1.170583	0.002002002	0.02602603

The depleted pathways are:

[37]:

gsea_df %>%
    filter(`Adj. P-value` < 0.05 & NES < 0) %>%
    arrange(desc(abs(NES)))

A data.frame: 0 × 5
Factor	Term	NES	P-value	Adj. P-value
<chr>	<chr>	<dbl>	<dbl>	<dbl>

Finally, we can visualize the enrichment results using a dotplot:

[38]:

gsea.dotplot<-function(pval_df, score_df, significance = 0.05, font_size = 15){
    pval_df <- pval_df[apply(pval_df<significance,1,any), apply(pval_df<significance, 2, any)] %>%
            mutate_all(function(x) -1*log10(x + 1e-9))
    score_df <- score_df[rownames(pval_df), colnames(pval_df)]
    pval_df <- pval_df %>%
                    tibble::rownames_to_column('Annotation')
    score_df <- score_df %>%
                    tibble::rownames_to_column('Annotation')

    viz_df <- cbind(melt(pval_df), melt(score_df)[[3]])
    names(viz_df) <- c('Annotation', 'Factor', 'Significance', 'NES')


    dotplot <- ggplot(viz_df, aes(x = Factor, y = Annotation)) + geom_point(aes(size = Significance, color = NES)) +
                scale_color_gradient2() + theme_bw(base_size = font_size) +
                theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5))+
                       scale_size(range = c(0, 8)) +
                       guides(size=guide_legend(title="-log10(p-value)"))
    return(dotplot)
}

[39]:

h_ = 8
w_ = 10
options(repr.plot.height=h_, repr.plot.width=w_)

dotplot <- gsea.dotplot(pval_df = gsea_res$pvals,
                       score_df = gsea_res$scores,
                       significance = 0.05)
dotplot

Using Annotation as id variables

Using Annotation as id variables

../../_images/notebooks_ccc_R_QuickStart_93_1.png

Footprint Enrichment

Footprint enrichment analysis build upon classic geneset enrichment analysis, as instead of considering the genes involved in a biological activity, they consider the genes affected by the activity, or in other words the genes that change downstream of said activity (Dugourd and Saez-Rodriguez, 2019).

In this case, we will use the PROGENy pathway resource to perform footprint enrichment analysis. PROGENy was built in a data-driven manner using perturbation and cancer lineage data (Schubert et al, 2019), as a consequence it also assigns different importances or weights to each gene in its pathway genesets. To this end, we need an enrichment method that can take weights into account, and here we will use multi-variate linear models from the decoupler-py package to perform this analysis (Badia-i-Mompel et al., 2022).

Let’s load the PROGENy genesets and then convert them to sets of weighted ligand-receptor pairs:

[40]:

# obtain progeny gene sets
net <- decoupleR::get_progeny(organism = 'human', top=5000) %>%
    dplyr::select(-p_value)

# convert to LR sets
progeny_lr <- liana::generate_lr_geneset(sce = covid_data,
                                          resource = net)

Next, we can run the footprint enrichment analysis:

[41]:

# interaction loadings to matrix
mat <- factors$interactions %>%
    textshape::column_to_rownames("lr") %>%
  as.matrix()

# run enrichment analysis with decoupler
# (we fit a univariate linear model for each gene set)
# We don't consider genesets with minsize < 10
res <- decoupleR::run_ulm(mat = mat,
                          network = progeny_lr,
                          .source = "set",
                          .target = "lr",
                          minsize=10) %>%
  mutate(p_adj = p.adjust(p_value, method = "fdr"))

Finally, we can visualize the results:

[42]:

res %>% # sig/isnig flag
  mutate(significant = if_else(p_adj <= 0.05, "signif.", "not")) %>%
  ggplot(aes(x=source, y=condition, shape=significant,
             colour=score, size=-log10(p_value+1e-36))) +
  geom_point() +
  scale_colour_gradient2(high = "red", low="blue") +
  scale_size_continuous(range = c(3, 12)) +
  scale_shape_manual(values=c(21, 16)) +
  theme_bw(base_size = 15) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(x="Pathway",
       y="Factor",
       colour="Activity"
       )

../../_images/notebooks_ccc_R_QuickStart_100_0.png

Let’s zoom in on Factor 8 which is associated with COVID-19 cases.

[43]:

selected_factor = 'Factor.8'
sample_loadings <- liana::get_c2c_factors(covid_data,
                                          sample_col='sample_new',
                                          group_col='condition') %>%
                                          purrr::pluck("contexts") %>%
                                          pivot_longer(-c("condition", "context"),
                                                names_to = "factor",
                                                values_to = "loadings") %>%
                                          mutate(factor = factor(factor)) %>%
                                          mutate(factor = forcats::fct_relevel(factor, "Factor.10", after=9))

res.factor <- res %>%
                filter(condition == selected_factor) %>%
                mutate(p_adj = p.adjust(p_value, method = "fdr")) %>%
                arrange(desc(score)) %>%
                mutate(source = factor(source, levels = source))

ggplot(data=res.factor, aes(x=score, y=source, fill = score)) +
  geom_bar(stat="identity", width=0.5) + theme_bw() + scale_fill_gradient2(trans = 'reverse') +
ylab('') + xlab('Activity')

../../_images/notebooks_ccc_R_QuickStart_102_0.png

[ ]: