2. Ligand-Receptor Inference

As a result of the growing interest in cell-cell communication (CCC) inference, a number of computational tools in single-cell transcriptomics have emerged. Although, there many different categories and approaches to infer CCC events, in this tutorial we will focus on those that infer interactions between ligands and receptors, commonly referred to as ligand-receptor (LR) inference methods (e.g. (Efremova et al., 2020, Hou et al., 2020, Jin et al., 2021, Raredon et al., 2022). These tools typically rely on gene expression information as a proxy of protein abundance, and they work downstream of data pre-processing and acquisition of biologically-meaningful cell groups. These CCC tools infer intercellular interactions in a hypothesis-free manner, meaning that they infer all possible interactions between cell clusters, relying on prior knowledge of the potential interactions. Here, one group of cells is considered the source of the communication signal, sending a ligand, and the other is the receiver of the signal via its receptors. CCC events are thus represented as interactions between LR pairs, expressed by any combination of source and receiver cell groups.

The information about the interacting proteins is commonly extracted from prior knowledge resources. In the case of LR methods, the interactions can also be represented by heteromeric protein complexes (Efremova et al., 2020, Jin et al., 2021, Noël et al., 2021).

Here, we will use LIANA to obtain a consensus score for each LR interaction inferred by the different tools. Further, we will make use of LIANA’s consensus resource, which combines a number of expert-curated LR resources.

Environment Setup

[1]:

import os
import numpy as np
import pandas as pd
import scipy
import seaborn as sns
import matplotlib

import scanpy as sc
import cell2cell as c2c

import liana as li
import plotnine as p9

from tqdm.auto import tqdm

[2]:

import warnings
warnings.filterwarnings('ignore')

Directories

[3]:

data_path = '../../data/'
output_folder = os.path.join(data_path, 'liana-outputs/')
c2c.io.directories.create_directory(output_folder)

../../data/liana-outputs/ already exists.

Data Reminder

Here, we will re-load the processed data as processed in Tutorial 01.

[4]:

adata = sc.read_h5ad(os.path.join(data_path, 'processed.h5ad'))

just as a quick reminder, let’s visualize the cell types and samples in the data.

[5]:

adata.obs.head()

[5]:

	sample	sample_new	group	disease	hasnCoV	cluster	celltype	condition	n_genes_by_counts	total_counts	total_counts_mt	pct_counts_mt	n_genes
AAACCCACAGCTACAT_3	C100	HC3	HC	N	N	27.0	B	Control	1377	3124.0	287.0	9.186940	1377
AAACCCATCCACGGGT_3	C100	HC3	HC	N	N	23.0	Macrophages	Control	836	1430.0	13.0	0.909091	836
AAACCCATCCCATTCG_3	C100	HC3	HC	N	N	6.0	T	Control	1105	2342.0	148.0	6.319385	1105
AAACGAACAAACAGGC_3	C100	HC3	HC	N	N	10.0	Macrophages	Control	4530	31378.0	3132.0	9.981516	4530
AAACGAAGTCGCACAC_3	C100	HC3	HC	N	N	10.0	Macrophages	Control	3409	12767.0	659.0	5.161745	3409

[6]:

# plot pre-annotated cell types
sc.pl.umap(adata, color=['celltype']) # would do 'sample_new' only if we integrate

../../_images/notebooks_ccc_python_02-Infer-Communication-Scores_12_0.png

Predicting CCC events with LIANA

16ae349697a54979ab55438a5f56cf8a

Now that we have the preprocessed data loaded, we will use liana to score the interactions inferred by the different tools.

liana is highly modularized and it implements a number of methods to score LR interactions, we can list those with the following command:

[7]:

li.method.show_methods()

[7]:

Method Name	Magnitude Score	Specificity Score	Reference
CellPhoneDB	lr_means	cellphone_pvals	Efremova, M., Vento-Tormo, M., Teichmann, S.A....
Connectome	expr_prod	scaled_weight	Raredon, M.S.B., Yang, J., Garritano, J., Wang...
log2FC	None	lr_logfc	Dimitrov, D., Türei, D., Garrido-Rodriguez, M....
NATMI	expr_prod	spec_weight	Hou, R., Denisenko, E., Ong, H.T., Ramilowski,...
SingleCellSignalR	lrscore	None	Cabello-Aguilar, S., Alame, M., Kon-Sun-Tack, ...
CellChat	lr_probs	cellchat_pvals	Jin, S., Guerrero-Juarez, C.F., Zhang, L., Cha...
Rank_Aggregate	magnitude_rank	specificity_rank	Dimitrov, D., Türei, D., Garrido-Rodriguez, M....
Geometric Mean	lr_gmeans	gmean_pvals	CellPhoneDBv2's permutation approach applied t...

LIANA classifies interaction scores into two categories: those that infer the Magnitude and Specificity of interactions. The Magnitude of an interactions is a measure of the strength of the interaction’s expression, and the Specificity of an interaction is a measure of how specific is an interaction to a given pair of clusters. Generally, these categories are complementary, and the magnitude of the interaction is a proxy of the specificity of the interaction. For example, a ligand-receptor interaction with a high magnitude score is likely to be specific, and vice versa.

Scoring Functions

We will now describe the mathematical formulation of the magnitude and specificity scores we will use in this tutorial:

Shared Notations

k is the k-th ligand-receptor interaction

L - expression of ligand L

R - expression of receptor R

C - cell cluster

i - cell group i

j - cell group j

M - the library-size normalized and log1p-transformed gene expression matrix

X - normalized gene expression vector

Note that L & R can also denote the interaction of any other CCC event category, such as those that between membrane-bound or extracellular matrix proteins. Furthermore, in the case of complexes L & R denote the summarized* expression of the complex.

CellPhoneDBv2

Magnitude:

\[LRmean_{k,i,j} = \frac{L_{C_{i}} + R_{C_{j}}}{2}\]

Specificity: CellPhoneDBv2 introduced a permutation approach also adapted by other methods, see permutation formulation below.

Geometric Mean

Magnitude:

\[LRgeometric.mean_{k,i,j} = \sqrt{L_{C_{i}} \cdot R_{C_{j}}}\]

Specificity: An adaptation of CellPhoneDBv2’s permutation approach.

CellChat (a resource-agnostic adaptation)

Magnitude:

\[LRprob_{k,i,j} = \frac{TriMean(L_{C_{i}}) \cdot TriMean(R_{C_{j}})}{Kh + TriMean(L_{C_{i}}) \cdot TriMean(R_{C_{j}})}\]

where Kh = 0.5 by default and TriMean represents Tuckey’s Trimean function:

\[TriMean(X) = \frac{Q_{0.25}(X) + 2 \cdot Q_{0.5}(X) + Q_{0.75}(X)}{4}\]

Note that the original CellChat implementation also uses information of mediator proteins, which is specific to the CellChat resource. Since we can use any resource with liana, by default liana’s consensus resource, we will not use this information, and hence the implementation of CellChat’s LR_probability in LIANA was simplified to be resource-agnostic.

Specificity: An adaptation of CellPhoneDBv2’s permutation approach.

The specificity scores of these three method is calculated as follows:

\[p\text{-value}_{k,i,j} = \frac{1}{P} \sum_{p=1}^{P} [fun_{permuted}(L^*_{C_{i}}, R^*_{C_{j}}) \geq fun_{observed}(L^*_{C_{i}}, R^*_{C_{j}})]\]

where P is the number of permutations, and L* and R* are ligand and receptor expression summarized according by each method, i.e. arithmetic mean for CellPhoneDB and Geometric Mean, and TriMean for CellChat.

SingleCellSignalR

Magnitude:

\[LRscore_{k,i,j} = \frac{\sqrt{L_{C_{i}} R_{C_{j}}}}{\sqrt{L_{C_{i}} R_{C_{j}}} + \mu}\]

where mu is the mean of the expression matrix M

NATMI

Magnitude:

\[LRproduct_{k,i,j} = L_{C_{i}} R_{C_{j}}\]

Specificity:

\[SpecificityWeight_{k,i,j} = \frac{L_{C_{i}}}{\sum^{n} L_{C_{i}}} \cdot \frac{R_{C_{j}}}{\sum^{n} R_{C_{j}}}\]

Connectome

Magnitude:

\[LRproduct_{k,i,j} = L_{C_{i}} R_{C_{j}}\]

Specificity:

\[LRz.mean_{k,i,j} = \frac{z_{L_{C_{i}}} + z_{R_{C_{j}}}}{2}\]

where z is the z-score of the expression matrix M:

\[X_{z} = (X - mean(X)) / std(X)\]

log2FC

Specificity:

\[LRlog2FC_{k,i,j} = \frac{\text{Log2FC}_{C_i,L} + \text{Log2FC}_{C_j,R}}{2}\]

where log2FC for each gene is calculated as:

\[log2FC = \log_2\left(\text{mean}(X_i)\right) - \log_2\left(\text{mean}(X_{\text{not}_i})\right)\]

What the above equations show is that there are many commonalities between the different methods, yet there are also many variations in the way the magnitude and specificity scores are calculated.

\[I \left\{ L_{C_j}^{expr.prop} \geq 0.1 \text{ and } R_{C_j}^{expr.prop} \geq 0.1 \right\}\]

where liana considers interactions as occurring only if the ligand and receptor, and all of their subunits, are expressed by default in at least 0.1 of the cells (>= expr_prop) in cell clusters i, j. Any interactions that don’t pass these criteria are not returned by default, to return them the user can check the return_all_lrs parameter.

Score Distributions

[8]:

# pick a sample to infer the communication scores for
sample_name = 'C100'
sadata = adata[adata.obs['sample']==sample_name]
sadata

[8]:

View of AnnData object with n_obs × n_vars = 2550 × 24798
    obs: 'sample', 'sample_new', 'group', 'disease', 'hasnCoV', 'cluster', 'celltype', 'condition', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_genes'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'celltype_colors', 'hvg', 'log1p', 'neighbors', 'pca', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

The parameters that we will use are the following: - groupby corresponds to the cell group label stored in adata.obs. - use_raw is boolean that indicates whether to use the adata.raw slot, here the log-normalized counts are assigned to adata.X, other options include passing the name of a layer via the layer parameter or using the counts stored in adata.raw. - expr_prop is the expression proportion threshold (in terms of cells per cell type expressing the protein)threshold of expression for any protein subunit involved in the interaction, according to which we keep or discard the interactions. - min_cells is the minimum number of cells per cell type required for a cell type to be considered in the analysis - verbose is a boolean that indicates whether to print the progress of the function

(Other parameters are described in the documentation of the function, as well as in more detail below)

[9]:

%%time
li.method.rank_aggregate(sadata,
                           groupby='celltype',
                           resource_name = 'consensus',
                           expr_prop=0.1, # must be expressed in expr_prop fraction of cells
                           min_cells = 5,
                           n_perms = 1000,
                           use_raw = False, # run on log- and library-normalized counts
                           verbose = True,
                           inplace = True
                          )

Using `.X`!
5580 features of mat are empty, they will be removed.
The following cell identities were excluded: Plasma
Using resource `consensus`.
0.33 of entities in the resource are missing from the data.

Generating ligand-receptor stats for 2548 samples and 19218 features
Assuming that counts were `natural` log-normalized!
Running CellPhoneDB

100%|██████████████████████████████████████| 1000/1000 [00:03<00:00, 263.17it/s]

Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR
Running CellChat

100%|███████████████████████████████████████| 1000/1000 [00:39<00:00, 25.45it/s]

CPU times: user 50.7 s, sys: 490 ms, total: 51.2 s
Wall time: 50.9 s

LIANA’s results are by default save to the liana_res slot in adata.uns

[10]:

liana_res = sadata.uns['liana_res'].copy()
liana_res.to_csv(os.path.join(output_folder, sample_name + '_aggregate_scores.csv'))
# only keep those that are not liana's ranks
liana_res.head()

[10]:

	source	target	ligand_complex	receptor_complex	lr_means	expr_prod	scaled_weight	lr_logfc	spec_weight	lrscore	lr_probs	specificity_rank	magnitude_rank
5665	NK	NK	B2M	KLRC1	3.738849	11.625175	1.746263	1.780719	0.093731	0.958522	0.278479	0.000145	1.314594e-10
5756	T	NK	B2M	KLRC1	3.751150	11.679413	1.770161	1.832493	0.094169	0.958614	0.277996	0.000143	4.431054e-10
5645	NK	NK	B2M	KLRD1	3.709523	11.315901	2.054310	1.865720	0.103861	0.957982	0.271253	0.000091	2.053588e-09
5731	T	NK	B2M	KLRD1	3.721824	11.368696	2.078208	1.917494	0.104346	0.958076	0.270777	0.000091	3.544038e-09
6624	NK	T	B2M	CD3D	3.706217	11.281032	1.257045	1.837345	0.069132	0.957920	0.267706	0.001595	6.921164e-09

Here, we can see all of the different scores for each method, as well as the consensus scores for magnitude and specificity.

As a reminder, each output score (column names in the above dataframe) can be mapped back to the scoring method and scoring type (i.e., magnitude vs specificity):

[11]:

li.method.show_methods()

[11]:

Method Name	Magnitude Score	Specificity Score	Reference
CellPhoneDB	lr_means	cellphone_pvals	Efremova, M., Vento-Tormo, M., Teichmann, S.A....
Connectome	expr_prod	scaled_weight	Raredon, M.S.B., Yang, J., Garritano, J., Wang...
log2FC	None	lr_logfc	Dimitrov, D., Türei, D., Garrido-Rodriguez, M....
NATMI	expr_prod	spec_weight	Hou, R., Denisenko, E., Ong, H.T., Ramilowski,...
SingleCellSignalR	lrscore	None	Cabello-Aguilar, S., Alame, M., Kon-Sun-Tack, ...
CellChat	lr_probs	cellchat_pvals	Jin, S., Guerrero-Juarez, C.F., Zhang, L., Cha...
Rank_Aggregate	magnitude_rank	specificity_rank	Dimitrov, D., Türei, D., Garrido-Rodriguez, M....
Geometric Mean	lr_gmeans	gmean_pvals	CellPhoneDBv2's permutation approach applied t...

To provide a better illustration of the different scores, we will now plot the distribution of scores for each of the methods.

[12]:

# convert to long format by index, and each score and value in different columns
liana_res = liana_res.loc[:, liana_res.columns[~liana_res.columns.str.contains(pat = 'rank')]]
liana_res = liana_res.melt(id_vars=['source', 'target', 'ligand_complex', 'receptor_complex'], var_name='score', value_name='value')

liana_res['score'] = liana_res['score'].astype('category')

[13]:

(p9.ggplot(liana_res, p9.aes(x='value', fill='score')) +
 p9.geom_density(alpha=0.5) +
 p9.facet_wrap('~score', scales='free') +
 p9.theme_bw() +
 p9.theme(figure_size=(8, 8))
 )

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

../../_images/notebooks_ccc_python_02-Infer-Communication-Scores_40_1.png

[13]:

<Figure Size: (800 x 800)>

We can also observe the consistency between, for example, the magnitude scores between the various methods. Let’s do so by taking the correlation between them:

[14]:

magnitude_scores = ['lr_means', 'expr_prod', 'lrscore', 'lr_probs']
liana_corr = sadata.uns['liana_res'].copy()[magnitude_scores].corr(method = 'spearman')

[15]:

cm = c2c.plotting.clustermap_cci(liana_corr,
                                 method='ward',
                                 optimal_leaf=True,
                                 metadata=None,
                                 title='',
                                 cbar_title='Similarity',
                                 cmap='Blues_r',
                                 vmax=1.,
#                                  vmin=0.,
                                 annot=True,
                                 dendrogram_ratio=0.15,
                                figsize=(4,5))

font = matplotlib.font_manager.FontProperties(weight='bold', size=7)
for ax in [cm.ax_heatmap, cm.ax_cbar]:
    for tick in ax.get_xticklabels():
        tick.set_fontproperties(font)
    for tick in ax.get_yticklabels():
        tick.set_fontproperties(font)

    text = ax.yaxis.label
    text.set_font_properties(font)

Interaction space detected as a distance matrix

../../_images/notebooks_ccc_python_02-Infer-Communication-Scores_43_1.png

As can be seen, for this sample, CellChat has the most dissimilar communication scores.

This show the extent to which each of the scoring functions differ from each other. Unsurprisingly, independent evaluations have shown that the choice of method and/or resource leads to limited consensus in inferred predictions when using different tools (Dimitrov et al., 2022, Liu et al., 2022, Wang et al., 2022). To this end, we have implemented a consensus score that is a rank aggregate of the scores obtained by the different methods, and for the sake of these tutorials we will use this consensus score to rank the interactions, specifically the one that aggregates the magnitude scores from different functions. Aggregate scores are also non-negative, which is beneficial for decomposition with Tensor-cell2cell.

To further explore the effects of scoring methods, see Supplementary Tutorial S3)

LIANA’s Rank Aggregate

LIANA calculates an aggregate rank for both magnitude and specificity as defined above. The rank_aggregate function of liana uses a re-implementation of the RobustRankAggregate method by Kolde et al., 2012, and generates a probability distribution for ligand-receptors that are ranked consistently better than expected under a null hypothesis. It thus provides a consensus of the rank of the ligand-receptor interactions across methods, that can also be treated as a p-value.

In more detail, a rank aggregate is calculated for the magnitude and specificity scores from the methods in LIANA as follows:

First, a normalized rank matrix[0,1] is generated separately for magnitude and specificity as:

\[r_{i,j} = \frac{rank_{i,j}}{\max(rank_i)} \quad (1 \leq i \leq m, 1 \leq j \leq n)\]

where m is the number of score rank vectors, n is the length of each score vector (number of interactions), ranki,j is the rank of the j-th element (interaction) in the i-th score rank vector, and max(ranki) is the maximum rank in the i-th rank vector.

For each normalized rank vector r, we then ask how probable is it to obtain rnull(k) <= r(k), where rnull(k) is a rank vector generated under the null hypothesis. The RobustRankAggregate method expresses Kolde et al., 2012 the probability rnull(k) <= r(k) as βk,n(r), through a beta distribution. This entails that we obtain probabilities for each score vector r as:

\[p(r) = \underset{1, ..., n}{min} \beta_k,_n(r) * n\]

where we take the minimum probability ρ for each interaction across the score vectors, and we apply a Bonferroni correction to the p-values by multiplying them by n to account for multiple testing.

Single-Sample Dotplot

Now that we are familiar with the what LIANA’s aggregate scores are, let’s generate a basic dotplot with the most highly-ranked ligand-receptor interactions for this sample.

[16]:

li.pl.dotplot(
    adata=sadata,
    colour="magnitude_rank",
    size="specificity_rank",
    inverse_colour=True,  # we inverse sign since we want small p-values to have large sizes
    inverse_size=True,
    # We choose only the cell types which we wish to plot
    source_labels=["B", "pDC", "Macrophages"],
    target_labels=["T", "Mast", "pDC", "NK"],
    # since the rank_aggregate can also be interpreted as a probability distribution
    # we can again filter them according to their specificity significance
    # yet here the interactions are filtered according to
    # how consistently highly-ranked is their specificity across the methods
    filterby="specificity_rank",
    filter_lambda=lambda x: x <= 0.05,
    # again, we can also further order according to magnitude
    orderby="magnitude_rank",
    orderby_ascending=True,  # prioritize those with lowest values
    top_n=20,  # and we want to keep only the top 20 interactions
    figure_size=(9, 5),
    size_range=(1, 6),
)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

../../_images/notebooks_ccc_python_02-Infer-Communication-Scores_52_1.png

[16]:

<Figure Size: (900 x 500)>

Great! We have now obtained ligand-receptor predictions for a single sample. What we see here is that interactions are predicted across cell types and are typically specific to pairs of cell types.

Note that missing dots here would represent interactions for which the ligand and receptor are expressed below the expr_prop threshold.

Run LIANA by Sample

Now that we have familiarized ourselves with how ligand-receptor methods work and how the results look by sample, let’s run LIANA’s rank_aggregate on all of the samples in the dataset. These results will be used to generate a tensor of ligand-receptor interactions across contexts that will be decomposed into CCC patterns by Tensor-Cell2cell.

This is easily done with liana with the internal to the method instances function called by_sample. So, we will use the same parameters as before, but we will also add the by_sample parameter, which is a string that indicates the column in the adata.obs dataframe.

We relax the expr_prop parameter to have more consistent presence of LR interactions across contexts. See further discussions on the how parameter in Tutorial 03.

[17]:

%%time
li.mt.rank_aggregate.by_sample(adata,
                               sample_key='sample_new',
                               groupby='celltype',
                               resource_name = 'consensus',
                               expr_prop=0.1, # must be expressed in expr_prop fraction of cells
                               min_cells = 5,
                               n_perms = 100,
                               use_raw = False, # run on log- and library-normalized counts
                               verbose = True,
                               inplace = True
                              )

Now running: S6: 100%|██████████████████████████| 12/12 [03:38<00:00, 18.20s/it]

CPU times: user 3min 29s, sys: 10.3 s, total: 3min 40s
Wall time: 3min 38s

Check the results

Here, we can see that the results look very similar as before with the exception that we have an additional column for the sample name, and the results are now by sample.

[18]:

adata.uns['liana_res'].head()

[18]:

	sample_new	source	target	ligand_complex	receptor_complex	lr_means	expr_prod	scaled_weight	lr_logfc	spec_weight	lrscore	lr_probs	specificity_rank	magnitude_rank
0	HC1	Macrophages	NK	B2M	CD3D	3.410504	8.059611	1.300556	1.397895	0.083273	0.961040	0.221495	0.003713	1.698996e-09
1	HC1	T	NK	B2M	CD3D	3.410586	8.059861	1.300856	1.272266	0.083276	0.961041	0.221213	0.003713	6.256593e-09
2	HC1	NK	NK	B2M	CD3D	3.264099	7.614378	0.790913	1.113901	0.078673	0.959963	0.216816	0.006245	2.653267e-08
3	HC1	T	NK	B2M	KLRD1	3.297900	6.865250	6.960920	1.244892	0.171293	0.957924	0.214586	0.000092	9.767878e-08
4	HC1	Macrophages	NK	B2M	KLRD1	3.297818	6.865037	6.960620	1.370520	0.171288	0.957924	0.214861	0.000092	1.086199e-07

We can also generate a dotplot for the most highly-ranked ligand-receptor interactions for each sample. Let’s pick the first two distinct interaction in the list, and see how they look like in the dotplot_by_sample.

[19]:

plot = (li.pl.dotplot_by_sample(adata=adata,
                               colour='magnitude_rank',
                               size='specificity_rank',
                               source_labels=["B", "pDC", "Epithelial"],
                               target_labels=["Macrophages", "Mast", "pDC", "NK"],
                               ligand_complex = ['VIM', 'SCGB3A1'],
                               receptor_complex= ['CD44', 'MARCO'],
                               sample_key='sample_new',
                               inverse_colour=True,
                               inverse_size=True,
                               figure_size=(11, 8),
                               size_range=(1, 6),
                               ) +
         p9.labs(color='Magnitude rank', size='Specificity Rank'))

plot.save(output_folder + '/Dotplot-by-sample.pdf', height=8, width=11)
plot

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

../../_images/notebooks_ccc_python_02-Infer-Communication-Scores_61_1.png

[19]:

<Figure Size: (1100 x 800)>

Here, we can already see that the ligand-receptor interactions are not only specific to cell types, but also to samples or contexts (see VIM -> CD44 with pDCs in source cell types). However, we can also see that this plot, even with just two ligand-receptor interactions visualized, starts to get a bit overwhelming. To this end, to make the most use of the hypothesis-free nature of the ligand-receptor interactions, in the next chapter we will use Tensor-Cell2cell to decompose the ligand-receptor interactions into interpretable CCC patterns across contexts.

Save Results for Tensor-Cell2cell

[20]:

adata.uns['liana_res'].to_csv(output_folder + '/LIANA_by_sample.csv', index=False)

Next Chapter: 03-Generate-Tensor.

Supplementary Information about LIANA

Independent Methods

Although, in these tutorials we will use the consensus score, one can also run the different methods independently via LIANA.

[21]:

from liana.method import cellphonedb, cellchat, geometric_mean, singlecellsignalr, natmi, connectome, logfc

Each of these is an instance of Class Method in LIANA and comes with it’s own metadata. For example, let’s examine the metadata of the CellPhoneDBv2 method:

[22]:

cellphonedb.describe()

CellPhoneDB uses `lr_means` and `cellphone_pvals` as measures of expression strength and interaction specificity, respectively

We can also check in what order one should interpret the results from each method:

[23]:

{ cellphonedb.magnitude : cellphonedb.magnitude_ascending }

[23]:

{'lr_means': False}

[24]:

{ cellphonedb.specificity : cellphonedb.specificity_ascending }

[24]:

{'cellphone_pvals': True}

Similarly, we can obtain the reference for the method:

[25]:

cellphonedb.reference

[25]:

'Efremova, M., Vento-Tormo, M., Teichmann, S.A. and Vento-Tormo, R., 2020. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nature protocols, 15(4), pp.1484-1506. '

Key Parameters

We already covered some of the parameters in LIANA, that can be used to customize the results. Here, we will go in more detail over some of the most important ones.

resource and resource_name enable the user to select the resource that they want to use for CCC inference. By default, liana will use the ‘consensus’ resource which combines a number of expert-curated ligand-receptor resources. However, one can also use the ‘cellphonedb’ resource, any of the resources that are available within liana by passing their resource_name (See below). Additionally, the user can pass their own resource via the resource parameter, which expects a pandas DataFrame.

[26]:

li.resource.show_resources()

[26]:

['baccin2019',
 'cellcall',
 'cellchatdb',
 'cellinker',
 'cellphonedb',
 'celltalkdb',
 'connectomedb2020',
 'consensus',
 'embrace',
 'guide2pharma',
 'hpmr',
 'icellnet',
 'italk',
 'kirouac2010',
 'lrdb',
 'mouseconsensus',
 'ramilowski2015']

expr_prop which we also used before is the proportion of cells that need to express a ligand-receptor pair for it to be considered as a potential ligand-receptor pair. This is a parameter that can be used to filter out lowly-expressed ligand-receptor pairs. This is common practice in CCC inference at the cluster level, as we make the assumption that the event occurs for all cells within that clusters. By default, it is assigned to 0.1 - i.e. 10% of cells need to express the ligand-receptor and pair, including all of their subunits, in the corresponding clusters.
return_all_lrs is related to the expr_prop parameter. If return_all_lrs is set to True, then all ligand-receptor pairs will be returned, regardless of whether they are expressed above the expr_prop threshold. When return_all_lrs is set to True, all interactions are returned and for all above expr_prop liana would calculate the score as usual, while the remainder get the worst possible score within the list of scores. Furthermore, a column with the lrs_to_keep flag is added to the adata.uns['liana_res'] dataframe that indicates whether the interaction and any of its members passed the expr_prop threshold. This flag is for example used by the li.multi.to_tensor_c2c described in the next tutorial. This is useful if one wants to use the expr_prop parameter to filter out lowly-expressed ligand-receptor pairs, but still wants to see the scores for all ligand-receptor pairs. By default, it is set to False, which means that only ligand-receptor pairs that are expressed above the expr_prop threshold will be returned.
supp_columns is a list of columns that can be added to the adata.uns['liana_res'] dataframe. These are typically other stats that are calculated for each ligand and receptor, such as the 1vsRest differential expression p-value, or any other stat from another method.

In addition to those parameters liana provides a number of other utility parameters as well as some method-specific ones, please refer to the documentation for more information.

[27]:

?cellphonedb

Alternative Resources

In addition to the resources already provided with LIANA, one can also use their own resource. This can be done by passing a pandas DataFrame to the resource parameter. The DataFrame should have the following columns: ligand, receptor, where subunits are separated by _.

Let’s load the immune-focused resource by Noël et al., 2021 for obtained from https://github.com/LewisLabUCSD/Ligand-Receptor-Pairs:

[28]:

resource = pd.read_csv("../../data/Human-2020-Noël-LR-pairs.csv")
resource.head(6)

[28]:

	Ligand 1	Ligand 2	Receptor 1	Receptor 2	Receptor 3	Alias	Family	Subfamily	Classifications	Source for interaction	PubMed ID	Comments
0	HLA-A	NaN	LILRB1	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	9285411; 9382880	NaN
1	HLA-B	NaN	LILRB1	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	9285411; 9382880	NaN
2	HLA-C	NaN	LILRB1	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	9285411; 9382880	NaN
3	HLA-F	NaN	LILRB1	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	9285411; 9382880	NaN
4	HLA-G	NaN	LILRB1	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	9285411; 9382880	NaN
5	IGHG1	IGLC1	FCGR3B	NaN	NaN	NaN	Antigen binding	NaN	Antigen binding	NaN	10917521	NaN

Now we need to just change it to be the same format as any other resource in liana, for example:

[29]:

li.resource.select_resource('consensus').head()

[29]:

	ligand	receptor
0	LGALS9	PTPRC
1	LGALS9	MET
2	LGALS9	CD44
3	LGALS9	LRP1
4	LGALS9	CD47

[30]:

# Unite ligand1 and ligand2 into complexes acc to liana input, and do the same for receptor1 and receptor2
resource['ligand'] = resource['Ligand 1'] + ('_' + resource['Ligand 2']).fillna('')
resource['receptor'] = resource['Receptor 1'] + ('_' + resource['Receptor 2']).fillna('')
resource = resource.loc[:, ['ligand', 'receptor']]
resource.head(6)

[30]:

	ligand	receptor
0	HLA-A	LILRB1
1	HLA-B	LILRB1
2	HLA-C	LILRB1
3	HLA-F	LILRB1
4	HLA-G	LILRB1
5	IGHG1_IGLC1	FCGR3B

[31]:

# run with any method, in this case singlecellsignalr
li.mt.singlecellsignalr(sadata, resource=resource, groupby='celltype', use_raw=False, verbose=True)

Using `.X`!
5580 features of mat are empty, they will be removed.
The following cell identities were excluded: Plasma
0.25 of entities in the resource are missing from the data.

Using provided `resource`.
Generating ligand-receptor stats for 2548 samples and 19218 features

[32]:

sadata.uns['liana_res']

[32]:

	ligand	ligand_complex	ligand_means	ligand_props	mat_mean	receptor	receptor_complex	receptor_means	receptor_props	source	target	lrscore
577	MIF	MIF	0.950994	0.666667	0.147544	CD74	CD74	4.767055	1.000000	Mast	mDC	0.935195
724	MIF	MIF	0.950994	0.666667	0.147544	CD74	CD74	4.359042	1.000000	Mast	pDC	0.932430
228	MIF	MIF	0.950994	0.666667	0.147544	CD74	CD74	4.029520	0.993441	Mast	Macrophages	0.929911
37	MIF	MIF	0.950994	0.666667	0.147544	CD74	CD74	3.754743	0.888889	Mast	B	0.927575
605	MIF	MIF	0.664859	0.354839	0.147544	CD74	CD74	4.767055	1.000000	NK	mDC	0.923466
...	...	...	...	...	...	...	...	...	...	...	...	...
559	JAG1	JAG1	0.062933	0.111504	0.147544	NOTCH1	NOTCH1	0.167743	0.194175	Macrophages	mDC	0.410506
169	MIF	MIF	0.159508	0.111111	0.147544	CXCR2	CXCR2	0.062348	0.118063	B	Macrophages	0.403305
298	CXCL8	CXCL8	0.155774	0.166667	0.147544	CXCR2	CXCR2	0.062348	0.118063	pDC	Macrophages	0.400458
94	IL1A	IL1A	0.084155	0.147830	0.147544	IL1R1	IL1R1_IL1RAP	0.086664	0.153846	Macrophages	Epithelial	0.366612
198	ICOSLG	ICOSLG	0.058154	0.109990	0.147544	ICOS	ICOS	0.090905	0.160949	Macrophages	Macrophages	0.330115

725 rows × 12 columns

Provided an appropriate resource, one could use liana with any ID category and any species, as long as the IDs of the resource match the .var_names in the adata object.