5. Downstream Analyses of Factor Loadings

Here we use the loadings obtained with Tensor-cell2cell for each element in their respective dimensions (contexts, ligand-receptor pairs, sender cells, receiver cells). These loadings represent the importance of each element in a given factor or latent pattern. We can use this information to gain more insights about the biological processes underlying each of the communication patterns.

[1]:
library(liana, quietly = TRUE)

library(dplyr, quietly = TRUE)
library(tibble, quietly = TRUE)

library(patchwork, quietly = TRUE)
library(ggplot2, quietly = TRUE)

library(purrr, quietly = TRUE)
library(forcats, quietly = TRUE)
library(tidyr, quietly = TRUE)
library(rstatix, quietly = TRUE)
library(ggpubr, quietly = TRUE)

library(reticulate, quietly = TRUE)
use_condaenv('ccc_protocols')
c2c <- reticulate::import(module = "cell2cell")

Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



Attaching package: ‘rstatix’


The following object is masked from ‘package:stats’:

    filter


Directories

[2]:
output_folder <- file.path('..', '..', 'data', 'tc2c-outputs')
data_path <- file.path('..', '..', 'data')

Load Data

Load the SingleCellExperiment object with the stored loadings obtained from the tensor factorization

[3]:
covid_data <- readRDS(file.path(data_path, 'covid_data_sce.rds'))

Downstream Analyses

Boxplots to statistically compare group of samples/contexts

The statistical test can be changed by modifying the parameters. An FDR correction is applied to the p-values.

statistical_test='t-test_ind'

pval_correction='benjamini-hochberg'

[4]:
sample_loadings <- liana::get_c2c_factors(covid_data,
                                          sample_col='sample_new',
                                          group_col='condition') %>%
                                          purrr::pluck("contexts") %>%
                                          pivot_longer(-c("condition", "context"),
                                                names_to = "factor",
                                                values_to = "loadings") %>%
                                          mutate(factor = factor(factor)) %>%
                                          mutate(factor = fct_relevel(factor, "Factor.10", after=9))
[5]:
h_ = 10
w_ = 15
options(repr.plot.height=h_, repr.plot.width=w_)

# Do pairwise tests
pwc <- sample_loadings %>%
    group_by(factor) %>%
    rstatix::pairwise_t_test(
      loadings ~ condition,
      p.adjust.method = "fdr"
      )

# plot
pwc <- pwc %>% add_xy_position(x = "condition")
ggboxplot(sample_loadings, x = "condition", y = "loadings", add = "point", fill="condition") +
    facet_wrap(~factor, ncol = 4) +
    stat_pvalue_manual(pwc) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
../../_images/notebooks_ccc_R_05-Downstream-Visualizations_9_0.png

Here, we see that the communication patterns (or context loadings) identified statistically significant patterns when comparing each pair of COVID-19 severities.

These factors thus represent differences in the ligand-receptor interactions as well as cell types participating in cell-cell communication among the different COVID-19 severities.

Correlation between Context Loadings & COVID-19 Severity

We first define the ranking of the severity, where higher values represent the worst severity.

[6]:
severity_rank<-function(x){
    if (x == 'Control'){
        ranking = 1
    }else if (x == 'Moderate COVID-19'){
        ranking = 2
    }else if (x == 'Severe COVID-19'){
        ranking = 3
    }
}
[7]:
factors <- liana::get_c2c_factors(sce = covid_data,
                            sample_col = "sample_new",
                            group_col = "condition")
sev_rank <- data.frame(condition = factors$contexts$condition,
                        severity.ranking = sapply(factors$contexts$condition, function(x) severity_rank(x)))
head(sev_rank)
A data.frame: 6 × 2
conditionseverity.ranking
<fct><dbl>
1Control 1
2Control 1
3Control 1
4Moderate COVID-192
5Moderate COVID-192
6Moderate COVID-192

Number of factors from the tensor decomposition

[8]:
nfactors <- dim(factors$contexts %>% dplyr::select(starts_with("Factor")))[[2]]

Spearman correlation

[9]:
for (i in 1:nfactors){
    factor <- paste0('Factor.', i)
    suppressWarnings({
        s.cor<-cor.test(factors$contexts[[factor]], sev_rank$severity.ranking,
            method = 'spearman')
    })

    s.r<-format(round(s.cor$estimate[['rho']], 2), nsmall = 2)
    s.p<-format(round(s.cor$p.value, 3), nsmall = 3)
    print(paste0(factor, ' has a Spearman correlation of ', s.r,
                 ' with an associated p-value of ', s.p))

}
[1] "Factor.1 has a Spearman correlation of -0.85 with an associated p-value of 0.000"
[1] "Factor.2 has a Spearman correlation of 0.88 with an associated p-value of 0.000"
[1] "Factor.3 has a Spearman correlation of 0.41 with an associated p-value of 0.186"
[1] "Factor.4 has a Spearman correlation of -0.10 with an associated p-value of 0.751"
[1] "Factor.5 has a Spearman correlation of 0.42 with an associated p-value of 0.173"
[1] "Factor.6 has a Spearman correlation of 0.02 with an associated p-value of 0.944"
[1] "Factor.7 has a Spearman correlation of -0.15 with an associated p-value of 0.646"
[1] "Factor.8 has a Spearman correlation of 0.79 with an associated p-value of 0.002"
[1] "Factor.9 has a Spearman correlation of 0.85 with an associated p-value of 0.000"
[1] "Factor.10 has a Spearman correlation of 0.83 with an associated p-value of 0.001"

Heatmaps

Cluster samples/contexts by their importance across factors

Here we provide a example on clustering samples given their respective loadings, colored by category of COVID-19 condition:

[10]:
liana::plot_context_heat(sce = covid_data,
                sample_col = "sample_new",
                group_col = "condition")
../../_images/notebooks_ccc_R_05-Downstream-Visualizations_18_0.png

Here patients are grouped by the importance that each communication pattern (factor) has in relation to the other patients. This captures combinations of related communication patterns that explain similarities and differences at a sample-specific resolution. In this case the differences are reflected with an almost perfect clustering by COVID-19 severity, where moderate cases are more similar to control cases.

Cluster LR pairs by their importance across factors

To consider important LR pairs in at least one factor, we can set a threshold to filter for the top n (fraction or number) of LR pairs. This value can be obtained in the way that user thinks is pertinent. Here we use n=05 to filter for the top 5 LR pairs per factor

[11]:
liana::plot_lr_heatmap(sce = covid_data,  n = 5)
../../_images/notebooks_ccc_R_05-Downstream-Visualizations_20_0.png

In this case, LR pairs are grouped according to how important they for one factor versus the others. This can give an idea of the molecular mechanisms that are crucial in each communication pattern.

Cluster sender-receiver pairs for an specific factor

Factor 8 is associated with COVID-19 cases (see boxplots). Thus, it would be interesting to determine the cellular interactions underlying this communicatino pattern. To do so, we define its Loading Product between senders and receiver cells (outer product of loadings for that factor), which represents the joint importance of the cell-cell pair:

[12]:
selected_factor <- 'Factor.8'
liana::plot_c2c_cells(sce = covid_data,
                   factor_of_int = selected_factor,
                   name = "Loadings \nProduct")
../../_images/notebooks_ccc_R_05-Downstream-Visualizations_22_0.png

Overall CCI potential: Network visualization of sender-receiver cell pairs

An interaction network can be created for each factor by using the loading product between sender and receiver cells. First we need to choose a threshold to indicate what pair of cells are interacting. To do so, we need to get all the outer products between the loadings for the sender and receiver cells dimensions across all factors.

[13]:
head(factors[['interactions']])
A tibble: 6 × 11
lrFactor.1Factor.2Factor.3Factor.4Factor.5Factor.6Factor.7Factor.8Factor.9Factor.10
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
A2M^LRP1 2.014718e-046.177029e-070.00010184092.616977e-031.016893e-065.631589e-071.216602e-013.608764e-021.469622e-041.185005e-04
ACTR2^ADRB2 3.939710e-023.842080e-020.01412205304.564203e-021.393886e-023.178997e-021.182422e-021.927887e-022.016811e-051.158559e-08
ACTR2^LDLR 1.093640e-041.604342e-160.01161694432.514870e-024.178628e-103.704964e-022.194405e-101.038034e-091.593195e-036.626642e-02
ADAM10^AXL 4.680507e-051.702355e-140.00155329483.183815e-021.170693e-024.584032e-025.565283e-028.909291e-081.652256e-118.912689e-11
ADAM10^CADM12.598824e-043.258244e-080.00115001104.352961e-103.441869e-021.577997e-044.419620e-041.123850e-031.777034e-051.144090e-06
ADAM10^CD44 5.699971e-032.898187e-020.02882838256.858978e-036.697052e-021.096958e-021.019256e-015.999368e-023.157115e-028.152719e-03
[14]:
# format for use with cell2cell
format_py_df<-function(df){
    df<-df %>%
    as.data.frame() %>%
    textshape::column_to_rownames() %>%
    select(-contains("condition"))

    return(df)
}
py_factors<-unname(lapply(factors, function(df) format_py_df(df)))
py_factors = reticulate::py_dict(keys = names(factors), values = py_factors)
[15]:
# Get all outer products as adjacency matrices, one per factor
networks <- c2c$analysis$tensor_downstream$get_factor_specific_ccc_networks(py_factors,
                                                                            sender_label = 'senders',
                                                                            receiver_label = 'receivers')

# Then, flatten the adjacency matrices
network_by_factors = c2c$analysis$tensor_downstream$flatten_factor_ccc_networks(networks, orderby='receivers')

# And we can plot the distributions of the weights for each factor-specific network
ggplot(data.frame(weights = unlist(unname(as.vector(network_by_factors)))), aes(x=weights)) +
geom_histogram(bins = 50) + theme_bw() + ylab('')

../../_images/notebooks_ccc_R_05-Downstream-Visualizations_27_0.png

With this distribution we can set a reasonable threshold. We can also apply some mathematical methods to select one, as for example choosing a percentile. Here we set a threshold of 0.075.

[16]:
threshold = 0.075

Then, we can plot all networks we are interested in:

[17]:
c2c$plotting$ccc_networks_plot(py_factors,
                               included_factors=c('Factor.2', 'Factor.8', 'Factor.9'),
                               sender_label='senders',
                               receiver_label='receivers',
                               ccc_threshold=threshold, # Only important communication
                               nrows=as.integer(1),
                               filename=file.path(data_path, 'tc2c-outputs', 'network_plot_05_r.png')
                              )
[[1]]
<Figure size 2400x800 with 3 Axes>

[[2]]
[[2]][[1]]
<Axes: title={'center': 'Factor.2'}>

[[2]][[2]]
<Axes: title={'center': 'Factor.8'}>

[[2]][[3]]
<Axes: title={'center': 'Factor.9'}>


network_plot_05

Gini coefficients of the factor-specific cell-cell communication

The Gini coefficient quantifies the extent to which sender-receiver cell pairs differ within a factor according to their importance as measured by their loadings. Here it is use to measure the imbalance of communication for a given factor. In other words, it identifies whether a few cells are key for an identified pattern (low Gini coefficient) or whether the pattern corresponds to a general process that involves most of the cell types (high Gini coefficient). Gini coefficients range between [0, 1].

Note that here we show ginis seperately for sender and receiver cell types per factor, while in Python we show the combined gini for sender and receiver cell types

[18]:
# Get loadings for source/sender Cell types
calculate_gini(factors$senders)
A tibble: 10 × 2
factorgini
<chr><dbl>
Factor.1 0.2662005
Factor.100.1829162
Factor.2 0.1913923
Factor.3 0.4460056
Factor.4 0.4506584
Factor.5 0.6984962
Factor.6 0.6161417
Factor.7 0.2631348
Factor.8 0.0631097
Factor.9 0.6547246
[19]:
# Get loadings for source/sender Cell types
calculate_gini(factors$receiver)
A tibble: 10 × 2
factorgini
<chr><dbl>
Factor.1 0.08451066
Factor.100.47924770
Factor.2 0.10965274
Factor.3 0.13653756
Factor.4 0.09516597
Factor.5 0.14736680
Factor.6 0.10340692
Factor.7 0.50656809
Factor.8 0.69806214
Factor.9 0.05691588

For senders, we observe that the lowest Gini coefficient was obtained by Factor 2, which seems to have more balanced cell loadings. Similarly, for receivers, the lowest Gini coefficient is in Factor 10. For each of these factors respectively, senders and receivers are more similar or homogeneous.