5. Downstream Analyses of Factor Loadings

Here we use the loadings obtained with Tensor-cell2cell for each element in their respective dimensions (contexts, ligand-receptor pairs, sender cells, receiver cells). These loadings represent the importance of each element in a given factor or latent pattern. We can use this information to gain more insights about the biological processes underlying each of the communication patterns.

[1]:
import cell2cell as c2c

import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
[2]:
import warnings
warnings.filterwarnings('ignore')

Directories

[3]:
output_folder = '../../data/tc2c-outputs/'
c2c.io.directories.create_directory(output_folder)
../../data/tc2c-outputs/ already exists.

Load Data

Load the loadings obtained from the tensor factorization

[4]:
factors = c2c.io.load_tensor_factors(output_folder + '/Loadings.xlsx')

Dictionary with the condition of each sample/context

This was generated when preprocessing the data in the Notebook to Generate the 4D-Tensor.

[5]:
from collections import defaultdict

element_dict = defaultdict(lambda: 'Unknown')
[6]:
context_dict = element_dict.copy()

context_dict.update({'HC1' : 'Control',
                     'HC2' : 'Control',
                     'HC3' : 'Control',
                     'M1' : 'Moderate COVID-19',
                     'M2' : 'Moderate COVID-19',
                     'M3' : 'Moderate COVID-19',
                     'S1' : 'Severe COVID-19',
                     'S2' : 'Severe COVID-19',
                     'S3' : 'Severe COVID-19',
                     'S4' : 'Severe COVID-19',
                     'S5' : 'Severe COVID-19',
                     'S6' : 'Severe COVID-19',
                    })

Downstream Analyses

Boxplots to statistically compare group of samples/contexts

The statistical test and the multiple test correction can be changed by modifying the parameters

statistical_test='t-test_ind'

pval_correction='benjamini-hochberg'

[7]:
groups_order = ['Control', 'Moderate COVID-19', 'Severe COVID-19']
fig_filename = output_folder + '/BALF-Severity-Boxplots.pdf'

_ = c2c.plotting.context_boxplot(context_loadings=factors['Contexts'],
                                 metadict=context_dict,
                                 nrows=3,
                                 figsize=(16, 12),
                                 group_order=groups_order,
                                 statistical_test='t-test_ind',
                                 pval_correction='fdr_bh',
                                 cmap='plasma',
                                 verbose=False,
                                 filename=fig_filename
                                )
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_13_0.png

Here, we see that the communication patterns (or context loadings) identified statistically significant patterns when comparing each pair of COVID-19 severities.

These factors thus represent differences in the ligand-receptor interactions as well as cell types participating in cell-cell communication among the different COVID-19 severities.

Correlation between Context Loadings & COVID-19 Severity

We first define the ranking of the severity, where higher values represent the worst severity.

[8]:
def severity_rank(x):
    if x == 'Control':
        ranking = 1
    elif x == 'Moderate COVID-19':
        ranking = 2
    elif x == 'Severe COVID-19':
        ranking = 3
    return ranking

We generate a dataframe with the same indexes as the dataframe containing the loadings, then assign those rankings here.

[9]:
sev_rank = pd.DataFrame(index=factors['Contexts'].index)
sev_rank['Severity-Ranking'] = [severity_rank(context_dict[idx]) for idx in factors['Contexts'].index]
sev_rank.head()
[9]:
Severity-Ranking
HC1 1
HC2 1
HC3 1
M1 2
M2 2

Number of factors from the tensor decomposition

[10]:
nfactors = factors['Contexts'].shape[1]

Spearman correlation

[11]:
import scipy
for i in range(1, nfactors+1):
    factor = 'Factor {}'.format(i)
    print(factor, scipy.stats.spearmanr(factors['Contexts'][factor],
                                sev_rank['Severity-Ranking']))
Factor 1 SignificanceResult(statistic=-0.47795211939924326, pvalue=0.11604597293054225)
Factor 2 SignificanceResult(statistic=-0.3869136204660541, pvalue=0.21405134607765433)
Factor 3 SignificanceResult(statistic=0.7396878038321621, pvalue=0.005962089821850848)
Factor 4 SignificanceResult(statistic=0.47795211939924326, pvalue=0.11604597293054225)
Factor 5 SignificanceResult(statistic=0.8534859274986488, pvalue=0.0004135604992741732)
Factor 6 SignificanceResult(statistic=-0.31863474626616217, pvalue=0.3127697121445725)
Factor 7 SignificanceResult(statistic=0.6258896801656757, pvalue=0.02947559079163286)
Factor 8 SignificanceResult(statistic=0.11379812366648649, pvalue=0.7247294545217076)
Factor 9 SignificanceResult(statistic=0.2503558720662703, pvalue=0.4325597712381267)
Factor 10 SignificanceResult(statistic=0.6714089296322702, pvalue=0.01681347928292643)

Heatmaps

Cluster samples/contexts by their importance across factors

Here we provide a example on clustering samples (colored by COVID-19 severity). If you plan adding colors for additional data and patient’s properties. A detailed guide can be found here: https://www.chrisremmel.com/post/seaborn-color-labels/

Once you have a dataframe or a dictionary for coloring patients/samples, the parameter col_colors should be changed. Make sure to use the same sample labels as in factors['Contexts'].index

[12]:
# Generate color by COVID-19 severity for each sample
condition_colors = c2c.plotting.aesthetics.get_colors_from_labels(['Control', 'Moderate COVID-19', 'Severe COVID-19'],
                                                                  cmap='plasma')

# Map these colors to each sample name
color_dict = {k : condition_colors[v] for k, v in context_dict.items()}

# Generate a dataframe used as input for the clustermap
col_colors = pd.Series(color_dict)
col_colors = col_colors.to_frame()
col_colors.columns = ['COVID-19 Severity']

How the color dataframe looks like. If you want to show multiple properties, add new columns with the names you want to display them, and the respective colors.

[13]:
col_colors.head()
[13]:
COVID-19 Severity
HC1 (0.050383, 0.029803, 0.527975, 1.0)
HC2 (0.050383, 0.029803, 0.527975, 1.0)
HC3 (0.050383, 0.029803, 0.527975, 1.0)
M1 (0.610667, 0.090204, 0.619951, 1.0)
M2 (0.610667, 0.090204, 0.619951, 1.0)

Now generate the clustermap of samples given their respective loadings, colored by category of COVID-19 condition.

[14]:
sample_cm = c2c.plotting.loading_clustermap(factors['Contexts'],
                                            use_zscore=False, # Whether standardizing the loadings across factors
                                            col_colors=col_colors, # Change this to color by other properties
                                            figsize=(16, 6),
                                            dendrogram_ratio=0.3,
                                            cbar_fontsize=12,
                                            tick_fontsize=14,
                                            filename=output_folder + 'Clustermap-Contexts.pdf'
                                           )

#Insert legend
plt.sca(sample_cm.ax_heatmap)
legend = c2c.plotting.aesthetics.generate_legend(color_dict=condition_colors,
                                                 bbox_to_anchor=(1.1, 0.5), # Position of the legend (X, Y)
                                                 title='COVID-19 Severity'
                                                )
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_29_0.png

Here patients are grouped by the importance that each communication pattern (factor) has in relation to the other patients. This captures combinations of related communication patterns that explain similarities and differences at a sample-specific resolution. In this case the differences are reflected with a clear clustering by COVID-19 severity, with some interferance of a few severe samples more similar to moderate cases.

Cluster LR pairs by their importance across factors

To consider important LR pairs in at least one factor, set a loading_threshold. This value can be adjusted to be more stringent control the number of ligand-receptor pairs. Here we use loading_threshold=0.1 to prioritize the interactions with high loadings. If we use loading_threshold=0, we would return all of pairs.

[15]:
lr_cm = c2c.plotting.loading_clustermap(factors['Ligand-Receptor Pairs'],
                                        loading_threshold=0.1, # To consider only top LRs
                                        use_zscore=False, # Whether standardizing the loadings across factors
                                        figsize=(28, 8),
                                        filename=output_folder + 'Clustermap-LRs.pdf',
                                        row_cluster=False # To avoid clustering of factors
                                       )
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_32_0.png

In this case, LR pairs are grouped according to how important they for one factor versus the others. This can give an idea of the molecular mechanisms that are crucial in each communication pattern.

Cluster sender-receiver pairs for an specific factor

Factor 3 is associated with severe COVID-19 cases (see boxplots). Thus, it would be interesting to determine the cellular interactions underlying this communication pattern. To do so, we define its Loading Product between senders and receiver cells (outer product of loadings for that factor), which represents the joint importance of the cell-cell pair:

[16]:
selected_factor = 'Factor 3'
[17]:
loading_product = c2c.analysis.tensor_downstream.get_joint_loadings(factors,
                                                                    dim1='Sender Cells',
                                                                    dim2='Receiver Cells',
                                                                    factor=selected_factor,
                                                                   )
[18]:
lprod_cm = c2c.plotting.loading_clustermap(loading_product.T, # Remove .T to transpose the axes
                                           use_zscore=False, # Whether standardizing the loadings across factors
                                           figsize=(8, 8),
                                           filename=output_folder + 'Clustermap-CC-Pairs.pdf',
                                           cbar_label='Loading Product',
                                          )
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_38_0.png

Evaluation of sender-receiver pair and LR pairs used in a specific factor

Similarly, we can check what LR pairs each cell-cell pair is using

[19]:
lr_cell_product = c2c.analysis.tensor_downstream.get_lr_by_cell_pairs(factors,
                                                                      lr_label='Ligand-Receptor Pairs',
                                                                      sender_label='Sender Cells',
                                                                      receiver_label='Receiver Cells',
                                                                      order_cells_by='receivers',
                                                                      factor=selected_factor,
                                                                      cci_threshold=None, # None for considering all cell-cell pairs.
                                                                      lr_threshold=0.075 # Prioritize important pairs. None for considering all of them
                                                                     )
[20]:
lsr_prod_cm = c2c.plotting.loading_clustermap(lr_cell_product,
                                              use_zscore=False, # Whether standardizing the loadings across factors
                                              figsize=(12, 24),
                                              filename=output_folder + 'Clustermap-LR-CC-Pairs.pdf',
                                              cbar_label='Loading Product',
                                              yticklabels=1
                                             )
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_42_0.png

Overall CCI potential: Network visualization of sender-receiver cell pairs

An interaction network can be created for each factor by using the loading product between sender and receiver cells. First we need to choose a threshold to indicate what pair of cells are interacting. To do so, we need to get all the outer products between the loadings for the sender and receiver cells dimensions across all factors.

[21]:
# Get all outer products as adjacency matrices, one per factor
networks = c2c.analysis.tensor_downstream.get_factor_specific_ccc_networks(factors,
                                                                           sender_label='Sender Cells',
                                                                           receiver_label='Receiver Cells',
                                                                           )
[22]:
# Then, flatten the adjacency matrices
network_by_factors = c2c.analysis.tensor_downstream.flatten_factor_ccc_networks(networks, orderby='receivers')

# And we can plot the distributions of the weights for each factor-specific network
_ = plt.hist(network_by_factors.values.flatten(), bins = 50)
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_46_0.png

With this distribution we can set a reasonable threshold. We can also apply some mathematical methods to select one, as for example choosing a percentile. Here we set a threshold of 0.075.

[23]:
threshold = 0.075

Then, we can plot all networks we are interested in:

[24]:
c2c.plotting.ccc_networks_plot(factors,
                               included_factors=['Factor 3', 'Factor 5', 'Factor 10'],
                               ccc_threshold=threshold, # Only important communication
                               nrows=1,
                               panel_size=(16, 16), # This changes the size of each figure panel.
                               filename=output_folder + 'Factor-Networks.pdf',
                              )
[24]:
(<Figure size 4800x1600 with 3 Axes>,
 array([<Axes: title={'center': 'Factor 3'}>,
        <Axes: title={'center': 'Factor 5'}>,
        <Axes: title={'center': 'Factor 10'}>], dtype=object))
../../_images/notebooks_ccc_python_05-Downstream-Visualizations_50_1.png

Gini coefficients of the factor-specific cell-cell communication

The Gini coefficient quantifies the extent to which sender-receiver cell pairs differ within a factor according to their importance as measured by their loadings. Here it is use to measure the imbalance of communication for a given factor. In other words, it identifies whether a few cells are key for an identified pattern (low Gini coefficient) or whether the pattern corresponds to a general process that involves most of the cell types (high Gini coefficient). Gini coefficients range between [0, 1]

[25]:
c2c.analysis.tensor_downstream.compute_gini_coefficients(factors,
                                                         sender_label='Sender Cells',
                                                         receiver_label='Receiver Cells'
                                                        ).sort_values('Gini')
[25]:
Factor Gini
0 Factor 1 0.333537
3 Factor 3 0.383925
8 Factor 8 0.511984
5 Factor 5 0.536594
2 Factor 2 0.537684
6 Factor 6 0.561459
4 Factor 4 0.574522
9 Factor 9 0.680455
7 Factor 7 0.696134
1 Factor 10 0.747488

We observe that the lowest Gini coefficient was obtained by factor 1, which seems to have more balanced cell loadings. Senders and/or receivers are more similar or homogeneous.