S4. Identify CCC patterns from spatial transcriptomics

This tutorial illustrates how LIANA and Tensor-cell2cell can be used to explore cell-cell communication in distinct regions of a tissue using spatial transcriptomics. Here each region of the tissue is treated as a separate context.

Initial Setups

Enabling GPU use

First, if you are using a NVIDIA GPU with CUDA cores, set use_gpu=True and enable PyTorch with the following code block. Otherwise, set use_gpu=False or skip this part

[1]:
use_gpu = True

if use_gpu:
    import tensorly as tl
    tl.set_backend('pytorch')

Libraries

Then, import all the packages we will use in this tutorial:

[2]:
import cell2cell as c2c
import decoupler as dc
import liana as li

import numpy as np
import pandas as pd
import scanpy as sc
import squidpy as sq

import matplotlib.pyplot as plt
import seaborn as sns

from scipy.spatial.distance import squareform
[3]:
import warnings
warnings.filterwarnings('ignore')

Directories

Afterwards, specify the data and output directories:

[4]:
output_folder = '../../data/spatial/'
c2c.io.directories.create_directory(output_folder)
../../data/spatial/ was created successfully.

Load data

Similar to the tutorial of using LIANA and MISTy, we will use an ischemic 10X Visium spatial slide from Kuppe et al., 2022. It is a tissue sample obtained from a patient with myocardial infarction, specifically focusing on the ischemic zone of the heart tissue.

The slide provides spatially-resolved information about the cellular composition and gene expression patterns within the tissue.

[5]:
adata = sc.read("kuppe_heart19.h5ad", backup_url='https://figshare.com/ndownloader/files/41501073?private_link=4744950f8768d5c8f68c')
[6]:
adata
[6]:
AnnData object with n_obs × n_vars = 4113 × 17703
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
    var: 'gene_ids', 'feature_types', 'genome', 'SYMBOL', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'mt', 'rps', 'mrp', 'rpl', 'duplicated'
    uns: 'spatial'
    obsm: 'compositions', 'mt', 'spatial'

Process data

Normalize data

[7]:
adata.layers['counts'] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

Visualize spot clusters or niches. These niches were defined by clustering spots from their cellular composition deconvoluted by using cell2location (Kuppe et al., 2022).

[8]:
sq.pl.spatial_scatter(adata, color=[None, 'celltype_niche'], size=1.3, palette='Set1')
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_17_0.png

Cellular composition obtained from cell2location

[9]:
# Rename to more informative names
full_names = {'Adipo': 'Adipocytes',
              'CM': 'Cardiomyocytes',
              'Endo': 'Endothelial',
              'Fib': 'Fibroblasts',
              'PC': 'Pericytes',
              'prolif': 'Proliferating',
              'vSMCs': 'Vascular_SMCs',
              }
# but only for the ones that are in the data
adata.obsm['compositions'].columns = [full_names.get(c, c) for c in adata.obsm['compositions'].columns]
[10]:
comps = li.ut.obsm_to_adata(adata, 'compositions')
[11]:
comps.var
[11]:
Adipocytes
Cardiomyocytes
Endothelial
Fibroblasts
Lymphoid
Mast
Myeloid
Neuronal
Pericytes
Proliferating
Vascular_SMCs
[12]:
# check key cell types
sq.pl.spatial_scatter(comps,
                      color=['Vascular_SMCs','Cardiomyocytes',
                             'Endothelial', 'Fibroblasts'],
                      size=1.3, ncols=2, img_alpha=0
                      )
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_22_0.png
[13]:
adata_og = adata.copy()

Defining spatial contexts in the tissue

For defining spatial contexts within a tissue using spatial transcriptomics we simply divide the tissue in different regions through a square grid of a specific number of bins in each of the axes.

In this case, we divide our tissue into a 4x4 grid. A new column will be create in the adata object, specifically adata.obs['grid_cell'], to assign each spot or cell a grid window.

[14]:
num_bins = 4
[15]:
c2c.spatial.create_spatial_grid(adata, num_bins=num_bins)
[16]:
sq.pl.spatial_scatter(adata, color='grid_cell', size=1.3, palette='tab20')
... storing 'grid_cell' as categorical
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_28_1.png

Here, cells or spots in one grid window do not get to interact with those that are within other grid windows, ignoring that cells in the interfaces between grid windows could also interact. For accounting for those cases, more complex approaches could be employed, as for example a sliding window strategy for defining contexts. In that case, spatial contexts will present some overlap in the cells or spots that are within them.

For simplicity we only tried the grid approach, which could be replaced by the sliding windows method, but it would also require to account for the overlap between spatial contexts. If you are interested in trying out this approach, see cell2cell.spatial.neighborhood.create_sliding_windows() and cell2cell.spatial.neighborhood.add_sliding_window_info_to_adata().

Deciphering cell-cell communication

Here we run LIANA to compute interactions between spots or cells aggregated by their corresponding cluster/niche annotations. Interactions are computed for each spatial context (grid window) by considering only cells annotated with such spatial context in the grid_cell column in the adata.obs DataFrame.

In this case we use the CellPhoneDB list of ligand-receptor interactions, and the CellPhoneDB method to infer CCC.

[17]:
cell_group = 'celltype_niche'
[18]:
lr_pairs = li.resource.select_resource('cellchatdb')
[19]:
li.mt.cellphonedb.by_sample(adata, resource=lr_pairs, groupby=cell_group, sample_key='grid_cell',
                            expr_prop=0.1, use_raw=False, verbose=True)
Now running: 3_3: 100%|█████████████████████████| 16/16 [01:16<00:00,  4.78s/it]

The results are located here:

[20]:
spatial_liana = adata.uns['liana_res']
[21]:
spatial_liana
[21]:
grid_cell ligand ligand_complex ligand_means ligand_props receptor receptor_complex receptor_means receptor_props source target lr_means cellphone_pvals
0 0_0 FN1 FN1 4.556507 1.000000 ITGAV ITGAV_ITGB1 2.021336 1.00 ctniche_1 ctniche_3 3.288921 0.083
1 0_0 FN1 FN1 4.556507 1.000000 ITGAV ITGAV_ITGB1 1.928708 1.00 ctniche_1 ctniche_7 3.242608 0.155
2 0_0 FN1 FN1 4.342961 1.000000 ITGAV ITGAV_ITGB1 2.021336 1.00 ctniche_3 ctniche_3 3.182149 0.299
3 0_0 FN1 FN1 4.312120 1.000000 ITGAV ITGAV_ITGB1 2.021336 1.00 ctniche_7 ctniche_3 3.166728 0.374
4 0_0 FN1 FN1 4.342961 1.000000 ITGAV ITGAV_ITGB1 1.928708 1.00 ctniche_3 ctniche_7 3.135835 0.499
... ... ... ... ... ... ... ... ... ... ... ... ... ...
195553 3_3 WNT3 WNT3 0.064199 0.200000 LRP6 FZD6_LRP6 0.081630 0.20 ctniche_3 ctniche_3 0.072915 0.538
195554 3_3 WNT3 WNT3 0.064199 0.200000 LRP6 FZD1_LRP6 0.081630 0.20 ctniche_3 ctniche_3 0.072915 0.538
195555 3_3 WNT3 WNT3 0.064199 0.200000 LRP6 FZD8_LRP6 0.081630 0.20 ctniche_3 ctniche_3 0.072915 0.538
195556 3_3 POMC POMC 0.052017 0.104478 MC1R MC1R 0.078427 0.11 ctniche_4 ctniche_5 0.065222 0.644
195557 3_3 SEMA3A SEMA3A 0.064199 0.200000 PLXNA4 NRP1_PLXNA4 0.064199 0.20 ctniche_3 ctniche_3 0.064199 0.116

195558 rows × 13 columns

Using LIANA results to build a 4D-communication tensor, with grid windows as the contexts in the 4th dimension.

[22]:
tensor = li.multi.to_tensor_c2c(liana_res=spatial_liana, # LIANA's dataframe containing results
                                sample_key='grid_cell', # Column name of the samples
                                source_key='source', # Column name of the sender cells
                                target_key='target', # Column name of the receiver cells
                                ligand_key='ligand_complex', # Column name of the ligands
                                receptor_key='receptor_complex', # Column name of the receptors
                                score_key='lr_means', # Column name of the communication scores to use
                                inverse_fun=None, # Transformation function
                                how='outer', # What to include across all samples
                                outer_fraction=1/4., # Fraction of samples as threshold to include cells and LR pairs.
                               )
100%|███████████████████████████████████████████| 16/16 [00:19<00:00,  1.21s/it]

Metadata for coloring the elements in the tensor.

Here we do not assign major groups, each element is colored separately.

[23]:
dimensions_dict = [None, None, None, None]
meta_tensor = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                  metadata_dicts=dimensions_dict,
                                                  fill_with_order_elements=True
                                                 )

Tensor properties

[24]:
tensor.shape
[24]:
torch.Size([16, 700, 9, 9])
[25]:
tensor.excluded_value_fraction()
[25]:
0.7867879122495651
[26]:
tensor.sparsity_fraction()
[26]:
0.0

Run Tensor-cell2cell pipeline

For simplicity, we factorize the tensor into 8 factors or CCC patterns instead of using the elbow analysis to determine the number of factors.

[27]:
c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                           meta_tensor,
                                           rank=8, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
                                           tf_optimization='regular', # To define how robust we want the analysis to be.
                                           random_state=0, # Random seed for reproducibility
                                           device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                           cmaps=['plasma', 'Dark2_r', 'Set1', 'Set1'],
                                           output_folder=output_folder, # Whether to save the figures in files. If so, a folder pathname must be passed
                                          )
Running Tensor Factorization
Generating Outputs
Loadings of the tensor factorization were successfully saved into ../../data/spatial//Loadings.xlsx
[27]:
<cell2cell.tensor.tensor.PreBuiltTensor at 0x7f9a75deb3d0>
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_48_2.png

Visualize CCC patterns in space

Each factor or CCC pattern contains loadings for the context dimension. These loadings could be mapped for each spot or cell depending on what spatial context (grid window) they belong to. Thus, we can visualize the importance of each context in each of the factors, and see how the CCC patterns behave in space.

This behavior in space is useful to understand what set of LR pairs are used by determinant sender-receiver spot pairs in each region of the tissue. Regions with higher scores, indicate that LR pairs of that factor are used more there by the senders and receivers with high loadings.

[28]:
# Generate columns in the adata.obs dataframe with the loading values of each factor
factor_names = list(tensor.factors['Contexts'].columns)
for f in factor_names:
    adata.obs[f] = adata.obs['grid_cell'].apply(lambda x: float(tensor.factors['Contexts'][f].to_dict()[x]))
    adata.obs[f] = pd.to_numeric(adata.obs[f])
[29]:
# These columns are used for coloring the tissue regions to associate them with each factor
sq.pl.spatial_scatter(adata, color=[None, 'celltype_niche']+factor_names, size=1.3, cmap='Blues', vmin=0.)
plt.suptitle(f'{num_bins}x{num_bins} Spatial Grid', fontsize=32, ha='left')
[29]:
Text(0.5, 0.98, '4x4 Spatial Grid')
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_52_1.png

For example, we can take Factor 5 here, and see that regions in the top area of the tissue, and the bottom left corner are associated with the CCC pattern. Here, main interacting spots are niches 1 and 2 (mainly composed of cardiomyocytes and endothelial cells), coinciding with the areas where they are mainly located.

Then, using the LR pair loadings, we can also identify LR interactions that are key in each factor, and link this with the spatial region with higher scores in the same factor.

[30]:
_ = c2c.plotting.loading_clustermap(loadings=tensor.factors['Ligand-Receptor Pairs'],
                                    loading_threshold=0.1,
                                    use_zscore=False,
                                    figsize=(28, 8),
                                    filename=None,
                                    row_cluster=False
                                   )
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_55_0.png

In factor 5, where interactions of niches 1 and 2 are associated with this CCC pattern, LR interactions such as NPPB^NPR1, THBS4^CD36, and FN1^ITGAV&ITGB1 are important.

Following the same idea, other downstream analyses could be performed (e.g., Pathway enrichment analysis, PROGENy analysis, among others - see this notebook), and use the resulting scores to associate them with important tissue regions per factor.

Finally, for comparing tissues from multiple patients, a 4D tensor could be built for the same tissue region across patients. So the 4th dimension would be a tissue region aligned and present across patients.

Different grid sizes

We can also explore changing the number of bins in the grid to see the impact on the analysis. With this we can focus on shorter- or longer-ranges of interactions.

[31]:
def run_pipeline_per_bins(adata, lr_pairs, cell_group, bin_list, output_folder=None):
    results = {}
    for num_bins in bin_list:
        if output_folder is not None:
            output_folder = output_folder + '/NumBins-{}/'.format(num_bins)
            c2c.io.directories.create_directory(output_folder)

        tmp_result = {}
        adata_ = adata.copy()
        c2c.spatial.create_spatial_grid(adata_, num_bins=num_bins)

        li.mt.cellphonedb.by_sample(adata_, resource=lr_pairs, groupby=cell_group, sample_key='grid_cell',
                                    expr_prop=0.1, use_raw=False, verbose=True)

        spatial_liana = adata_.uns['liana_res']
        tensor = li.multi.to_tensor_c2c(liana_res=spatial_liana, # LIANA's dataframe containing results
                                        sample_key='grid_cell', # Column name of the samples
                                        source_key='source', # Column name of the sender cells
                                        target_key='target', # Column name of the receiver cells
                                        ligand_key='ligand_complex', # Column name of the ligands
                                        receptor_key='receptor_complex', # Column name of the receptors
                                        score_key='lr_means', # Column name of the communication scores to use
                                        inverse_fun=None, # Transformation function
                                        how='outer', # What to include across all samples
                                        outer_fraction=1/4., # Fraction of samples as threshold to include cells and LR pairs.
                                       )

        dimensions_dict = [None, None, None, None]
        meta_tensor = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                          metadata_dicts=dimensions_dict,
                                                          fill_with_order_elements=True
                                                         )

        c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                   meta_tensor,
                                                   rank=8, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
                                                   tf_optimization='regular', # To define how robust we want the analysis to be.
                                                   random_state=0, # Random seed for reproducibility
                                                   device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                   cmaps=['plasma', 'Dark2_r', 'Set1', 'Set1'],
                                                   output_folder=output_folder, # Whether to save the figures in files. If so, a folder pathname must be passed
                                                  )

        # Generate columns in the adata_.obs dataframe with the loading values of each factor
        factor_names = list(tensor.factors['Contexts'].columns)
        for f in factor_names:
            adata_.obs[f] = adata_.obs['grid_cell'].apply(lambda x: float(tensor.factors['Contexts'][f].to_dict()[x]))
            adata_.obs[f] = pd.to_numeric(adata_.obs[f])

        tmp_result['tensor'] = tensor
        tmp_result['tensor_meta'] = meta_tensor
        tmp_result['adata'] = adata_
        results[num_bins] = tmp_result
    return results
[32]:
results = run_pipeline_per_bins(adata_og, lr_pairs, cell_group, bin_list=[2,3], output_folder=output_folder)
../../data/spatial//NumBins-2/ was created successfully.
Converting `grid_cell` to categorical!
Now running: 1_1: 100%|███████████████████████████| 4/4 [00:28<00:00,  7.17s/it]
100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.66s/it]
Running Tensor Factorization
Generating Outputs
Loadings of the tensor factorization were successfully saved into ../../data/spatial//NumBins-2//Loadings.xlsx
Converting `grid_cell` to categorical!
../../data/spatial//NumBins-2//NumBins-3/ was created successfully.
Now running: 2_2: 100%|███████████████████████████| 9/9 [00:45<00:00,  5.02s/it]
100%|█████████████████████████████████████████████| 9/9 [00:12<00:00,  1.40s/it]
Running Tensor Factorization
Generating Outputs
Loadings of the tensor factorization were successfully saved into ../../data/spatial//NumBins-2//NumBins-3//Loadings.xlsx
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_62_7.png
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_62_8.png

Let’s visualize the imprtant bins per factor for the different grids

[ ]:
for k, v in results.items():
    factor_names = list(v['tensor'].factors['Contexts'].columns)
    sq.pl.spatial_scatter(v['adata'], color=[None, 'celltype_niche']+factor_names, size=1.3, cmap='Blues', vmin=0.)
    plt.suptitle(f'{k}x{k} Spatial Grid', fontsize=32, ha='left')
../../_images/notebooks_ccc_python_S4_Spatial-Decomposition_64_0.png

As the size of the grid window increases, it becomes more challenging to discern regional differences within the tissue due to the coarser resolution provided by these larger windows. In this case, more cell-cell interactions are capture per window, making difficult to identify patterns associated with more specific spots. Conversely, smaller grid windows offer a finer resolution, enabling a clearer distinction and more detailed understanding of the variations between different tissue regions.

[ ]: