{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b141efa8",
   "metadata": {},
   "source": [
    "# 3. Generating a 4D-Communication Tensor from computed communication scores\n",
    "\n",
    "After inferring communication scores for combinations of ligand-receptor and sender-receiver cell pairs, we can use that information to identify context-dependent CCC patterns across multiple samples simultaneously by generating a 4D-Communication Tensor. LIANA handily outputs these score as a dataframe that is easy to use for building our tensor.\n",
    "\n",
    "In this tutorial we will show you how to use the dataframe saved from LIANA to generate a 4D-Communication Tensor that could be later used with Tensor-cell2cell."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "739c7ec9",
   "metadata": {},
   "source": [
    "## Initial Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "326b8af3",
   "metadata": {},
   "source": [
    "**Import the necessary packages**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "465067f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cell2cell as c2c\n",
    "import liana as li\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcd8c7e6",
   "metadata": {},
   "source": [
    "## Directories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ce6752aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_folder = '../../data/liana-outputs/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "049a39ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../../data/tc2c-outputs/ already exists.\n"
     ]
    }
   ],
   "source": [
    "output_folder = '../../data/tc2c-outputs/'\n",
    "c2c.io.directories.create_directory(output_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4a65ac6",
   "metadata": {},
   "source": [
    "## Load Data\n",
    "\n",
    "Open the dataframe containing LIANA results for every sample/context (this can be also found in `adata.uns['liana_res']`. These results contain the communication scores of the combinations of ligand-receptor pairs and sender-receiver pairs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7927cffb",
   "metadata": {},
   "outputs": [],
   "source": [
    "liana_res = pd.read_csv(data_folder + 'LIANA_by_sample.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "844bd54a",
   "metadata": {},
   "source": [
    "## Create 4D-Communication Tensor"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8756f614",
   "metadata": {},
   "source": [
    "### Specify the order of the samples/contexts\n",
    "\n",
    "Here, we will specify an order of the samples/contexts given the condition they belong to (HC or *Control*, M or *Moderate COVID-19*, S or *Severe COVID-19*)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "50ace1d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "sorted_samples = sorted(liana_res['sample_new'].unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3b0d6e91",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['HC1', 'HC2', 'HC3', 'M1', 'M2', 'M3', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted_samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f5753a64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sample_new</th>\n",
       "      <th>source</th>\n",
       "      <th>target</th>\n",
       "      <th>ligand_complex</th>\n",
       "      <th>receptor_complex</th>\n",
       "      <th>lr_means</th>\n",
       "      <th>cellphone_pvals</th>\n",
       "      <th>expr_prod</th>\n",
       "      <th>scaled_weight</th>\n",
       "      <th>lr_logfc</th>\n",
       "      <th>spec_weight</th>\n",
       "      <th>lrscore</th>\n",
       "      <th>lr_probs</th>\n",
       "      <th>cellchat_pvals</th>\n",
       "      <th>specificity_rank</th>\n",
       "      <th>magnitude_rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>HC1</td>\n",
       "      <td>Macrophages</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.410504</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.059611</td>\n",
       "      <td>1.300556</td>\n",
       "      <td>1.397895</td>\n",
       "      <td>0.083273</td>\n",
       "      <td>0.961040</td>\n",
       "      <td>0.221495</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.003713</td>\n",
       "      <td>1.698996e-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.410586</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.059861</td>\n",
       "      <td>1.300856</td>\n",
       "      <td>1.272266</td>\n",
       "      <td>0.083276</td>\n",
       "      <td>0.961041</td>\n",
       "      <td>0.221213</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.003713</td>\n",
       "      <td>6.256593e-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>HC1</td>\n",
       "      <td>NK</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.264099</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7.614378</td>\n",
       "      <td>0.790913</td>\n",
       "      <td>1.113901</td>\n",
       "      <td>0.078673</td>\n",
       "      <td>0.959963</td>\n",
       "      <td>0.216816</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.006245</td>\n",
       "      <td>2.653267e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>KLRD1</td>\n",
       "      <td>3.297900</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.865250</td>\n",
       "      <td>6.960920</td>\n",
       "      <td>1.244892</td>\n",
       "      <td>0.171293</td>\n",
       "      <td>0.957924</td>\n",
       "      <td>0.214586</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000092</td>\n",
       "      <td>9.767878e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>HC1</td>\n",
       "      <td>Macrophages</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>KLRD1</td>\n",
       "      <td>3.297818</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.865037</td>\n",
       "      <td>6.960620</td>\n",
       "      <td>1.370520</td>\n",
       "      <td>0.171288</td>\n",
       "      <td>0.957924</td>\n",
       "      <td>0.214861</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000092</td>\n",
       "      <td>1.086199e-07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  sample_new       source target ligand_complex receptor_complex  lr_means  \\\n",
       "0        HC1  Macrophages     NK            B2M             CD3D  3.410504   \n",
       "1        HC1            T     NK            B2M             CD3D  3.410586   \n",
       "2        HC1           NK     NK            B2M             CD3D  3.264099   \n",
       "3        HC1            T     NK            B2M            KLRD1  3.297900   \n",
       "4        HC1  Macrophages     NK            B2M            KLRD1  3.297818   \n",
       "\n",
       "   cellphone_pvals  expr_prod  scaled_weight  lr_logfc  spec_weight   lrscore  \\\n",
       "0              0.0   8.059611       1.300556  1.397895     0.083273  0.961040   \n",
       "1              0.0   8.059861       1.300856  1.272266     0.083276  0.961041   \n",
       "2              0.0   7.614378       0.790913  1.113901     0.078673  0.959963   \n",
       "3              0.0   6.865250       6.960920  1.244892     0.171293  0.957924   \n",
       "4              0.0   6.865037       6.960620  1.370520     0.171288  0.957924   \n",
       "\n",
       "   lr_probs  cellchat_pvals  specificity_rank  magnitude_rank  \n",
       "0  0.221495             0.0          0.003713    1.698996e-09  \n",
       "1  0.221213             0.0          0.003713    6.256593e-09  \n",
       "2  0.216816             0.0          0.006245    2.653267e-08  \n",
       "3  0.214586             0.0          0.000092    9.767878e-08  \n",
       "4  0.214861             0.0          0.000092    1.086199e-07  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "liana_res.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce8ec903",
   "metadata": {},
   "source": [
    "## Generate tensor\n",
    "\n",
    "To generate the 4D-communication tensor, we will to create matrices with the communication scores for each of the ligand-receptor pairs within the same sample, then generate a 3D tensor for each sample, and finally concatenate them to form the 4D tensor.\n",
    "\n",
    "Briefly, we use the LIANA dataframe and communication scores to organize them as follows:\n",
    "\n",
    "![ccc-scores](https://github.com/earmingol/cell2cell/blob/master/docs/tutorials/ASD/figures/4d-tensor.png?raw=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b06f7eb",
   "metadata": {},
   "source": [
    "LIANA includes a function that does all these steps at once."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44395144",
   "metadata": {},
   "source": [
    "We will transform the structure of the communication scores from a set of 2D-matrices for each sample into a 3D Tensor where the third dimension is sample/context.\n",
    "\n",
    "**The key parameters when building a tensor are:**\n",
    "\n",
    "- `non_negative` and `non_negative_fill` as Tensor-cell2cell by default expects non-negative values, any values below 0 will be set to 0 (e.g. this is relevant if one wants to use e.g. the `LRlog2FC` score function). If you used a pipeline that generated negative scores, we suggest replacing these with 0. Otherwise, by default, Tensor-cell2cell will treat these as NaN. Since we used the magnitude rank score, which is non-negative, these parameters won't affect our results. \n",
    "\n",
    "- `inverse_fun` is the function we use to convert the communication score before using it to build the tensor. In this case, the 'magnitude_rank' score generated by LIANA considers low values as the most important ones, ranging from 0 to 1. In contrast, Tensor-cell2cell requires higher values to be the most important scores, so here we pass a function (`lambda x: 1 - x`) to adapt LIANA's magnitude-rank scores (subtracts the LIANA's score from 1). If `None` is passed instead, no transformation will be performed on the communication score. If using other scores coming from one of the methods implemented in LIANA, a similar transformation can be done depending on the parameters and assumptions of the scoring method.\n",
    "\n",
    "- `non_expressed_fill` indicates what value to assign to missing scores when liana was run with `return_all_lrs` is set to `True` (i.e., those that did not passed LIANA's filters and were not inferred because ligands and/or receptors were not expressed; see parameter `expr_prop` in the [Notebook for Inferring the Communication Scores](./02-Infer-Communication-Scores.ipynb)). If `None`is passed, missing values will be treated as `numpy.nan` values.  In this example, this is not used as we use the `outer_fraction` parameter from tensor (see below) to address this.\n",
    "- `how` controls which ligand-receptor pairs and cell types to include when building the tensor. This decision depends on whether the missing values across a number of samples for both ligand-receptor interactions and sender-receiver cell pairs are considered to be biologically-relevant. Options are:\n",
    "    - `'inner'` is the more strict option since it only considers only cell types and LR pairs that are present in all contexts (intersection).\n",
    "    - `'outer'` considers all cell types and LR pairs that are present across contexts (union).\n",
    "    - `'outer_lrs'` considers only cell types that are present in all contexts (intersection), while all LR pairs that are present across contexts (union).\n",
    "    - `'outer_cells'` considers only LR pairs that are present in all contexts (intersection), while all cell types that are present across contexts (union).\n",
    "\n",
    "\n",
    "\n",
    "**The following two parameters (`lr_fill` and `cell_fill`) indicate what value to assign missing scores when `how` is not set to `'inner'`**, i.e., there are cells or LR pairs that are not present in all contexts. During tensor component analysis, NaN values are masked such that they are not considered by the decomposition objective. This results in behavior of NaNs being imputed as missing values that are potentially communicating, whereas if missing LRs are filled with a value such as 0, they are treated as biological zeroes (i.e., not communicating). For additional details and discussion regarding this parameter, please see the [missing indices benchmarking](../../tc2c_benchmark/scripts/missing_indices_consistency.ipynb).\n",
    "\n",
    "- `lr_fill` is the value to fill communication scores when a ligand-receptor pair is not use by any cell type within a sample. Here we treat these cases as missing values by passing a `numpy.nan` value.\n",
    "\n",
    "\n",
    "- `cell_fill` is the value to fill communication scores when a cell type is not using a given ligand-receptor pair within a sample. This value has priority over `lr_fill` if that ligand-receptor pair is used at least by one pair of the sender-receiver cell pairs within the sample. Here we treat these cases as missing values by passing a `numpy.nan` value.\n",
    "\n",
    "\n",
    "- `outer_fraction` controls the elements to include in the union scenario of the `how` options. Only elements that are present at least in this fraction of samples/contexts will be included. When this value is 0, the tensor includes all elements across the samples. When this value is 1, it acts as using `how='inner'`.\n",
    "    \n",
    "    \n",
    "**In this case we will consider cell types and LR pairs that are in the LIANA results at least in 1/3 of the samples**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "fd101435",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████| 12/12 [00:18<00:00,  1.53s/it]\n"
     ]
    }
   ],
   "source": [
    "tensor = li.multi.to_tensor_c2c(liana_res=liana_res, # LIANA's dataframe containing results\n",
    "                                sample_key='sample_new', # Column name of the samples\n",
    "                                source_key='source', # Column name of the sender cells\n",
    "                                target_key='target', # Column name of the receiver cells\n",
    "                                ligand_key='ligand_complex', # Column name of the ligands\n",
    "                                receptor_key='receptor_complex', # Column name of the receptors\n",
    "                                score_key='magnitude_rank', # Column name of the communication scores to use\n",
    "                                non_negative = True, # set negative values to 0\n",
    "                                inverse_fun=lambda x: 1 - x, # Transformation function\n",
    "                                non_expressed_fill=None, # Value to replace missing values with \n",
    "                                how='outer', # What to include across all samples\n",
    "                                lr_fill=np.nan, # What to fill missing LRs with \n",
    "                                cell_fill = np.nan, # What to fill missing cell types with \n",
    "                                outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs.\n",
    "                                lr_sep='^', # How to separate ligand and receptor names to name LR pair\n",
    "                                context_order=sorted_samples, # Order to store the contexts in the tensor\n",
    "                                sort_elements=True # Whether sorting alphabetically element names of each tensor dim. Does not apply for context order if context_order is passed.\n",
    "                               )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39578913",
   "metadata": {},
   "source": [
    "## Evaluate some tensor properties"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0e7b706",
   "metadata": {},
   "source": [
    "### Tensor shape\n",
    "This indicates the number of elements in each tensor dimension: (Contexts, LR pairs, Sender cells, Receiver cells)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "57b1a6eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(12, 1054, 10, 10)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1e7b87a",
   "metadata": {},
   "source": [
    "### Missing values\n",
    "This represents the fraction of values that are missing. In this case, missing values are combinations of contexts x LR pairs x Sender cells x Receiver cells that did not have a communication score or were missing in the dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "23f025e8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.906289531941809"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor.missing_fraction()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c87b373",
   "metadata": {},
   "source": [
    "### Sparsity\n",
    "This represents the fraction of values that are a real zero (excluding the missing values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "a4d357f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.04997707147375079"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor.sparsity_fraction()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e2e04e2",
   "metadata": {},
   "source": [
    "### Fraction of excluded elements\n",
    "This represents the fraction of values that are ignored (masked) in the analysis. In this case it coincides with the missing values because we did not generate a new `tensor.mask` to manually ignore specific values. Instead, it automatically excluded the missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8781aaaa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.906289531941809"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor.excluded_value_fraction() # Percentage of values in the tensor that are masked/missing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f828991",
   "metadata": {},
   "source": [
    "## Prepare Tensor Metadata\n",
    "\n",
    "To interpret analysis on the tensor, we can assign groups to each sample/context, and to every elements in the other dimensions (LR pairs and cells).\n",
    "\n",
    "We can generate respective dictionaries manually or automatically from DBs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66983c9f",
   "metadata": {},
   "source": [
    "**Default dict to return Unknown if major groups are not present for a given element**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5fd47a86",
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "element_dict = defaultdict(lambda: 'Unknown')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "228006e2",
   "metadata": {},
   "source": [
    "**Major groups of the samples/contexts**\n",
    "\n",
    "Please note that this `context_dict` could be directly generated from the `adata` object in the [Notebook for Inferring the Communication Scores](./02-Infer-Communication-Scores.ipynb) by using the command:\n",
    "\n",
    "```context_dict = adata.obs.set_index('sample_new')['condition'].sort_values().to_dict()```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c2f83dde",
   "metadata": {},
   "outputs": [],
   "source": [
    "context_dict = element_dict.copy()\n",
    "\n",
    "context_dict.update({'HC1' : 'Control',\n",
    "                     'HC2' : 'Control',\n",
    "                     'HC3' : 'Control',\n",
    "                     'M1' : 'Moderate COVID-19',\n",
    "                     'M2' : 'Moderate COVID-19',\n",
    "                     'M3' : 'Moderate COVID-19',\n",
    "                     'S1' : 'Severe COVID-19',\n",
    "                     'S2' : 'Severe COVID-19',\n",
    "                     'S3' : 'Severe COVID-19',\n",
    "                     'S4' : 'Severe COVID-19',\n",
    "                     'S5' : 'Severe COVID-19',\n",
    "                     'S6' : 'Severe COVID-19',\n",
    "                    })\n",
    "dimensions_dict = [context_dict, None, None, None]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bb84983",
   "metadata": {},
   "source": [
    "**Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "7c6dbf77",
   "metadata": {},
   "outputs": [],
   "source": [
    "meta_tensor = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,\n",
    "                                              metadata_dicts=[context_dict, None, None, None],\n",
    "                                              fill_with_order_elements=True\n",
    "                                             )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba86c361",
   "metadata": {},
   "source": [
    "If you want to color the elements of another dimension by major groups, just replace the corresponding `None` in `metadata_dicts=[context_dict, None, None, None]` by a dictionary whose keys are the element names of the dimension  and the values are the major groups.  For example, if you want to color LR pairs, you should create a dictionary whose keys are the names from `tensor.order_names[1]`, and put that new dictionary (e.g. `lr_dict`) in `metadata_dicts=[context_dict, lr_dict, None, None]`. For sender and receiver cells, the same could be done."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b7a0659",
   "metadata": {},
   "source": [
    "## Export Tensor\n",
    "\n",
    "Here we will save the `tensor` as a pickle object with `cell2cell`, so we can use it later with other analyses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d9d1701f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../../data/tc2c-outputs//BALF-Tensor.pkl  was correctly saved.\n"
     ]
    }
   ],
   "source": [
    "c2c.io.export_variable_with_pickle(tensor, output_folder + '/BALF-Tensor.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef47baac",
   "metadata": {},
   "source": [
    "## Export Tensor Metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9d635441",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../../data/tc2c-outputs//BALF-Tensor-Metadata.pkl  was correctly saved.\n"
     ]
    }
   ],
   "source": [
    "c2c.io.export_variable_with_pickle(meta_tensor, output_folder + '/BALF-Tensor-Metadata.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "898ff029",
   "metadata": {},
   "source": [
    "**Make sure to use this pandas version to load the metadata in the future to avoid errors**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "48677744",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.1.4'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "940b9bf7",
   "metadata": {},
   "source": [
    "## Supplementary Information about Tensor-cell2cell\n",
    "\n",
    "The function `li.multi.to_tensor_c2c()` from LIANA that we used to build the tensor relies on the function `c2c.tensor.dataframes_to_tensor()` from cell2cell. We can use the cell2cell's function instead for more fine parameter tuning, as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc446b0d",
   "metadata": {},
   "source": [
    "First, we need to create a dictionary with sample names as keys and dataframes containing the communication scores within each sample. Here we split the LIANA output to recreate that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "f5361583",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = dict(list(liana_res.groupby('sample_new')))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "892ef75f",
   "metadata": {},
   "source": [
    "This is, for example, the dataframe for the sample HC1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "834efe57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sample_new</th>\n",
       "      <th>source</th>\n",
       "      <th>target</th>\n",
       "      <th>ligand_complex</th>\n",
       "      <th>receptor_complex</th>\n",
       "      <th>lr_means</th>\n",
       "      <th>cellphone_pvals</th>\n",
       "      <th>expr_prod</th>\n",
       "      <th>scaled_weight</th>\n",
       "      <th>lr_logfc</th>\n",
       "      <th>spec_weight</th>\n",
       "      <th>lrscore</th>\n",
       "      <th>lr_probs</th>\n",
       "      <th>cellchat_pvals</th>\n",
       "      <th>specificity_rank</th>\n",
       "      <th>magnitude_rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>HC1</td>\n",
       "      <td>Macrophages</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.410504</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.059611</td>\n",
       "      <td>1.300556</td>\n",
       "      <td>1.397895</td>\n",
       "      <td>0.083273</td>\n",
       "      <td>0.961040</td>\n",
       "      <td>0.221495</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.003713</td>\n",
       "      <td>1.698996e-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.410586</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.059861</td>\n",
       "      <td>1.300856</td>\n",
       "      <td>1.272266</td>\n",
       "      <td>0.083276</td>\n",
       "      <td>0.961041</td>\n",
       "      <td>0.221213</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.003713</td>\n",
       "      <td>6.256593e-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>HC1</td>\n",
       "      <td>NK</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>CD3D</td>\n",
       "      <td>3.264099</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7.614378</td>\n",
       "      <td>0.790913</td>\n",
       "      <td>1.113901</td>\n",
       "      <td>0.078673</td>\n",
       "      <td>0.959963</td>\n",
       "      <td>0.216816</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.006245</td>\n",
       "      <td>2.653267e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>KLRD1</td>\n",
       "      <td>3.297900</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.865250</td>\n",
       "      <td>6.960920</td>\n",
       "      <td>1.244892</td>\n",
       "      <td>0.171293</td>\n",
       "      <td>0.957924</td>\n",
       "      <td>0.214586</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000092</td>\n",
       "      <td>9.767878e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>HC1</td>\n",
       "      <td>Macrophages</td>\n",
       "      <td>NK</td>\n",
       "      <td>B2M</td>\n",
       "      <td>KLRD1</td>\n",
       "      <td>3.297818</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.865037</td>\n",
       "      <td>6.960620</td>\n",
       "      <td>1.370520</td>\n",
       "      <td>0.171288</td>\n",
       "      <td>0.957924</td>\n",
       "      <td>0.214861</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000092</td>\n",
       "      <td>1.086199e-07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4218</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>LGALS3BP</td>\n",
       "      <td>CD33</td>\n",
       "      <td>0.546831</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.162326</td>\n",
       "      <td>-0.428778</td>\n",
       "      <td>-0.330954</td>\n",
       "      <td>0.065941</td>\n",
       "      <td>0.777816</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4219</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>C1QB</td>\n",
       "      <td>CD33</td>\n",
       "      <td>1.500021</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.499953</td>\n",
       "      <td>-0.674297</td>\n",
       "      <td>-0.418023</td>\n",
       "      <td>0.050734</td>\n",
       "      <td>0.860018</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4220</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>C1QA</td>\n",
       "      <td>CD33</td>\n",
       "      <td>1.491162</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.496815</td>\n",
       "      <td>-0.682417</td>\n",
       "      <td>-0.439526</td>\n",
       "      <td>0.058043</td>\n",
       "      <td>0.859639</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4221</th>\n",
       "      <td>HC1</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>LGALS1</td>\n",
       "      <td>CD69</td>\n",
       "      <td>1.350314</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.522723</td>\n",
       "      <td>-0.108787</td>\n",
       "      <td>0.406618</td>\n",
       "      <td>0.039087</td>\n",
       "      <td>0.862677</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4222</th>\n",
       "      <td>HC1</td>\n",
       "      <td>mDC</td>\n",
       "      <td>mDC</td>\n",
       "      <td>CIRBP</td>\n",
       "      <td>TREM1</td>\n",
       "      <td>0.634905</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.233897</td>\n",
       "      <td>-0.544721</td>\n",
       "      <td>-0.515300</td>\n",
       "      <td>0.013720</td>\n",
       "      <td>0.807776</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000e+00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4223 rows × 16 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     sample_new       source target ligand_complex receptor_complex  lr_means  \\\n",
       "0           HC1  Macrophages     NK            B2M             CD3D  3.410504   \n",
       "1           HC1            T     NK            B2M             CD3D  3.410586   \n",
       "2           HC1           NK     NK            B2M             CD3D  3.264099   \n",
       "3           HC1            T     NK            B2M            KLRD1  3.297900   \n",
       "4           HC1  Macrophages     NK            B2M            KLRD1  3.297818   \n",
       "...         ...          ...    ...            ...              ...       ...   \n",
       "4218        HC1            T      T       LGALS3BP             CD33  0.546831   \n",
       "4219        HC1            T      T           C1QB             CD33  1.500021   \n",
       "4220        HC1            T      T           C1QA             CD33  1.491162   \n",
       "4221        HC1            T      T         LGALS1             CD69  1.350314   \n",
       "4222        HC1          mDC    mDC          CIRBP            TREM1  0.634905   \n",
       "\n",
       "      cellphone_pvals  expr_prod  scaled_weight  lr_logfc  spec_weight  \\\n",
       "0                 0.0   8.059611       1.300556  1.397895     0.083273   \n",
       "1                 0.0   8.059861       1.300856  1.272266     0.083276   \n",
       "2                 0.0   7.614378       0.790913  1.113901     0.078673   \n",
       "3                 0.0   6.865250       6.960920  1.244892     0.171293   \n",
       "4                 0.0   6.865037       6.960620  1.370520     0.171288   \n",
       "...               ...        ...            ...       ...          ...   \n",
       "4218              1.0   0.162326      -0.428778 -0.330954     0.065941   \n",
       "4219              1.0   0.499953      -0.674297 -0.418023     0.050734   \n",
       "4220              1.0   0.496815      -0.682417 -0.439526     0.058043   \n",
       "4221              1.0   0.522723      -0.108787  0.406618     0.039087   \n",
       "4222              1.0   0.233897      -0.544721 -0.515300     0.013720   \n",
       "\n",
       "       lrscore  lr_probs  cellchat_pvals  specificity_rank  magnitude_rank  \n",
       "0     0.961040  0.221495             0.0          0.003713    1.698996e-09  \n",
       "1     0.961041  0.221213             0.0          0.003713    6.256593e-09  \n",
       "2     0.959963  0.216816             0.0          0.006245    2.653267e-08  \n",
       "3     0.957924  0.214586             0.0          0.000092    9.767878e-08  \n",
       "4     0.957924  0.214861             0.0          0.000092    1.086199e-07  \n",
       "...        ...       ...             ...               ...             ...  \n",
       "4218  0.777816  0.000000             1.0          1.000000    1.000000e+00  \n",
       "4219  0.860018  0.000000             1.0          1.000000    1.000000e+00  \n",
       "4220  0.859639  0.000000             1.0          1.000000    1.000000e+00  \n",
       "4221  0.862677  0.000000             1.0          1.000000    1.000000e+00  \n",
       "4222  0.807776  0.000000             1.0          1.000000    1.000000e+00  \n",
       "\n",
       "[4223 rows x 16 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['HC1']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b892352",
   "metadata": {},
   "source": [
    "We can check what samples are included"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "327b7f4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['HC1', 'HC2', 'HC3', 'M1', 'M2', 'M3', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6'])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a9b52fd",
   "metadata": {},
   "source": [
    "As explained before, the `magnitude_rank` score needs to be converted before using it with Tensor-cell2cell. Thus, we modify it here for each of the sample dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "70a7db25",
   "metadata": {},
   "outputs": [],
   "source": [
    "for sample, df in data.items():\n",
    "    df['magnitude_rank'] = df['magnitude_rank'].apply(lambda x: 1-x)\n",
    "    data[sample] = df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a2aa71d",
   "metadata": {},
   "source": [
    "**The key parameters when building a tensor with the cell2cell's function are:**\n",
    "\n",
    "- `how` controls what ligand-receptor pairs and cell types to include when building the tensor. This decision is dependent on the number of samples including scores for their combinations of ligand-receptor and sender-receiver cell pairs. Options are:\n",
    "    - `'inner'` is the more strict option since it only considers only cell types and LR pairs that are present in all contexts (intersection).\n",
    "    - `'outer'` considers all cell types and LR pairs that are present across contexts (union).\n",
    "    - `'outer_lrs'` considers only cell types that are present in all contexts (intersection), while all LR pairs that are present across contexts (union).\n",
    "    - `'outer_cells'` considers only LR pairs that are present in all contexts (intersection), while all cell types that are present across contexts (union).\n",
    "\n",
    "**The following two parameters (`lr_fill` and `cell_fill`) indicate what value to assign missing scores when `how` is not set to `'inner'`**, i.e., there are cells or LR pairs that are not present in all contexts. During tensor component analysis, NaN values are masked such that they are not considered by the decomposition objective. This results in behavior of NaNs being imputed as missing values that are potentially communicating, whereas if missing LRs are filled with a value such as 0, they are treated as biological zeroes (i.e., not communicating). For additional details and discussion regarding this parameter, please see the [missing indices benchmarking](../../tc2c_benchmark/scripts/missing_indices_consistency.ipynb).\n",
    "\n",
    "- `lr_fill` is the value to fill communication scores when a ligand-receptor pair is not use by any cell type within a sample. Here we treat these cases as missing values by passing a `numpy.nan` value.\n",
    "\n",
    "\n",
    "- `cell_fill` is the value to fill communication scores when a cell type is not using a given ligand-receptor pair within a sample. This value has priority over `lr_fill` if that ligand-receptor pair is used at least by one pair of the sender-receiver cell pairs within the sample. Here we treat these cases as missing values by passing a `numpy.nan` value.\n",
    "\n",
    "\n",
    "- `outer_fraction` controls the elements to include in the union scenario of the `how` options.\n",
    "    Only elements that are present at least in this fraction of samples/contexts will be included.\n",
    "    When this value is 0, considers all elements across the samples. When this value is 1, it acts as using `how='inner'`.\n",
    "    \n",
    "    \n",
    "**In this case we will consider cell types and LR pairs that are in the LIANA results at least in 1/3 of the samples**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "7afc0936",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████| 12/12 [00:18<00:00,  1.53s/it]\n"
     ]
    }
   ],
   "source": [
    "tensor = c2c.tensor.dataframes_to_tensor(context_df_dict=data,\n",
    "                                         sender_col='source', # Column name of the sender cells\n",
    "                                         receiver_col='target', # Column name of the receiver cells\n",
    "                                         ligand_col='ligand_complex', # Column name of the ligands\n",
    "                                         receptor_col='receptor_complex', # Column name of the receptors\n",
    "                                         score_col='magnitude_rank', # Column name of the communication scores\n",
    "                                         how='outer', # What to include across all samples\n",
    "                                         outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs.\n",
    "                                         lr_sep='^', # How to separate ligand and receptor names to name LR pair\n",
    "                                         context_order=sorted_samples, # Order to store the contexts in the tensor\n",
    "                                         sort_elements=True, # Whether sorting alphabetically element names of each tensor dim. Does not apply for context order if context_order is passed.\n",
    "                                         lr_fill=np.nan,\n",
    "                                         cell_fill=np.nan,\n",
    "                                        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e780acc9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(12, 1054, 10, 10)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor.shape"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:ccc_protocols]",
   "language": "python",
   "name": "conda-env-ccc_protocols-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  },
  "vscode": {
   "interpreter": {
    "hash": "a89d9df9e41c144bbb86b791904f32fb0efeb7b488a88d676a8bce57017c9696"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}