{ "cells": [ { "cell_type": "markdown", "id": "dadb14c6", "metadata": {}, "source": [ "# 1. Preprocessing expression data" ] }, { "cell_type": "markdown", "id": "ad84acda", "metadata": {}, "source": [ "This tutorial demonstrate how to pre-process single-cell raw UMI counts to generate expression matrices that can be used as input to cell-cell communication tools. We recommend the [quality control](https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html) chapter in the Single-cell Best Practices book as a starting point for a detailed overview of QC and single-cell RNAseq analysis pipelines in general. \n", "\n", "We demonstrate a typical workflow using the popular single-cell analysis software [scanpy](https://scanpy.readthedocs.io/en/stable/) to generate an AnnData object which can be used downstream. For these tutorials, we will use a lung [dataset](https://doi.org/10.1038/s41591-020-0901-9) of 63k immune and epithelial cells across three control, three moderate, and six severe COVID-19 patients.\n", "\n", "Details and caveats regarding [batch correction](https://www.nature.com/articles/s41592-018-0254-1), which removes technical variation while preserving biological variation between samples, can be viewed in the additional examples tutorial entitled \"S1_Batch_Correction\"." ] }, { "cell_type": "markdown", "id": "6ada2ff1", "metadata": {}, "source": [ "## Preparare the object for cell-cell communication analysis" ] }, { "cell_type": "markdown", "id": "d8c2848d", "metadata": {}, "source": [ "Import the required packages" ] }, { "cell_type": "code", "execution_count": 1, "id": "278dc12a", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "import os\n", "\n", "import scanpy as sc\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import cell2cell as c2c" ] }, { "cell_type": "code", "execution_count": 2, "id": "8a517526", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 3, "id": "45e05da1", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "../../data/ already exists.\n" ] } ], "source": [ "data_path = '../../data/'\n", "c2c.io.directories.create_directory(data_path)" ] }, { "cell_type": "markdown", "id": "488c71b7", "metadata": {}, "source": [ "#### Loading" ] }, { "cell_type": "markdown", "id": "b939740f", "metadata": {}, "source": [ "The 12 samples can be downloaded as .h5 files from [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145926). You can also download the cell metadata from [here](https://raw.githubusercontent.com/zhangzlab/covid_balf/master/all.cell.annotation.meta.txt)\n", "\n", "Alternatively, cell2cell has a helper function to load the data as an AnnData object:" ] }, { "cell_type": "code", "execution_count": 4, "id": "c3ac1116", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | sample | \n", "sample_new | \n", "group | \n", "disease | \n", "hasnCoV | \n", "cluster | \n", "celltype | \n", "condition | \n", "
---|---|---|---|---|---|---|---|---|
AAACCCACAGCTACAT_3 | \n", "C100 | \n", "HC3 | \n", "HC | \n", "N | \n", "N | \n", "27.0 | \n", "B | \n", "Control | \n", "
AAACCCATCCACGGGT_3 | \n", "C100 | \n", "HC3 | \n", "HC | \n", "N | \n", "N | \n", "23.0 | \n", "Macrophages | \n", "Control | \n", "
AAACCCATCCCATTCG_3 | \n", "C100 | \n", "HC3 | \n", "HC | \n", "N | \n", "N | \n", "6.0 | \n", "T | \n", "Control | \n", "
AAACGAACAAACAGGC_3 | \n", "C100 | \n", "HC3 | \n", "HC | \n", "N | \n", "N | \n", "10.0 | \n", "Macrophages | \n", "Control | \n", "
AAACGAAGTCGCACAC_3 | \n", "C100 | \n", "HC3 | \n", "HC | \n", "N | \n", "N | \n", "10.0 | \n", "Macrophages | \n", "Control | \n", "