Single-cell and spatially resolved analysis uncovers cell heterogeneity of breast cancer

The heterogeneity and the complex cellular architecture have a crucial effect on breast cancer progression and response to treatment. However, deciphering the neoplastic subtypes and their spatial organization is still challenging. Here, we combine single-nucleus RNA sequencing (snRNA-seq) with a microarray-based spatial transcriptomics (ST) to identify cell populations and their spatial distribution in breast cancer tissues. Malignant cells are clustered into distinct subpopulations. These cell clusters not only have diverse features, origins and functions, but also emerge to the crosstalk within subtypes. Furthermore, we find that these subclusters are mapped in distinct tissue regions, where discrepant enrichment of stromal cell types are observed. We also inferred the abundance of these tumorous subpopulations by deconvolution of large breast cancer RNA-seq cohorts, revealing differential association with patient survival and therapeutic response. Our study provides a novel insight for the cellular architecture of breast cancer and potential therapeutic strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s13045-022-01236-0.

phosphate-buffered saline (PBS) with DAPI. Nuclei suspensions were loaded to a MoFlo Astrios EQ Cell Sorter and sorted into a 1.5-ml tube.

Single-nucleus RNA-sequencing
10X-based libraries were acquired with the Chromium Single Cell V3.0 reagent kit following the manufacturer's protocol (10X Genomics). Nuclei suspensions containing around 500 nuclei per μl were loaded into nine independent lanes. Libraries were sequenced on a NovaSeq 6000 (Illumina) [2].

Single-nucleus RNA-sequencing analysis
The snRNA-seq data from 10X Genomics were aligned and quantified using the Cell Ranger Single-Cell Software. The preliminary filtered data generated from Cell Ranger was used for downstream analysis. Each 10x library was individually quality checked, and cells were filtered to ensure good gene coverage, a consistent range of read counts and low numbers of mitochondrial reads. At least 200 and no more than 5000 detected gene were required for each cell. No more than 15% mitochondrial reads were allowed per cell. Downstream statistical analyses were conducted using the Seurat (V4.0.4) software packages for R.
We processed approximately 8000-9000 cells in each sample. After quality control and filtering, 4,093 single nuclei in the BC-A sample and 5,917 single nuclei in the BC-B sample were retained. UMAP visualization showed major cell types containing fibroblasts, dendritic cells (DCs), endothelial cells, pericytes, macrophages, mast cells, T cells, luminal, and basal tumor cells by using canonical lineage markers.
To distinguish neoplastic cells from normal epithelial cells, single-cell copy number variation (CNV) profiles were estimated by using inferCNV package of R. Fibroblasts and endothelial cells were regarded as the negative control. FindMarkers function of Seurat was used to find differentially expression genes. For calling molecular subtypes using the PAM50 method, we processed 'pseudobulk' expression profiles for each tumor, in a similar manner to any bulk RNA-seq sample. KEGG enrichment and GSVA enrichment were performed using the MSigDB HALLMARK gene sets and the PROGENy gene sets. SCENIC analysis [3] was performed to measure the difference between cell clusters based on transcription factors or their target genes by using SCENIC package of R. Pseudotime analysis was performed with Monocle2 [4] to determine the dramatic translational relationships among cell types and clusters. Cellcell communication analysis was performed by using CellPhoneDB [5] and iTALK package of R.

Histology and immunohistochemical staining
Tumor tissue was fixed in 10% neutral buffered formalin for 24 h and then processed for paraffin embedding. Diagnostic tumor blocks were accessed for samples that did

Spatial transcriptomics
Tissue samples were embedded in optimal cutting temperature compound and stored at −80 °C. Tissue blocks were cut into 10-μm sections and processed using the Visium Spatial Gene Expression Kit (10x Genomics) according to the manufacturer's instructions. First, breast tissue permeabilization condition was optimized using the Visium Spatial Tissue Optimization Kit, which was found to be ideal at 12 min.
Sections were stained with H&E and imaged using a Leica DM6000 microscope under a 20× lens magnification, then processed for spatial transcriptomics. The resulting complementary DNA library was checked for quality control, then sequenced using an Illumina NovaSeq 6000 system. Cycling conditions were set for 28, 98 and 8 for Read 1, Read 2 and Read 3 (i7 index), respectively. Spots were annotated by a specialist breast pathologist using the Loupe v.4.0.0 software (10x Genomics).

Spatial transcriptomics analysis
Reads were demultiplexed and mapped to the reference genome GRCh38 using the

Deconvolution of bulk RNA-seq transcriptomics and spatial transcriptomics
We performed an integrative gene signature consisting of the top 20 differentially expressed genes in each cluster. ssGSEA was used to deconvolute predicted cell fractions from bulk transcript profiling datasets. Normalized METABRIC expression matrices, clinical information and PAM50 subtype classifications were obtained from METABRIC [6]. Tumor subgroups in the METABRIC cohort were identified using SKmeans-based consensus clustering of the predicted cell fraction from ssGSEA in each bulk METABRIC patient tumor.
We performed deconvolution of spatial tissue locations and the scaled deconvolution values for six epithelial subclusters were overlaid onto tissue spots. The ST dataset was classified into distinct regions based on SKmeans-based consensus clustering of the predicted cell fraction from MCPcounter [7]. We predicted the estimated scores of immune cells in each group by using MCPcounter.

Survival analysis of snRNA-seq signatures
To assess the impact of cell types described by snRNA-seq on clinical outcome, we assessed the association between gene signatures with patient overall survival in the METABRIC cohort. Based on the defined gene signatures, all the cases were divided into three discriminated subgroups (DSGs). Survival curves of the three groups were generated using the Kaplan-Meier method with the 'survival' package. We assessed the significance between each two groups using the log-rank test statistics. R package 'survminer' was used for visualization.

Chemotherapy resistance analysis
We examined public datasets of patients treated with chemotherapy. Normalized expression matrices and clinical information were obtained from GEO under accession number GSE123845 and GSE163882. CIBERSORTx [8] was used to predict the estimated scores of each gene signatures in two NAC cohorts. Statistical significance was determined using a two-sided t-test in a pairwise comparison of means between groups, with P values adjusted using the Benjamini-Hochberg procedure.