Clonal evolution in liver cancer at single-cell and single-variant resolution

Genetic heterogeneity of tumor is closely related to its clonal evolution, phenotypic diversity and treatment resistance, and such heterogeneity has only been characterized at single-cell sub-chromosomal scale in liver cancer. Here we reconstructed the single-variant resolution clonal evolution in human liver cancer based on single-cell mutational profiles. The results indicated that key genetic events occurred early during tumorigenesis, and an early metastasis followed by independent evolution was observed in primary liver tumor and intrahepatic metastatic portal vein tumor thrombus. By parallel single-cell RNA-Seq, the transcriptomic phenotype of HCC was found to be related with genetic heterogeneity. For the first time we reconstructed the single-cell and single-variant clonal evolution in human liver cancer, and dissection of both genetic and phenotypic heterogeneity will facilitate better understanding of their relationship.


Selection of mutations for single-cell target sequencing
Putative clonal and subclonal mutations were selected from WES-derived mutation list, and the target sites were further narrowed down by checking mutation prevalence in cBioPortal collected HCC samples [2] and Gene Set Enrichment Analysis [3]. A total of 57, 56, 57 and 64 mutation sites were selected for HCC1, HCC2, HCC9 and HCC8, respectively. HCC5 was not used as many mutations had low variant allele frequency (VAF) values, suggesting low tumor cell purity which was confirmed by scRNA-Seq. Primers were designed to amplify these sites from single-cell WGA product, and the amplification specificities of the primers were confirmed to ensure one PCR product for each primer pair.

Single-cell target sequencing
A total of 480 cells were used for single-cell target sequencing, with 96, 96, 96 and 192 single cells from HCC1, HCC2, HCC9 and HCC8, respectively. WGA product from each single cell was used as PCR template for 5 parallel PCR reactions with ~10 primer pairs in each reaction. PCR cycling program was adopted from User Guide for Access Array System (Fluidigm, 100-3770) for even amplification of multiple targets. mapped to GRCh37/hg19 with Tophat using default parameters, and numbers of the reads with reference or mutation site were counted.

Single-cell mutational status and clonal structure analysis
The following criteria were used to determine the mutational status of a given target site: if reads <3, it was defined as site with no coverage; if reads >=3, more than 1 read with mutation, and VAF >=0.1, then it was defined as mutated site; other conditions were defined as reference site. All target sites in single cells were further confirmed by manual inspection in Integrative Genomics Viewer [4]. We filtered genetic variations with poor coverage, and also removed those found in both We selected drop-out of normal allele for ADO assessment, which will generate VAF values approximating 1. After filtering sites with genuine homogenous mutations, except for HCC2 with rate of 56.9%, all samples showed good results with genetic variations having VAF values of 0.95~1 occupying 15.8%, 6.3%, 2.1% and 6.3% of total variations for HCC1, HCC9, HCC8-T and HCC8-PVTT.
The combination of mutations enabled inference of clones and clone-specific mutations in each patient, which could reconstruct their evolutionary relationship. For 4 HCC9, 17 cells were found to be mixture of more than one cell based on VAF analysis and removed before clonal analysis. Besides single-cell clonal analysis based on the mutational status, we also used nucleotide sequences at each site for evolutionary phylogenetic tree reconstruction. The combined sequences from all target sites in the single cells were aligned, and maximum parsimony tree was constructed using MEGA-X [5].
Single-cell libraries were pooled and sequenced on illumina HiSeq platform with 2 × 150 bp mode. The C1 mRNA Sequencing High Throughput Demultiplexer Script (Fluidigm) was used for de-multiplexing and generation of FASTQ files for each single cell. The sequencing reads were mapped to GRCh37/hg19 with Tophat using the default parameters, and the numbers of reads in each gene were counted. ERCC RNA Spike-in (Ambion, 4456740) was added as a technical control. SC3 was used for outlier identification and cell type identification [6]. Single-cell transcriptomes were filtered based on three parameters: total counts, total features, and ERCC count percentage. If all three parameters fall within 1 MAD (median of the absolute deviation) away from median then the cell was kept, and a total of 2064 cells from the 3200 cells passed the filtering. HCC8 samples were not analyzed because the size of tumor cell was out of the capture range of C1 HT IFC. Gene Ontology enrichment and transcriptional factor co-variance network analysis were done as previously described [1], and copy number inference from single-cell global transcriptional profiles was done with inferCNV software (https://github.com/broadinstitute/inferCNV).