Clinical specimens
11 Formalin-fixed paraffin-embedded (FFPE) tumors with combined LUAD and LUSC histology were identified, from which independent isolation of both histological components was possible (N = 11, Additional file 1: Table S1). As the components of these mixed histology tumors are not temporally ordered, we refer to the component parts of these mixed histology tumors as “T-LUAD” and “T-LUSC” with the T referring to histologic transformation, without presumption of directionality. We identified an additional 4 pre-transformation LUAD and 7 post-transformation LUSC cases for which tissue material was available (Additional file 1: Table S1). As controls, we included a group of never-transformed LUADs (N = 15) and a set of de novo LUSC samples (N = 11) (Additional file 1: Tables S2,3). All study subjects had provided signed informed consent for biospecimen analyses under an institutional review board-approved protocol.
Tissue isolation
For microdissection, hematoxylin and eosin (H&E)-stained FFPE tumor slides of tumors with combined LUAD/LUSC were evaluated by a pathologist. Multiple FFPE blocks of each tumor were reviewed, with the aim of selecting areas containing exclusively the LUAD or the LUSC component. Where individual slides with pure components were not available, slides containing both histologic components with complete physical separation were selected. Between 10 and 20 unstained sections (USS) at 10 μm prepared on uncharged slides from corresponding FFPE blocks were used for microdissection of each case. Every 10 sections, an additional section was stained with H&E for confirmation of histology. The areas corresponding to each histological component on the initial H&E were dissected using a clean blade and the tissue collected in 0.5-ml nuclease-free tubes for nucleic acid extraction. Alternatively, 1.0–1.5-mm core punches were made from LUAD and LUSC areas on the FFPE blocks and placed in 0.5-ml nuclease-free tubes for nucleic acid extraction, exclusively in cases where each histologic component was located in a different block and where no histologic cross-contamination was confirmed by pathological review.
DNA extraction
FFPE tissue was deparaffinized using heat treatment (90 °C for 10’ in 480 μL PBS and 20 μL 10% Tween 20), centrifugation (10,000 × g for 15’) and ice chill. Paraffin and supernatant were removed, and the pellet was washed with 1 mL 100% EtOH followed by an incubation overnight in 400 µl 1 M NaSCN for rehydration and impurity removal. Tissues were subsequently digested with 40 µl Proteinase K (600 mAU/ml) in 360 µl Buffer ATL at 55 °C. DNA isolation proceeded with the DNeasy Blood & Tissue Kit (QIAGEN catalog #69504) according to the manufacturer’s protocol modified by replacing AW2 buffer with 80% ethanol. DNA was eluted in 0.5X Buffer AE.
RNA/DNA dual extraction from FFPE tissue
FFPE sections were deparaffinized in mineral oil. Briefly, 800µL mineral oil (Fisher Scientific, #AC415080010) and 180µL Buffer PKD were mixed with the sections, Proteinase K was added for tissue digestion, and the sample was incubated at 56 °C for 15 min. Phase separation was encouraged with centrifugation, and the aqueous phase was chilled 3 min to precipitate RNA. After centrifugation for 15 min at 20,000 g, RNA-containing supernatant was removed for extraction, while DNA remained in the pellet. Nucleic acids were subsequently extracted using the AllPrep DNA/RNA Mini Kit (QIAGEN, #80204) according to the manufacturer’s instructions. RNA was eluted in nuclease-free water and DNA in 0.5X Buffer ATE.
Whole exome sequencing
For DNA samples, after PicoGreen quantification and quality control by Agilent BioAnalyzer, 100–500 ng of DNA were used to prepare libraries using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) with 8 cycles of PCR. After sample barcoding, 100 ng of library was captured by hybridization using the xGen Exome Research Panel v1.0 (IDT) according to the manufacturer’s protocol. PCR amplification of the post-capture libraries was carried out for 12 cycles.
For DNA library samples, after PicoGreen quantification and quality control by Agilent BioAnalyzer, 100 ng of library transferred from the DMP was captured by hybridization using the xGen Exome Research Panel v1.0 (IDT) according to the manufacturer’s protocol. PCR amplification of the post-capture libraries was carried out for 8 cycles.
Samples were run on a HiSeq 4000 in a 100 bp/100 bp paired end run, using the HiSeq 3000/4000 SBS Kit (Illumina). Normal and tumor samples had a median target coverage of 87X and 108X, respectively.
Whole exome analysis
We used a comprehensive in-house WES pipeline TEMPO—time-efficient mutational profiling in oncology (https://github.com/mskcc/tempo & https://ccstempo.netlify.app) that performs alignment using BWA-mem algorithm followed by mutation calling using Strelka2 and Mutect2 variant callers. The combined, annotated and filtered variant calls were used for downstream analysis. Details of the variant call processing are described at https://ccstempo.netlify.com/variant-annotation-and-filtering.html#somatic-snvs-and-indels and are previously described as well [12]. Copy number analysis was performed with FACETS (https://github.com/mskcc/facets), processed using facets-suite (https://github.com/mskcc/facets-suite), and manual reviewed and refitted using facets-preview (https://github.com/taylor-lab/facets-preview). To delineate mutational processes driving the acquisition of somatic alterations, mutational signatures were decomposed for all tumor samples that had a minimum of 5 single-nucleotide somatic mutations using the R package mutation signatures (https://github.com/mskcc/mutation-signatures). Further, a given signature was considered to be ‘dominant’ if the proportion of mutations contributing to the signature was at least 20% of all mutations detected in the sample.
Purity, ploidy, tumor mutational burden (TMB), genome doubling and cancer cell fractions for all mutations in all specimens were inferred from sequencing data. We estimated neoantigen load by taking the number of variant estimated to having strong class I MHC binding affinity by NetMHC 4.0 [13] and normalizing it by the TMB. We summarized the top occurring somatic variants located on cancer genes in an oncoprint using the R package ComplexHeatmaps version 2.0.0 (https://github.com/jokergoo/ComplexHeatmap) [14]. Cancer genes were genes defined as “OncoKB Annotated” on the Cancer Gene List (downloaded in June 2020, https://www.oncokb.org/cancerGenes). All other plots for this analysis were created using ggplot2 version 3.3.2 (https://github.com/tidyverse/ggplot2).
Comparison to TCGA
Somatic mutations and copy number alterations (CNAs) found in cancer genes in our T-LUAD samples were compared to those found in The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) cohort using a Fisher exact test. The mutations from TCGA-LUAD [15] were extracted using the R package TCGA mutations (https://github.com/PoisonAlien/TCGAmutations) and tested against our cohort mutations with maftools v.2.0.16 (https://github.com/PoisonAlien/maftools) [16]. Separately, a Fisher exact test was used to identify significant CNAs by comparing the number of samples with amplifications and deletions on particular genes in TCGA-LUAD, extracted from CbioPortal [17, 18], to the number of samples with gene level CNAs in our cohort. For both mutations and CNAs, genes with p < 0.05 were considered differentially altered. The results were summarized in a volcano plot using the R packages, EnhancedVolcano version 1.7.4 (https://github.com/kevinblighe/EnhancedVolcano) and ggplot.
Methylation sequencing
After PicoGreen quantification (ThermoFisher, #P11496) and quality control by Agilent BioAnalyzer, 170–750 ng of genomic DNA was sheared using a LE220-plus Focused-ultrasonicator (Covaris, #500569). Samples were cleaned using Sample Purification Beads from the TruSeq Methyl Capture EPIC LT Library Prep Kit (Illumina, #FC-151-1002) according to the manufacturer’s instructions with modifications. Briefly, samples were incubated for 5 min after addition of SPB, 50 µL RSB was added for resuspension, and resuspended samples were incubated for 2 min. Sequencing libraries were prepared using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) without PCR amplification. Post-ligation cleanup proceeded according to Illumina’s instructions with 110 µL Sample Purification Mix. After purification, 3–4 samples were pooled equimolar and methylome regions were captured using EPIC oligos. Capture pools were bisulfite converted and amplified with 11–12 cycles of PCR. Pools were sequenced on a NovaSeq 6000 or HiSeq 4000 in a 150/150 bp or 100 bp/100 bp paired end run, using the NovaSeq 6000 S4 Reagent Kit (300 Cycles) or HiSeq 3000/4000 SBS Kit (Illumina). The average number of read pairs per sample was 51 million.
DNA methyl capture EPIC data processing
The Bismark pipeline [19] was adopted to map bisulfite-treated EPIC sequencing reads and determine cytosine methylation states. Trim Galore v0.6.4 was used to remove raw reads with low-quality (less than 20) and adapter sequences. The trimmed sequence reads were C(G) to T(A) converted and mapped to similarly converted reference human genome (hg19) [20] using default Bowtie 2 [21] settings within Bismark. Duplicated reads were discarded. The remaining alignments were then used for cytosine methylation calling by Bismark methylation extractor.
Differential methylation analysis
Differentially methylated CpGs (DMCs) were identified using DSS R package [22, 23] on the basis of dispersion shrinkage followed by Wald statistical test for beta-binomial distributions. Any CpGs with FDR < 0.1 and methylation percentage difference greater than 10% were considered significant DMCs. Differentially methylated regions (DMRs) were subsequently called based on the DMCs. The called DMRs were required to satisfy the minimum length of 50bps and minimum 3 CpGs in the region; two neighboring DMRs were merged if less than 50bps apart, and significant CpGs were those that occupy at least 50% of all CpGs population in the called DMRs as default in DSS package. Pairwise comparisons were conducted for pre-transformation pre-transformation LUAD versus control LUAD, post-transformation LUSC versus de novo LUSC, and post-transformation LUSC versus pre-transformation LUAD. The DMRs were mapped to gene regions at promoters and gene bodies, and differential methylation levels were subsequently associated with differential gene expression values in selected pathways. In addition to pairwise comparisons, principal component analysis (PCA) and partial least square discriminant analysis (PLSDA) were also performed to classify samples into groups and identify influential CpGs using mixOmics R package [23].
Motif enrichment analysis
Differential methylation may influence transcription factor (TF) binding. To identify overrepresented known TF motifs due to differential methylation for the post-transformation LUSC compared with pre-transformation LUAD, “findMotifsGenome.pl” from HOMER [24] was applied to DMCs (± 50bps) overlapping with gene promoter regions. DMCs regions with hyper- and hypo-methylation in LUSC were explored separately to show the effects from different methylation status. The significantly enriched TFs were defined as those with q value ≤ 0.1.
RNA sequencing
Approximately 500 ng of FFPE RNA or 100 ng of fresh frozen RNA per sample was used for RNA library construction using the KAPA RNA Hyper library prep kit (Roche, Switzerland) per the manufacturer’s instructions with minor modifications. Customized adapters with unique molecular indexes (UMI) (Integrated DNA Technologies, US) and Sample-specific dual-indexes primers (Integrated DNA Technologies, US) were added to each library. The quantity of libraries was measured with Qubit (Thermo Fisher Scientific, US) and quality measured by TapStation Genomic DNA Assay (Agilent Technologies, US). Equal amounts of each RNA library (around 500 ng) were pooled for hybridization capture with IDT Whole Exome Panel V1 (Integrated DNA Technologies, US) using a customized capture protocol modified from NimbleGen SeqCap Target Enrichment system (Roche, Switzerland). The captured DNA libraries were then sequenced on an Illumina HiSeq4000 with paired end reads (2 Å ~ 100 bp), at 50millions reads/sample.
RNASeq analysis
In-line UMI sequences were trimmed from the sequencing reads with Marianas (https://github.com/mskcc/Marianas) and aligned to human GRCh37 genome using STAR 2.7.0 (https://github.com/alexdobin/STAR) [25] with Ensembl v75 gene annotation. Hybrid selection-specific metrics and alignment metrics were calculated for the BAM files using CalculateHsMetrics and CollectRnaSeqMetrics, respectively, from Picard Toolkit (https://github.com/broadinstitute/picard) to determine the quality of the capture.
We quantified RNA-seq reads with Kallisto v.0.45.0 [26] to obtain transcript counts and abundances. Kallisto was run with 100 bootstrap samples, sequence-based bias correction and in strand-specific mode, which processed only the fragments where the first read in a pair is pseudoaligned to the reverse strand of a transcript. Differential gene expression analysis, principle component analysis and transcript per million (TPM) normalization by size factors were done from Kallisto output files using Sleuth v0.30.0 run in gene mode [27]. Differentially expressed genes were identified using the Wald test. Genes were marked significant if the False Discovery Rates, q, calculated using the Benjamini–Hochberg method, were less than 0.05, and beta (Sleuth-based estimation of log2 fold change) > 1.25, which approximately correlated with a log2 fold change of 2 in our data. The log of the normalized TPM values for selected significant genes was rescaled using a z-score transformation and plotted in a heatmap using the ComplexHeatmap Library in R.
Pathway enrichment
Gene set enrichment analysis (GSEA) [28] was performed on full sets of gene expression data across the previously mentioned three comparisons. Genes were ranked on p value scores computed as − log10(p value) * (sign of beta). Gene set annotations were taken from Molecular Signatures Database (MSigDB v7.0.1) [28, 29]. The significance level of enrichment was evaluated using permutation test, and the p value was adjusted by Benjamini–Hochberg procedure. Any enriched gene sets with adjusted p value ≤ 0.1 were regarded as significant. This analysis was conducted using ClusterProfiler R package [30]. The enriched gene sets that are influenced by DMCs were selected, and pathway annotations concatenated manually to remove redundancy and achieve high level generality. When the pathway terms were merged, median enrichment score was taken as the new group enrichment score, p values were aggregated using Fisher’s method from the Aggregation R package [31], and core enrichment of genes was collapsed.
LUSC subtyping
Wilkerson’s model [32] was used to predict LUSC subtypes by a nearest centroid classification algorithm. An expression heatmap on centroid genes was produced by ‘ComplexHeatmap’ library in R. Chi-square test was performed to detect the association between cell types and subtypes.
Immunoblotting
Protein extraction and Western blot were performed as previously described [33] after quantification of protein extracts using the Bradford method (#5000205, Bio-Rad), running 10–30-ug aliquots in the gels. Western blot antibodies for Beta-catenin (#8480, Cell Signaling Technology), pAKT (#4060, Cell Signaling Technology), p40 (#67825, Cell Signaling Technology), EGFR (#4267, Cell Signaling Technology), EZH2 (#5246, Cell Signaling Technology), MYC (#13987, Cell Signaling Technology), pPRAS40 (#2997, Cell Signaling Technology), SOX2 (#3579, Cell Signaling Technology), Vinculin (#13901, Cell Signaling Technology) and actin (#3700, Cell Signaling Technology) were used. Immunohistochemistry antibodies for p40 (#AC13066A, BioCare), TTF-1 (#M3575, Dako), CK5/6 (#790-4554, Ventana) and MYC (#ab32072, Abcam) were used.
Protein extraction from FFPE
Protein extraction from punches on the LUAD and LUSC components of combined histology FFPE blocks was performed following the instructions from the Qproteome FFPE Tissue Kit (#37623, Qiagen), using one punch per extraction.
Phospho-kinase array
Protein samples were quantified with the Bradford method (#5000205, Bio-Rad), and 200-ug aliquots were used in the phospho-kinase array (#ARYC003C, R&D-Biotechne), which was performed using the manufacturer’s instructions. Quantification of spots was performed using the Image Studio software (Version 3.1, Li-Cor). Technical replicates (2 per array) per sample were averaged. Two-tailed Student’s T-test was performed on these values, comparing the T-LUAD and T-LUSC groups.
Cell line transductions
Lx462 cell line was derived by PDX dissociation using the tumor dissociation kit (#130-95-929, Miltenyi) and GentleMACS Octo Dissociator with Heaters (Miltenyi, #130-096-427) as indicated by the manufacturer. Dissociated cells were seeded in RPMI 1640 10% FBS and expanded in culture. PC9 cell line was purchased from Millipore Sigma (#90071810-VL). Both cell lines were regularly tested for Mycoplasma and maintained in RPMI 1640 10% FBS. Lentiviruses were produced as previously described [34] with a MYC overexpression plasmid (#10674, Addgene) and a myrAKT overexpression plasmid, kindly provided by Dr. Witte [35]. Cell lines were transduced at high MOI as previously described [34] with overnight virus incubation.
Xenografts and in vivo treatments
For cell line xenografts, 1 million cells were injected in a 1:1 mixture of PBS and Matrigel (#CB40234, Fisher) in the flanks of NOD.Cg-Prkdc < scid > Il2rg < tm1Wjl > /SzJ (NSG) mice. For patient-derived xenografts, tumor was dissociated as described above, and 1 million cells were injected in a 1:1 mixture of PBS and Matrigel (#CB40234, Fisher) in the flanks of NOD.Cg-Prkdc < scid > Il2rg < tm1Wjl > /SzJ (NSG) mice.
For treatments, 5–6 mice were engrafted per treatment arm (please see figure legends for details) until they reached 100–150 mm3. At that point, mice were randomized into groups and treated with either vehicle, osimertinib (25 mg/kg/day), ORS1 (100 mg/kg/day) or samotolisib (10 mg/kg/day), or combinations of osimertinib and ORS1, or of osimertinib and samotolisib, by oral gavage 5 days a week. Mice were killed when tumors reached ~ 1000 mm3 and fixed in formalin 10% O/N for paraffin embedding. Tumors and mice body weight were measured twice a week.
To generate osimertinib-resistant Lx462 tumors, these were treated with osimertinib in the previously mentioned conditions until relapse (~ 1000 mm3), when tumors were collected, dissociated and re-engrafted to continue osimertinib treatment. At second relapse on osimertinib, mice were considered resistant.