Autoantibody signature in hepatocellular carcinoma using seromics

Background Alpha-fetoprotein (AFP) is a widely used biomarker for hepatocellular carcinoma (HCC) early detection. However, low sensitivity and false negativity of AFP raise the requirement of more effective early diagnostic approaches for HCC. Methods We employed a three-phase strategy to identify serum autoantibody (AAb) signature for HCC early diagnosis using protein array-based approach. A total of 1253 serum samples from HCC, liver cirrhosis, and healthy controls were prospectively collected from three liver cancer centers in China. The Human Proteome Microarray, comprising 21,154 unique proteins, was first applied to identify AAb candidates in discovery phase (n = 100) and to further fabricate HCC-focused arrays. Then, an artificial neural network (ANN) model was used to discover AAbs for HCC detection in a test phase (n = 576) and a validation phase (n = 577), respectively. Results Using HCC-focused array, we identified and validated a novel 7-AAb panel containing CIAPIN1, EGFR, MAS1, SLC44A3, ASAH1, UBL7, and ZNF428 for effective HCC detection. The ANN model of this panel showed improvement of sensitivity (61.6–77.7%) compared to AFP (cutoff 400 ng/mL, 28.4–30.7%). Notably, it was able to detect AFP-negative HCC with AUC values of 0.841–0.948. For early-stage HCC (BCLC 0/A) detection, it outperformed AFP (cutoff 400 ng/mL) with approximately 10% increase in AUC. Conclusions The 7-AAb panel provides potentially clinical value for non-invasive early detection of HCC, and brings new clues on understanding the immune response against hepatocarcinogenesis.


Background
Hepatocellular carcinoma (HCC) is one of the leading causes of cancer mortality worldwide [1]. The majority of HCC occur in patients with underlying liver disease, such as hepatitis B virus (HBV) infection and cirrhosis [2]. Over half of patients with HCC are diagnosed at advanced stages, preventing the possibility of curative therapies. Alpha-fetoprotein (AFP) is a widely used, yet imperfect, biomarker for HCC early diagnosis. It has been reported that AFP (at a threshold level of 20 ng/ mL) showed low sensitivity of 40-60% with specificity of 80-90% [3]. Low sensitivity, false negativity (e.g., a small HCC with normal AFP level), and false positivity (e.g., liver function damage and certain gastrointestinal tumors) of AFP could lead to decreased chance of early diagnosis and thus poor clinical outcomes, highlighting the requirement for more effective approaches for HCC detection.
Cancer-associated autoantibodies (AAbs) may develop early during carcinogenesis when cancer-associated antigens appear in premalignant or malignant lesions. The immune system can effectively amplify and memorize immune responses to those antigens, thereby making AAbs as appealing cancer biomarkers. For example, DHCR24 AAb was identified as a novel biomarker for disease progression of hepatitis C [4]. Likewise, it has been reported that AAbs against HCC1, CDKN2A, p53, CIP2A, and survivin could indicate the presence of HCC prior to clinical diagnosis [5]. In another study, AAbs against NPM-1, 14-3-3 zeta, and MDM2 were suggested to have diagnostic value for AFP-negative HCC patients (AFP < 20 ng/mL; AFP − HCC) [6]. Serum AAbs against EIF3A [7] and SF3B1 [8] were also reported as potential diagnostic biomarkers for HCC. However, the sensitivity and specificity of those selected AAbs remain limited, and further high-throughput unbiased screening with a large cohort and independent validation are still required. In addition, the heterogeneity of human biology in cancer suggest that combined use of the cancer biomarkers in parallel or in tandem in algorithms such as artificial neural network (ANN) are necessary [9,10].
Protein microarrays are capable of presenting thousands of tumor-associated antigens to rapidly and globally identify AAb responses in serum (seromics) [11,12]. Known and predicted tumor antigens have been employed in a comprehensive protein array to profile cancer immune response, such as p53 [13], GPR78 [14], HER2 [15], and HSP60 [16]. In this regard, global AAb screening has identified high-performance AAb panels for early diagnosis of lung cancer [13] and Behcet disease [17]. Herein, the HuProt arrays, comprising of 21, 154 unique full-length proteins, were first employed to survey serum AAbs using HCC samples. Subsequently, HCC-focused arrays were fabricated with the candidate proteins identified in the HuProt arrays. A large cohort of 1253 serum samples, including HCC patients, liver cirrhosis (Cirrhotic) patients, and healthy controls (Healthy), were screened to develop a diagnostic model. A novel panel of 7 proteins including CIAPIN1, EGFR, MAS1, SLC44A3, ASAH1, UBL7, and ZNF428 were discovered and evaluated for the early detection of HCC.

Human serum sample
The cohort was comprised of 1253 serum samples from 611 HCC patients, 249 cirrhotic patients, and 393 healthy controls. Between January 2019 and August 2019, these samples were collected at Zhongshan Hospital of Fudan University, Eastern Hepatobiliary Surgery Hospital, and Cancer Hospital of Guangxi Medical University. All blood samples were processed identically to obtain serum. Briefly, 5 mL venous blood was drawn from each individual (before any treatments and surgery), placed in room temperature (RT) for 1 h until coagulated. Serum was recovered by centrifugation at 3000 rpm for 10 min and stored in aliquots at − 80°C until used. The informed consent and agreement of all samples used in this study have been obtained. The ethical regulations have been approved from each hospital.
Inclusion criteria for HCC patients in this study were (1) pathological diagnosis of HCC (n = 446); or (2) diagnosis of HCC by enhanced computed tomography, enhanced magnetic resonance imaging, or contrastenhanced ultrasonography in combination with AFP or des-gamma carboxyprothrombin for patients without pathological diagnosis (n = 165); (3) without autoimmune diseases. Patients were all free of hepatic encephalopathy and ECOG/WHO/Zubrod performance status scored as 0~1. Child-Pugh score, BCLC staging [18], TNM staging, and Chinese Liver Cancer staging [19] were individually estimated; (4) patients with other cancerous history were excluded from our study.
Diagnosis of liver cirrhosis was confirmed by enhanced magnetic resonance imaging or pathology. Healthy controls had normal liver biochemistry and were in the absence of liver diseases and alcohol abuse.

Serum AAb profiling on HuProt arrays
HuProt™ Human Proteome Microarray v3.0 was provided by CDI Laboratories, Inc (Mayaguez, PR). Each HuProt array is comprised of 21,154 unique proteins. A total of 100 serum samples from discovery phase (I) was applied to HuProt arrays, including 50 HCC and 50 healthy controls. The microarray was taken out from − 80°C and then incubated in blocking buffer (3% BSA in PBS) at RT for 3 h. Then a serum sample diluted at 1:200 in binding buffer (1% BSA in PBST) was added to the microarray and incubated at 4°C overnight. After washing with PBST, the microarray was incubated with 1:1000 diluted Fluor conjugated goat anti-human IgG (532 nm) and donkey anti-human IgM (635 nm) (Jackson ImmunoResearch, West Grove, PA) at RT for 1 h in the dark. After washing with PBST, the microarray was rinsed with ddH 2 O and dried. The microarray was scanned with the LuxscanTM 10 K-A (CapitalBio Corporation, Beijing, China). The GenePix Pro 6.0 (Axon Instruments, Foster City, CA) was used for foreground and background intensity extraction for each spot. The signal for each spot (SNR) was defined as the ratio of the foreground to the background median intensity as previously described [20].

HCC-focused arrays
After serum incubation on the HuProt arrays, autoantibody signals were detected, normalized [21], and quantified. For selection of candidate proteins, three criteria should be satisfied after comparing HCC vs. Healthy: (1) p values obtained from the t test ≤ 0.05; (2) fold change (FC) ≥ 1.2; (3) the positive ratio ≥ 10% (The HCC positive reactivity was defined as greater than the mean plus 2 × SD of the healthy controls. The positive ratio was calculated as the number of HCC positive reactivity to its sum [22]). According to the criteria above, 81 proteins were identified. The extra 19 AAbs including CTRL, DCAF4L2, BIRC5, CCNB1IP1, GPR78, HM13, HSPA2, IMP3, KDM1A, MAPK1, RALA, RPLP0, SARNP, SF3A3, TSPAN13, TUBB6, XRCC5, CENPF, and CDKN2A were selected based on cancer literature in general. We aimed to fabricate the HCC-focused arrays using more candidate proteins from our own experiment and the literature. Thus, a total of 100 proteins were picked to fabricate the HCC-focused arrays, which contained 14 identical subarrays on each slide (BC-BIO, Foshan, China). The subsequent assay process was similar to that described for HuProt array, with an exception that the dilution of serum samples was 1:100 per subarray.

Model development for HCC detection
For ANN model, we determined the number of hidden neurons based on previous literature [23]. Using the model N h = (4n 2 + 3)/(n 2 − 8) [N h , the number of hidden neurons; n, the number of input neurons], N h was set at 5 in our study. Thus, fully connected feedforward neural-networks including 7 input nodes, 5 neurons in the hidden layer, and 2 output nodes were chosen. Back propagation of error algorithm was used as the learning rule, and the average committee vote was used to classify samples [24][25][26]. For the test phase (II), 576 samples were randomly split into 10 equally sized groups. One ANN model was constructed using 90% of cases as training set and the remaining 10% as verification set. This procedure was repeated 10 times to obtain 10 ANN models. After repeating 50 times, 500 ANN models were developed. Each ANN model provided the outputs 0 for control or 1 for HCC. The committee vote was performed by averaging all outputs and then to classify the samples. The samples in the validation phase (III) used 500 ANN models for the blind test. Both ANN models and AFP were tested using receiver operating characteristic (ROC) curve analysis.

Study design
This study included three phases ( Fig. 1): discovery phase (I), test phase (II), and validation phase (III). In the discovery phase (I), serum samples from 50 HCC and 50 healthy were enrolled. These 100 samples were all obtained from Zhongshan Hospital and individually profiled on HuProt arrays for screening candidate proteins and fabricating the HCC-focused arrays. Then, 282 HCC, 130 cirrhotic, and 164 healthy were collected from Zhongshan Hospital and used for model construction in the test phase (II). Finally, 279 HCC, 119 cirrhotic, and 179 healthy collected from Eastern Hepatobiliary Surgery Hospital and Cancer Hospital of Guangxi Medical University were used for independent verification in the validation phase (III). The clinical data of patients are summarized in Additional file 5: Table S1. Clinical Fig. 1 Study design using seromics. A large cohort of 1253 serum samples, including 611 HCC patients, 249 patients with liver cirrhosis (cirrhotic), and 393 healthy controls (healthy), were enrolled for discovery and evaluation of potential serum AAbs as HCC diagnostic biomarkers variables of each group in the test phase (II) and validation phase (III) were compared by Pearson's chisquared test, and there was no statistical significance.

AAb screening for construction of HCC-focused arrays
In the discovery phase (I), the HuProt arrays were employed to profile 100 serum samples collected from 50 HCC and 50 healthy (Additional file 1: Fig. S1). For selection of candidate proteins, three criteria should be satisfied after comparing HCC vs. Healthy, as described in the "Methods" section. Finally, 81 proteins that were more significantly bound by the autoantibodies of HCC group than by those of the healthy group were identified (Fig. 2a). A total of 100 proteins were printed to fabricate the HCC-focused arrays in combination with additional 19 proteins from cancer literature in general. Among these 19 AAbs, 17 were also present in the discovery HuProt array and satisfied 1-2 criteria. Another 2 AAbs, CENPF [16] and CDKN2A [5], were absent in the discovery HuProt array. Alternatively, more samples were enrolled in the test phase (II) and validation phase (III), which would help to accurately evaluate the distinguishing capacity of these AAbs. The biological function and expression level of these 100 proteins were also investigated based on HPA database and our previous multi-omics HCC data [27] (Additional files 2 & 3: Figs. S2 & S3). One serum sample (pooled from 10 randomly selected HCC individuals) was independently applied to a total of 47 different HCC-focused arrays to evaluate their potential variance. As shown in Fig. 2b, the variance was minimal with an average correlation coefficient of 0.95.

Identification of AAb biomarkers for HCC detection
Next, HCC-focused arrays were tested using serum samples from a large cohort of HCC individually. In the test phase (II), the signals of each protein between HCC and Fig. 2 Fabrication of HCC-focused arrays. a According to the screening results of HuProt arrays, 81 proteins (p ≤ 0.05, FC ≥ 1.2 and positive ratio ≥ 10%) were selected as potential candidates. A total of 100 proteins were printed to fabricate the HCC-focused arrays, including 19 proteins from previous reports. b Six representative HCC-focused arrays testing the same sample exhibited high reproducibility. The diagonal indicates the SNR distribution of the sample, the lower left indicates the bivariate scatter plot with a fitted line, and the upper right indicates the correlation coefficient and the significance (***p < 0.001). c HCC-focused arrays were incubated with samples from one HCC patient, one patient with liver cirrhosis, and one healthy control, respectively. Three-dimension renderings of the signal intensities were shown, indicating that the array worked well healthy or cirrhotic were compared, respectively. Examples of array image for HCC, cirrhotic, and healthy were provided in Fig. 2c. We identified a total of 55 potential biomarkers using the following criteria: p < 0.05, FC ≥ 1.2, and sensitivity > 15% with at least 90% specificity. Among them, 24 AAbs were able to classify HCC patients versus healthy, 17 AAbs were able to classify HCC versus cirrhotic, and the remaining AAbs were able to classify both HCC patients versus healthy and HCC versus cirrhotic (Additional file 6: Table S2).
To select predictors for model development, we performed 10-fold cross validation for the 55 potential biomarkers (Fig. 3a). The differential AAbs in each fold were used as input to a logistic regression that classified HCC patients versus controls. Within each fold, stepwise variable selection identified the most discriminative subset of the biomarker candidates [28]. Biomarker candidates selected in ten folds were characterized as predictors in a consensus logistic regression model, and 7 predictors were identified including CIAPIN1, EGFR, MAS1, SLC44A3, ASAH1, UBL7, and ZNF428 (Additional file 4: Fig. S4). The performance of the combinatorial 7 AAbs was then evaluated for HCC detection.

Performance of the 7-AAb panel in test/validation phase
The correlations between any two proteins from the 7 predictors were calculated using all samples (HCC, cirrhotic, and healthy) in the test phase (II). The results showed that the closest connection existed between MAS1 and ASAH1 with a coefficient of 0.77 (p < 0.001; Fig. 3b). It has been reported that neural network analysis was potentially more powerful than traditional statistical techniques when the interaction among variables was complex. Thus, ANN model based on these 7 predictors was further explored in the test phase (II). We built a three-layer neural network with 7 input nodes, 5 hidden neurons, and 2 output neurons (Fig. 3c). The committee vote was performed by averaging all outputs and then to classify the samples (Fig. 4). As shown in Table 1 Table S3). We also explored the feasibility of the model in HCC patients with negative HBsAg (HBsAg − -HCC). Test phase (II) and validation phase (III) contained 46 and 30 HBsAg − -HCC patients, respectively. We found that the ANN model of this panel was able to efficiently detect HBsAg − -HCC patients from controls (AUC 0.822-0.932), superior to AFP at a cutoff of 400 ng/mL with an AUC of 0.567-0.647 (Additional file 7: Table S3).

The 7-AAb panel's performance for different HCC stages
Patients with early-stage HCC can benefit from curative treatments like tumor resection, liver transplantation, or ablation [18]. The performance of our model for HCC patients at different stages were also considered in our study. The evaluation for different stages of BCLC is provided in Table 3

Discussion
Although pathological and radiological examination remains the "gold standard" for clinical diagnosis of cancers, liquid biopsy has shown appealing potential for early detection of HCC [29]. In this regard, tremendous efforts have been made on the early diagnostic potential of circulating micro-RNA signature [30], cell-free DNA [31], metabolites [32], glycans [33], and DNA methylation pattern [34]. However, AFP is still the only widely used clinical protein biomarker for HCC diagnosis, although approximately 40% of HCC cases harbored a normal AFP level. Due to the nature of stability and easy detection, efforts have also been made to evaluate novel protein biomarkers for HCC detection, such as Dickkopf-1 [35] and Aldo-keto reductase family 1 member B10 [36]. The subjects were systematically rotated between ten folds. Within each fold, differential AAbs were determined comparing HCC patients to controls. The predictors for further model development were generated using the potential biomarkers, which worked in ten folds in the cross validation. b The correlations between any two proteins from the 7 predictors were calculated using all samples (HCC, cirrhotic, and healthy) in the test phase (II). The diagonal indicates the SNR distribution of the sample, the lower left indicates the bivariate scatter plot with a fitted line, and the upper right indicates the correlation coefficient and the significance (*p < 0.05, **p < 0.01, ***p < 0.001). c Schematic representation of the ANN model to predict HCC. Fully connected feedforward neural-networks including 7 input nodes (7 predictors), 5 neurons in the hidden layer, and 2 output nodes were chosen. Back propagation of error algorithm was used as the learning rule, and the average committee vote was used to classify the patient samples Based on three steps for biomarker classifier development [37], we focused on CIAPIN1, EGFR, MAS1, SLC44A3, ASAH1, UBL7, and ZNF428, which are mainly involved in activation of signaling cascades and apoptotic/metabolic processes. The molecular function of UBL7 is polyubiquitin modification-dependent protein binding, and loss of ubiquitin-proteasome players were suggested to lead to protein expression alteration and hepatocarcinogenesis [27]. CIAPIN1 was reported to play an important role in HCC proliferation through regulating the expression of cell cycle-related proteins [38]. EGFR is a transmembrane receptor tyrosine kinase and plays a key role in HCC development and progression [39]. The biological functions of MAS1, SLC44A3, Fig. 4 Workflow for the ANN-model. For the test phase (II), 576 samples were randomly split into 10 equally sized groups. One ANN model was built using 90% of cases as training set and the remaining 10% as verification set. This procedure was performed 10 times to generate 10 ANN models. Five hundred ANN models were obtained after a total running of 50 times. Each ANN model provided the following outputs: 0 indicates healthy control and 1 indicates HCC. The committee vote was performed by averaging all outputs and then to classify the samples. The samples in the validation phase (III) used 500 ANN models for the blind test The diagnostic cutoff value of AFP was 400 ng/mL ASAH1, and ZNF428 in HCC were rarely reported.
Here, we provided autoantibody clues for further exploring their biological significance in HCC. It has been reported that neural network analysis was potentially more useful than traditional statistical techniques when the relationship among variables was complex and non-linear [10]. The performance of ANNbased 7-AAb model could be further improved due to continuous learning of neural networks in future clinical application. However, there are several limitations in the present study. First, AAbs were reported to appear in multiple cancer types due to immune surveillance. Alternatively, it may indicate the potential of AAbs for monitoring various cancer types, similar to the pan-cancer diagnostic value of cfDNA alterations [40]. Based on previous literature, AAb against ASAH1 could be applied to monitor the progression of melanoma [41]. However, there were no significant differences in AAb  The diagnostic cutoff value of AFP was 400 ng/mL