Skip to main content
Fig. 1 | Journal of Hematology & Oncology

Fig. 1

From: Deep learning for differential diagnosis of malignant hepatic tumors based on multi-phase contrast-enhanced CT and clinical data

Fig. 1

The flowchart of dataset setup, the architecture of the STIC model and the performance on primary malignant hepatic tumors classification. A This study consisted of 612 patients in method development cohort and 111 patients in external validation cohort, who were pathologically diagnosed with HCC, ICC or metastatic liver cancer. B The STIC model contains four different modules. SpatialExtractor module is a deep CNN that uses convolutional layers to extract detailed spatial features of CECT images. TemporalEncoder module uses gated RNN to mine the changing pattern among different CECT phases. In the Integration module, the TemporalEncoder module is concatenated with the vector of encoded dummy clinical variables. Finally, in the Classifier module, the Integration output is passed through the softmax activation function to implement the classification task. C The ROC curves of five-fold cross-validation of the STIC model for classifying benign and malignant hepatic tumors in the preliminary study, where the mean ROC curve was obtained by interpolation of the ROC curves of each fold, with mean AUC of 0.987. D Comparison of the performance for differencing HCC and ICC on the test set by ROC curve analysis. The AUC of the STIC model was 0.893 (95% CIs, 0.803–0.982), which was much higher than 0.709 (95% CIs, 0.573–0.845) in the Naive RBG model and 0.766 (95% CIs, 0.644–0.888) in the Naive joint model. E Among three models, the STIC model produced the best performance in distinguishing two primary malignant hepatic tumors, with accuracy of 86.2% (95% CIs, 74.6%-93.9%), sensitivity of 0.892 (95% CIs, 0.746–0.970) and specificity of 0.810 (95% CIs, 0.581–0.946), where sensitivity and specificity are defined by viewing HCC as positive and ICC as negative. The error bars represent 95% CIs calculated by Wald Z Method with Continuity Correction for accuracy, sensitivity and specificity and by DeLong method for AUC. F Using McNemar’s Chi-squared test, the STIC model outperformed the Naive RBG model with an increase of 25.9% (95% CIs 11.0%-40.7%, p value = 0.001) in accuracy and 0.270 (95% CIs 0.082–0.459, p value = 0.009) in sensitivity. It also outperformed the Naive joint model with an increase of 17.2% (95% CIs 3.7%-30.8%, p value = 0.016) in accuracy and 0.189 (95% CIs 0.015–0.363, p value = 0.046) in sensitivity. G The distribution of the predicted score for HCC and ICC according to three models. For two benchmark models, the score predicted had much wider distribution. Our proposed STIC model had a more concentrated distribution of predicted scores for both HCC and ICC. H Comparison of the performance of the STIC model and two benchmark models using different extractor’s backbone for binary classification of primary malignant hepatic tumors. Using Cochran’s Q test, there were no significant differences in the diagnostic level among STIC models with different extractor’s backbone. For Naïve RGB models with different extractor’s backbone, there were significant differences in sensitivity (p value < 0.001) and specificity (p value = 0.012). For Naïve joint models with different extractor’s backbone, there were also significant differences in sensitivity (p value < 0.001) and specificity (p value < 0.001)

Back to article page