Skip to main content

A novel non-invasive exhaled breath biopsy for the diagnosis and screening of breast cancer



Early detection is critical for improving the survival of breast cancer (BC) patients. Exhaled breath testing as a non-invasive technique might help to improve BC detection. However, the breath test accuracy for BC diagnosis is unclear.


This multi-center cohort study consecutively recruited 5047 women from four areas of China who underwent BC screening. Breath samples were collected through standardized breath collection procedures. Volatile organic compound (VOC) markers were identified from a high-throughput breathomics analysis by the high-pressure photon ionization–time-of-flight mass spectrometry (HPPI-TOFMS). Diagnostic models were constructed using the random forest algorithm in the discovery cohort and tested in three external validation cohorts.


A total of 465 (9.21%) participants were identified with BC. Ten optimal VOC markers were identified to distinguish the breath samples of BC patients from those of non-cancer women. A diagnostic model (BreathBC) consisting of 10 optimal VOC markers showed an area under the curve (AUC) of 0.87 in external validation cohorts. BreathBC-Plus, which combined 10 VOC markers with risk factors, achieved better performance (AUC = 0.94 in the external validation cohorts), superior to that of mammography and ultrasound. Overall, the BreathBC-Plus detection rates were 96.97% for ductal carcinoma in situ, 85.06%, 90.00%, 88.24%, and 100% for stages I, II, III, and IV BC, respectively, with a specificity of 87.70% in the external validation cohorts.


This is the largest study on breath tests to date. Considering the easy-to-perform procedure and high accuracy, these findings exemplify the potential applicability of breath tests in BC screening.

To the editor

Breast cancer (BC) is one of the most common cancers and a leading cause of death worldwide [1]. Early BC detection improves survival [2]. However, imaging-based BC screening methods are prone to being expensive and overdiagnosed. [3] By detecting volatile organic compounds (VOCs) during exhalation [4], breath biopsy is a promising non-invasive strategy for early cancer detection [5]. However, the accuracy of the breath test for BC diagnosis has not been verified by multi-center clinical trials with sufficient sample sizes [4].

Herein, we enrolled 5047 women who underwent BC screening from six hospitals in four areas of China (Fig. 1 and Additional file 1: Figure S1). The discovery set included 216 BC patients and 2959 non-cancer women from three hospitals in Beijing, and the external validation set included 249 BC patients and 1545 non-cancer women from another three hospitals in Yantai, Wenzhou, and Guiyang, respectively (Additional file 1: Tables S1, S2). Most BC patients were diagnosed at early stages (Additional file 1: Table S3).

Fig. 1
figure 1

Patient Enrollment and Study Design. This multi-center cohort study consecutively recruited women who underwent breast cancer screening at six hospitals in China. The participants were divided into the discovery cohort to identify candidate VOCs and to construct diagnostic models, and the external validation cohorts to independently test the diagnostic value of the models. In the model construction, the discovery dataset was randomly split into training, internal validation, and test datasets with a ratio of 5:2:3. The external validation cohorts enrolled women who underwent opportunistic breast cancer screening at Yantai and Wenzhou and women underwent the population-based breast cancer screening at Guiyang. For each participant, the information of risk factors for breast cancer and breath sample was breath sample collected before the standard mammography and ultrasonography. The final diagnosis was based on the pathology result and a 6-month follow-up. 78 patients lost to follow-up were excluded. Abbreviation: BC, breast cancer; CAMS, Chinese Academy of Medical Sciences

Breath samples of 1.2L for each participant were collected according to established procedures and analyzed by high-pressure photon ionization time-of-flight mass spectrometry (HPPI-TOFMS) (Additional file 1: Supplementary methods) [6]. HPPI-TOFMS has a higher throughput than earlier technologies and does not require the pretreatment of exhaled breath [7]. Each VOC ion’s peak area was then computed. Spectrum peak patterns and VOC correlation modules of the BC patients and controls differed (Additional file 1: Figures S2 and S3). Ten optimal VOC features were selected to differentiate the BC patients and non-cancer controls in the discovery cohort (Fig. 2A). Eight VOCs showed significantly higher peak areas in BC patients than controls, and two VOCs were substantially lower (Fig. 2B and Additional file 1: Table S4). Significant fold changes and diagnostic performances were identified in these 10 VOC ions (Additional file 1: Figure S4). The m/z values of 28.0 and 40.0, which may contain ethylene and propyne or fragment ions, showed the highest AUCs (Fig. 2C and Additional file 1: Table S4).

Fig. 2
figure 2

The Workflow of Data Analysis, the Distribution of the Top Ten Volatile Organic Compound (VOC) Ions with High Contribution Coefficients in the Models Construction, and the Performance of the Breast Cancer Detection Models for the BreathBC Model and BreathBC-Plus Model. A The workflow of data analysis and models construction. Breath samples were collected through standardized breath collection procedures using self-designed collectors and airbags and then analyzed by the high-pressure photon ionization–time-of-flight mass spectrometry (HPPI-TOFMS). Data for 1500 VOC ions were detected from the m/z range of [20, 320) with an interval of 0.2. Based on the random forest algorithm, the optimal 10 VOC ions were confirmed based on the feature importance or coefficient in the model training. Two breast cancer detection models (BreathBC and BreathBC-Plus) were constructed using the breath VOC markers with or without risk factors. Both models were verified with the three external validation cohorts. B Ten optimal VOC ions demonstrated significant differences between patients with breast cancer and non-cancer women among all the participants in this study, including eight elevated VOCs and two decreased VOCs. C The receiver operating characteristic (ROC) curves and the associated areas under curves (AUCs) of the diagnostic performance of the ten optimal VOC ions. DE For the BreathBC model using 10 breath VOC markers, the diagnostic AUC was 0.96 (95% CI, 0.94–0.97) in the internal validation cohort, 0.95 (95% CI, 0.93–0.90) in the test cohort (D), and 0.87 in the external validation cohorts (E). F, G For the BreathBC-Plus model using both breath VOC markers and risk factors, the combined model performed better than the BreathBC model in the internal validation cohort and the test cohort (AUC = 0.97–0.98) (F) and the external validation cohorts (AUC = 0.94) (G). Abbreviation: AUC, areas under curve. Abbreviation: HPPI-TOFMS, high-pressure photon ionization–time-of-flight mass spectrometry; VOC, volatile organic compound; BC, breast cancer; HC, healthy control; AUC, areas under curve

The random forest algorithm [8] was employed as the classifier. The discovery dataset was randomly split 5:2:3 into training, internal validation, and test datasets for model construction. We constructed two BC detection models, BreathBC and BreathBC-Plus, using only the 10 VOC markers and both VOC markers and risk factors, respectively (Fig. 2A).

BreathBC scores were higher in BC patients than controls (0.66 ± 0.31 vs. 0.11 ± 0.15, p = 1.29 × 10−153), regardless of tumor size, lymph node status, and molecular subtypes (all p < 0.01, Additional file 1: Figure S5), and collinear with tumor size (r = 0.41, p = 0.05; Additional file 1: Figure S6). The diagnostic AUC of the BreathBC model was 0.96 (95%CI, 0.94–0.97) in the internal validation cohort and 0.95 (95%CI, 0.93–0.90) in the test cohort (Fig. 2D, E, Additional file 1: Table S5). The performances are higher than all the results of previous studies using the gas chromatography-mass spectrometry (GC–MS) (AUC = 0.67–0.93) [9,10,11] but lower than the electronic nose (AUC = 0.99; Additional file 1: Table S6) [12]. However, no external validation was conducted for the previous methods, and their sample sizes were relatively small. In external validation cohorts, the BreathBC model achieved an AUC of 0.87, a sensitivity of 92.37% (230/249), and a specificity of 60.45% (934/1545; Additional file 1: Table S7).

Furthermore, the BreathBC-Plus diagnostic model was developed in the discovery cohort, combining BreathBC scores with traditional risk factors (Additional file 1: Supplementary methods). The combined model outperformed the BreathBC model in the internal validation cohort (AUC = 0.98), the test cohort (AUC = 0.97), and external validation cohorts (AUC = 0.94) (Fig. 2F, G, Additional file 1: Table S5). In external validation cohorts, BreathBC-Plus produced sensitivity and specificity of 89.16% (222/249) and 87.70% (1355/1545; Additional file 1: Table S7). Collectively, the total detection rates were 96.97% (32/33) in ductal carcinoma in situ (DCIS), 85.06% (74/87), 90.00% (99/110), 88.24% (15/17), and 100% (2/2) for stages I, II, III, and IV BC in external validation cohorts, respectively (Additional file 1: Table S8). Intriguingly, breathBC-Plus outperformed mammography and ultrasound in diagnosis (Additional file 1: Figure S7, Table S9).

There are some limitations of this study. First, although the HPPI-TOFMS provided a high-throughput methodology for VOC analysis, it is still being determined which chemical compound is associated with each MS peak. Second, as most previous studies on VOCs were only focusing on one cancer type, we also aimed to identify the BC-specific VOC markers in this study.

To our knowledge, this is the largest breathomics analysis study to date. Collectively, breath-based methods may provide supplemental or alternative screening strategies to detect early-stage BC and DCIS at comparable performance to imaging-based technologies.

Availability of data and materials

The supplementary data supporting this study's findings are openly available in the supplemental materials. Deidentified participant data and analytic code are available upon reasonable request to Dr. Jiaqi Liu (



Area under the curve


Breast cancer


Confidence interval


Ductal carcinoma in situ


Gas chromatography–mass spectrometry


Healthy control


High-pressure photon ionization-time-of-flight mass spectrometry


Receiver operating characteristic


Volatile organic compound


  1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: Cancer J Clin. 2021;71(3):209–49.

  2. Crosby D, Bhatia S, Brindle KM, et al. Early detection of cancer. Science. 2022;375(6586):eaay9040.

  3. Clift AK, Dodwell D, Lord S, et al. The current status of risk-stratified breast screening. Br J Cancer. 2022;126(4):533–50.

    Article  PubMed  Google Scholar 

  4. Hanna GB, Boshier PR, Markar SR, Romano A. Accuracy and methodologic challenges of volatile organic compound-based exhaled breath tests for cancer diagnosis: a systematic review and meta-analysis. JAMA Oncol. 2019;5(1):e182815.

  5. Wang P, Huang Q, Meng S, et al. Identification of lung cancer breath biomarkers based on perioperative breathomics testing: a prospective observational study. eClinicalMedicine. 2022;47:101384.

  6. Huang Q, Wang S, Li Q, et al. Assessment of breathomics testing using high-pressure photon ionization time-of-flight mass spectrometry to detect esophageal cancer. JAMA Netw Open. 2021;4(10):e2127042.

  7. Wang Y, Jiang J, Hua L, et al. High-pressure photon ionization source for TOFMS and its application for online breath analysis. Anal Chem. 2016;88(18):9047–55.

    Article  CAS  PubMed  Google Scholar 

  8. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  9. Phillips M, Cataneo RN, Ditkoff BA, et al. Volatile markers of breast cancer in the breath. Breast J. 2003;9(3):184–91.

    Article  PubMed  Google Scholar 

  10. Phillips M, Cataneo RN, Ditkoff BA, et al. Prediction of breast cancer using volatile biomarkers in the breath. Breast Cancer Res Treat. 2006;99(1):19–21.

    Article  CAS  PubMed  Google Scholar 

  11. Phillips M, Cataneo RN, Saunders C, Hope P, Schmitt P, Wai J. Volatile biomarkers in the breath of women with breast cancer. J Breath Res. 2010;4(2):026003.

  12. Yang HY, Wang YC, Peng HY, Huang CH. Breath biopsy of breast cancer using sensor array signals and machine learning analysis. Sci Rep. 2021;11(1):103.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank all the individuals, families, and physicians involved in the study for their participation.


This research was funded in part by the National Natural Science Foundation of China (82272938 to Jiaqi Liu), Beijing Nova Program (20220484059 to Jiaqi Liu), the CAMS Innovation Fund for Medical Sciences (2021-I2M-1-014 to Jiaqi Liu and 2021-I2M-C&T-A-015 to Xiang Wang), and the Beijing Hope Run Special Fund (LC2020B05 to Jiaqi Liu).

Author information

Authors and Affiliations



J.L., H.C., B.Z., Z.L., and X.W. conceived the study. J.L., Z.L., and X.W. administratively supported this study. J.L., B.Z., X.W., G.L., D.W., Y.W., Y.L., G.Q., Y.Z., Y.F., J.Z., X.T., Y.G., S.L., J.X., L.B., and C.L. enrolled the participants and collected study materials. J.L., H.C., Y.L., Y.F., J.Z., Y.G., S.L., and J.X. performed data cleaning and statistical analysis. J.L., H.C., Y.L., Z.J., H.X., Y.J., and J.Z. devised the algorithm and performed data analysis and interpretation. All authors wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Bailin Zhang, Zhihua Liu or Xiang Wang.

Ethics declarations

Ethics approval and consent to participate

This study was reviewed and approved by the ethics committees at each participating hospital (Ethics number: 22/290-3492). Written informed consent was obtained from each participant.

Competing interests

The authors have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

.Supplementary Methods, Figures, and Tables. 

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Chen, H., Li, Y. et al. A novel non-invasive exhaled breath biopsy for the diagnosis and screening of breast cancer. J Hematol Oncol 16, 63 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: