Abstract

OBJECTIVE: Chest radiographs are frequently used to diagnose community-acquired pneumonia (CAP) for children in the acute care setting. Natural language processing (NLP)-based tools may be incorporated into the electronic health record and combined with other clinical data to develop meaningful clinical decision support tools for this common pediatric infection. We sought to develop and internally validate NLP algorithms to identify pediatric chest radiograph (CXR) reports with pneumonia. MATERIALS AND METHODS: We performed a retrospective study of encounters for patients from six pediatric hospitals over a 3-year period. We utilized six NLP techniques: word embedding, support vector machines, extreme gradient boosting (XGBoost), light gradient boosting machines Naïve Bayes and logistic regression. We evaluated their performance of each model from a validation sample of 1,350 chest radiographs developed as a stratified random sample of 35% admitted and 65% discharged patients when both using expert consensus and diagnosis codes. RESULTS: Of 172,662 encounters in the derivation sample, 15.6% had a discharge diagnosis of pneumonia in a primary or secondary position. The median patient age in the derivation sample was 3.7 years (interquartile range, 1.4-9.5 years). In the validation sample, 185/1350 (13.8%) and 205/1350 (15.3%) were classified as pneumonia by content experts and by diagnosis codes, respectively. Compared to content experts, Naïve Bayes had the highest sensitivity (93.5%) and XGBoost had the highest F1 score (72.4). Compared to a diagnosis code of pneumonia, the highest sensitivity was again with the Naïve Bayes (80.1%), and the highest F1 score was with the support vector machine (53.0%). CONCLUSION: NLP algorithms can accurately identify pediatric pneumonia from radiography reports. Following external validation and implementation into the electronic health record, these algorithms can facilitate clinical decision support and inform large database research.

DOI 10.3389/FDGTH.2023.1104604