A CNN-based Active Learning Framework to Identify Mycobacteria in Digitized Ziehl-Neelsen Stained Human Tissues

Abstract

Tuberculosis is the most common mycobacterial disease that affects humans worldwide. Rapid and reliable diagnosis of mycobacteria is crucial to identify infected individuals, to initiate and monitor treatment and to minimize or prevent transmission. Microscopic identification of acid-fast mycobacteria (AFB) in tissue sections is usually accomplished by examining Ziehl-Neelsen (ZN) stained slides in which AFB appear bright red against the blue background. Because the ZN-stained slides require time consuming and meticulous screening by an experienced pathologist, our team developed a machine learning pipeline to classify digitized ZN-stained slides as AFB-positive or AFB-negative. The pipeline includes two convolutional neural network (CNN) models to recognize tiles containing AFB, and a logistic regression (LR) model to classify slides based on features from AFB-probability maps assembled from the CNN tile-based classification results. The first CNN was trained using tiles from 6 AFB-positive and 8 AFB-negative slides, and the second CNN was trained using the initial tile set expanded by additional tiles from 19 AFB-negative slides selected within an active learning framework. When evaluated on a separate set of tiles, the two CNNs yielded F1 scores of 99.03% and 98.75%, respectively, and were used to classify tiles in a separate set of 134 slides (46 AFB-positive and 88 AFB-negative). The classification yielded two AFB-probability maps, one for each CNN. The LR model was then 10-fold cross-validated using the average of feature vectors extracted from the AFB-probability maps generated by each CNN. The feature vector consisted of seven features of the AFB-probability map histogram and the positive tile rate (PTR). The sensitivity (87.13%), specificity (87.62%) and F1 (80.18%) achieved by this model were superior to the baseline performance of PTR-based separation of slides that yielded F1 scores of 73.13% and 66.67% in the AFB-probability maps outputted by the CNN trained within the active learning framework and the CNN trained only on the initial set of slides, respectively. Our CNNs outperformed several recently published models for AFB detection. Active learning induced robust learning of features by the CNN and led to improved LR classification performance of slides. In the 52 AFB-positive slides used in the pipeline development, the AFB were infrequent, predominantly single and only rarely found in small clusters. Our pipeline can classify slides and visualize suspected AFB-positive areas in each slide, and thus potentially facilitate evaluation of ZN-stained tissue sections for AFB.

Publication
Computerized Medical Imaging and Graphics 2020