Pre-processing of data:
Before being accepted, each sample is tested to ensure that a cough has been captured at sufficient length and volume to be useful. A virtue of the proposed system is that almost any type of cough sample can be used. There are no strict requirements about how a tester must cough.
Once accepted, audio samples are used to generate Mel-spectrogram and MFCC (Mel-frequency cepstral coefficients) for further processing.
Model development and early results
A number of Machine Learning and Deep Learning COVID-19 model projections have been developed using the accepted cough audio samples. Each model examines the audio files to determine if the tester shows an audio signature that indicates if the tester is infected with COVID positive, or negative.
To train each model, a portion of samples was used for training and another set were used for validation of COVID-19 model predictions as shown in the table above.
A set of 1553 validation samples was used. 25 of these were Covid-19 Positive; 1528, mostly obtained from public domain pneumonia and normal cough data, were negative. The performance of the model is 95% with 2 false negatives (Covid positive is identified as negative) and 75 false positives (negative samples identified as positive).
In another experiment with segmented cough i.e one or more cough segment are extracted from a given recording, performance of the COVID-19 model has an accuracy rates over 85% and in one case going up to 93% for Training and 86% for Validation. In this experiment each cough segment is passed to the model for inference. Decisions are made based on majority voting among different models, as well as the confidence level of prediction by the model.
The ROC curve and improving ROC Model accuracy
ROC model evaluation (Receiver Operating Characteristic) and improving performance based on the ROC curve are critical next steps To enhance performance and reduce misclassification, we are working to establish the threshold of confidence level using ROC and the number of pure neurons responsible for both types of classification rather than relying on inference from the model.
Additionally, because the number of training samples for ROC model validation is limited, we’re pursuing an ensemble model that leverages all the models which may enhance performance significantly. Usually, for a two-class classifier, the model generates a class for which the probability (we consider as confidence level of inference) is more than 0.5. In our analysis we observed that majority of the correct classification have high confidence and false classification have value around 0.5 though there are exceptions. We define a threshold of confidence to reduce misclassification and the threshold is defined based on ROC curve. When the confidence of classification is more than the threshold in that ROC curve, we accept the class. Otherwise, we mark it as inconclusive. In some of the samples, the confidence levels from multiple models are different. In such cases, the decision is taken based on majority voting. This helps to reduce ambiguous decisions.
We also analyse neurons responsible for different classes. We take the dense layer and find which all neurons are activated only for Covid-19 positive classes and only for negative classes. We call those pure neurons. During inference, we check the percentage of pure neurons activated for the given class along with confidence score for decision making. This also helps us making better inference.
Learnings and gaps
- While the model performance is reasonably good with a small validation set, it will need more acoustically defined data for a decision making with high confidence. We plan to get at least 300 samples of each category to train the model.
- All positive samples are actually tested positive, but negative samples are collected from random healthy subjects who did not have the test outcome. Hence, some of these negative subjects may actually be positive without symptoms. This problem can be addressed if we have samples who are tested negative.
- We are not suggesting any guidance for cough recording with the assumption that when we have large number of samples the differences in recording will not affect the outcome.
- Age and gender-based analysis may generate better inference but that is possible only if we have a good number of samples in each group.
- Along with a model-based approach, we need to work on finding parameters from audio and spectrogram of single cough recording that may differentiate the two types of samples. This will be helpful to find correlation between conventional parametric approach (acoustic analysis of cough) and model-based approach. Moreover, the parameters can be used as input to ML/DL models for decision making
- When the complete recording is used for inference, we got inconsistent inference in which multiple samples of same subject (taken within a short period) showed different outcome. For some cases, it happened due to either intra-subject variability in the cough samples or noise in the data. This was addressed by considering single cough segment-based analysis.
Can society safely unlock and achieve a “new normal”?
Key use cases
We foresee two important use cases.
First, screening for medical interventions. These would be done at Hospitals by trained medical staff for treatment of infected patients.
Second, screening in public places like Airports, Malls, restaurants and beyond to arrest the spread.