Use of artificial intelligence (AI) techniques in various day-to-day business activities are slowly but surely becoming ubiquitous. The excellent capacity of these techniques to make mathematical models of complex, intuitive tasks, with impressive accuracy, has been reported time and again in media. It’s no wonder then, that, as per Adobe1, the proportion of enterprises using AI will double in 2019, from 2018.
One of the most commonly used tasks where AI is typically used is classification. In this task, the ask is to categorize numbers, images, text, into a set of pre-defined classes. Think about an algorithm distinguishing a cat from a dog in an image, and, you have a classifier. There are several, more useful and non-trivial cases for classifiers where they are used to identify credit card frauds or ensure no sensitive information is being leaked from an organization. In such tasks, the job of the classifier is to identify all cases of potential fraud or data leak while ensuring that no actual incident slips through.
In AI speak, the task is to ensure the machine catches all true positives while ensuring there are no false positives. The cost of cases not identified by such systems can be significant with organizations facing regulatory penalties as well as reputational damage. In spite of the potential costs involved, it can get extremely difficult to first reduce and then, completely eliminate false negatives from classification results.
How to reduce false negatives
A right approach to classification in AI helps deal with anomalies in the task. The effective approach uses a cascade of models to selectively reduce false negatives. The initial layer looks for both positive and negative classes, while the second layer looks only for negatives and any hidden positives in them. A short description of the steps involved in the classification approach are:
Validation of the multi-layered classification approach
In an experiment with four real-world datasets of emails tagged as alerts for containing potentially sensitive data, for the one of the files, the primary classifier was performing fairly well with a false negative rate (FNR) of 1.31%. Due to the criticality of the decision, isolation of false negatives was required.
The given classification approach was followed using principal component analysis (PCA) for transformation and dimensionality reduction, followed by support vector classifier (SVC) with radial basis function (RBF) kernel, to identify the false positives. This performed very well and reduced the FNR to 0.11%
The improvement in FNR, as usual, comes with a price. The false positive rate (FPR) which after the primary model was 2.65%, increased to 8% after the secondary model. This FPR was deemed acceptable.
The various metrics for all files are given in table 1. These were calculated by predicting the labels for the whole dataset and comparing them with the original labels. So, they represent the mean performance of the train and test datasets. We see that the mean percentage reduction in FNR is 78.97%, by the application of this approach.
Dataset size |
Post-primary classification |
Post-secondary classification |
|||||
Balanced accuracy |
FNR |
FPR |
Balanced accuracy |
FNR |
FPR |
||
File 1 |
44593 |
0.9810 |
0.0131 |
0.0248 |
0.9556 |
0.0011 |
0.0876 |
File 2 |
20000 |
0.9962 |
0.0055 |
0.0020 |
0.9621 |
0.0013 |
0.0746 |
File 3 |
39430 |
0.9996 |
0.0004 |
0.0004 |
0.9991 |
0.0001 |
0.0017 |
File 4 |
13313 |
0.9970 |
0.0048 |
0.0012 |
0.9972 |
0.0013 |
0.0044 |
Table 1: Performance metrics before and after the secondary classifier
This approach to classification, being probabilistic in nature, will be unable to ensure complete isolation of false negatives from the final output. It will only result in the reduction of this metric. For complete isolation, a deterministic set of rules will still have to be written. As in any such exercise, domain-specific expertise becomes critical and their involvement would be needed to devise these set of rules.
Conclusion
The primary advantage of including this approach prior to isolating deterministic rule set is that it can potentially lead to simpler rules which do not increase the false positive rate significantly. Additionally, the use of multiple classifiers allows each model to be relatively simpler than a single, highly optimized model. This would also lead to better variance metrics without significant reduction in performance metrics.
Reference
1 https://www.adobe.com/insights/15-stats-about-artificial-intelligence.html
Rohit Sardeshpande
AI Lead and Principal Consultant, HOLMES for Business, Wipro
Rohit focuses on applying AI methods and techniques in solving industry and business problems. He works in diverse domains, from banking to healthcare to cyber security, and identifies opportunities and develops products and solutions to build IP in that space. He is an expert in traditional machine learning and deep learning, with a focus on natural language processing.