We started working on our anomaly detection suite of solutions two years back, with the platform being live at Wipro and our customers for well over a year now. Let us look back at how the product has evolved since we began. The ride started with working on Wipro datasets across process domains – to uncover anomalies and frauds in areas such as procure to pay, travel and expense claims, payroll processing, intellectual property theft and credentials misuse to name a few. The results were encouraging, led to significant effort savings and plugged leakages that previously went unnoticed. Below are some lessons learnt along the way
Challenges in data gathering
The first and biggest challenge in developing a robust Machine Learning offering is availability of real data that can be experimented with. As we expected, even with enterprise data available within the organization, there was need for jurisdictional regulations to be cleared, logical access to be enabled, identifying the right datasets, and getting access to it. We learnt to deal with messy data, data with quality issues and working with proxy data.
Moving from data to insight
With the business problem defined and datasets identified, next step was to get into data nuances and build business logic and detection models to find the anomalies. Data handling consumes substantial effort – from building mechanisms to ingest the data, to cleansing, transforming and tokenizing sensitive information. The core of the engine is the detection layer, which is a combination of business rules and machine learning algorithms. For each fraud type, our data scientists worked closely with the domain specialists to iterate between Machine Learning models, features and parameters to arrive at the best performing combination and get meaningful insights. Multiple models such as Logistic Regression, Modified Multi-Variate Gaussian, Boosting, Bagging using Random Forest were built and tested for their performance on various data sets and scenarios.