October | 2015
In my last blog, I stressed on the importance of understanding and defining the problem before getting on to execution.
Subsequent steps after ‘problem identification’ is setting up infrastructure, getting access to relevant data based on the problem and scope and follow it up by running the detection models to deliver business outcomes. Let us now delve into these stages briefly:
Step 2 – Data Handling
Although this seems to be an operational step, it requires a fair bit of attention and effort as the investment made here pays off by avoiding re-work and creating bottlenecks in course of the process. Typically, data could be sourced from anywhere – the ERP, business systems or log data, timesheets, HR records and so on. Some or most of the data would require cleansing, where in, you might need to identify proxies for the missing data. At this point, it is advisable to ensure compliance to privacy regulations, tokenize any sensitive information and verify that you conform to jurisdictional requirements especially when handling personnel data, health records or financial records. In addition, when working with log data – network, physical and other access logs from multiple locations and servers, consistency across data sets should be maintained. For example, even the date formats should be looked at as inconsistent formats can lead to twisted results.
Step 3 – Model Execution
Execution of detection algorithms is the core of the process. This step flags anomalous records for investigation. To execute this without any hassles, a data scientist should ideally work closely with the business analyst to understand the data problem and iterate between machine learning models so as to arrive at the best performing one. What is critical to have here is a decision on desired levels of precision and recall based on the impact of missing a true positive (anomaly) vs. the cost of investigation from false positives. We have seen this to be a function of whether the business in question has recently burnt its fingers through an act of fraud and hence is prepared to live with greater false positives and extra investigation burden as long it can be assured of catching all frauds – for example, in areas such as data theft.
Step 4 – Investigations and feedback loop
The proof of the pudding can be established only through results of investigation of the anomalies shared with the business team. What is important here is to recognize that the feedback from these findings will go a long way in improving the detection models in subsequent iterations as well as reducing the trade-off between precision and recall.
Identifying and fixing process gaps is also critical. For instance, in the area of workforce analytics for a large professional services company, the findings established a negative correlation between management presence and incidents of credentials sharing. A process fix was then recommended directing a minimum threshold of supervisory presence or visits in all locations, to deter deviations.
To conclude, while a solution built on open-source components is easy to run, the program would need executive sponsorship, collaboration between teams and definitive success criteria. Businesses should keep the overall vision in mind, ask the right questions and accept that course correction would be part of the journey to building a robust fraud detection model.
Bhavna Sachar, Senior Manager – Product Marketing, Wipro HOLMESTM
Bhavna leads product marketing for Wipro HOLMES Business solutions. She is responsible for crafting compelling messaging and positioning across the solution suite in order to bridge the gap between engineering and sales functions. She also drives sales enablement and works with various teams to create content that differentiates Wipro HOLMES™ in the market. She has 10 years of diverse professional experience in areas of business, process & organizational consulting and customer centricity programs.
© 2021 Wipro Limited |
|
© 2021 Wipro Limited |
Pharmaceutical & Life Sciences