Using analytics to predict fraud

Irish Tax and Customs tax officer describes how data mining fits in

Today, governments and their public sector agencies everywhere are under pressure to perform more efficiently and effectively; essentially, doing better with less. Tax and customs authorities are no exception, many of which are dealing with decreased resources and ever-increasing risks, often in difficult economic circumstances.

Traditional methods for addressing risk have served many authorities well, but there is now a need to use more advanced methods to combat fraud, error and waste. To arm themselves for this battle, more and more tax and customs authorities have turned to data mining and analytics to improve their business processes, resulting in better compliance with new rules and regulations and better customer service.

Ultimately, it is the taxpayers and citizens who will benefit the most if the public sector adopts data mining as part of its day-to-day business. So if analytics can help to reduce fraud, error and waste, then the taxpayers deserve nothing less.

Duncan Cleary
Senior Statistician in Revenue for Irish Tax and Customs

So where does data mining fit into the risk analysis toolkit of a tax authority? Business rules aimed at detecting risk – and the intelligence gathered from differing channels – have their place and can be effective. Add data mining to the mix and you've got a powerful combination to prevent and detect fraud and error.

Data mining can be defined as the application of the scientific method, including statistical analyses, to large amounts of data to uncover valuable information from that data. It can often detect patterns in data that cannot be recognized manually, as well as make predictive estimates of outcomes of interest, such as the likelihood of a tax return containing errors.

Broadly speaking, there are three types of data mining that can be used to combat fraud:

Supervised techniques (also known as predictive analytics), where a target is predicted.
Semi-supervised techniques, where some business knowledge can direct the analyses.
Unsupervised techniques, such as segmentation, which are exploratory.

Fighting fraud with predictive analytics

By creating a predictive model, predictive analytics uses a specific set of data that contains known outcomes for a particular target. This target could be likelihood to yield if a case is audited, the likely amount of yield, the likelihood of a business failing, the likelihood of a claim for benefits or refunds being fraudulent, and so on. Models perform better where the target has been clearly defined. Techniques for creating predictive models are numerous, but it is often hard to beat the well established warhorses: logistic regressions, decision trees and neural networks.

The real power of predictive models comes from their ability to score new cases against some target of interest, even if these cases or events have never been previously evaluated. Cases can be ranked in descending order of priority and worked according to resources and the severity of the risk. Feedback is critical to evaluating model performance, and improving models is an iterative and cyclical process. Feeding information back into the model will help reduce the number of false positives (false alarms) over time, as well as reducing the number of actual bad cases escaping attention (false negatives).

Many tax agencies are now using these predictive techniques, in conjunction with their other tools such as business rules and intelligence, to prevent and detect fraud and error. Some have even deployed these techniques in real time in their live transactional systems, including the Irish Tax and Customs authority.

Exploring fraud with unsupervised techniques

Unsupervised techniques can be a powerful means of understanding your case base. Often there is so much data available that it is difficult to understand the underlying structure of the population without using such methods as cluster analysis and segmentation. Especially if a target is not available, cluster analysis can help identify groups in the population that are alike within the group but different from members of other groups.

Once segmented, cases can be assigned a group membership. This label can be used to determine treatment strategies, identify service channel options and even monitor the effectiveness of a tax authority's efforts to change taxpayer behavior over time.

Combining methods: semi-supervised techniques

Additional insight can be gained from overlaying outputs from unsupervised techniques with supervised techniques and vice versa. Some segments might emerge as inherently more risky, thus semi-supervised techniques are often useful where some business knowledge can be used, even where only minimal training data is available.

Outlier detection, where anomalies are identified and investigated, also can be an important weapon in the fight against fraud. With any population, there will always be anomalous cases, many of which will be perfectly legitimate. However, some cases may point to fraud, and these can be identified and investigated.

The network view of the case base is also becoming increasingly important in detecting fraud and error, with group risk and risk propagation through a network of connected entities becoming increasingly easier to detect using techniques such as social network analysis. Unstructured data – including text, voice, image and spatial data – has just begun to be used for fraud detection on a large scale, and its importance and usefulness will undoubtedly grow in the future. Since tax authorities have the unusual position of having population data, not just sample data, there are few limits to how data mining techniques can be used to improve their performance.

Making analytics a core part of your processes

So what is there to prevent the use of data mining techniques in a tax authority? There are many potential obstacles: lack of good quality data, data that has not been integrated or merged, lack of skilled resources, lack of senior-level sponsorship, IT challenges and cultural challenges.

Do not let these – or other issues – stop you from using advanced analytics to detect fraud, or stop you from using data mining to improve an agency's performance. Starting with a small achievable project with a clearly defined goal can often be the first step on the path to success. The results do not need to be spectacular, but if they show how data mining can add value and potentially reduce fraud and error, then the case will be made and analytics can start to become a core part of the agency's business processes.

Ultimately, it is the taxpayers and citizens who will benefit the most if the public sector adopts data mining as part of its day-to-day business. So if analytics can help to reduce fraud, error and waste, then the taxpayers deserve nothing less.

Bio: Duncan Cleary is a Senior Statistician in Revenue for Irish Tax and Customs. He specializes in the use of research and analytics methodologies and their application in the Irish Tax and Customs Authority, including the use of predictive analytics, customer segmentation, risk analyses, large scale surveys, evidence-based decision support, social network analysis and real-time risk.

Challenge

Like many tax agencies, the Irish Tax and Customs Authority needed an affordable solution to predict and prevent fraud.

Solution

SAS^® Fraud and Improper Payments

Benefits

The Irish Tax And Customs Authority used SAS along with traditional fraud detection methods to reduce fraud and ultimately reduce costs to the Irish taxpayer.

본 문서에 나오는 결과는 본 문서에 설명된 특정 상황, 비즈니스 모델, 데이터 입력 및 컴퓨팅 환경에 적합하게 되어 있습니다. 각 SAS 고객의 경험은 고유한 것으로, 비즈니스 및 기술적 변수에 따라 달라집니다. 따라서 모든 서술은 비전형적인 것이라는 점을 고려해야 합니다. 실제 절약, 결과 및 성능 특성은 개별 고객의 구성 및 조건에 따라 달라질 수 있습니다. SAS는 모든 고객이 비슷한 결과를 달성할 수 있다고 보증하거나 진술하지 않습니다. SAS 제품과 서비스에 대한 유일한 보증은 해당 제품 및 서비스에 대한 서면 계약의 보증서에 명시되어 있습니다. 본 문서의 어떠한 내용도 추가 보증을 구성하는 것으로 해석될 수 없습니다. 고객은 SAS 소프트웨어의 성공적인 구현에 따라 합의된 계약적 교환 또는 프로젝트 성공 요약의 일환으로 성공 사례를 SAS와 공유했습니다. 브랜드 및 제품 명칭은 각 기업의 상표입니다.