This post is part of our Galvanize Capstone featured projects. This post was written by Sanhita Joshi and posted here with their permission.
The goal of this project is to detect signs for financial fraud detection from expense data collected from individuals whose cognitive abilities are compromised and have someone taking care of their finances. The data used is a test set of data from the State of Minnesota, provided by Michael Curran of Guide Change. Minnesota is currently the only state that collects this information digitally. The intent of the data analysis is to learn patterns in elderly fraud and alert authorities about how these frauds happen.
Detecting Elderly Fraud Now
Previously, to detect elderly fraud, the state government used red flags such as a charitable donations over $100 or if a single cash transaction is over certain amount. This is not a very clever way to handle such a big and varied dataset, and there are too many red flags. This outdated process is a waste of resources, like accountants’ time and, in turn, tax-payers’ money.
The other difficult part of this system is that cases where the person has a monthly income (or estate) less than $3000 are not investigated. Therefore, it is important to find outliers within low-income households as well in our analysis.
Although some of these cases might be investigated in detail, there is no information about that in the dataset. Hence, the challenging part of the project is that this is unlabeled data, and this is going to be unsupervised learning.
Improving Elderly Fraud Detection: My Algorithm
For this project, I designed an algorithm to read all the expense transactions of these individuals and find outliers. For the purpose of the project, I made two dimensional plots for each expense category: X-axis is a normalized expense in the category and Y-axis is the number of transactions per day. Each point represents one individual. Because each plot shows outliers, I designed an algorithm to quantify them:
- For each category, distance between each pair of points is calculated.
- Number of neighbors within the median distance (from step 1) were counted.
- Each individual is ranked depending on the number of neighbors they have.
- All ranks were aggregated for quantifying outliers.
The figure below shows aggregated plot from this algorithm. Half the points are blue, or ‘normal,’ and the other half are red, or potential outliers.
Reading the Plot and Why This a Powerful Algorithm
1. From the individual expense category plots, limits on each spending category can be determined. Currently, the red flag for individual expense category is set arbitrarily.
2. This algorithm helps find frequent, yet low-dollar value, transactions; clever fraud might be buried within high-dollar value transactions.
3. The boundary between blue and red points is not very sharp; this implies that the algorithm can detect potential fraud that wouldn’t be noticed with simplistic red flags, like described above.
Now that we have a regular spending pattern, it can be used:
- to inform authorities about outliers that may point to fraud
- to help the elderly budget better
The idea with this project is to identify people who may have been defrauded and find patterns from their data to help the vulnerable elderly population. Going forward, this algorithm can be used to do just that.
For more detailed information, please visit the project’s GitHub.