Fraud detection is one of the most challenging prediction problems for a machine learning system. At its heart, machine learning for fraud identification is a complex function that maps a set of attributes (IP address, address match, name match, typing speed, device among others) to a fraud outcome (like identity theft). This function is learned from historical training examples of attributes-outcome pairs. In order to train accurate fraud detection models, a large number of known fraud outcomes are required. In practice, no business wants to collect fraud outcomes by letting their systems be vulnerable to such attacks, which is where humans must step in. Human experts trained at detecting and confirming fraud provide the machine with valuable outcome labels. Unlike other applications like predicting clicks where the outcome variable (click for e.g.) can be automatically recorded, it often requires human intervention to confirm whether a transaction or an application was truly fraudulent. Outside of this fundamental requirement, there are other factors that make it imperative for humans to be closely involved in machine learning based fraud systems.
Machines can only predict future behavior that’s representative of the past
The effectiveness of a machine learning system depends on how well the model generalizes, or performs on previously unseen instances or inputs. This generalization depends on how well the training sample represents the unseen instances the model acts on. Unfortunately, the fraud game is inherently adversarial, so the problem isn’t stationary. As the system gets better at stopping old fraud strategies by learning historical examples, fraudsters develop novel attack vectors to beat the system. This severely limits the generalizability and shelf-life of a fraud detection model, so people are required to constantly monitor its actions and performance.
Humans are aware of context and capable of logical reasoning
Machines cannot incorporate new information well; they look at only what they are trained to look at. For example, if there is a security breach in a large phone service or email provider, a human can take that knowledge into account while reviewing cases. However, the machine cannot spontaneously update itself to react appropriately to the new information.
No business wants to collect fraud outcomes by letting their systems be vulnerable to such attacks, which is where humans must step in
Often times, such new information results from a feature that didn’t even exist when the model was trained like a vulnerable update to an existing software. In practice, when this happens, humans will design and deploy stop-gap rules to protect the newly-discovered blind spot. These rules provide relief to the fraud detection system while the model is updated to incorporate the changed situation.
Humans are also capable of trying out complex approaches as they review cases. It is common for humans to contact the applicant to confirm if fraud occurred. During the course of the conversation, the human expert might choose to do a variety of things. The human could (a) ask the applicant to answer some questions associated with their past, (b) ask for their social security number, (c) ask the user to complete a set of tasks sent through email in order to confirm the applicant’s identity. Machines, on the other hand, are not capable of carrying out such detailed, wide-ranging approaches that require taking action based on responses or feedback that cannot be predicted in advance.
An important consideration in fraud detection systems is how different cases link with one another. Most fraud detection systems create an underlying graph structure that connects cases through attributes they share. The hypothesis is that a fraudster attempts to get through the fraud systems making repeated attacks using different identities. However, in all such attacks there is some constant attribute, such as the device through which the attack was carried out, or the requested shipping address. The graph structure helps connect these otherwise disparate applications and identify them as an attack. Since this graph tracks a subset of all possible attributes at any point of time, it can miss connections simply because the attribute isn’t represented on the graph. A human does not have this constraint, and can improvise attributes not available to the pre-designed graph. In other words, humans are free to reason about anything, not just a predetermined set of attributes.
Humans are better at annotating outliers and inliers
Machines can easily detect whether a case is an outlier (i.e. different from the instances that the model was trained on). However, it cannot easily predict the outcome label as the machine or model was never trained on such an instance. Such outliers require human expertise and intuition, as mere extrapolation is seldom the right approach, and that is generally what a machine is limited to.
Similarly, when a certain attribute or instance is an inlier (i.e. is observed more than represented in the training sample); it could suggest an outcome that’s different than what the model was trained on. However, the model wouldn’t be able to confirm this as it requires additional context that a human needs to take into account. The human expert could help determine whether the inlier is a result of a data issue where all IPs were incorrectly recorded to be the same or a fraudster was repeatedly trying to attack the system from the same IP.
Machines can help address scale and manage human intervention through active learning
As a business scales, it attracts more frequent and novel attacks from fraudsters. It is hard for a rapidly growing internet scale business to hire and train the human resources required to keep up with the scale. Machines, however, can rank order and present only the most dubious transactions that require manual review based on the resources available. While there are different strategies to sort cases for manual review, most or all boil down to two goals (1) refer complex cases where the model is uncertain about its decision, and (2) get outcome labels that can help improve the next generation of the model the most.