In fact, some scientists have already issued a warning about the overuse of machine learning with some coining the term “AI solutionism” as the (flawed) view that AI will be able to solve all existing problems in their entirety.
Tech business is no different. Expectations surrounding machine learning are still sky high. It is no mistake - machine learning is definitely an extremely powerful tool. But there are limitations to its application. In some cases, the old school rule-based or manual review approach will deliver better results with less financial burden.
Statistical limitations in machine learning
Before we get onto the specific applications where machine learning is best aided with other approaches, we should understand that there are certain statistical limitations hidden within the process.
One of the most important limitations of machine learning in business is concept drift. Machine learning models are constantly evolving stochastic (all results are predictions with some randomness, not purely deterministic outcomes) processes. As the data changes, models will not be able to catch up, causing less accurate predictions over time.
Another statistical limitation is associated with data science and insight generation. Machine learning models can be used to “read through” colossal amounts of data between different sets in order to discover correlations. However, as the mantra goes, correlation doesn’t equal causation. Without going in-depth into what the machine learning found and attempting to match the correlations with human reason, we would be risking going awry with interpretations, as not all methods are easy to understand, leading to detrimental outcomes.
Finally, due to partly design, partly computational limitations, reinforcement learning (RL) is limited as well. While it has been able to achieve seemingly impressive results in incredibly complex applications (e.g. video games such as Dota 2 or StarCraft), there are still huge drawbacks associated with optimization algorithms. Theoretically, RL models can achieve either a highly skilled application of one particular strategy or a decent application of several different strategies. How these RL limitations might cause future issues is another topic for another day.
Machine learning limitations in business
Outside of the realm of abstract limitations, there lies the practical issues. In business, we are generally heavily constrained by financial and temporal constraints. Often, we are limited in what, where, and how efficiently we can apply machine learning models.
Even if we forego both abstract and practical limitations in machine learning, we would still have reason to apply manual or rule-based approaches. Humans reason in a completely different manner from machines.
Models can usually very accurately and quickly evaluate trends and historical data. They can then provide reasonable accurate statistical predictions about the best business decisions. However, they operate within a narrow domain and through a single layer of logic. Essentially, machines are detail-oriented-small-picture thinkers.
Humans are different. We can zoom out and take into account the entire view of the business. In many cases, a big picture outlook is necessary as business decision making isn’t as simple as pulling up a few numbers and arriving at a conclusion. At least in many cases it isn’t.
Additionally, machine learning models cannot transfer experience (or information) from one domain to another. While there are ideas circling about transfer learning, we have yet to arrive at a satisfactory way to do transfer skills and abilities between domains in machine learning. Humans, on the other hand, can support their decision making and abilities from a wide variety of domains of expertise through the use of heuristics.
Therefore, we shouldn’t be aiming to solve everything through machine learning. Old school solutions are not overshadowed by ML. They are supplemented by it.
Machine learning against phishing
Let’s take a simple, practical example from the usual day-to-day business activities - emails. Everyone with some skin in the game has received a phishing email. However, we don’t see most of them as they are blocked automatically.
Phishing emails seem like a great candidate for machine learning. They are plentiful, they look like the original but have key elements different or entirely missing, and they have few to some giveaway signs. All we need to do is label certain fields to check and the model will take care of the rest, right? Such an approach is far from optimal.
For one, phishing emails are a little like airplanes. Failure can yield drastic consequences. Thus, we wouldn’t accept a 1% failure rate of an engine. Similarly, we don’t want a 1% failure rate of anti-phishing processes as leaking data from just one account can lead to many quickly compounding issues.
Unfortunately, machine learning models are hard to bring over 99.9%. In fact, for most businesses and processes even reaching 99% will be outside the realm of possibility. Only true tech giants and academic institutions should even attempt to reach such heights of accuracy. Add concept drift into the mix and the challenge goes from nearly insurmountable to impossible.
As mentioned above, having a failure rate above 1% would be unacceptable in anti-phishing practices. Yet, training a machine learning model to 99% is already a hard task. Instead of getting too hyped up with machine learning, we should be supplementing it with other approaches.
Integrating into rule-based and manual review processes
I might have sounded like a true machine learning sceptic. However, it’s not all doom and gloom. Going back to the video game example, something I left out was that the developers had specifically set a communication “ping” and a delay between inputs. Machine learning models can achieve greater efficiency in certain areas (e.g. in reaction times) than any human ever could.
Deep Q-Learning machine learning models completely outmatch humans in certain games. Source.
If machine learning models are better at some aspects of video games, the same will be true for other areas. Phishing will be no different. Machine learning models will be able to detect inconsistencies that could be difficult to spot for humans and vice versa. While it may be possible to develop some Skynet-level antiphishing model, businesses usually have other things to consider - mainly, the costs of such an undertaking.
Rule-based approaches have many benefits. For one, they have been utilized in numerous industries for a considerable period of time, allowing effortless access to best practices and domain-specific knowledge. Additionally, implementing a rule-based approach into (returning to our previous example) email protection is going to be tremendously more cost-efficient than training any machine learning model.
These approaches are also more transparent and accessible. Machine learning models are black boxes by design. Figuring them out takes considerable amounts of time. Compare the nature of these models with a rule-based system: even extremely complex rule-based systems can be quite quickly understood by someone experienced in that particular industry.
We don’t really know what’s going under the hood of a particular model. Source.
Additionally, the inclusion of manual reviews might be a necessity for some aspects of business (e.g. anti-fraud). Some regulations might require businesses to comply with consumer requests for manual reviews if something has been flagged automatically.
However, manual reviews aren’t just for compliance with the law. They can be utilized to strike a better balance between red flags and letting things go through the system. As mentioned above, machine learning models are clearly better than humans at a specific set of tasks while humans outperform models at others. Thus, mixing rule-based systems, manual reviews, and machine learning models would allow a business to maximize revenue and minimize costs.
Finally, rule-based systems and manual reviews will come in handy whenever a model needs new training data either due to concept drift or “bad” training data. Both can produce unpredictable results over time, necessitating a return to the tried and true practices.
Instead of living through the hype attempting to solve every issue under the sun with machine learning, we could just integrate that model into our current processes. Processes that already have rule-based approaches and manual reviews in place. It’s a lot like the, hopefully on the brink of the end, recommendations for protection against COVID-19. Taking a single precaution (such as washing your hands) does fairly little but combining all of them together creates a formidable aegis.