Marketers continually focus on ways to increase their effectiveness, whether by increasing response rates to email programs or reducing the average cost per sale.
Selecting the appropriate target audience effectively is a key to fulfilling those goals. With the wider availability of data sources such as social media and web analytics, marketers are moving more toward using predictive analytics to help separate likely respondents from non-respondents. The main tools in this effort fall in the category of machine learning.
The adoption of machine learning can be demonstrated in many ways, but one of the most dramatic statistics comes from Google search query volumes. Interest in “machine learning” has doubled in just the past four years, leaping from an index of 45 to 100 as of April 2015 (source: Google Trends). Similarly, it is estimated that marketing departments will outspend traditional IT departments in total budget expenditures for the technology to handle these demands.
The Hidden Issue of Machine Learning
What many marketers don’t realize is that the most common approaches to machine learning, where only one algorithm or approach is used, may be providing results well below what they should be. Because the inner workings of predictive analytics are not immediately visible to most marketers, they are unaware that this is even a factor to consider.
Fortunately, the answer to the “one algorithm” approach isn’t terribly complicated. That answer is referred to as “modeling ensembles.”
Modeling ensembles can be defined as “the combination of multiple models to solve a particular problem.” In other words, one applies more than one model in a formal process. By combining multiple algorithms in a project, modeling ensembles can often provide substantial increases in performance.
The Right Tool for the Job
Imagine you are a carpenter. Your task is build a new home for a client. That client has provided you with the building site, the raw materials, ample time and a significant budget. That’s the good news.
The bad news is that your client informs you that you can only use the tools he provides. Worse, in what seems to be a complete break with reality, your client demands that you use only one type of saw. You can have as many saws as you want, but they all need to be identical- only one type of saw until everything is completed.
You panic- there should be circular saws for lumber, keyhole saws for cutting holes, scroll saws for cutting the trim, etc. When you raise the issue to your boss, she replies, “Just pick your favorite and use that for everything.”
The right saw for the job
The above scenario is ridiculous, of course. No competent carpenter would ever accept such a restriction. After all, the “proper tool for the job” is a mantra of craftspeople everywhere.
However, considering only one type of machine learning algorithm for an entire project can be just as limiting. Different algorithms have different strengths and weaknesses, and combining them or using them for different parts of the project can make a lot of sense.
For example, researchers John Elder (Elder Research) and Steven Lee (University of Idaho) compared five different algorithms used on five famous public datasets and measured the relative error rates of each one. While it shouldn’t be a surprise that different algorithms resulted in different error rates, it is surprising that “every algorithm scored best or next-to-best on at least two of the six datasets.” (Elder and Lee, 1997). Relying on just one type of algorithm could have been a very big mistake.
Dataset | First Place | Second Place |
---|---|---|
Diabetes | Logical Regression | Linear Vector Quantization |
Gaussian | Neural Network | Logistic Regression |
Hypothyroid | Neural Network | Decision Tree |
German Credit | Project Pursuit Regression | Logistic Regression |
Waveform | Neural Network | Project Pursuit Regression |
Investment | Neural Network | Decision Tree |
Also note that while Neural Networks ranked in first place on four of the datasets, it didn’t rank at all in two others, and was next-to-last for the Diabetes project.
When to Use a Modeling Ensemble
Modeling ensembles can make significant contributions in the following areas:
- Reducing the risk of relying on only one algorithm type
- Reducing the overall error rate by combining the scores of different algorithms
- Leveraging different algorithms’ strengths at different points in the process, such as using recursive partitioning/ decision trees for variable selection (because they are faster and easier to visualize) vs. a logistic regressions for final scoring (because of the smoother curve for the final score outputs)
- Solving unique problems, such as creating a “filter model” to remove likely “non-donators” from a project to predict final donation rates (removing the records who would likely contribute $0.00 makes it easier for the model to focus on differences in donation rates only from likely donors)
Drawbacks to Modeling Ensembles
Engaging in using modeling ensembles is not without its disadvantages, however:
- Multiple models take a good deal more time to build and combine effectively than just building a single model
- Interpreting the results of multiple scores is more complicated than looking at a single set of scores
- A project will reach diminishing returns over time for each additional algorithm added