Almost everyone has heard of artificial intelligence and machine learning. We all know that somehow the suggestions we see on streaming services or e-commerce sites are somehow calculated using algorithms and not manually curated by humans in cubicles. You might also know that in many cases job-seekers’ resumes, applications, and cover letters are read and screened by AI before they even get in front of a human. How exactly these things work is hidden from us, but we know they are powerful and assume they are somehow “beyond” us or only for organizations with massive budgets.
Today, the technology behind these algorithms is accessible to even smaller businesses and organizations, and it gives a real advantage to those who leverage it. In this post, you’ll get an accessible overview of how machine learning works. I also hope you’ll have the beginnings of some ideas for transformation of your work using these tools. I’ll intentionally avoid jargon and replace some technical terms with everyday language—you can learn more about the jargon and field-specific terminology in the other machine learning posts.
Learn How it Works—In Under a Minute
While the math and algorithms are complex, understanding the process is actually pretty easy:
- A human defines the task for the computer by describing the goal (see below) and possibly also selecting an algorithm. Depending on the tool used, this could be a few minutes of clicking in an AutoML tool or it could be writing code that leverages existing widely-used algorithms.
- The algorithm is given a training dataset to learn from. This could be a database of pictures alongside text labels of the photos or it could simply be a database or spreadsheet of rows and columns. In the same way that humans learn from taking in the world and from experience, this dataset is the “experience” you want the computer to look at and learn from. After training, the algorithm will have “learned” something from the training data and be ready to apply it to new things. This learning is actually just tweaking a bunch of numbers that often start as zeroes in the algorithm’s initial framework. The numbers are picked so as to best “fit” the “experience” of the training dataset.
- We apply the algorithm in its post-training “learned state” to some real-world examples that we didn’t let the computer learn from during its initial training. We intentionally held back some real-world examples so the computer couldn’t cheat and simply memorize the answer for each and every example we initially gave it. That’s not intelligence nor is it even artificial intelligence—that’s simply having a good memory. We need to find out at this stage if it has learned something that can be generalized to new situations or if it is actually pretty worthless when it comes to things it hasn’t yet seen.
- If everything is satisfactory, the trained algorithm can begin to be applied to business or organizational processes—ultimately to working smarter instead of simply working harder.
Essentially, this is not all that different from the way we as humans learn about the world. We are constantly processing and reprocessing our cumulative experience to date in order to be ready to understand things we haven’t seen yet and to make predictions. Interestingly, just as humans can have bias, overconfidence and other similar cognitive blind spots, machine learning has analogous traps and there are important best practices for avoiding and working past them. One of the fantastic rewards from understanding machine learning is the insights it can give us into understanding the process of our own human learning, the reasons we sometimes get things very wrong, and the experiences most likely to propel us quickly to a better and more accurate understanding of the world.
A Real-World Example
Suppose you have 5,000 people who have indicated preliminary interest in what you provide: products, services, education, etc. You may have gotten these names from a form on your website or from people stopping by your booth at a conference/event, or they may be people who have purchased, attended events, or volunteered in the past. Suppose you need 200 people to take a specific action in order to hit your goals. The action could be a purchase, enrollment, commitment, etc. One way to achieve this goal is to blast out an email to all 5,000 people once a week and see what happens. If that only gets you to 75 people, how do you close the gap between 75 and your goal of 200? The natural thing might be to pick up the phone and start calling to build personal connection, explain what you provide, and explain the advantages you offer. Based on past trends, you know that 1 in 20 people you talk to will go to the next step and take the action you are hoping for. You’re going to need to call 2,500 people because if you get 1 in 20, you’ll get 125 from that 2,500, and you’ll close your gap from 75 to the goal of 200. (For more on this kind of math, see the blog post on funnels.) It will take you eighteen 8-hour work days to call 2,500 people on the list and you’ll be done. Have fun! Have fun? Let’s be honest: it’s going to take more than eighteen days because you’re not going to enjoy all that cold calling, and you’re going to want to do it in 2-hour chunks rather than 8-hour chunks.
One day two of calling, you realize that perhaps some of the names on the list are somewhat more likely to be the people you’re looking for. You run an analysis on past data and you find that people who gave their name at an event and already had a brief conversation in person are 30% more likely to take the next step you’re hoping for compared to people who just filled out a web form. This is great news—after crunching some numbers you estimate that if you call these event attendees first, you’re only going to have to call 2,000 names on the list instead of 2,500 in order to hit your goal of 125 more people.
Enter machine learning: let’s take your earlier insight about some people being more likely to proceed compared to others and put it on steroids. It turns out that the predictors of someone being more interested, more likely to take the desired action are multifactorial, which is another way of saying that a number of different things are predictors. In a sales database your predictors might include having made a prior purchase, having spent more than 20 minutes on the website, having asked a question via website chat, proximity to your physical location (ZIP code), and a host of other things. In a prospective student database, it might include number of prior email clicks, how they found out about your school, whether they have responded to prior text messages, whether they have an in progress application, etc.
With machine learning, you create a training dataset and give it to the algorithm. First you filter your database for training data: people who have been in the system long enough that you know whether they were a “yes” or a “no”. Someone who has purchased or enrolled in the past is a “yes” and someone who indicated interest but has become stale (no longer reading emails, no recent website visits) is a “no” because we don’t think they are ever going to become a “yes”. Someone who only indicated interest a month ago isn’t part of the training dataset at all because you don’t know if they are going to ultimately be a “yes” or a “no”. Then you figure out what data you have on these people, and you give the algorithm all the fields you believe might be associated with or predictive of someone taking the next step you are hoping for. These possible predictor columns are called features, and there’s one final column which is the “yes” or “no” column that is ultimately what you want the computer to learn to predict. If you want to think statistics, the feature columns are sort of like the independent variables and the yes/no column is the dependent variable and we want the computer to learn to predict the likelihood of a “yes” from all the other information we have on a person. ML has a number of advantages over classical statistics and logistic regression, especially if the goal is all that matters and you aren’t trying to write up a research paper on the topic.
Note that in situations involving employment or enrollment, you absolutely want to leave out of the algorithm anything that would introduce bias. Leave out columns like date of birth, age, gender, race, ethnicity, and income if you have that data. Leave out anything that might be reasonably associated with these because your algorithm may will bring unacceptable bias into its predictions this way. In sales settings, this is less of a big deal because how you prioritize marketing and outreach is up to you. In settings where you are offering an opportunity like employment or education, you really need to make sure your algorithm doesn’t end up disadvantaging people based on demographics.
Real-World Outcomes
I’ve worked on problems very similar to the example above multiple times, and each time machine learning has been able to create and validate a predictor that really helps focus the time-consuming personal outreach on the people most likely to be interested. Here’s an example of the results from one such analysis:
- The algorithm identifies the top 150 of 5,000 people as being 1-in-4 likelihood to take the desired action. Compare that to the overall 1-in-20 likelihood. Call these 150 people, and you’re likely to have 37 people taking the desired next step in no time.
- The algorithm identifies the next 900 people on the priority list as being 1-in-9 likelihood to take the desired action—still way better than 1 in 20.
- The algorithm identifies the remaining ~4,000 people as being 1-in-35 likelihood to take the desired action.
If you’ve heard of the Pareto principal before, this should remind you of the 80/20 rule. What this ultimately means is that in a few hours of machine learning work, you’ve cut your expected workload from 2,500 calls to just 942 calls. It’s your choice: you can spend 146 hours calling 2,500 people or you can spend 2-3 hours building a satisfactory machine learning model and then spend 55 hours calling 942 people.
There’s also some psychological momentum your team will have when most of the people you talk to on the phone are really interested and 1 in 4 take the next step you’re hoping for versus only getting 1 in 20. It makes the call campaign fun and encouraging instead of annoying. And you end up not wasting the time of people who aren’t all that likely to be interested in what you offer (the 1-in-35 people referenced above). Everybody wins. You can also use machine learning to figure out who gets mailed promotional material given that is not free like an email blast is. Use your mailing budget the best possible way and have a greater proportion of your material actually read and considered rather than immediately trashed upon arrival.
One final note: if your ML solution or ML consultant is ability to provide you with explanability data, you’ll also get some indication of which variables matter most in prediction an outcome. While not absolutely essential, the insights from this usually spawn productive strategy discussions.
Getting Started
If working smarter sounds great, the good news is that it’s never been easier to get started. More and more databases and platforms are coming with some basic machine learning tools built in. For one example and a sense of the effort involved, read the blog post on Salesforce’s AutoML product, Einstein Prediction Builder. At least half of widely-used CRM platforms have something like this although each goes by a different name and is of different quality and capabilities. There’s a real chance that you’re already using a platform that has the possibility of basic integrated machine learning pipelines. There’s still a learning curve to it, but built-in tools like this can be a real advantage even if you need some initial consulting to get it set up or to train your team.
Even if the cloud platforms you currently use don’t have anything built in, there are integration tools that can get you running with machine learning regardless of your current technology platforms—even if your current dataset is as basic as a spreadsheet. You can also test the water at relatively low cost with a one-time offline machine learning analysis of your database. Get all your records/contacts “scored” with a one-week turnaround, and then work through the list to see for yourself the ROI potential in your industry before going all the way to a live and integrated ML pipeline. United InfoLytics is ready to talk about consulting solutions that fit your budget and are sure to provide near-term ROI and cost savings.
Further Reading: Types of Machine Learning Tasks
Artificial intelligence and machine learning both aim to approximate the ways that humans approach certain learning tasks. Without going into the difference between AI and ML, you can rest assured that this is not about creating a machine that approximates or exceeds the whole of human intelligence but instead to use learning algorithms to get computers teach themselves to become good at a specific task that a human has defined in advance. It turns out it’s often better and easier to write computer code that teaches a computer to teach itself than it is to write code where the programmer directly teaches the computer all the human knowledge that applies to the task at hand. The most common types of machine learning tasks include:
- CLASSIFICATION: answering the question “which is this?” or “what is this?” You give it a photo, it returns “dog” or “cat” or “banana.” Or you give it a the name, length, actors, producer of a movie and it predicts the category “thriller” or “drama.” Many algorithms go further and return probabilities of the different classification labels such as “99% chance it’s a dog but 0.7% chance a cat and 0.3% something else.”
- BINARY CLASSIFICATION is a special case of classification where there are only two possible responses. It answers something similar to “is this a yes or is this a no?” Every transaction you attempt to make with your credit card goes through a lightning-fast binary classification system that returns, “legitimate” or “fraudulent” in a few milliseconds. More often it’ll return a percent chance of a transaction being fraudulent rather than just a simple “yes” or “no.” In sales or enrollment, you can using use binary classification to predict the chance that each outstanding sales contact or each prospective applicant will ultimately purchase or enroll.
- REGRESSION: Estimating or predicting an unknown number using known information. This could be predicting a student’s end of year test score or estimating the market value of a home using known data about the student or the home. The main difference from classification is the output being a number not a label or probability of a label.
- CLUSTERING: Identifying similar records in a database, grouping things by similarity. I realize this may sound a bit like classification above but the key here is that we aren’t coming up with labels that coherently describe the class of things in human language—we’re just saying that they are similar. Clustering is one approach to the recommendation engines like you’re used to in online shopping or streaming sites. These sites generally don’t need to be able to put a precise label on a grouping of things as just knowing they are similar is enough to recommend them.
- FORECASTING: Look at what has happened in the past and where things are right now and predict at regular intervals what will happen in the future. Think weather forecasting or predicting the movement of the price of a stock.
These are all things that you and I can do with our human intelligence. Why bring a computer into it when we can do it ourselves? The advantages of machine learning are that it can do these tasks very quickly and sometimes even more accurately than the average human can. The speed factor means we can accomplish more with less effort and not spend time doing work that a computer can do for us. The accuracy advantage, which is not guaranteed but is often possible, means that the computer ultimately gets better than most humans at a very specific task. Even if the algorithm turns out to be 5% less accurate than a human, the speed with which it can approach its work makes it all worthwhile, and the algorithm can refer the situations that it is less confident about to a human for manual examination.