A review of Salesforce Einstein Prediction Builder

Salesforce isn’t exactly known for AI or Machine Learning (see Gardner magic quadrant), but they are working at it. I was pleased when in 2020 I discovered their Einstein Prediction Builder, which is (of course) an additional charge, but is available for basic usage for free. Any current Salesforce customer can use their “Try Einstein” program to see what is possible and actually do some real work with it too. Even if you don’t want to pay, you can start using Prediction Builder to model up to 10 different prediction tasks. As of date of writing (Early 2023), you can only have one turned on and active at any given time, meaning the free level of this service only allows one prediction to be populating predictions onto objects in real time as they are created or modified in the system. If you currently use Salesforce, read on to learn more. If not, United InfoLytics can still help you with your Machine Learning and Predictive Analytics needs across other platforms as well.

TL/DR Summary: there’s a bit of a learning curve to ML but you owe it to yourself to try it using a simple and affordable AutoML tool like this either by setting it up yourself (read carefully below) or getting our team to help you set up a prediction for you ensuring best practices in independent variable selection, training dataset filtering etc.

Types of Predictions

What sorts of predictions is Einstein Prediction Builder capable of? Basically there are two types of predictions: a binary (yes/no or true/false) prediction or predicting a number through regression (this option is currently in beta). For the binary predictions, it involves making a prediction of the relative probability of an event occurring. For those using Salesforce in sales, this would often be the chance of a given lead making a purchase, or the chance of a quote becoming a deal. Basically it involves looking at a Salesforce object (Lead, Contact, Account, Opportunity, etc.) and making a prediction of the probability of an event occurring, usually a desired outcome like a student enrolling or a customer making another purchase. For the second type or prediction, it involves making a prediction of a number like total sales over the next 12 months across all current customers. This is a bit fancier because each prediction is a number (can be negative or positive, large or small) instead of the binary predictions being a relative probability between 0 and 100. Regression can be used to predict sales of $100 or $100,000 all using the same prediction builder.

Review of Einstein Prediction Builder

There are some things that Salesforce clearly got right with this product. Even if you don’t have any specific AI or machine learning training, you can get started with this, and it mostly works even if there are a few bugs you may encounter. In a sense, they might have one of the more user-friendly user interfaces for getting your feet wet in predictive analytics. And if you’re already on Salesforce, there’s no reason not to give it a try. Your data is already in the system, and there are no flat files to export and re-import into another machine learning system, no APIs or data exchange to set up. Give yourself 2 hours to play with it and you might create something really useful! Or if you don’t have the two hours, let United InfoLytics do it for you! With minimal investment, you may end up with a really useful system that helps you focus your efforts on the accounts and contacts that are most likely to purchase, enroll, participate, give, etc.

Basically the process for setting up a binary prediction is:

  1. Pick an object you want to make predictions on. All the fields you want to reference must be on this object, and currently you cannot reference related objects. If you need to reference a related object, create a formula field that pulls in this data first.
  2. Define the training set. This is the data that we now know the answer to: which ones were a “yes” and which ones were ultimately a “no” or a “not yet.” Tell it which accounts, contacts, leads, or opportunities are “water under the bridge” such that we have a “yes” or a “no” or some reasonable estimate on this. You would want to, for example, exclude a lead from the training dataset if it came in over the last 60 days. We shouldn’t call a new lead that came in yesterday a “no” just because they haven’t purchased yet!
  3. Define a “yes” and a “no”. For many, this will be whether there was a sale or not. Help Salesforce know what criteria define a “yes” and they’ll assume all the others in the training set are “no”.
  4. Tell Salesforce which fields to use to make predictions. This is possibly the most tricky step, and it’s also the most important. You should include anything that you think might be reasonably likely to predict the outcome in some small way. You should be careful not to include things like race, ethnicity, gender, etc. if the prediction needs to be equitable and not learn biases from the training data.
  5. Think carefully about excluding fields that allow in some hindsight bias. Most important in the prior step is that you must avoid hindsight bias by excluding any fields that are in some way linked to a salesforce object taking the desired outcome. Let’s say that phone number is not generally gathered on most of your inquiry forms whereby new leads and contacts come into the system, but the phone number is always gathered as part of the sale process. If you allow prediction builder to use the phone number field to make predictions, it’ll do a very funny thing: it’ll say that the most powerful predictor of someone making a purchase is whether they provided their phone number vs. leaving it blank! But think carefully here: it’s not actually true. The fact that they made a purchase is what caused them to enter their phone number and not the other way around. My rule of thumb here is simple: include fields that are usually gathered prior to the desired outcome occurring and exclude any fields that are rarely or only sometimes collected but later fully populated after the desired outcome.
  6. Hit go and grab some coffee. Once you are done defining your prediction, it’s time to let the machine do some “learning.” This is basically where the machine learning algorithms start working looking at all the training data where you supplied a “yes” or a “no” and developing a model for which other fields predict the desired outcome being more or less likely. Right now, the algorithms it currently uses are (I believe) just Logistic Regression and Random Forest. It tries them both and figures out which one does a better job on the data! It may take 30 minutes to several hours for it to finish the learning process. Just go and start another task and occasionally come back to refresh and check whether it has completed.
  7. Check the prediction scorecard. Salesforce has made a pretty easy to understand prediction scorecard that’ll help you avoid the two main problems you may encounter: a prediction that’s too good and a prediction that’s no good at all. If your prediction comes back too good, changes are high that you’ve allowed Salesforce to use a field that is tainted with the hindsight bias discussed above. If your prediction comes back pretty bad, you’ve either defined the machine learning problem poorly, or you’ve not given it the features that are most likely to be helpful and predictive, or your prediction problem just isn’t amenable to the data you currently have in your system and living on that object. Sometimes feature engineering here is helpful, like pulling in summary or calculated fields off of other objects. By adding a few additional “features” to your prediction, you may get better performance that starts to make your prediction worth using in real life to prioritize leads, contacts, and accounts for focused follow-up, calls, emails and other outreach.
  8. When satisfied, turn on the prediction! If your scorecard comes back good, check to make sure that the features it says are good predictors make sense to you as meaningful predictors in your experience in the field you are working in. Anything that seems bogus might just need to be removed from the prediction. Once you’re happy, turn on the prediction and go get some more coffee. Slowly, it’ll start filling in predictions on all records that you’ve asked it to predict upon. Once this is complete, you can begin to act based on these predictions. One great way to start is with a list view or report that sorts records from highest to lowest probability of taking the desired outcome. Then start working the list (make those calls, send some text messages and emails) to see if the prediction plays out in real life. One interesting experiment here is to call the top 20 people on the list and the bottom 20 on the list. Hopefully you’ll see a significant difference in interest and likelihood of taking the desired actions between the top 20 and the bottom 20. If so, you’ve proven it works and you should start using this prediction across all your business processes where you are unable to evenly reach out to all contacts and you need to spend more time, effort and money on some over others. While mass email is cheap, phone calls are not. Send mass email to all your contacts—but save your phone calls for the people most likely to convert!

Caveats and Bugs

In a short time of working with the system, I encountered two bugs / shortcomings:

  1. There is an undocumented issue where Prediction Builder is unable to “see” any records that haven’t been modified in the last two years. There is literally no reference to this in their documentation, and I had to go through a very long series of back and forth messages over a month for the problem to get escalated to someone who finally figured out that this was “intended behavior.” They said they would update the documentation on this, but honestly this is indicative of a very immature product if they think that all the training data should necessarily be from the last two years, and they don’t even document this “feature.”
  2. Sometimes you can have a working prediction in the system and go in to tweak some small thing (add or remove a feature column or restrict the training dataset in some way) and at the end it refuses to save your changes giving you an error message. The error message doesn’t make sense and there’s also no way to resolve it. All you can do is leave the page. Every time this has happened to me (2-3 times) I’ve lost all my work on a given prediction. They are unable to recover your work, and basically you have to start over on defining the nature of the prediction problem, the positive case, the negative case, the features to use in the prediction, etc. This to me is again a marker of this product being immature. I have high hopes that someday everyone using Salesforce intelligently will be using this tool, but maybe right now it’s only a chance to get your feet wet and start imagining what is possible with AI/ML across your Salesforce database. I certainly wouldn’t pay for it yet.

Technical Details for ML Professionals

If you’re a machine learning professional, you will likely be a bit disappointed by some simplifications of the product. For example, you cannot pick a scoring function for the AutoML process. I honestly don’t even know what it is as currently implemented. You cannot get any sort of area under curve for model evaluation or comparison. You cannot even see what the second best performing model was or what the hyperparameters were. That having been said, you can use your ML knowledge to quickly and efficiently get something built and into production in an hour of active work plus the necessary coffee breaks while the model is trained and scored. This is better than can be said for many other machine learning pipelines! If your company or your client is using Salesforce, it’s worth a try as the investment is very low and the return on investment possibly very high in terms of increased sales with decreased effort.

Peter VanWylen

Peter VanWylen loves data and enjoys helping people attain their goals with the right tools for data gathering, analysis, dashboards, and data science.