MT-ing My Head

Welcome to my soapbox of Salesforce technical writing, music, and miscellaneous stuff

An Intro to Predictive Modeling for Salesforce Peeps — April 16, 2021

An Intro to Predictive Modeling for Salesforce Peeps

I wanted to write about prediction because it’s seemingly everywhere: in marketing emails, in real estate, and even in criminal justice (blah).

An understanding of prediction and modeling can be pretty beneficial in business, however. Think lead scoring for new prospective clients or donors, employee attrition, or revenue forecasting. By understanding the capabilities of prediction, it might help us predict what data to collect in order to make more informed decisions.

Salesforce has released some A.I. products over the last few years under various names: Einstein, Tableau CRM, Wave Analytics. We can also create models ourselves, using different programming languages such as R and Python, or even in Excel.

First, a quote.

“All models are wrong, but some are useful”

George E. P. Box

Human nature can be unpredictable. We are also biased and have blind spots. I recently watched the documentary Coded Bias on Netflix, which showed how bad models can affect real people. So handle prediction with care.

So how does prediction work?

A predictive model tells us, based on existing criteria, what outcome is expected. For instance, what is someone’s income based on their race, gender, and profession?

Think of this as a mathematical formula similar to the ones we learn in high school for a straight line or a parabola. The inputs are x and the outputs are y.




To build a predictive model, we use mathematical methods to find a formula that provides an output that is as close as possible to the real output for the greatest number of records (e.g. people, schools, whatever).

We can use both categorical (e.g. picklists) and numeric values in prediction.

So let’s say we want to predict a donor’s 2021 donation amount. We can use a method called regression.

In order to build a model to predict someone’s next donation amount, we need a “training data set” – a report of donors who have made 2021 Donations. We choose fields we think may be relevant (called independent variables, e.g. Age, Gender, Last Donation Amount, and Average Donation Size) and a “target” dependent variable (a field called 2021 Donation Amount.)

Then, we use a tool (Excel, R, Python, or a calculator with a huge memory :P) to create the model. Our goal is to get the most accurate model, so we may remove variables that don’t add that much value.

Our final output might look like this:

ŷ (this is the predicted value of 2021 Donation Amount) = 10 (this is the y-intercept) + 1.4*Age + 1.5*GenderFemale(true=1 or false=0) + 1.02*LastDonationAmount + 0.01*AverageDonationSize

So to predict the next donation amount of a donor who is:

  • Age = 45
  • Gender = Male
  • LastDonationAmount=$200
  • AverageDonationSize=$250

We would calculate the predicted value here:

ŷ = 10 + (1.4*45) + (1.5*0)+(1.02*200)+(.01*250)

ŷ = $279.50

NOTE: This is not a real model, and I am also oversimplifying.

From this result, we can see that the predicted donation amount for 2021 is higher than the donor’s last donation amount and average donation. Perhaps, in our fictional world, donors are very generous in 2021.

Conclusion and Disclaimer

I am passionate about educating people on how different technology works and hopefully saving them some money. However, I am a student and I am writing about what I am learning in class at the current time, and I am not an expert.

If you’re interested in learning more, check out online courses on modeling with R or Python, and then apply these skills to your Salesforce data. Courses will be able to explain some of the road bumps to look out for (e.g. multicollinearity, non-normality) and go more into detail about how to figure out if your model is actually a reliable one.

The example above is what is called linear regression. Logistic regression is another type of model…a way to predict the likelihood of an event. A famous example is whether or not someone died on the Titanic, which, spoiler alert: given a number of variables, is predicted best gender (“women and children” first) and ticket level (a proxy for wealth). I don’t know which method would be quicker for me: watching the 2 1/2 hour Kate Winslet and Leonardo DiCaprio movie or trying to build a model in R, but I digress…

Happy modeling!

Analyzing Data from a Multiselect Picklist – Two Options — November 11, 2020

Analyzing Data from a Multiselect Picklist – Two Options

Today I’m writing about how to use multiselect picklists in an effective way using Salesforce or Excel.

The Business Problem

We took a survey of constituents. It integrated into Salesforce and populated a multi-select picklist. Let’s say that picklist was called “Fruits” and showed a survey participant’s favorite fruits.

First mistake was using a multi-select picklist! 😉
Report of results

This report is…okay. We can see that apples are popular by eyeballing the report. But when we try to count how many people selected each fruit, we run into a bit of an issue.

This is the stuff of nightmares.

So we need another way.

Two Solutions

We can solve this issue ~for free~ in two ways:

  1. The first method involves creating formula fields in Salesforce.
    1. Benefit: The data will always be current and accurate.
    2. Drawback: If you have many options and many multi-select picklists, you can run out of fields pretty quickly.
    3. Best for when: You don’t have lots of options, you need to see aggregated data in realtime (in a dashboard or report)
  2. The second method involves Excel.
    1. Benefit: You don’t waste Salesforce fields.
    2. Drawback: You have to do some periodic copying and pasting if you want live data; can’t create dashboards easily, etc.
    3. Best when: there are lots of options, the survey period is over, you just need raw data and don’t need dashboards

Solution 1: Salesforce Fields

The idea is to create a field for each multiselect picklist option, per this help article.

I created a formula checkbox field because they are visually appealing and can be summed easily. The formula is as follows:

if(includes( Fruits__c ,"Oranges"),true,false)

The result is something like this:

We would need one field for each answer choice.

Solution 2: Excel

The second option would be to export the results and use formulas in Excel to populate columns for each answer choice.

First, I export my report results to Excel.

Then, I find all of the multiselect options. I go to the field in Setup and click Printable View. I copy this list using Cmd/Ctrl+C.

Then, I create a new tab in my Excel spreadsheet. I paste the list into that tab.

Next, I go back to the original tab and use the Transpose function in cell C1 to list out the options horizontally across the header row.


Then I used a FIND formula in cell C2 to figure out if cell B2 had the word “Apples” in it. This is probably a total kludge, so if there’s a more efficient way to do it please let me know:


This is saying that if it finds the word “Apples” in the B column at any position (position > 0), return 1. If not, return 0. I used IFERROR because it gave me an error when nothing was found. This is probably because I am an Excel noob. But hey, it works.

The $’s are there so that I can drag the formula into the other columns and it will still work – the correct row and column “sticks” and other data is dynamic. So here’s the final product:

Now we can do cool stuff like sum them up per column and even create data visualizations.

How bout them apples?


These solutions turn something virtually unusable into something a little less unusable. Enjoy.

PS: I’m pretty sure another option would be to feed this data into Tableau and create formulas that way. But I don’t have Tableau. If someone wants to donate a license to me for the sake of the blog, LMK.

Protected: Saving Money with Visualforce Templates — November 9, 2020