MT-ing My Head

Welcome to my soapbox of Salesforce technical writing, music, and miscellaneous stuff

An Intro to Predictive Modeling for Salesforce Peeps — April 16, 2021

An Intro to Predictive Modeling for Salesforce Peeps

I wanted to write about prediction because it’s seemingly everywhere: in marketing emails, in real estate, and even in criminal justice (blah).

An understanding of prediction and modeling can be pretty beneficial in business, however. Think lead scoring for new prospective clients or donors, employee attrition, or revenue forecasting. By understanding the capabilities of prediction, it might help us predict what data to collect in order to make more informed decisions.

Salesforce has released some A.I. products over the last few years under various names: Einstein, Tableau CRM, Wave Analytics. We can also create models ourselves, using different programming languages such as R and Python, or even in Excel.

First, a quote.

“All models are wrong, but some are useful”

George E. P. Box

Human nature can be unpredictable. We are also biased and have blind spots. I recently watched the documentary Coded Bias on Netflix, which showed how bad models can affect real people. So handle prediction with care.

So how does prediction work?

A predictive model tells us, based on existing criteria, what outcome is expected. For instance, what is someone’s income based on their race, gender, and profession?

Think of this as a mathematical formula similar to the ones we learn in high school for a straight line or a parabola. The inputs are x and the outputs are y.

y=mx+b

y=x^2

y=1/x

To build a predictive model, we use mathematical methods to find a formula that provides an output that is as close as possible to the real output for the greatest number of records (e.g. people, schools, whatever).

We can use both categorical (e.g. picklists) and numeric values in prediction.

So let’s say we want to predict a donor’s 2021 donation amount. We can use a method called regression.

In order to build a model to predict someone’s next donation amount, we need a “training data set” – a report of donors who have made 2021 Donations. We choose fields we think may be relevant (called independent variables, e.g. Age, Gender, Last Donation Amount, and Average Donation Size) and a “target” dependent variable (a field called 2021 Donation Amount.)

Then, we use a tool (Excel, R, Python, or a calculator with a huge memory :P) to create the model. Our goal is to get the most accurate model, so we may remove variables that don’t add that much value.

Our final output might look like this:

ŷ (this is the predicted value of 2021 Donation Amount) = 10 (this is the y-intercept) + 1.4*Age + 1.5*GenderFemale(true=1 or false=0) + 1.02*LastDonationAmount + 0.01*AverageDonationSize

So to predict the next donation amount of a donor who is:

  • Age = 45
  • Gender = Male
  • LastDonationAmount=$200
  • AverageDonationSize=$250

We would calculate the predicted value here:

ŷ = 10 + (1.4*45) + (1.5*0)+(1.02*200)+(.01*250)

ŷ = $279.50

NOTE: This is not a real model, and I am also oversimplifying.

From this result, we can see that the predicted donation amount for 2021 is higher than the donor’s last donation amount and average donation. Perhaps, in our fictional world, donors are very generous in 2021.

Conclusion and Disclaimer

I am passionate about educating people on how different technology works and hopefully saving them some money. However, I am a student and I am writing about what I am learning in class at the current time, and I am not an expert.

If you’re interested in learning more, check out online courses on modeling with R or Python, and then apply these skills to your Salesforce data. Courses will be able to explain some of the road bumps to look out for (e.g. multicollinearity, non-normality) and go more into detail about how to figure out if your model is actually a reliable one.

The example above is what is called linear regression. Logistic regression is another type of model…a way to predict the likelihood of an event. A famous example is whether or not someone died on the Titanic, which, spoiler alert: given a number of variables, is predicted best gender (“women and children” first) and ticket level (a proxy for wealth). I don’t know which method would be quicker for me: watching the 2 1/2 hour Kate Winslet and Leonardo DiCaprio movie or trying to build a model in R, but I digress…

Happy modeling!

How I Fixed It: Improperly Formatted Contact Names — November 7, 2020

How I Fixed It: Improperly Formatted Contact Names

This is a series of posts about data cleanup in Salesforce. The goal is to IDENTIFY, FIX, AND PREVENT data issues.

The Problem

We have a series of forms (for making a donation or signing up for a program, for instance) that create Contact records in Salesforce. Some people type in all caps (e.g. BOB ROSS) or all lowercase (e.g. bob ross) and the records end up reflecting that.

Business Implications

  • Merge Fields in Mailers – A mass email or appeal letter will end up saying “Dear bob,” rather than “Dear Bob,”
  • Reporting to Funders – When we need to report program data to funders on a granular (constituent-based) level, the development team has to spend time formatting the names properly before sharing the report.

What I Did

Identifying the Records

After some exploratory data analysis and trial and error, I created two checkbox formula fields on the Contact called “Contact First Name Bad Format” and “Contact Last Name Bad Format” (I know, I know…silly field labels).

First Name Field

AND(LEN(FirstName)>2,NOT(CONTAINS(FirstName,".")),
OR(FirstName=LOWER(FirstName),FirstName=UPPER(FirstName)))

Last Name Field

AND(LEN(LastName)>1,NOT(CONTAINS(LastName,".")),
OR(LastName=LOWER(LastName),LastName=UPPER(LastName)))

These formulas looks for Contacts with the First Name or Last Name in all uppercase or lowercase. It ignores initials such as “DJ” for first names or “C.” for last names. People putting initials instead of their full last names into fields is a different issue….

Fixing the Records

I ran a report of all of the Contacts where the “Contact First Name Bad Format” OR “Contact Last Name Bad Format” fields were true. I made sure to include the 18-digit Case-Safe Contact ID in this report.

Then I exported the report, used the PROPER() function on Excel to change the first and last names to proper formatting, and used Apex Data Loader to update these (400 or so) records.

Preventing the Issue Moving Forward

To prevent this issue from happening in the future., I created two workflow rules using the a) the criteria in the above formula fields and b) field updates witht he solution mentioned in this post.

This workflow rule will detect if a person’s name is in all uppercase or lowercase letters and edit it accordingly:

Criteria

See formula fields above

Evaluation Criteria

Evaluate the rule when a record is created, and any time it’s edited to subsequently meet criteria

Actions

  • Field Update: Update First Name (Proper)
    • Field: First Name
    • Code: UPPER(LEFT(FirstName, 1)) & LOWER(RIGHT(FirstName, LEN(FirstName) – 1))
  • Field Update: Update Last Name (Proper)
    • Field: Last Name
    • Code: UPPER(LEFT(LastName, 1)) & LOWER(RIGHT(LastName, LEN(LastName) – 1))

Testing the Workflow Rules

First I manually created Contacts with the above names:

Contact NameExpected Result
JJ AbramsJJ Abrams
AUDRE LORDEAudre Lorde
audre lordeAudre Lorde
F. FITZGERALDF. Fitzgerald

Other Tests

TestExpected Result
Edit Contact manually with one of above namesFix name
Fill out the donation form with one of the above namesFix name and send acknowledgement with fixed name included
Fill out a program form with one of the above namesFix name
YESSSS! It worked.

What’s Left to Do

Tweak the field update to be more precise – There’s still some room for error here, since the formula only capitalizes the first letter of each field. Some exceptions that this would not update properly would be:

  • Gordon-Levitt
  • McDonald
  • DiFranco

Conclusion

Overall, this took about two hours to figure out and will likely save more than two hours of annoyance and manual editing in the future.

Feel free to comment with any questions or other ways you’ve solved this.