Tay
Many readers will probably be familiar with the recent news story about Microsoft’s AI bot, Tay, that was taught to say hateful things by users (see Microsoft silences its new A.I. bot Tay, after Twitter users teach it racism).
What is Unsupervised Learning?
In a nutshell, unsupervised learning means “let the data do the talking”. It’s about discovering patterns and relationships.
Common approaches
- Latent Variable models which seek to transform the observable variables into latent/hidden variables that are inferred (i.e. not directly observed)
Examples of systems using Unsupervised Learning that you already know
- Google’s understanding of foreign languages (see Wikipedia's article on Google Translate) and synonyms (see How Does Google Use Latent Semantic Indexing (LSI)? [from 2005],
- Amazon’s “you may also like…”
- Netflix’s recommender system
Two places your business can quickly benefit from unsupervised learning
Lead Scoring
The problem
About half the companies with which Stand Sure works have some form of lead scoring in place. Lead scoring normally means using data to prioritize which prospects and opportunities get salesperson [human?] attention. Prospects with a score better than a certain value get called on; those below the threshold may get email marketed but generally do not get as much human attention [unless there are insufficient “good” leads].
Every customer lead scoring program that we have encountered to date suffers from the same problem: unreliability -- the scores do not align with outcomes and sales loses faith in the scoring.
The reason for this unreliability is that organizations tend to score leads using explicit factors (some form of BANT (budget, authority, needs & timing) or RWA (ready, willing & able)). These factors are all necessary for a sale, but they are not sufficient for predicting that one customer is more likely to buy than another. Indeed, in a lot of organizations, the data for these factors is not collected until after a customer has interacted with a salesperson.
Consider, for example, a lead scoring model that based on historical data produces results like the following, where 1=sale and 0=no-sale:
Actual
|
Model
|
1
|
1
|
1
|
0
|
1
|
1
|
1
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
The team behind this model reported to management that it was 70% accurate, which is true. What it failed to report is that the model misclassified 50% of prospects that went on to buy (a 50% false negative rate (FNR)).
Still, 70% sounds pretty good…
When Stand Sure looked at the model, we computed a Coefficient of Determination (R Squared) value, which measures how much of the variance in the actual data is explained by the model. In the example above, the R2 value was 8⅓%, meaning that the model correctly explained reality very rarely -- a coin toss would have done a better job. (See the Google Sheet at https://goo.gl/iNVIF8 for more statistics from this model).
The solution
- More variables
- sign-up page
- session count
- Sign-up day of week
- Sign-up time of day data
- the number of pages viewed in the sign-up session.
- Clickstream cluster analysis
- ALL web users were assigned to data-driven groups
- For web users who signed up and became leads, cluster assignment was added to the data (it turns out that users that behave the same way online tend to have similar behaviors offline -- in this case, the leads from certain clusters were MUCH more likely to buy).
- The company set unit sales records.
Conversion Rate Optimization (a.k.a. Fixing bad website User Experience (UX))
The problem
Most websites ____! Erm, most websites are designed from a vertical-centric point of view and not from a customer-centric POV [if you are contemplating a website design update, it is generally best to NOT work with someone who makes websites for others in your industry -- you will end up in a “sea of sameness” and a set of pages that sells well to people in your industry].
Anecdotally, 90+% of the pages on most sites are visited by only a small fraction of users.
Users get confused & frustrated and leave.
The solution
- Association analysis (this technique is normally used for shopping cart analysis -- what items are commonly bought together)
- Pages that commonly were consumed together were identified
- Where it appeared to the business that the group of pages indicated confusion, content and navigation changes were made
- Pages that had high site exit rates were simplified and users were encouraged via navigation cues to get back on a non-exit trajectory.
- Once the user experience (UX) was improved via the incremental changes suggested by association analysis, clustering analysis was performed.
- The conversion rate to lead for each cluster/group was assessed.
- Personas were created to describe the groups based on what content was most consumed by the members of the group.
- The page content and navigation was updated to better align with the perceived user intent.
- New lead collectors were created to more closely align with the user intent
- Downstream email marketing campaigns were designed to keep providing the members of each group with useful content
- Outcome - exponential lead growth [the image below shows quarterly lead counts -- the digital marketing budget remained constant throughout the period shown]
No comments:
Post a Comment