21 December 2015

Understanding Marginal Contribution and Shapley Values

Marginal Contribution & Shapley Values

Since our article Better Attribution: Using Clickstream Data and Shapley Analysis to Get More Accurate CPA & ROAS, there have been some questions asked about marginal contribution and Shapley values. 


Marginal Contribution

A straight-forward way to understand marginal contribution is to consider the problem of how to allocate the cost of building a new runway  between four aircraft that need different runway lengths.

AircraftRunway Required
A8
B11
C13
D18

runway needs
Runway Needs









Aircraft D is the only one that needs the last 5 runway units.

Aircraft C and D are the only ones that need the penultimate 2 units.

B, C and D need 3 common units.

All four need 8 units.

One way to allocate cost is to take the marginal cost (MC) for each segment and divide it by the number of beneficiaries.

AircraftA+B+C+D
MC8325
# Aircraft benefitting4321
Cost per aircraft2115

Thus, the cost/value allocated to each aircraft is as follows:


Cost to A2
Cost to B21
Cost to C211
Cost to D2115

The total of each row in the runway problem is an equitable way to assign runway cost to each aircraft.

Assigned Cost
Cost to A22
Cost to B213
Cost to C2114
Cost to D21159
18

Shapley Value

Shapley values are essentially averages of the cost/benefit for each participant. It is normally used in scenarios where the different players can participate in different orders.

Thus in a 4-player scenario, the following permutations need to be considered when calculating the Shapley value:

ABCD
ABDC
ACBD
ACDB
ADBC
ADCB
BACD
BADC
BCAD
BCDA
BDAC
BDCA
CABD
CADB
CBAD
CBDA
CDAB
CDBA
DABC
DACB
DBAC
DBCA
DCAB
DCBA

In the runway scenario, rearranging the aircraft makes no sense.

The Glove Game

Order makes sense in many other cases, however.

Consider the following scenario where order matters:

You are searching for a pair of gloves in a box with 1 left glove and 2 right gloves. 



When you have a pair of gloves, the game is won.


If you want to figure out the value of each glove in the outcome, order matters.

Consider the following scenario where order matters:

Glove 1Glove 2Glove 3Win Credit
L R1 R2R1
L R2 R1R2
R1 L R2L
R1 R2 LL
R2 L R1L
R2 R1 LL

The Shapley value of the Left glove is ⅔ (it gets credit for 4 out of the 6 wins), the Shapley value of each Right glove is ⅙ (each gets credit for 1 out of 6 wins).

Application to Online Advertising

The application to online advertising is straightforward.

Build a Table of Media Permutations

First, build a table with all of the orderings.

If a particular media type contributes more than once, consider it as a different media type for the purpose of building the table -- you will be able to figure out if a media type contributes more than once by looking at the source for the first pageview of each session and then building a table of all of the media types that brought a particular user to your site.

Media 1Media 2Media 3
LR1R2
LR2R1
R1LR2
R1R2L
R2LR1
R2R1L

Outcome Table

Next, figure out the outcome after each step, entering a "1" for a win. 

OrderOutcome @ 1Outcome @ 2Outcome @ 3
L, R1, R2011
L, R2, R1011
R1, L, R2011
R1, R2, L001
R2, L, R1011
R2, R1, L001

[In advertising, there will normally be wins in column #1 and the number of wins will generally increase from left to right; since we are extending the glove game example, column #1 always has zero wins and the number of wins doesn't change in column #3 if there was a win in column #2. Please see our previous article for a more realistic table.]

Marginal Contribution

Next, for columns after the first column subtract the value of the preceding column from the current column (for the first column simply carry over the value).

OrderOutcome @ 1Outcome @ 2Outcome @ 3
L, R1, R2010
L, R2, R1010
R1, L, R2010
R1, R2, L001
R2, L, R1010
R2, R1, L001

If you are using Google Sheets, the formula for doing this looks like this: =ARRAYFORMULA(F14:F19-I14:I19)


Figuring Out the Value of Each Media Player in Each Row

Next, for each row, grab the value associated with each media type.
LR1R2
010
001
100
100
100
100

If you are using Google Sheets, the formula for doing this looks like this: =SUMPRODUCT(($A14:$C14=L$13)*($I14:$K14))

Shapley Value

Finally, sum each column and divide by the number of rows -- this is the Shapley value.

LR1R2
010
001
100
100
100
100
0.670.170.17

If you are using Google Sheets, the formula for doing this looks like this: =SUM(L14:L19)/ROWS(L14:L19)

Conclusion

Building a table of each ordering of media that contributes to a website conversion and then computing the marginal contribution at each step allows you to calculate the Shapley value of each media type.

Normalizing the Shapley values allows you to assign win/conversion percentages to each media type. 

As shown in our previous article, doing this allows you to more accurately assign cost and value to your online efforts and will allow you to make better decisions about your marketing investments.   

No comments: