Question

Topic: Research/Metrics

Rfm Scoring Questions

Posted by Anonymous on 500 Points
Hello, I have two questions regarding RFM analysis.

1. When performing RFM analysis, quintiles are used to assign the scores of 1 - 5 for Recency, Frequency, and Money. I'm wondering, why are quintiles used? Why not create 5 groups using k-mean analysis and then assign scores of 1 - 5 based on the ranges created for those groups? It seems to me it is a more logical grouping of people, so the scores become more meaningful.

2. What should be used for Recency for monthly subscription type of products/services? I understand Frequency and Money, but it seems Recency doesn't come to play in this equation.

Thank you in advance for the responses.
To continue reading this question and the solution, sign up ... it's free!

RESPONSES

  • Posted by koen.h.pauwels on Accepted
    Very logical questions - are you an engineer who went into marketing :-)?

    Short answer to your questions: because quintiles and recency have been used before and are easy to understand. RFM analysis grew from practical insights over the last 3 decades that:
    1) the top quintile is 3 times more responsive to your direct marketing actions than the next quintile
    2) customers who bought more recently are much more likely to buy from you again than customers that did not

    The following website provides more details:
    https://www.dbmarketing.com/articles/Art149.htm

    Of course this does not mean we should try and improve current practice. For one, RFM analysis often seems upset with responsiveness versus profitability (you are right that only frequency and money enter the profitabilty equation). To me, a 2x2 juxtaposing profitabiltiy and responsiveness to marketing actions is more insightful
  • Posted by steven.alker on Accepted
    Hi Dan

    I started to write a reply to this and realised that there was some info missing regarding the de-facto use of quintiles as a strictly defined statistical term, rather than it just being an arbitrary choice.

    I think that the number of divisions you split a sample into (5 in this instance) is probably arbitrary, but the use of 5 equal measures for RFM purposes is far from arbitrary.

    In fact I believe that it can be shown that equal divisions of a linear scale drawn from the variable data is better at predicting purchasing behaviour. Weightings and non-linear divisions all produce less useful results. I’ve got a paper somewhere on this and I’ll try to find it for tomorrow.

    Best wishes



    Steve Alker
    Xspirt
  • Posted by steven.alker on Accepted
    Dear Dan

    You’ve got a couple of fairly decent answers to your question already, but I’m wondering if there’s something more you want to get at. I think that one area needs to be added to the statistical views already offered and that is the fact that RFM analysis is an attempt to predict future buying behaviour from existing buying records. Concentrate on the fact that it is a predictor of behaviour. It is therefore advantageous for the analysis to produce the sharpest and clearest differentiation between existing customers, even if the parameters chosen to segment them appear to be somewhat arbitrary.

    Actually, the use of a statistical quintile, even whilst admitting that it is an arbitrary division of the data is quite important. Other means of splitting up a sample of customer data by using weightings or non-linear scales results in less useful data when used for RFM scoring simply because it fails to clearly differentiate the groups most likely to make a purchase.

    There is some anecdotal merit based on the 80:20 rules, which is after all what a quintile is when expressed as a ratio. Analysis of a body of data using recency frequency and monetary value of transactions can be shown, empirically, that it is best segregated in buying behaviour or likely buying behaviour if you utilise a linear scale of 5 equal bands across the data which you hold. That is the most significant point about RFM analysis – the bands must be equal, regardless of other interactions and you certainly have to exclude any tendency to use Bayesian statistics, simply because you know more about the relationship between the variables than the plain figures infer.

    An early attempt to weight RFM analysis consisted of splitting up the three variables into quintiles of unequal width – for example using bands defined by the number of items in each band or the value of purchases within a band, rather than the top fifth, second fifth and so on. It doesn’t produce such a consistent demographic when it comes to predicting future buying behaviour. There is a decent practical explanation of some of this from Arthur Middleton Hughes who is an acknowledged expert in database marketing which is in the middle of the following presentation:

    https://www.dbmarketing.com/Speeches/Non_Profit_Feb_4_2005.ppt

    The whole point about RFM analysis and practice appears to be skewed towards the fact that done in the recommended way, it does produce consistent results. There is a lack of analysis which shows from first principals that it is the best way to conduct the analysis of past behaviour as a predictor of future behaviour and there may well be better models for certain instances. One thing in favour of the equal quintile approach on three variables is that it forces the answers into arbitrarily sharp groups, where you know that a 555 score is going to be a better predictor of future buying behaviour , on average that a 333 score.

    It can also benefit from sensitivity analysis which shows up in the paper referred to by Koen – By plotting a graph of the three values independently over their 3 quintiles, the one with the steepest gradient will have the best ability to differentiate buying behaviour. Because the cell-scoring used (111 to 555) is essentially a vector product, which means that the most profitable groups of potential customers are more clearly defined by the quintile scale than any other comparative simple way of cutting and analysing the data. This is clearly shown in the analysis from Hughes.

    As to the second part of your question, the best measure is again found by looking at the figures. Subscriptions are handled by looking at the total monetary expenditure over a consistent period of time. Recency is scored by looking at the last renewal and is again examined in the paper by Hughes.

    One thing I like about this type of analysis is that you can carry it out today, for an ongoing project, but if you have enough historic data, you can check the validity of the technique from your existing figures. If an RFM analysis of the first 4 years of 5 years of data accurately predicts what the fifth year yields, you have a rough idea that it is likely to be on target for looking at the sectors to approach for the 6th year!

    I’ve also looked at some alternatives and whilst I have not yet produced any verifiable or reviewable results, I am coming to a belief that using neural networks to predict buying behaviour are also capable of producing results. The big problem with this technique is that no-one, including the manufacturers of neural networking software know which algorithm it will plump for to tie results to variables, thus it could be using some esoteric maths which happens to work or it could be using quintiles! Have a look at NeuralWare as an example of a simple table based neural networking tool. Simply shovel in historic data, connect the variables to the historic outcome and ask it to predict the next year’s results.

    Again, you can do this with 6 years worth of data, using the first 5 years to predict the 6th and once you are confident that this works, ask it for the seventh. It’s a brave marketing statistician who uses such techniques, because unless you go quite deeply into an analysis of the mechanism, no one has a clue what it’s doing apart for evolving the best fit!

    I hope that this is of some use to your quest


    Steve Alker
    Xspirt
  • Posted on Author
    Thank you all for the responses. It sounds like I may expecting too much from one score method.

    Our ultimate goals are two fold:

    1. Find our BEST customers and rank them by scoring them (so we know which are VIP, etc)
    2. Analyze our database to profile what constitutes our best customers so we can find more of them (externally).

    I tried scoring both ways-- using quintiles and using k-mean analysis. The k-mean analysis gave me more meaningful scores for Goal #1 at least visually so it seems. The way I scored them is using R x M. So the lowest score was a 1 (1x1) and the highest was 25 (5x5). There weren't too many 25's...

    Then I did it again the same way using quintiles, and there were a lot of 25's, which didn't really make sense, because our best customers w/o even having to score in terms of revenue+time really stand out.

    Perhaps there is a better way to model what I am trying to achieve?
  • Posted by joy.levin on Accepted
    One other possibility you might want to try is multiple regression analysis for lifetime value - there may be other variables in your database that are predictive of lifetime value.

    In addition to regression, there are also other models you might try:

    https://hbswk.hbs.edu/archive/1436.html

    Also, here's an article that compares lifetime value with RFM measurements:

    https://www.marketingnpv.com/articles/research/efficacy_of_clv_measure

    Good luck!
  • Posted on Author
    Thanks for your responses!

Post a Comment