A Guide for Customer Retention Analysis with SQL

Customer retention curves are essential to any business looking to understand its clients, and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business.

Other techniques for customer retention

There are, of course, other ways to think about customer retention analysis. In particular, we might want to look at how to deal with returning customers: if in p1 there are 100 customers and in p2 80 of them return, in p3 do we want to keep the original 100 or look only at the 80?

In the above example, I talked only about period-to-period retention, but I also classified those who were “laggers,” or who took more than one month to return. A business may want, through targeted marketing efforts, to reward customers who keep a monthly visit pattern, or perhaps, to encourage those who visit less than once a month to come back more often.

So, another way to look at it would be to look at what proportion of our visitors in any given month are retained, how many are returning, and how many are new. In this case, we would want to change our perspective slightly, and look not at the next visit, but at the previous one.

Time_lapse_2 AS

    SELECT cust_id,


           lag(visit_month, 1) over (partition BY cust_id ORDER BY cust_id, visit_month)

     FROM visit_log

Time_diff_calculated_2 AS

    SELECT cust_id,



           visit_month — lag AS time_diff

     FROM time_lapse_2

Custs_categorized AS

    SELECT cust_id,



                    WHEN time_diff=1 THEN ‘retained’,

                    WHEN time_diff>1 THEN ‘returning’,

                    WHEN time_diff IS NULL THEN ‘new’

           END AS cust_type

FROM time_diff_calculated_2

So then, it’s simple to count the number of each type of customer in any given month.

SELECT visit_month,



FROM custs_categorized



With a simple export and graph, you will get something that looks like this:

Data visualized with Statsbot


Cohort tables

A popular way to analyze customer retention is through the use of cohorts, i.e. defining each user by their first visit, and then seeing how they return over time.
Ideally, our end product will be a table like this:

Well, actually, ideally, our table will show that the number of new users is increasing every month, and that retention is improving over time. But you get the idea!

In order to build this, we need first to establish the number of new users for each month, and the amount of time that they stick around. Then, getting the percentages will be easy.

Our first subquery will classify each customer according to their initial visit, and the length of time that they are retained.

first_visit AS

    SELECT cust_id,

           min(visit_month) AS first_month

    FROM visit_log

    GROUP BY 1

By month, we can then calculate the number of new users.

New_users AS

    SELECT first_month,

           count(DISTINCT cust_id) AS new_users

    FROM first_visit

    GROUP BY 1

We also want to get a picture of whether a new user in, say, month 1, returned in both month 2 and 3 or only in month 3. So, we’ll create a subquery that allows us to track the visit pattern of each customer.


FROM visit_log

     LEFT JOIN visit_log AS visit_tracker

            ON visit_log.cust_id = visit_tracker.cust_id

               AND visit_log.visit_month < visit_tracker.visit_month

Then, we’ll combine these to get a lovely final product. For each month a new user is recruited, we’ll have the size of their group, as well as, for each subsequent month, the percentage of those users who returned. Just like the table above!

In order to do that, we run the following script:

SELECT first_month,






           SELECT first_month,


                  Count(DISTINCT visit_tracker.cust_id) AS retained count(DISTINCT visit)tracker.cust_id)/new_users AS retention_percent

FROM      first_visit

LEFT JOIN visit_log AS visit_tracker

ON        visit_log.visit_month < visit_tracker.visit_month

LEFT JOIN new_users

ON        new_users.first_month=first_visit.first_month



This would give us a table which looks like this:

We can then do a pivot table in Excel (or similar) in order to get the table above, or use a business intelligence tool such as Statsbot to get data right from our database without coding SQL.


Producing such a table is, of course, only the first step, and a pretty useless one at that if we don’t know how to interpret the numbers.

Taking our original table again, I’ll run through the basics of cohort analysis.

First of all, we’re going to look at the number of new users per month. Are we managing to recruit a healthy number of clients for our business? Here we can see that we are recruiting between 500 and 600 new clients per month. Is this what we had aimed to do? This is a discussion that should be had with the marketing team.

Then, is the number of new clients per month rising? If not, why? Are there peaks and troughs? So, for example, we can see that there were only 504 new customers in period 6, but 583 in period 5 and 598 in period 7. These are huge swings in the numbers. If there is something to explain it, like we’re a business that caters to school groups and there was a week of school vacation during that period, fine. But we need to have an explanation.

After that, the actual retention percentages. One business objective is going to be to improve our retention over time, i.e. to have an upward trend as we go down the lines. So, if we retain a higher percentage of new users who joined in period 2 than those who joined in period 1, we are on track.

As we see in the table above, this is what happened: after 1 month, 63% of new users from period 1 returned, but 68% of new users from period 2 returned. So we’re doing well! Basically, our customer experience should be getting better over time, so that people are more likely to come back.

After that, another metric we will look at is the rate at which we lose people. It is expected that, over time, retention will fall. This is the reason that we always need to keep recruiting! But, if we can slow customer attrition, so that the fall between each period gets smaller, then we can spend more resources on serving client needs, and not just on trying to find new clients. This, in turn, will help retention, as clients will like that we are paying more attention to them.

Frequency of visits

One element that I have neglected until now is frequency of visits, which is the other side of customer retention analysis. It is generally accepted that a customer who remains loyal to your business is worth much more than a new customer, or even one whose engagement is patchy (i.e. returning customers).

Adding a count of visits per customer, per month, or average lag time will complete the picture of how customers are interacting with your business. You can also build cohorts with the amount of customers, revenue from returning customers, etc.


Final thoughts

Customer retention analysis will add depth to any business analysis, and allow decision makers to track not only the success of their recruitment strategy, but also how well they are performing in terms of customer experience. If your clients don’t come back, then there’s something to be worked on, either in terms of the quality of your product, or in terms of the relationship that you have with your customers. Retention analysis allows for easy flagging of this type of issue.

Bio: Luba Belokon is Content Jedi at Statsbot.

Original. Reposted with permission.