Retention and churn prediction in SQL

Learn more on customer retention and how to boost business growth by comparing user groups, constructing prediction models, and combating churn.

Picture of author and link to their profile
Thibaut Collette

July 4, 2023 · 4 min read

Milk churns in the countryside

In an era where the cost of acquiring new customers is on the rise, retention has become the golden tool for businesses.

I'd like to delve into the definition of retention, explore the use of cohorts for comparing retention across diverse user groups, and finally, demonstrate how you can construct lightweight prediction models using your data warehouse (with an example from BigQuery).

This will empower you and your team to retain users and combat some leaky user buckets.

Retention defined

In simple terms, retention is the measure of how many users return to your product over time. It's the percentage of active users at any given time after X retention windows from their first retention event.

To define your internal retention, you need to select a retention window and a retention event.

The retention window depends on your ideal persona's usage frequency:

  • Daily — This is common for apps such as social media platforms or messaging apps like WhatsApp, Facebook, and various games.
  • Weekly — This is applicable for productivity tools or educational apps like Trello or Husprey.
  • Monthly — This is often the case for services that don't require daily or weekly engagement but are still used regularly, like administrative tools such as Payfit or Alan.
  • Yearly — This applies to infrequently used or as-needed services. Tax preparation software like TurboTax or platforms like Airbnb are great examples.

The retention event is the event you use to consider a user active or retained. This can be any active event, such as logging in, opening your app, buying a product, or using a specific feature.

While it needs to make sense for your business, you should aim for consistency to allow for benchmark comparisons. For example, you might refer to the B2B User Retention Product Benchmark or ChartMogul SaaS retention report.

The power of cohorts

To fully grasp the concept of retention, we must talk about cohorts.

While you can compute retention for your entire user base, impact of potential improvements will be difficult to discern. This is where cohorts come into play.

A cohort is a group of users (or companies) that share a common characteristic.

By default, retention is computed using a time-based cohort, such as users who signed up in 2023. If you're more familiar with marketing terminology, you can think of this as "an audience".

In fact, you should define different cohorts:

  • by the sign-up year or month;
  • by country ;
  • by active features ;
  • by spending dollars ;
  • by demographics (when B2C) ;
  • and more...

The idea is to compare retention across different cohorts and prioritize work (product or customer success) towards the cohorts where the leverage is greatest.

However, a word of caution: make sure to check your cohort sizes. You might not want to spend too much time improving the retention of your smallest cohort!

Predict churn — at the account level

After assessing your retention levels and implementing new activation mechanisms to improve retention at the cohort level, it's time to combat churn at the account level.

However, it's easier to fight churn BEFORE someone actually churns rather than after. This is where predictions come into play.

Churn is the opposite of retention. It's when a user stops returning. Nowadays, it's easier than ever to create basic models to predict the probability of churn for an account or a user.

In BigQuery, for example, it's a three-step process:

  • Create a churn modeling table ;
  • Train a basic model in BigQuery ;
  • Compute churn probability for your users.
BigQuery churn prediction model

To create a churn modeling table in BigQuery, you can consider the following dimensions (or features 🤓):

user_id / company_id
churned: true / false (this will be the label used for training)

Next, you can train a classification model using a basic logistic regression, XGBoost, or more.  You can find the documentation to run such model or a pretty insightful example blog post.

Once training is done, you can now predict the churn probability for each account and share that with your CSM team.

You can know join forces to retain customers by manually reactivating them or including them in some specifically designed campaigns. Example in Game Developers.

Retention, while often complex to grasp, can be a key driver in any business's growth when used appropriately. It's time to say goodbye to the "leaky bucket".

This basic modeling can be deployed as one-shot in about a week of work at companies of "normal" scale. If you want to run it in production every day, a bit more tooling, checks will be needed.

Next step will be to discuss Net Dollar Retention (NDR): how to increase revenue per account over time. This often requires changes to the pricing model, (which can be more challenging to implement than small Customer Success Management campaigns). Best companies achieve constant account growth.

If you need examples of retention sql queries, have questions about implementing churn prediction in BigQuery, or want to share your own experiences, I'd be more than happy to chat.

PS: if you want more BigQuery hacks you can also check BigFunctions amazing work from Paul Marcombes

Husprey Logo

Learn more about Husprey

Husprey is a powerful, yet simple, platform that provides tools for Data Analysts to create SQL notebooks effortlessly, collaborate with their team and share their analyses with anyone.