Cohort analysis in Google Analytics : Which cohort size should I choose?

Cohort analysis in Google Analytics : Which cohort size should I choose? .png

Cohort analysis is increasingly valued by e-commerce companies, because cohort analysis can truly reflect the long-term profitability of the company compared to short-term metrics, but how to effectively use cohort analysis in Google Analytics is still full of challenges.

Probably the most frequently asked question from our clients regarding cohort analysis is:

Which cohort size should I choose?

As Google Analytics users, all of us have probably had the experience of visiting a cohort report, flipping through the different cohort sizes, realizing that the data from these options are not very different from each other, and then deciding to just use the default option (day) for analysis.

However, by doing that, we missed so many juicy insights from the cohort analysis report, and risk coming up with bad insights based on the noises in our data instead of the signals.

Google Analytics allows you to define your cohorts by day, week, or month, and defaults to the “day” option.

let’s walk through each of the options carefully and see what they really mean, and how to use different options for our analyses to get different insights.

Day — The Short-Term Option

Let’s start with days, which is probably the most common option we do our analyses on (being the default option).

If you selected the day option, you will be able to see cohort data up to 12 days after the day of the initial visit, depending on what day the cohort starts (this applies even if you set the date range to be beyond 12 days, such as 30 days).

For example, if you are trying to measure the cohort behavior of a group of users that visit your site on June 1st, you will be able to see how many of them returned for a visit all the way until June 13th, 12 days after their first visit.

In general, I would only recommend using “day” as a cohort option if your cohort size is consistently above 100 users for all days.

This is because, by selecting the day option, you are slicing your data down to a very small sample size, which is subject to a huge statistical swing if that sample size is too little.

This statistical swing will cause your data to go up and down without apparent reason, and might lead to incorrect attribution if you are not careful enough.

Furthermore, you should only use the “day” option with a specific goal in mind, such as measuring the short-term effectiveness of a specific campaign.

This is because your user data, even if you have a reasonable sample size to avoid intrinsic statistical volatility, is subject to so many external change factors such as the day of the week, changes in multiple different traffic sources, etc.

While it is very exciting to go full detective mode and try to figure out all of the deciding factors that can cause one cohort to retain a lot better than the other, going in without a solid objective will leave you with hardly any insights that you can generalize and take actions upon.

It is also strongly recommended to pair the day option with some sort of segments, so you can remove a lot of external factors such as channels you don’t care about, unengaged visits, and so on.

Week — The General-Use Option

Now let’s talk about the “week” option, which is the option that I strongly recommend using as the default option of your cohort analysis if you want to obtain a general understanding of how you are retaining users on your website.

If you select the “week” option, you can see your cohort data up to 12 weeks after the initial cohort, depending on when the initial cohort was, and you can get data about your cohorts up to 12 weeks prior.

The reason I suggest using this as the default option is because “week” gives you a lot more data to work with, and can also ward you against many external noises, such as weekly seasonality (which doesn’t exist if you analyze your data by week).

The below graph illustrates a comparison between the week and day option when it comes to user retention. And as you can see, weekly data fluctuates a lot less than daily data.

0_31uUO_yJ5JijzKic.png

Remember, when looking at the cohort analysis table, always ignore the last entry in every row/column as data collection is not complete.

However, the week option is not perfect, and one thing it falls really short on is measuring the immediate impact of your advertising campaigns or promotions.

Every time a customer visits your website, you have a certain window (usually a couple of days) to convince your customers to buy your product or submit a lead before they forget about your company completely, and that window is usually shorter than a week.

Therefore, when it comes to analyzing the conversion window of your customers for a specific campaign, the day option becomes a lot better than the week option, simply due to its recency (more on this later).

Month — The Loyalty Measure

The “month” option measures your user retention in the long term, and is therefore much more useful for the part of your website that requires long-term retention, such as a blog.

Sadly, Google Analytics only lets you go back three month (at most) in your analyses.

Given that the last row/columns should be discarded for analyses due to incomplete data collection, you only have access to 6 numbers when analyzing your cohort by month, making the analysis a lot less impactful.

Conclusion:

In general, we recommend choosing the "week" option because there is enough data to analyse and can eliminate the noise in the data. 

you should choose the "daily" option when considering specific goals, such as measuring the short-term effectiveness of a specific campaign.

The "month" option mainly measures your user retention in the long term.

The cohort analysis practice is still full of challenges. we will bring you cohort analysis practice part 2 "when should i use each metric" next time Stay tuned.