How to build a cohort analysis
Table of contents
This guide complements our article on cohort analysis. While the article explains the concepts and strategic issues involved, this guide shows you in concrete terms how to build your first analysis, step by step.
Stage 1: Structuring the database
This is the most important stage, because it is the quality of this database and its organisation that will determine the quality of your cohort model, avoiding overly large files. Bear in mind, however, that there are limits to this approach: for established businesses with substantial customer bases, your cohort analyses can be lengthy and consume a lot of computing power.
To carry out an effective cohort analysis, you will need two complementary databases: an invoicing database, which tracks transaction history, and a customer database, which centralises information about your customers. These two databases are interdependent and enrich each other to build up a complete picture of your business.
Billing basis
|
Invoice no
1
|
Customer name |
Customer ID
2
|
Invoice date
3
|
MRR Invoiced |
|---|---|---|---|---|
| INV-2501-001 | TechStart | CLT001 | 2025-01-01 | 299 € |
| INV-2501-002 | DataFlow | CLT002 | 2025-01-01 | 299 € |
| INV-2502-001 | TechStart | CLT001 | 2025-02-01 | 299 € |
| INV-2502-002 | DataFlow | CLT002 | 2025-02-01 | 299 € |
| INV-2502-003 | CloudOps | CLT003 | 2025-02-01 | 999 € |
| INV-2503-001 | TechStart | CLT001 | 2025-03-01 | 299 € |
| INV-2503-003 | CloudOps | CLT003 | 2025-03-01 | 999 € |
| INV-2504-001 | TechStart | CLT001 | 2025-04-01 | 299 € |
| INV-2504-003 | CloudOps | CLT003 | 2025-04-01 | 999 € |
| INV-2505-001 | TechStart | CLT001 | 2025-05-01 | 299 € |
Invoice number: This is the main key in your database, guaranteeing the uniqueness of each line and thus avoiding double counting, which would distort the analysis. It is preferable for its format to be 'text', to prevent Excel from converting it to 'Date' format, thereby jeopardising the concept of the unique identifier.
Customer identifier: This identifier is used as a source of truth to link invoicing (reflected by this database) with any information relating to the customer. So if, for example, your customer's name changes in the course of the transaction, you can trace this.
Invoice date (standard): Although in practice it is at the end of the month that MRR is observed, it is preferable to conduct the analysis at the first of the month to ensure that the formulas work properly and, above all, to simplify them.
You could enrich this database by adding qualitative columns such asRegion, Segment or Acquisition Channel. These additional dimensions would allow you to go even deeper in your analysis, for example by comparing retention rates according to geographical area or customer profile. However, this guide concentrates on the essentials for the sake of readability and efficiency: we have kept only the columns that are essential for calculating cohort metrics.
Customer base
The customer database complements the billing database by centralising stable and qualitative information about each customer. Unlike the billing database, which records each transaction, this database contains a single line for each customer, identified by the Customer ID. It contains permanent or slightly variable attributes: name, business segment, acquisition channel, key dates (acquisition, possible churn), and the last known MRR.
|
Client ID
1
|
Customer |
Region
2
|
Segment
3
|
Sector
4
|
Acquisition channel
5
|
Acquisition Date | Churn Date | Last MRR |
|---|---|---|---|---|---|---|---|---|
| CLT001 | TechStart | EMEA | Startup | Tech | Paid Search | Jan-25 | - | 299 € |
| CLT002 | DataFlow | Americas | SMB | Tech | Organic | Jan-25 | Mar-25 | 299 € |
| CLT003 | CloudOps | APAC | Enterprise | Finance | Outbound Sales | Feb-25 | - | 999 € |
| CLT004 | Startup Hub | EMEA | Startup | Tech | Referral | Mar-25 | - | 99 € |
| CLT009 | DataSync | APAC | SMB | Healthcare | Organic | Jan-25 | - | 499 € |
| CLT010 | CloudBase | EMEA | Startup | Retail | Referral | Jan-25 | - | 149 € |
| CLT011 | AutoDeploy | Americas | SMB | Tech | Paid Social | Jan-25 | Aug-25 | 249 € |
| CLT012 | SecureNet | APAC | Enterprise | Finance | Outbound Sales | Jan-25 | - | 699 € |
Each line represents a single customer with unchanging or rarely changed characteristics. TheMRR shown corresponds to the last known amount, providing a quick overview of the current value of the customer portfolio. For an overdue customer, the MRR for the last month of activity is displayed.
Customer identifier: This identifier is the link with the billing database. It is the key to cross-referencing transactional data with customer attributes for your segmented analyses.
Region: Allows you to segment your analyses geographically. For example, you can compare retention rates between EMEA, Americas and APAC, or identify the markets where your product is most successful.
Segment: Categorises your customers according to their sales profile (Startup, SMB, Enterprise). This segmentation often reveals distinct patterns of behaviour: Enterprise customers may have a higher MRR but a longer sales cycle, while Startups commit quickly but are sometimes more volatile.
Sector: Classifies your customers according to their industry (Tech, Finance, Healthcare, Retail, etc.). This segmentation can reveal sector-specific trends: certain industries may have different retention patterns or growth potential based on their market dynamics.
Acquisition channel: Identifies the customer's source of origin (Paid Search, Organic, Outbound Sales, Referral, etc.). This data is essential for assessing the quality of each channel: some may generate a lot of volume but with low retention, while others bring in fewer customers but with better LTV.
The relationship between the two databases is via the customer identifier: the billing database records each transaction with its customer identifier, while the customer database stores the customer's permanent attributes. This architecture allows you to navigate easily between the transactional view (billing) and the customer view (profile and characteristics).
Of course, these characteristics are just examples. Depending on your business (product, pricing, geographical scope, KPI, etc), they will differ.
Step 2: Creation of a monthly MRR tracker for each customer
Once you have structured your two databases, the next step is to build a table that summarises the MRR evolution of each customer, month by month. This consolidated table is the cornerstone of your cohort analysis: it will enable you to calculate all the essential metrics (MRR churn, expansion, Downsell, new MRR, ARPA) that we will detail in the following sections.
The structure of this table is based on the information you have already collected: the customer database determines the rows in the table (one customer = one row), as well as the analysis period for each customer (from the date of acquisition to the date of churn, or to the present day if the customer is still active). Labase Invoicing feeds the monthly values: for each month, it tells you the MRR invoiced by this customer.
The result is a table where each column represents a calendar month (from your minimum month to your maximum month of analysis), and each cell contains the customer's MRR for that month. Empty or zero cells indicate either that the customer has not yet been acquired, or that it has already expired. This visual representation makes it possible to identify customer trajectories at a glance: progressive upsells, stability, downsells or churn.
| Id | Customer | Acquisition date | Date Churn | Jan-25 | Feb-25 | Mar-25 | Apr-25 | May 25 | Jun-25 | Jun-25 | Aug-25 | Sep-25 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CLT001 | TechStart | 2025-01-01 | - | 990 € | 990 € | 1 480 € | 1 480 € | 1 480 € | 1 480 € | 1 480 € | 1 480 € | 1 480 € |
| CLT002 | DataFlow | 2025-01-01 | 2025-03-01 | 2 990 € | 2 990 € | 2 990 € | - € | - € | - € | - € | - € | - € |
| CLT003 | CloudOps | 2025-02-01 | - | - € | 4 990 € | 9 990 € | 9 990 € | 9 990 € | 9 990 € | 9 990 € | - € | - € |
| CLT004 | Startup Hub | 2025-03-01 | - | - € | - € | 1 990 € | 1 990 € | 990 € | - € | - € | - € | - € |
| CLT005 | DevTools | 2025-01-01 | - | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € |
| CLT006 | WebFlow | 2025-01-01 | - | 1 990 € | 1 990 € | 2 990 € | 2 990 € | 2 990 € | 2 990 € | 2 990 € | 2 990 € | 2 990 € |
| CLT007 | DataSync | 2025-01-01 | - | 3 990 € | 3 990 € | 4 990 € | 4 990 € | 4 990 € | 4 990 € | 4 990 € | 4 990 € | 4 990 € |
| CLT008 | CloudBase | 2025-01-01 | - | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € | 1 490 € |
Step 3: Consolidate lines by acquisition period
It is at this stage that the cohorts begin to emerge, because the criterion for the uniqueness of the rows in the table is no longer the customer, but the date of acquisition. Depending on the maturity of the business, you may consolidate by month, quarter or year of acquisition.
It is possible to proceed using two prisms: the MRR prism and the customer prism. These two approaches complement each other, so we're going to use both.
MRR cohorts
BEFORE
| Customer | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| CLT001 | 2 500 € | 2 500 € | 2 800 € | 2 800 € | 2 800 € | 3 200 € |
| CLT002 | 1 200 € | 1 200 € | 1 200 € | - | - | - |
| CLT003 | 1 800 € | 1 800 € | 1 800 € | 1 800 € | 1 800 € | 1 800 € |
| CLT004 | - | 3 500 € | 3 500 € | 3 500 € | 3 500 € | 3 500 € |
| CLT005 | - | - | 2 200 € | 2 200 € | 2 200 € | 2 200 € |
| CLT006 | - | - | - | 1 500 € | 1 500 € | 1 500 € |
AFTER
| Cohort | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| Jan-26 | 5 500 € | 5 500 € | 5 800 € | 4 600 € | 4 600 € | 5 000 € |
| Feb-26 | - | 3 500 € | 3 500 € | 3 500 € | 3 500 € | 3 500 € |
| Mar-26 | - | - | 2 200 € | 2 200 € | 2 200 € | 2 200 € |
| Apr-26 | - | - | - | 5 000 € | 5 000 € | 5 000 € |
| May-26 | - | - | - | - | 3 200 € | 3 200 € |
| Jun-26 | - | - | - | - | - | 4 700 € |
Customer cohorts
BEFORE
| Customer | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| CLT001 | 1 | 1 | 1 | 1 | 1 | 1 |
| CLT002 | 1 | 1 | 1 | - | - | - |
| CLT003 | 1 | 1 | 1 | 1 | 1 | 1 |
| CLT004 | - | 1 | 1 | 1 | 1 | 1 |
| CLT005 | - | - | 1 | 1 | 1 | 1 |
| CLT006 | - | - | - | 1 | 1 | 1 |
AFTER
| Cohort | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| Jan-25 | 3 | 3 | 3 | 2 | 2 | 2 |
| Feb-25 | - | 1 | 1 | 1 | 1 | 1 |
| Mar-25 | - | - | 1 | 1 | 1 | 1 |
| Apr-25 | - | - | - | 1 | 1 | 1 |
| May 25 | - | - | - | - | 1 | 1 |
| Jun-25 | - | - | - | - | - | 1 |
Thanks to these tables, it is now possible to draw initial conclusions about the behaviour of the cohorts, but the full analysis does not stop there.
Step 4: Transpose the matrix to introduce the evolutionary vision
Without this step, the analysis is limited to a behavioural evolution of the cohorts over the calendar months, and not over their lifetime in your database, which is of greater interest to us.
The logic is simple: determine the number of months elapsed between the month of acquisition and the month of activity observed in the column of the table. As with the previous step, this can be done from either the customer or MRR perspective.
MRR cohorts
BEFORE
| Cohort | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| Jan-25 | 5 500 € | 5 500 € | 5 800 € | 4 600 € | 4 600 € | 5 000 € |
| Feb-25 | - | 3 500 € | 3 500 € | 3 500 € | 3 500 € | 3 500 € |
| Mar-25 | - | - | 2 200 € | 2 200 € | 2 200 € | 2 200 € |
| Apr-25 | - | - | - | 4 800 € | 4 800 € | 4 800 € |
| May 25 | - | - | - | - | 3 200 € | 3 200 € |
| Jun-25 | - | - | - | - | - | 4 500 € |
AFTER
| Cohort | M0 | M1 | M2 | M3 | M4 | M5 |
|---|---|---|---|---|---|---|
| Jan-25 | 5 500 € | 5 500 € | 5 800 € ↑+5% | 4 600 € ↓-16% | 4 600 € ↓-16% | 5 000 € ↓-9% |
| Feb-25 | 3 500 € | 3 500 € | 3 500 € | 3 500 € | 3 500 € | - |
| Mar-25 | 2 200 € | 2 200 € | 2 200 € | 2 200 € | - | - |
| Apr-25 | 4 800 € | 4 800 € | 4 800 € | - | - | - |
| May 25 | 3 200 € | 3 200 € | - | - | - | - |
| Jun-25 | 4 500 € | - | - | - | - | - |
Note: The percentages displayed after M0 represent the NRR (Net Revenue Retention) in quasi-direct reading, i.e. NRR - 100%, so a value of +5% corresponds to a NRR of 105%. In other words, '↑' suggests a NRR > 100%, while '↓' suggests a contraction and therefore a NRR < 100%.
Customer cohorts
BEFORE
| Cohort | Jan-26 | Feb-26 | Mar-26 | Apr-26 | May-26 | Jun-26 |
|---|---|---|---|---|---|---|
| Jan-25 | 3 | 3 | 3 | 2 | 2 | 2 |
| Feb-25 | - | 1 | 1 | 1 | 1 | 1 |
| Mar-25 | - | - | 1 | 1 | 1 | 1 |
| Apr-25 | - | - | - | 2 | 2 | 2 |
| May 25 | - | - | - | - | 1 | 1 |
| Jun-25 | - | - | - | - | - | 2 |
AFTER
| Cohort | M0 | M1 | M2 | M3 | M4 | M5 |
|---|---|---|---|---|---|---|
| Jan-25 | 3 | 3 | 3 | 2 ↓-33% | 2 ↓-33% | 2 ↓-33% |
| Feb-25 | 1 | 1 | 1 | 1 | 1 | - |
| Mar-25 | 1 | 1 | 1 | 1 | - | - |
| Apr-25 | 2 | 2 | 2 | - | - | - |
| May 25 | 1 | 1 | - | - | - | - |
| Jun-25 | 2 | - | - | - | - | - |