I'm trying to create a cohort analysis table to calculate the retention over months.
The data consist of:
customer_id: A text column with id for each customer;
dt_purchase: A text column (YYYY-mm) with the year and month of the customer purchase;
purchase_seq: An integer column that shows the sequential number of the customer's purchase.
2020_03: A boolean where 0 indicates that the user has no purchases e 1 the user has a purchase in March 2020;
2020_04: A boolean where 0 indicates that the user has no purchases e 1 the user has a purchase in April 2020;
2020_05: A boolean where 0 indicates that the user has no purchases e 1 the user has a purchase in May 2020;
2020_06: A boolean where 0 indicates that the user has no purchases e 1 the user has a purchase in June 2020;
The table looks like:
customer_id
dt_purchase
purchase_sequence
2020_03
2020_04
2020_05
2020_06
12345
2020-04
1
0
1
0
0
12345
2020-05
2
0
0
1
0
12345
2020-06
3
0
0
0
1
54321
2020-04
1
0
1
0
0
54321
2020-06
2
0
0
0
1
What I wanting to achieve is a cohort graph like this one .
My question is: can I create a cohort analysis with the data? If yes, how can I proceed with the code?