Clear all

UCLU3 Unsupervised Learning - rfm calculations  




Ref following codes in the exercise 'Calculating RFM values':

rfm = df.groupby('CustomerID').agg({



     'InvoiceDay': lambda x: ref_date - x.max()})

The df will have repeated invoice numbers as each row represents an item purchased, so if a transaction has 10 items, then invoice number would be a repeat of 10 times. If so, 'nunique' should be used, rather than 'count'?


My filtered df shows 15,862 unique invoice numbers (hope my filtering is correct). If we print(rfm.Frequency.sum()) after using the given codes, the figure is 339,702, which corresponds to the no. of rows of the filtered df, in turn corresponding to number of items purchased, rather than no. of invoices.

Please advise if we should use 'nunique' in the code instead of 'count'. Thank you.




Delete your account