1 of 11

Machine Learning Models

In the next section, you’ll learn how to define a scenario, train a model, and leverage the results to make predictions, take strategic actions, and make data-driven decisions that directly impact your business.

Each model will be introduced on a dedicated page with step-by-step instructions and a video tutorial, guiding you through the process from setup to actionable insights.

Timeseries Forecast

Model Scenario

A Timeseries Forecast Model is designed to predict future values by analyzing historical time-related data. To utilize this model, your dataset must include both time-based and numerical columns. In this tutorial, we'll cover the fundamentals of the Model Scenario to help you achieve optimal results.

For the Target Column, select a numeric value you want to predict. It's crucial to have values by day, week, or year. If some dates are repeated, you can aggregate them by taking their sum, average, etc.

Next, you can choose a Sequence Identifier Field to group fields and generate an independent time series and forecast forecast for each group. Keep in mind, these values shouldn't be unique; they must form a series and there is maximum of 500 unique values allowed as sequence identifier. If you don't want to generate independent time series for each group, you can leave this option empty.

Then, select the Time/Date Column, specifying the column containing time-related values. The Time Interval represents the data frequency—choose daily for daily data, yearly for annual data, etc. With Forecast Horizon, decide how many days, weeks, or years you want to predict from the last date in your dataset.

The model performs well with seasonal data patterns. If your data shows a linear growth trend, select "additive" for Seasonality Mode; for exponential growth, select "multiplicative." For example, if you see annual patterns, set Yearly Seasonality to True. (TIP: Plotting your data beforehand can help you understand these patterns.) If you're unsure, the model will attempt to detect seasonality automatically.

For daily or hourly intervals, you can access Advanced Parameters to add special dates, weekends, holidays, or limit the target value.

Advanced Parameters

We are constantly enhancing our platform with new features and improving existing models. For your daily data, we've introduced some new capabilities that can significantly boost forecast accuracy. Now, you can limit your target predictions, remove outliers, and include country holidays and special events.

To set prediction limits, enter the minimum and maximum values for your target variable. For example, if you're predicting daily temperatures and know the maximum is 40°C, enter that value to prevent the model from predicting higher temperatures. This helps the model recognize the appropriate range of the Target Column. Additionally, you can use the Remove Days of the Week feature to exclude certain days from your predictions.

Country holidays and special dates

We added parameters for country holidays and special dates to improve model accuracy. Large deviations can occur around holidays, where stores see more customers than usual. By informing the model about these holidays, you can achieve more balanced and accurate predictions. To add holidays in Graphite Note, navigate to the advanced section of the Model Scenario and select the relevant country or countries.

Similarly, you can add promotions or events that affect your data by enabling Add special dates option. Enter the promotion name, start date, duration, and future dates. This ensures the model accounts for these events in future predictions.

Combining these parameters provides more accurate results. The more information the model receives, the better the predictions.

Removing data points

In addition to adding holidays and special events, you can delete specific data points from your dataset. In Graphite Note, enter the start and end dates of the period you want to remove. For single-day periods, enter the same start and end date. You can remove multiple periods if necessary. Understanding your data and identifying outliers or irrelevant periods is crucial for accurate predictions. Removing these dates can help eliminate biases and improve model accuracy.

By following these steps, you can harness the full potential of your Timeseries Forecast Model, providing valuable insights and more accurate predictions for your business. Now it's your turn to do some modeling and explore your results!

Training model

After setting all parameters it is time to Run Scenario and train Machine Learning model.

The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

Model Performance

Model Fit

The Model Fit Tab displays a graph with actual and predicted values. The primary prediction is shown with a yellow line, and the uncertainty interval is illustrated with a yellow shaded area. This visualization helps assess the model's performance.

If you used the Sequence Identifier Field, you can choose which value to analyze in each Model Result.

Trend

Trends and seasonality are key characteristics of time-series data that should be analyzed. The Trend Tab displays a graph illustrating the global trend that Graphite Note has detected from your historical data.

Seasonality

Seasonality represents the repeating patterns or cycles of behavior over time. Depending on your Time Interval, you can find one or two graphs in the Seasonality Tab. For daily data, one graph shows weekly patterns, while the other shows yearly patterns. For weekly and monthly data, the graph highlights recurring patterns throughout the year.

Special Dates

The Special Dates graph shows the percentage effects of the special dates and holidays in historical and future data.

Details

Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.

Take actions with Timeseries forecast

Once the model is trained, you can use it to predict future values, solve binary classification problems, and drive business decisions. Here are ways to take action with your Timeseries forecast model:

Predict

After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.

Create Notebook

You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Timerseries Forecast model.

Binary Classification

Model Scenario

With the Binary Classification model, you can analyze feature importance in a binary column with two distinct values. This model also predicts likely outcomes based on various parameters. To achieve optimal results, we'll cover the basics of the Model Scenario, where you will select parameters related to your dataset and the model itself.

To run the scenario, you need to have a Target Feature, which must be a binary column. This means it should contain only two distinct values, such as Yes/No or 1/0.

In the next step, select the Model Features you wish to analyze. All features that fit into the model are selected by default, but you may deselect any features you do not want to use. Graphite Note automatically preprocesses your data for model training, excluding features that are unsuitable. You can view the list of excluded features and the reasons for their exclusion on the right side of the screen.

Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.

Model Performance

To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. Details on Model metrics can also be found on Accuracy Overview tab.

On the performance tab, you can explore six different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Accuracy Overview, Training Results and Details.

Key Drivers

Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.

Impact Analysis

The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.

Model Fit

The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for binary values (1 or 0, Yes or No).

Accuracy Overview

The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.

On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.

Training Results

In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for binary classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.

Details

Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking on the XLSX button in the right corner.

Take actions with Binary Classification

Once the model is trained, you can use it to predict future values, solve binary classification problems, and drive business decisions. Here are ways to take action with your Binary Classification model:

Actionable Insights

In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Binary Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.

Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access

Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.

Predict

Create Notebook

You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model.

Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the Data Storytelling section.

Multiclass Classification

Model Scenario

With the Multiclass Classification model, you can analyze the importance of the features with 2-25 distinct values. Unlike binary classification, which deals with only two classes, multiclass classification handles multiple classes simultaneously.

To achieve the best results, we will cover the basics of the Model Scenario. In this scenario, you choose parameters related to the dataset and the model.

To run the model, you need to select a Target Feature first. This target is the variable or outcome that the model aims to predict or estimate. The Target Feature should be a text-type column (not a numerical or binary column).

You will be taken to the next step where you can choose all the Model Features you want to analyze. You can select which features the model will analyze. Graphite Note will automatically exclude some features that are not suitable for the model and will provide reasons for each exclusion.

Model Performance

Key Drivers

Impact Analysis

The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.

Model Fit

The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for multiclass feature.

Accuracy Overview

The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.

Training Results

In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for multiclass classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.

Details

Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking the XLSX button in the right corner.

Take actions with Multiclass Classification

Once the model is trained, you can use it to predict future values, solve multi-class classification problems, and drive business decisions. Here are ways to take action with your Multiclass Classification model:

Actionable Insights

In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Multiclass Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.

Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access

Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.

Predict

After building and analyzing a predictive model using Graphite Note, the Predict function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.

Create Notebook

You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Multiclass Classification model.

Regression

Model Scenario

A regression model in machine learning is a type of predictive model used to estimate the relationship between a dependent variable (target feature) and one or more independent variables. It aims to predict continuous outcomes by fitting a line or curve to the data points, minimizing the difference between observed and predicted values. To get the best possible results, we will go through the basics of the Model Scenario. In Model Scenario, you select parameters related to the dataset and model.

To run the model, you have to choose a Target Feature first. The target refers to the variable or outcome that the model aims to predict or estimate. In this case, it should be a numerical column.

Model Performance

On the performance tab, you can explore five different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Training Results and Details.

Key Drivers

Impact Analysis

The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.

Model Fit

The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing comparison between known outcomes (historical) and model predicted outcomes.

Training Results

In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for multiclass classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.

Details

The Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking on the XLSX button in the right corner.

Take actions with Multiclass Classification

Actionable Insights

In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Regression model training results) to produce a tailored analytics narrative aligned with your goals.

Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access

Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.

Predict

Create Notebook

You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Regression model.

General Segmentation

Model Scenario

With General Segmentation, you can uncover hidden similarities in data, such as the relationship between product prices and customer purchase histories. This unsupervised algorithm groups data based on similarities among numerical variables.

To run this model in Graphite, first identify an ID column to distinguish between values (e.g., customers or products within groups). Next, select the numeric columns (features) from your dataset for segmentation.

Now comes the tricky part: data preprocessing! We rarely encounter high-quality data, so we must clean and transform it for optimal model results. What should you do with missing values? Either remove them or replace them with relevant values, such as the mean or a prediction.

For instance, if you have chosen Age and Height as numeric columns, Age might range between 10 and 80, while Height could range from 100 to 210. The algorithm could prioritize Height due to its higher values. To avoid this, you should transform/scale your data; consider standardizing or normalizing it.

In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!

After reviewing all the steps, you can finish and Run Scenario. The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

Model Results

Cluster Summary

As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph.

For example, in the picture above, you can see that customers in Cluster2 have the highest average value of the Total spend, unlike the customers in Cluster0.

By Cluster and By Numeric Value

Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.

Cluster Visualization

You can also have a Cluster Visualization Tab that shows the link between two arguments and how they are distributed. You can change the measures to see different cluster and their distribution.

Details

Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling.

RFM Customer Segmentation

RFM Customer Model - How it Works?

Our intelligent system observes customers' shopping behavior without getting into the nitty-gritty technical details. It watches how recently each customer made a purchase, how often they come back, and how much they spend. The system notices patterns and groups customers accordingly.

This smart system doesn't need you to say, "Anyone who spends over $1000 is a champion." It figures out on its own who the champions are by comparing all the customers to one another.

When we talk about 'champion' customers in the context of RFM analysis, we're referring to those who are the most engaged, recent, and valuable. The system's approach to finding these champions is quite intuitive yet sophisticated.

Here's how it operates:

Observation: Just like a keen observer at a social event, the system starts by watching—collecting data on when each customer last made a purchase (Recency), how often they've made purchases over a certain period (Frequency), and how much they've spent in total (Monetary).
Comparison: Next, the system compares each customer to every other customer. It looks for natural groupings—clusters of customers who exhibit similar purchasing patterns. For example, it might notice a group of customers who shop frequently, no matter the amount they spend, and another group that makes less frequent but more high-value purchases.
Group Formation: Without being told what criteria to use, the system uses the data to form groups. Customers with the most recent purchases, highest frequency, and highest monetary value start to emerge as one group—these are your potential 'champions.' The system does this by measuring the 'distance' between customers in terms of RFM factors, grouping those who are closest together in their purchasing behavior.
Adjustment: The system then iterates, refining the groups by moving customers until the groups are as distinct and cohesive as possible. It's a process of adjustment and readjustment, seeking out the pattern that best fits the natural divisions in the data.
Finalization: Once the system settles on the best grouping, it has effectively ranked customers, identifying those who are the most valuable across all three RFM dimensions. These are your 'champions,' but the system also recognizes other groups, like new customers who've made a big initial purchase or long-time customers who buy less frequently but consistently.

By using this method, the system takes on the complex task of understanding the many ways customers can be valuable to a business. It provides a nuanced view that goes beyond simple categorizations, recognizing the diversity of customer value. The result is a highly tailored strategy for customer engagement that aligns perfectly with the actual behaviors observed, allowing businesses to interact more effectively with each segment, especially the 'champions' who drive a significant portion of revenue.

Why is ML better than rules-based segmentation?

Here’s why this machine learning approach is more powerful than manual labeling:

Adaptive Learning: The system continuously learns and adapts based on actual behavior, not on pre-set rules that might miss the nuances of how customers are interacting right now.
Time Efficiency: It saves you a mountain of time. No more going through lists of transactions manually to score each customer. The system does it instantly.
Personalized Grouping: Because it’s based on actual behavior, the system creates groups that are tailor-made for your specific customer base and business model, rather than relying on broad, one-size-fits-all categories.
Scalability: Whether you have a hundred customers or a million, this smart system can handle the job. Manual scoring becomes impractical as your customer base grows.
Unbiased Decisions: The system is objective, based purely on data. There’s no risk of human bias that might categorize customers based on assumptions or incomplete information.

In essence, this smart approach to customer grouping helps businesses focus their energy where it counts, creating a personalized experience for each customer, just like a thoughtful host at a party who knows exactly who likes what. It’s about making everyone feel special without having to ask them a single question.

RFM Model Scores

In the RFM model in Graphite Note, the intelligent system categorizes customers into segments based on their Recency (R), Frequency (F), and Monetary (M) values, assigning scores from 0 to 4 for each of these three dimensions. With five scoring options for each RFM category (including the '0' score), this creates a comprehensive grid of potential combinations—resulting in a total of 125 unique segments (5 options for R x 5 options for F x 5 options for M = 125 segments).

This segmentation allows for a high degree of specificity. Each customer falls into a segment that accurately reflects their interaction with the business. For example, a customer who recently made a purchase (high Recency), buys often (high Frequency), and spends a lot (high Monetary) could fall into a segment scored as 4-4-4. This would indicate a highly valuable 'champion' customer.

On the other hand, a customer who made a purchase a long time ago (low Recency), buys infrequently (low Frequency), but when they do buy, they spend a significant amount (high Monetary), might be scored as 0-0-4, placing them in a different segment that suggests a different engagement strategy.

By scoring customers on a scale from 0 to 4 across all three dimensions, the business can pinpoint exact customer profiles. This precision allows for highly tailored marketing strategies. For example, those in the highest scoring segments might receive exclusive offers as a reward for their loyalty, while those in segments with room for growth might be targeted with re-engagement campaigns.

The use of 125 segments ensures that the business can differentiate not just between generally good and poor customers, but between various shades of customer behavior, tailoring approaches to nurture the potential value of each unique segment. This granularity facilitates nuanced understanding and actionability for marketing, sales, and customer relationship management.

Model Scenario

Wouldn't be great to tailor your marketing strategy regarding identified groups of customers? That way, you can target each group with personalized offers, increase profit, improve unit economics, etc.

Recency - how long it’s been since a customer bought something from you or visited your website
Frequency - how often a customer buys from you, or how often he visits your website
Monetary - the average spend of a customer per visit, or the overall transaction value in a given period

Let's go through the RFM analysis inside Graphite Note. The dataset on which you will run your RFM Model must contain a time-related column, given that this report studies customer behavior over some time.

We need to distinguish all customers, so we need an identifier variable like Customer ID.

If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.

Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).

That's it, you are ready to run your first RFM Model.

RFM Model Results

RFM Scores

On the RFM Scores Tab, we have an overview of the customers and their scores:

Then you have a ranking of each RFM segment (125 of them) represented in a table.

And finally, a chart showing the number of customers per RFM score.

RFM Analysis

lost customer
hibernating customer
can-not-lose customer
at-risk customer
about-to-sleep customer
need-attention customer
promising customer
new customer
potential loyal customer
loyal customer
champion customer.

All information related to these groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the RFM Analysis Tab.

There is also a table at the end to summarize everything.

Recency

According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups:

lost
lapsing
average activity
active
very active.

In the Recency Tab, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.

Frequency

As Frequency is defined as the total number of purchases, customers can buy:

very rarely
rarely
regullary
frequently
very frequently.

Monetary

Monetary is defined as the amount of money the customer spent, so the customer can be a :

very low spender
low spender
medium spender
high spender
very high spender.

RFM Matrix

Details

All the values related to the first five tabs, with much more, can be found on the Details Tab, in the form of a table.

The RFM model columns outlined in your system provide a structured way to understand and leverage customer purchase behavior. Here’s how each column benefits the end user of the model:

Monetary: Indicates the total revenue a customer has generated. This helps prioritize customers who have contributed most to your revenue.
Avg_monetary: Shows the average spend per transaction. This can be used to gauge the spending level of different customer segments and tailor offers to match their spending habits.
Frequency: Reflects how often a customer purchases. This can inform retention strategies and indicate who might be receptive to more frequent communication.
Recency: Measures the time since the last purchase. This can help target re-engagement campaigns to customers who have recently interacted with your business.
Date_of_last_purchase & Date_of_first_purchase: These dates help track the customer lifecycle and can trigger communications at critical milestones.
Customer_age_days: The duration of the customer relationship. Long-standing customers might benefit from loyalty programs, while newer customers might be encouraged with welcome offers.
Recency_cluster, Frequency_cluster, and Monetary_cluster: These categorizations allow for segmentation at a granular level, helping customize strategies for groups of customers who share similar characteristics.
Rfm_cluster: This overall grouping combines recency, frequency, and monetary values, offering a holistic view of a customer's value and engagement, essential for creating differentiated customer journeys.
Recency_segment_name, Frequency_segment_name, and Monetary_segment_name: These descriptive labels provide intuitive insights into customer behavior and make it easier to understand the significance of each cluster for strategic planning.
Fm_cluster_sum: This score is a combined metric of frequency and monetary clusters, useful in prioritizing customers who are both frequent shoppers and high spenders.
Fm_segment_name and Rfm_segment_name: These labels offer a quick reference to the type of customer segment, simplifying the task of identifying and applying targeted marketing actions.

Testing the Model:

Seeking assurance about the model's accuracy and effectiveness? Here's how you can address these concerns:

Validation with Historical Data: Show how the model’s predictions align with actual customer behaviors observed historically. For instance, demonstrate how high RFM scores correlate with customers who have proven to be valuable.
Segmentation Analysis: Analyze the characteristics of customers within each RFM segment to validate that they make sense. For example, your top-tier RFM segment should clearly consist of customers who are recent, frequent, and high-spending.
Control Groups: Create control groups to test marketing strategies on different RFM segments and compare the outcomes. This can validate the effectiveness of segment-specific strategies suggested by the model.

Building Trust in the Model:

A/B Testing: Implement A/B testing where different marketing approaches are applied to similar customer segments to see which performs better, thereby showcasing the model's utility in identifying the right targets for different strategies.
Benchmarking: Compare the RFM model’s performance against other segmentation models or against industry benchmarks to establish its effectiveness.

Customer Lifetime Value

Introduction

Detecting early signs of reduced customer engagement is pivotal for businesses aiming to maintain loyalty. A notable signal of this disengagement is when a customer's once regular purchasing pattern starts to taper off, leading to a significant decrease in activity. Early detection of such trends allows marketing teams to take swift, proactive measures. By deploying effective retention strategies, such as offering tailored promotions or engaging in personalized communication, businesses can reinvigorate customer interest and mitigate the risk of losing them to competitors.

Our objective is to utilize a model that not only alerts us to customers with an increased likelihood of churn but also forecasts their potential purchasing activity and, importantly, estimates the total value they are likely to bring to the business over time.

These analytical needs are served by what is known in data science as Buy 'Til You Die (BTYD) models. These models track the lifecycle of a customer's interaction with a business, from the initial purchase to the last.

While customer churn models are well-established within contractual business settings, where customers are bound by the terms of service agreements, and churn risk can be anticipated as contracts draw to a close, non-contractual environments present a different challenge. In such settings, there are no defined end points to signal churn risk, making traditional classification models insufficient.

To address this complexity, our model adopts a probabilistic approach to customer behavior analysis, which does not rely on fixed contract terms but on behavioral patterns and statistical assumptions. By doing so, we can discern the likelihood of future transactions for every customer, providing a comprehensive and predictive understanding of customer engagement and value.

Customer Lifetime Value - How it Works?

The Customer Lifetime Value (CLV) model is a robust tool employed to ascertain the projected revenue a customer will contribute over their entire relationship with a business. The model employs historical data to inform predictive assessments, offering valuable foresight for strategic decision-making. This insight assists companies in prioritizing resources and tailoring customer engagement strategies to maximize long-term profitability.

The CLV model executes a series of sophisticated calculations. Yet, its operations can be conceptualized in a straightforward manner:

Historical Analysis: The model comprehensively evaluates past customer transaction data, noting the frequency and monetary value of purchases alongside the tenure of the customer relationship.
Engagement Probability: It assesses the likelihood of a customer’s future engagement based on their past activities, effectively estimating the chances of a customer continuing to transact with the business.
Forecasting: With the accumulated data, the model projects the customer’s future transaction behavior, predicting how often they will make purchases and the potential value of these purchases.
Lifetime Value Calculation: Integrating these elements, the model calculates an aggregate figure representing the total expected revenue from a customer for a designated future period.

Model Scenario

The Customer Lifetime Value model uses historical customer data to predict the future value a customer will generate for a business. It leverages algorithms and statistical techniques to analyze customer behavior, purchase patterns, and other relevant factors to estimate the potential revenue a customer will bring over their lifetime.

The dataset on which you will run your model must contain a time-related column.

We need to distinguish all customers, so we need an identifier variable like Customer ID. If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.

We need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).

Finally, you need to choose the Starting Date from which you'd like to calculate this model for your dataset.

When you're looking at this option for calculating Customer Lifetime Value (CLV), think of it as setting a starting line for a race. The "race" in this case is the journey you're tracking: how much your customers will spend over time.

The "Starting Date for Customer Lifetime Value Calculation" is basically asking you when you want to start watching the race. You have a couple of choices:

Max Date: This is like saying, "I want to start watching the race from the last time we recorded someone crossing the line." It sets the starting point at the most recent date in your records where a customer made a purchase.
Today: Choosing this means you want to start tracking from right now, today. So any purchases made after today will count towards the CLV.
-- select date --: This would be an option if you want to pick a specific date to start from, other than today or the most recent date in your data.

Model Results

Let's see how to interpret the results after we have run our model.

On the summary of repeat customers, we have:

the Total Repeat Customers: the customers came that keep returning (the loyal customers)
the Total Historical Amount: the past earnings from loyal customers
the Average Spend per Repeat Customer
the Average no. of Repeat Purchases: shows the customers' loyalty with the average number of repeat purchases
the Average Probability Alive Next 90 days: estimate the likelihood that a customer stays alive or active for their business in the next 90 days
the Predicted no. of Purchases next 90 days: the number of purchases you can expect the next 90 days based on our analysis
Predicted Amount Next 90 days: the revenue you can expect the next 90 days with our predicted amount feature
CLV Customer Lifetime Value: average revenue that one customer generated in the past and will generate in the future

CLV Insights

The CLV Insights Tab shows some charts on the lifetime of customers.

The forecasted number of purchases chart estimates the number of purchases that are expected to be made by returning customers over a specific period.

The forecasted amount chart is a graphical representation of the projected value of purchases to be made by returning customers over a certain period.

Finally, the average alive probability chart illustrates the average probability of a customer remaining active for a business over time, assuming no repeat purchases.

Details

Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

You have all the information in each column if you click on the link on the details tab.

Understanding the Data Columns

The Details Tab within the Customer Lifetime Value Model offers an extensive breakdown of metrics for in-depth analysis. Each column represents a specific aspect of customer data that is pivotal to understanding and predicting customer behavior and value to your business. Below are the descriptions of the available columns:

`amount_sum`

Description: This column showcases the total historical revenue generated by an individual customer. By analyzing this data, businesses can identify high-value customers and allocate marketing resources efficiently.

`amount_count`

Description: Reflects the total number of purchases by a customer. This frequency metric is invaluable for loyalty assessments and can inform retention strategies.

`repeated_frequency`

Description: Indicates the frequency of repeated purchases, highlighting customer loyalty. This metric can be leveraged for targeted engagement campaigns.

`customer_age`

Description: The duration of the customer's relationship with the business, measured in days since their first purchase. It helps in segmenting customers based on the length of the relationship.

`average_monetary`

Description: Average monetary value per purchase, providing insight into customer spending habits. Businesses can use this to predict future revenue from a customer segment.

`probability_alive`

Description: Displays the current probability of a customer being active. A score of 1 means 100%, the customer is likely active, aiding in prioritizing engagement efforts.

`probability_alive_7_30_60_90_365`

Description: This column shows the probability of customers remaining active over various time frames without repeat purchases. It's critical for developing tailored customer retention plans.

`predicted_no_purchases_7_30_60_90_365`

Description: Predicts the number of future purchases within specific time frames. This forecast is essential for inventory planning and sales forecasting.

`CVL_30_60_90_365`

Description: Estimates potential customer value over different time frames, aiding in strategic financial planning and budget allocation for customer acquisition and retention.

Example - customer 1

In this given example, we have a snapshot of customer data from the CLV model. The model considers various unique aspects of customer behavior to predict future engagement and value. Let's analyze the key data points and what they signify in a non-technical way, while emphasizing the model’s ability to tailor predictions to individual customer behavior:

amount_sum: This customer has brought in a total revenue of $4,584.14 to your business.
amount_count: They have made 108 purchases, which shows a high level of engagement with your store.
repeated_frequency: Out of these purchases, 106 are repeat purchases, suggesting a strong customer loyalty.
customer_age: They have been a customer for 364 days, indicating a relatively long-term relationship with your business.
average_monetary: On average, they spend about $42.73 per transaction.
probability_alive: There’s an 85% to 86% chance that they are still actively engaging with your business, which is quite high.
probability_alive_7: Specifically, the probability that this customer will remain active in the next 7 days is about 44.48%.

Alex, with a remarkable 106 repeated purchases and a customer_age of 364 days, has shown a pattern of strong and consistent engagement. The average monetary value of their purchases is $42.73, contributing significantly to the revenue with a total amount_sum of $4,584.14. The current probability_alive is high, indicating Alex is likely still shopping.

However, even with this consistent past behavior, the probability_alive_7 drops to about 44.48%. It highlights a nuanced understanding of Alex's habits; a sudden change in their routine is notable, which is why the model predicts a more significant impact if Alex were to alter their shopping pattern even slightly.

Example - customer 2

On the other hand, we have Casey, who has made 2 purchases, with only 1 being a repeated transaction. Casey’s amount_sum is $185.93, with an average_monetary value of $84.44, and a customer_age of 135 days. Despite a high current probability_alive, the model shows a minimal decline to 83.73% in the probability_alive_7.

This slight decrease tells us that Casey's engagement is inherently more sporadic. The business doesn't expect Casey to make purchases with the same regularity as Alex. If Casey doesn't return for a week, it isn't alarming or out of character, as reflected in the gentle decline in their seven-day active probability.

The Story of Two Shopping Journeys

The contrast in these profiles, painted by the CLV model, enables the business to craft distinct customer journeys for Alex and Casey. For Alex, it's about ensuring consistency and rewarding loyalty to maintain that habitual engagement. Perhaps an automated alert for engagement opportunities could be set up if they don't make their usual purchases.

For Casey, the strategy may involve creating moments that encourage repeat engagement, possibly through sporadic yet impactful touchpoints. Since Casey's behavior suggests openness to larger purchases, albeit less frequently, the focus could be on highlighting high-value items or exclusive offers that align with their sporadic engagement pattern.

The CLV model's behavioral predictions allow the business to personalize customer experiences, maximize the potential of each interaction, and strategically allocate resources to maintain and grow the value of each customer relationship over time. This bespoke approach is the essence of modern customer relationship management, as it aligns perfectly with the individualized tendencies of customers like Alex and Casey.

Utilizing Column Data

This detailed data is a treasure trove for businesses keen on data-driven decision-making. Here’s how to utilize the information effectively:

Custom Segmentation: Use customer_age, amount_sum, and average_monetary to segment your customers into meaningful groups.
Detect Churners: Use probability_alive to segment customers currently being active for non contractual business like eCommerce and Retail. A score of 0.1 means 10% probability the customer is active ("alive") for your business.
Targeted Marketing Campaigns: Leverage repeated_frequency and probability_alive columns to identify customers for loyalty programs or re-engagement campaigns.
Revenue Projections: The CVL_30_60_90_365 column helps in projecting future revenue and understanding the long-term value of customer segments.
Strategic Planning: Use predicted_no_purchases_7_30_60_90_365 to plan for demand, stock management, and to set realistic sales targets.

By engaging with the columns in the Details Tab, users can extract actionable insights that can drive strategies aimed at optimizing customer lifetime value. Each metric can serve as a building block for a more nuanced, data-driven approach to customer relationship management.

Customer Cohort Analysis

Model Scenario

Do you wonder if the changes that you’ve made in your business impacted new customers or do you want to understand the needs of your user base or identify trends? That and much more you can do with our new model, Customer Cohort Analysis.

A cohort is a subset of users or customers grouped by common characteristics or by their first purchase date. Cohort analysis is a type of behavioral analytics that allows you to track and compare the performance of cohorts over time.

With Graphite, you are only a few steps away from your Cohort model. Once you have selected your dataset, it is time to enter the parameters into the model. The Time/Date Column represents a time-related column.

After that, you have to select the Aggregation level.

For example, if monthly aggregation is selected, Graphite will generate Cohort Analysis with a monthly frequency.

Also, your dataset must contain Customer ID and Order ID/ Trx ID columns as required model parameters.

Last but not least, you have to select the Monetary (amount spent) variable, which represents the main parameter for your Cohort Analysis.

Additionally, you can break down and filter Cohorts by a business dimension (variable) which you select after you enable the checkbox.

That's it, your first Customer Cohort Analysis model is ready.

Model Results

Cohorts

After you run your model, the first tab that appears is the Cohorts Tab.

Number of Customers

Depending on the metric (the default is No of Customers), the results are presented through a graphic representation of her heatmap and the heatmap.

In the example above, groups of customers are grouped by year when they made their first purchase. Column 0 represents the number of customers per cohort (i.e. 4255 customers made their first purchase in 2018). Now we can see their activity year to year: 799 customers came back in 2019, 685 in 2020, and 118 in 2021.

Percentage

If you switch your metric to Percentage, you will get results in percentages.

Amount

Let's track our Monetary column (in our case total amount spent per customer) and switch metric to Amount to see how much money our customers spend through the years.

As you can see above, customers that made their first order in 2018 have spent 46.25M, and 799 customers that came back in 2019 have spent 12.38M. I

Cumulative Amount

In case you want to track the total amount spent through the years, switch metric to Amount (Cumulative).

Basically, we tracked the long-term relationships that we have for our given groups (cohorts). On the other hand, we can compare different cohorts at the same stage in their lifetime. For example, for all the cohorts, we can see how much the average revenue per customer two years after they made their first purchase: the average revenue per customer in the cohort from 2019 (12.02K) is almost half less than from 2018 (21.05K). Here is an opportunity to see what went wrong and make a new business strategy.

Repeat By

In case you broke down and filtered cohorts by a variable with less than 20 distinct values (parameter Repeat by in Model Scenario), for each value you will get a separate Cohort Analysis in the Repeat by Tab.

Details

All the values related to the Cohorts and Repeat by Tabs, with much more, can be found on the Details Tab, in the form of a table.

Now it's your turn to track your customer's behavior, see when is the best time for remarketing, and how to improve customer retention.

ABC Pareto Analysis

Model Scenario

Often companies spend a lot of time managing items/entities that have a low contribution to the profit margin. Every item/entity inside your shop does not have equal value - some of them cost more, some are used more frequently, and some are both. This is where the ABC Pareto analysis steps in, which helps companies to focus on the right items/entities.

ABC analysis is a classification method in which items/entities are divided into three categories, A, B, and C.

Category A is typically the smallest category and consists of the most important items/entities ('the vital few'),
while category C is the largest category and consists of least valuable items/entities ('the trivial many').

To create analysis you need to define 2 parameters:

ID column This represents the unique identifier or description of each entity being analyzed, such as a product ID or product name.
Numeric column - This is a measurable value used to categorize items into A, B, or C classes based on their relative importance. Common metrics include total sales volume, revenue, or usage frequency.

Model Results

Since ABC inventory analysis divides items into 3 categories, let's analyze these categories by checking the Model Results. The results consist of 4 tabs: Overview, ABC Summary, Pareto Chart, and Details Tabs.

Overview

In the Overview tab provides an actionable summary that supports data-driven decision-making by focusing on high-impact areas within the dataset. You’ll find a structured breakdown of entities within a chosen dimension (e.g., product_id) categorized based on a specific metric (e.g., price). This analysis highlights the contributions of different entities, focusing on the most impactful ones.

Key highlights in the Overview tab include:

• Category Breakdown: The dimension is divided into three categories:

• Category A: Top contributors representing few entities with a large share of the total metric.

• Category B: Mid-range contributors with moderate impact and growth potential.

• Category C: The largest group with the least individual impact.

• ABC Analysis Process: Explanation of sorting entities, calculating cumulative totals, and dynamically determining category boundaries based on cumulative contributions.

• Benefits and Next Steps: Highlights key points of the analysis. Encourages reviewing the Pareto Chart for visual insights, exploring detailed metrics, and identifying high-impact entities for strategic action.

ABC Summary

• The left chart shows the percentage of entities in each category (A, B, and C), illustrating how they are divided within the selected dimension (product_id).

• The right chart highlights each category’s contribution to the total metric (freight_price), showing how a smaller portion of entities (Category A) accounts for the majority of the impact, while the larger portion (Category C) has a lesser effect.

Together, these charts emphasize the purpose of ABC Analysis: to identify the “vital few” entities (Category A) that drive the most value, supporting targeted decision-making.

In the picture above, we can see that 33.77% of the items belong to category A and they represent 50.55% of the total value, meaning the biggest profit comes from the items in category A!

Pareto Chart

The ABC analysis, also called Pareto analysis, is based on the Pareto principle, which says that 80% of the results (output) come from 20% of the efforts (input). The Pareto Chart is a combination of a bar and a line graph - it contains both bars and lines, where each bar represents an item/entity in descending order, while the height of the bar represents the value of the item/entity. The curved orange line represents the cumulative percentage of the item/entity.

Details

The Details tab provides a granular view of the dataset resulting from the ABC Analysis. Each row represents an entity along with the following key details:

• The metric used for categorization, indicating each entity’s contribution (.

• The category assigned to each entity (A, B, or C) based on its relative impact.

• The cumulative percentage contribution of each entity to the total freight price, showing its share within the dataset.

This detailed breakdown allows users to identify specific high-impact entities in Category A, moderate contributors in Category B, and lower-impact entities in Category C, supporting data-driven prioritization and decision-making.

There is a long list of benefits from including ABC analysis in your business, such as improved inventory optimization and forecasting, reduced storage expenses, strategic pricing of the products, etc. With Graphite, all you have to do is upload your data, create the desired model, and explore the results.

New vs Returning Customers

Model Scenario

In this report, we want to divide customers into returning and new customers (this is the most fundamental type of ). The new customers have made only one purchase from your business, while the returning ones have made more than one.

Let’s go through their basic characteristics.

New customers are:

forming the foundation of your customer base
telling you if your marketing campaigns are working (improving current offerings, what to add to your repertoire of products or services)

while returning customers are:

giving you feedback on your business (if you have a high number of returning customers it suggests that customers are finding value in your products or service)
saving you a lot of time, effort, and money.

Let's go through the New vs returning customer analysis inside Graphite. The dataset on which you will run your model must contain a time-related column.

Since the dataset contains data for a certain period, it's important to choose the aggregation level.

For example, if weekly aggregation is selected, Graphite will generate a new vs returning customers dataset with a weekly frequency.

It is necessary to contain data such as Customer ID

Additionally, if you want, you can choose the Monetary (amount spent) variable.

With Graphite, compare absolute figures and percentages, and learn how many customers you are currently retaining on a daily, weekly, or monthly basis.

Model Results

New vs Returning

Depending on the aggregation level, you can see the number of distinct and returning customers detected in the period on the New vs Returning Tab.

For example, in December 2020, there were a total of 2.88k customers, of which 1.84K were new and 1.05K returning. You can also choose a daily representation that is more precise.

Retention %

If you are interested in retention, the percentage of your returning customers, through a period, use the Retention % Tab.

Revenue New vs Returning

Details

Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

RFM Customer Segmentation

RFM Customer Model - How it Works?

This smart system doesn't need you to say, "Anyone who spends over $1000 is a champion." It figures out on its own who the champions are by comparing all the customers to one another.

Here's how it operates:

Observation: Just like a keen observer at a social event, the system starts by watching—collecting data on when each customer last made a purchase (Recency), how often they've made purchases over a certain period (Frequency), and how much they've spent in total (Monetary).
Comparison: Next, the system compares each customer to every other customer. It looks for natural groupings—clusters of customers who exhibit similar purchasing patterns. For example, it might notice a group of customers who shop frequently, no matter the amount they spend, and another group that makes less frequent but more high-value purchases.
Group Formation: Without being told what criteria to use, the system uses the data to form groups. Customers with the most recent purchases, highest frequency, and highest monetary value start to emerge as one group—these are your potential 'champions.' The system does this by measuring the 'distance' between customers in terms of RFM factors, grouping those who are closest together in their purchasing behavior.
Adjustment: The system then iterates, refining the groups by moving customers until the groups are as distinct and cohesive as possible. It's a process of adjustment and readjustment, seeking out the pattern that best fits the natural divisions in the data.
Finalization: Once the system settles on the best grouping, it has effectively ranked customers, identifying those who are the most valuable across all three RFM dimensions. These are your 'champions,' but the system also recognizes other groups, like new customers who've made a big initial purchase or long-time customers who buy less frequently but consistently.

Why is ML better than rules-based segmentation?

Here’s why this machine learning approach is more powerful than manual labeling:

Adaptive Learning: The system continuously learns and adapts based on actual behavior, not on pre-set rules that might miss the nuances of how customers are interacting right now.
Time Efficiency: It saves you a mountain of time. No more going through lists of transactions manually to score each customer. The system does it instantly.
Personalized Grouping: Because it’s based on actual behavior, the system creates groups that are tailor-made for your specific customer base and business model, rather than relying on broad, one-size-fits-all categories.
Scalability: Whether you have a hundred customers or a million, this smart system can handle the job. Manual scoring becomes impractical as your customer base grows.
Unbiased Decisions: The system is objective, based purely on data. There’s no risk of human bias that might categorize customers based on assumptions or incomplete information.

RFM Model Scores

Model Scenario

Model identifies customers based on three key factors:

Recency - how long it’s been since a customer bought something from you or visited your website
Frequency - how often a customer buys from you, or how often he visits your website
Monetary - the average spend of a customer per visit, or the overall transaction value in a given period

We need to distinguish all customers, so we need an identifier variable like Customer ID.

If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.

Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).

That's it, you are ready to run your first RFM Model.

RFM Model Results

As we now know how to run in Graphite Note, let's go through the Model Results. The results consist of 7 tabs: , , , , , , and Tabs. All results are visualized because a visual summary of information makes it easier to identify patterns than looking through thousands of rows.

RFM Scores

On the RFM Scores Tab, we have an overview of the customers and their scores:

Then you have a ranking of each RFM segment (125 of them) represented in a table.

And finally, a chart showing the number of customers per RFM score.

RFM Analysis

RFM model analysis ranks every customer in each of these three categories on a scale of 0 (worst) to 4 (best). After that, we assign an RFM score to each customer, by concatenating his numbers for , , and value. Depending upon their RFM score, customers can be segregated into the following categories:

lost customer
hibernating customer
can-not-lose customer
at-risk customer
about-to-sleep customer
need-attention customer
promising customer
new customer
potential loyal customer
loyal customer
champion customer.

All information related to these groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the RFM Analysis Tab.

There is also a table at the end to summarize everything.

Recency

According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups:

lost
lapsing
average activity
active
very active.

In the Recency Tab, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.

Frequency

As Frequency is defined as the total number of purchases, customers can buy:

very rarely
rarely
regullary
frequently
very frequently.

In the Frequency Tab, you can track down the same behavior of the related groups, as with the Tab.

Monetary

Monetary is defined as the amount of money the customer spent, so the customer can be a :

very low spender
low spender
medium spender
high spender
very high spender.

In the Monetary Tabs, you can track down the same behavior of the related groups, as with the Tab.

RFM Matrix

The RFM Matrix Tab represents a matrix, showing the number of customers, monetary sum and average, average frequency, and average recency (with breakdown by , , and segments).

Details

All the values related to the first five tabs, with much more, can be found on the Details Tab, in the form of a table.

The RFM model columns outlined in your system provide a structured way to understand and leverage customer purchase behavior. Here’s how each column benefits the end user of the model:

Monetary: Indicates the total revenue a customer has generated. This helps prioritize customers who have contributed most to your revenue.
Avg_monetary: Shows the average spend per transaction. This can be used to gauge the spending level of different customer segments and tailor offers to match their spending habits.
Frequency: Reflects how often a customer purchases. This can inform retention strategies and indicate who might be receptive to more frequent communication.
Recency: Measures the time since the last purchase. This can help target re-engagement campaigns to customers who have recently interacted with your business.
Date_of_last_purchase & Date_of_first_purchase: These dates help track the customer lifecycle and can trigger communications at critical milestones.
Customer_age_days: The duration of the customer relationship. Long-standing customers might benefit from loyalty programs, while newer customers might be encouraged with welcome offers.
Recency_cluster, Frequency_cluster, and Monetary_cluster: These categorizations allow for segmentation at a granular level, helping customize strategies for groups of customers who share similar characteristics.
Rfm_cluster: This overall grouping combines recency, frequency, and monetary values, offering a holistic view of a customer's value and engagement, essential for creating differentiated customer journeys.
Recency_segment_name, Frequency_segment_name, and Monetary_segment_name: These descriptive labels provide intuitive insights into customer behavior and make it easier to understand the significance of each cluster for strategic planning.
Fm_cluster_sum: This score is a combined metric of frequency and monetary clusters, useful in prioritizing customers who are both frequent shoppers and high spenders.
Fm_segment_name and Rfm_segment_name: These labels offer a quick reference to the type of customer segment, simplifying the task of identifying and applying targeted marketing actions.

Testing the Model:

Seeking assurance about the model's accuracy and effectiveness? Here's how you can address these concerns:

Validation with Historical Data: Show how the model’s predictions align with actual customer behaviors observed historically. For instance, demonstrate how high RFM scores correlate with customers who have proven to be valuable.
Segmentation Analysis: Analyze the characteristics of customers within each RFM segment to validate that they make sense. For example, your top-tier RFM segment should clearly consist of customers who are recent, frequent, and high-spending.
Control Groups: Create control groups to test marketing strategies on different RFM segments and compare the outcomes. This can validate the effectiveness of segment-specific strategies suggested by the model.

Building Trust in the Model:

A/B Testing: Implement A/B testing where different marketing approaches are applied to similar customer segments to see which performs better, thereby showcasing the model's utility in identifying the right targets for different strategies.
Benchmarking: Compare the RFM model’s performance against other segmentation models or against industry benchmarks to establish its effectiveness.

Customer Lifetime Value

Introduction

Customer Lifetime Value - How it Works?

The CLV model executes a series of sophisticated calculations. Yet, its operations can be conceptualized in a straightforward manner:

Historical Analysis: The model comprehensively evaluates past customer transaction data, noting the frequency and monetary value of purchases alongside the tenure of the customer relationship.
Engagement Probability: It assesses the likelihood of a customer’s future engagement based on their past activities, effectively estimating the chances of a customer continuing to transact with the business.
Forecasting: With the accumulated data, the model projects the customer’s future transaction behavior, predicting how often they will make purchases and the potential value of these purchases.
Lifetime Value Calculation: Integrating these elements, the model calculates an aggregate figure representing the total expected revenue from a customer for a designated future period.

Model Scenario

The dataset on which you will run your model must contain a time-related column.

We need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).

Finally, you need to choose the Starting Date from which you'd like to calculate this model for your dataset.

The "Starting Date for Customer Lifetime Value Calculation" is basically asking you when you want to start watching the race. You have a couple of choices:

Max Date: This is like saying, "I want to start watching the race from the last time we recorded someone crossing the line." It sets the starting point at the most recent date in your records where a customer made a purchase.
Today: Choosing this means you want to start tracking from right now, today. So any purchases made after today will count towards the CLV.
-- select date --: This would be an option if you want to pick a specific date to start from, other than today or the most recent date in your data.

Model Results

Let's see how to interpret the results after we have run our model.

And then, the results consist of 2 tabs: and Tabs.

On the summary of repeat customers, we have:

the Total Repeat Customers: the customers came that keep returning (the loyal customers)
the Total Historical Amount: the past earnings from loyal customers
the Average Spend per Repeat Customer
the Average no. of Repeat Purchases: shows the customers' loyalty with the average number of repeat purchases
the Average Probability Alive Next 90 days: estimate the likelihood that a customer stays alive or active for their business in the next 90 days
the Predicted no. of Purchases next 90 days: the number of purchases you can expect the next 90 days based on our analysis
Predicted Amount Next 90 days: the revenue you can expect the next 90 days with our predicted amount feature
CLV Customer Lifetime Value: average revenue that one customer generated in the past and will generate in the future

CLV Insights

The CLV Insights Tab shows some charts on the lifetime of customers.

The forecasted number of purchases chart estimates the number of purchases that are expected to be made by returning customers over a specific period.

The forecasted amount chart is a graphical representation of the projected value of purchases to be made by returning customers over a certain period.

Finally, the average alive probability chart illustrates the average probability of a customer remaining active for a business over time, assuming no repeat purchases.

Details

Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

You have all the information in each column if you click on the link on the details tab.

Understanding the Data Columns

`amount_sum`

`amount_count`

Description: Reflects the total number of purchases by a customer. This frequency metric is invaluable for loyalty assessments and can inform retention strategies.

`repeated_frequency`

Description: Indicates the frequency of repeated purchases, highlighting customer loyalty. This metric can be leveraged for targeted engagement campaigns.

`customer_age`

Description: The duration of the customer's relationship with the business, measured in days since their first purchase. It helps in segmenting customers based on the length of the relationship.

`average_monetary`

Description: Average monetary value per purchase, providing insight into customer spending habits. Businesses can use this to predict future revenue from a customer segment.

`probability_alive`

Description: Displays the current probability of a customer being active. A score of 1 means 100%, the customer is likely active, aiding in prioritizing engagement efforts.

`probability_alive_7_30_60_90_365`

Description: This column shows the probability of customers remaining active over various time frames without repeat purchases. It's critical for developing tailored customer retention plans.

`predicted_no_purchases_7_30_60_90_365`

Description: Predicts the number of future purchases within specific time frames. This forecast is essential for inventory planning and sales forecasting.

`CVL_30_60_90_365`

Description: Estimates potential customer value over different time frames, aiding in strategic financial planning and budget allocation for customer acquisition and retention.

Example - customer 1

amount_sum: This customer has brought in a total revenue of $4,584.14 to your business.
amount_count: They have made 108 purchases, which shows a high level of engagement with your store.
repeated_frequency: Out of these purchases, 106 are repeat purchases, suggesting a strong customer loyalty.
customer_age: They have been a customer for 364 days, indicating a relatively long-term relationship with your business.
average_monetary: On average, they spend about $42.73 per transaction.
probability_alive: There’s an 85% to 86% chance that they are still actively engaging with your business, which is quite high.
probability_alive_7: Specifically, the probability that this customer will remain active in the next 7 days is about 44.48%.

Example - customer 2

The Story of Two Shopping Journeys

Utilizing Column Data

This detailed data is a treasure trove for businesses keen on data-driven decision-making. Here’s how to utilize the information effectively:

Custom Segmentation: Use customer_age, amount_sum, and average_monetary to segment your customers into meaningful groups.
Detect Churners: Use probability_alive to segment customers currently being active for non contractual business like eCommerce and Retail. A score of 0.1 means 10% probability the customer is active ("alive") for your business.
Targeted Marketing Campaigns: Leverage repeated_frequency and probability_alive columns to identify customers for loyalty programs or re-engagement campaigns.
Revenue Projections: The CVL_30_60_90_365 column helps in projecting future revenue and understanding the long-term value of customer segments.
Strategic Planning: Use predicted_no_purchases_7_30_60_90_365 to plan for demand, stock management, and to set realistic sales targets.