Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
When you first create your model you have to choose between many models.
Before running your scenario of your model, you can understand how the model is processed. First, it has to train, meaning we take 80% of the dataset to learn about it. Then, the remaining 20% are going to test it and calculate the model score. If the model score is high, the model trained is accurate and close to the test.
Data preprocessing is a crucial step in machine learning, enhancing model accuracy and performance by transforming and cleaning the raw data to remove inconsistencies, handle missing values, and scale features, and ensure compatibility with the chosen algorithm.
During preprocessing we can deal with
null values: if the column is 50% null or more, the column will not be included in model training
missing values: for a numerical column it will change it by the average, and for a categorical feature it will become "not_available"
One Hot Encoding: categorical data is transformed into numeric values before training a model, to be suitable for machine learning algorithms
fit imbalance: fixing the inequal distibution of target class which are not ideal for training
normalization: rescaling the values of numerical columns to have a better training result
constants: if the column has one unique value (a constant), the column will not be included in the model training
cardinality: if the column has high number of unique values, the column will not be included in the model training.
With the Binary Classification model, you can analyze feature importance in a binary column with two distinct values. This model also predicts likely outcomes based on various parameters. To achieve optimal results, we'll cover the basics of the Model Scenario, where you will select parameters related to your dataset and the model itself.
To run the scenario, you need to have a Target Feature, which must be a binary column. This means it should contain only two distinct values, such as Yes/No or 1/0.
In the next step, select the Model Features you wish to analyze. All features that fit into the model are selected by default, but you may deselect any features you do not want to use. Graphite Note automatically preprocesses your data for model training, excluding features that are unsuitable. You can view the list of excluded features and the reasons for their exclusion on the right side of the screen.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.
To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. details on Model metrics can also be found on Accuracy Overview tab.
On the performance tab, you can explore six different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Accuracy Overview, Training Results and Details.
Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.
The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.
The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for binary values (1 or 0, Yes or No).
The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.
On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.
In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for binary classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.
Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.
Once the model is trained, you can use it to predict future values, solve binary classification problems, and drive business decisions. Here are ways to take action with your Binary Classification model:
In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Binary Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.
Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access
Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
Create Notebook
You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model.
Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the Data Storytelling section.
A Timeseries Forecast Model is designed to predict future values by analyzing historical time-related data. To utilize this model, your dataset must include both time-based and numerical columns. In this tutorial, we'll cover the fundamentals of the Model Scenario to help you achieve optimal results. Within the Model Scenario, you'll select parameters related to your dataset and the model itself.
For the Target Column, select a numeric value you want to predict. It's crucial to have values by day, week, or year. If some dates are repeated, you can aggregate them by taking their sum, average, etc.
Next, choose a Sequence Identifier Field to group certain fields and generate an independent forecast for each time series. These values shouldn't be unique; they must form a series.
Then, select the Time/Date Column, specifying the column containing time-related values. The Time Interval represents the data frequency—choose daily for daily data, yearly for annual data, etc. With Forecast Horizon, decide how many days, weeks, or years you want to predict from the last date in your dataset.
The model performs well with seasonal data patterns. If your data shows a linear growth trend, select "additive" for Seasonality Mode; for exponential growth, select "multiplicative." For example, if you see annual patterns, set Yearly Seasonality to True. (TIP: Plotting your data beforehand can help you understand these patterns.) If you're unsure, the model will attempt to detect seasonality automatically.
For daily or hourly intervals, you can access Advanced Parameters to add special dates, weekends, holidays, or limit the target value.
We are constantly enhancing our platform with new features and improving existing models. For your daily data, we've introduced some new capabilities that can significantly boost forecast accuracy. Now, you can limit your target predictions, remove outliers, and include country holidays and special events.
To set prediction limits, enter the minimum and maximum values for your target variable. For example, if you're predicting daily temperatures and know the maximum is 40°C, enter that value to prevent the model from predicting higher temperatures. This helps the model recognize the appropriate range of the Target Column. Additionally, you can use the Remove Days of the Week feature to exclude certain days from your predictions.
We added parameters for country holidays and special dates to improve model accuracy. Large deviations can occur around holidays, where stores see more customers than usual. By informing the model about these holidays, you can achieve more balanced and accurate predictions. To add holidays in Graphite Note, navigate to the advanced section of the Model Scenario and select the relevant country or countries.
Similarly, you can add promotions or events that affect your data. Enter the promotion name, start date, duration, and future dates. This ensures the model accounts for these events in future predictions.
Combining these parameters provides more accurate results. The more information the model receives, the better the predictions.
In addition to adding holidays and special events, you can delete specific data points from your dataset. In Graphite Note, enter the start and end dates of the period you want to remove. For single-day periods, enter the same start and end date. You can remove multiple periods if necessary. Understanding your data and identifying outliers or irrelevant periods is crucial for accurate predictions. Removing these dates can help eliminate biases and improve model accuracy.
By following these steps, you can harness the full potential of your Timeseries Forecast Model, providing valuable insights and more accurate predictions for your business. Now it's your turn to do some modeling and explore your results!
After setting all parameters it is time to Run Scenario and train Machine Learning model.
After running your model, review your results. The Model Performance section provides visual and numerical summaries of key metrics. You'll see values for four evaluation metrics, crucial for assessing your machine learning algorithm's performance.
R-squared determines the proportion of variance in the dependent variable that can be explained by the independent variable. MAPE (Mean absolute percentage error), MAE (Mean absolute error) and RMSE (Root mean squared error) are measures that describe the average difference between the actual and predicted value.
The results are organized into five tabs: Model Fit, Trend, Seasonality, Special Dates, and Details.
The Model Fit Tab features a graph displaying actual and predicted values. Besides the primary target value prediction (yellow line), the model shows a range of values, known as the uncertainty interval (yellow shaded area). This visualization helps you gauge your model's performance.
If you used the Sequence Identifier Field, you can choose which value to analyze in each Model Result.
Trends and seasonality are key characteristics of time-series data that should be analyzed. The Trend Tab displays a graph illustrating the global trend that Graphite Note has detected from your historical data.
Seasonality represents the repeating patterns or cycles of behavior over time. Depending on your Time Interval, you can find one or two graphs in the Seasonality Tab. For daily data, one graph shows weekly patterns, while the other shows yearly patterns. For weekly and monthly data, the graph highlights recurring patterns throughout the year.
The Special Dates graph shows the percentage effects of the special dates and holidays in historical and future data:
The Details Tab contains a comprehensive table with all the values related to the Model Fit Tab, along with additional information.
Once your model is trained zou can use to fullfill its real function and that is predicting future values for Target column and trend. Use Predict tab to set up Star date and End date for time interval for which prediction will be calculated.
after trigering predict button table with prediction results and trends will become available. Prediction results will have the same frequency as trained model. (for example if model is trained on daily data predictions will be calculated for every day in prediction interval, if model is trained monthly, predictions will be created for every month)
Beside predict option where you can mannualz enter prediction parameters in case of timeseries Start end End tade Graphite Note offers API connection that can be used as a two-way communication between Graphite Note model third party external applications. You can use the API to programmatically make predictions by passing data to a model and retrieve the prediction results in real time. Details on how to use API you can find in the REST API section.
To interact with the Graphite Note API and perform predictions, you need to make a POST request to the API endpoint. On the API tab zou will find generated code snippet containing request that needs to be sent using cURL.
The request requires the Authorization header to be included. This header should be set to "Bearer [token]
". You will replace[token]
with your unique token that can be found by accessing the account info page in the Graphite Note app, under the section displaying your current plan information.
Once your prediction model is prepared and you are using it to predict future outcome of Timeseries you can enable end users to run their own predictions with Notebook feature. Notebook is used to easaly and intuitively do your own Data Storytelling; create various visualization with detailed descriptions, plot model results for better understanding and enable users to make their own predictions. More about notebooks you can find in Notebooks - Data Storytelling section
Click on New Notebook button to open New notebook creation vizard. Choose neotebook name, description, and attributes and click Create.
You can use notebook as single place to represent any of the following data:
Text with descriptions and writen insights about data
New visualization - you can choose between different visualisations to show data from zour dataset (such as barchart, linechart,
MODEL RESULT - you can present model results containing predictions based on data feed to model throug Predict tab or API.
MODEL ACTIONABLE INSIGHT - zou can present recomended actionable insight prepared by Gen Ai.
In our case we want to expose predct option to final user similar as it is available on Predict tab. Select Model result option and you will be guided to Model result visualization creator.
Choose model you want to use and you will be guided to next step where zou will choose model result you want to include into notebook.
Choose Predict as tab you would like to include into notebook. Predict tab enables users to make their own prediction interval selections and generate predictions based on model.
Once Notebook options are saved you will get Notebook frontend screen. You can always expand zour notebook with additional text, visualisations and model results.
You can share notebook to other users by copying URL link
With the Multiclass Classification model, you can analyze the importance of the features with 2-25 distinct values. Unlike binary classification, which deals with only two classes, multiclass classification handles multiple classes simultaneously.
To achieve the best results, we will cover the basics of the Model Scenario. In this scenario, you choose parameters related to the dataset and the model.
To run the model, you need to select a Target Feature first. This target is the variable or outcome that the model aims to predict or estimate. The Target Feature should be a text-type column (not a numerical or binary column).
You will be taken to the next step where you can choose all the Model Features you want to analyze. You can select which features the model will analyze. Graphite Note will automatically exclude some features that are not suitable for the model and will provide reasons for each exclusion.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. details on Model metrics can also be found on Accuracy Overview tab.
On the performance tab, you can explore six different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Accuracy Overview, Training Results and Details.
Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.
The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.
The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for multiclass feature.
The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.
On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.
In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for multiclass classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.
Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.
Once the model is trained, you can use it to predict future values, solve multi-class classification problems, and drive business decisions. Here are ways to take action with your Multiclass Classification model:
In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Multiclass Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.
Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access
Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model.
Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the Data Storytelling section.
With the Regression model, you can see which regression matches your dataset. To get the best possible results, we will go through the basics of the Model Scenario. In Model Scenario, you select parameters related to the dataset and model.
To run the model, you have to choose a Target Feature first. The target refers to the variable or outcome that the model aims to predict or estimate. In this case, it should be a numerical column.
The next thing to do is choose all the Model Features that you want to analyze. You can choose which feature the model will be analyzed. Some of them cannot fit for model and it shows the reason for each one.
Now you can finish the process and run the scenario.
And then you have all the information about the status with the best model used and the training time.
Let's see how to interpret the results after we have run our model.
First, you have all the performance, based on the best model and its accuracy.
And then, the results consist of 5 tabs: Feature Importance, Feature Impact, Model Fit, Training Result, and Details Tabs.
To see which feature has more impact on the target, we have the Feature Importance Tab. It shows how much each feature impacts the target and on the right more details on them.
The Feature Impact Tab represents a chart where you can see some features to see how it impacts the target. So you can select the feature that you want to analyze.
The Model Fit Tab contains a graph with actual and predicted values. You can see which one is correct and incorrect. With visualization, you can see how well or poorly your model is performing.
In the Training Results Tab, you have all the information about all the models to see which one is the best, trained on 80% of the dataset and tested on the 20% left, to have the accuracy.
In the end, a table with all the values ​​related to the Model Fit Tab, with much more, can be found on the Details Tab.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
Our intelligent system observes customers' shopping behavior without getting into the nitty-gritty technical details. It watches how recently each customer made a purchase, how often they come back, and how much they spend. The system notices patterns and groups customers accordingly.
This smart system doesn't need you to say, "Anyone who spends over $1000 is a champion." It figures out on its own who the champions are by comparing all the customers to one another.
When we talk about 'champion' customers in the context of RFM analysis, we're referring to those who are the most engaged, recent, and valuable. The system's approach to finding these champions is quite intuitive yet sophisticated.
Here's how it operates:
Observation: Just like a keen observer at a social event, the system starts by watching—collecting data on when each customer last made a purchase (Recency), how often they've made purchases over a certain period (Frequency), and how much they've spent in total (Monetary).
Comparison: Next, the system compares each customer to every other customer. It looks for natural groupings—clusters of customers who exhibit similar purchasing patterns. For example, it might notice a group of customers who shop frequently, no matter the amount they spend, and another group that makes less frequent but more high-value purchases.
Group Formation: Without being told what criteria to use, the system uses the data to form groups. Customers with the most recent purchases, highest frequency, and highest monetary value start to emerge as one group—these are your potential 'champions.' The system does this by measuring the 'distance' between customers in terms of RFM factors, grouping those who are closest together in their purchasing behavior.
Adjustment: The system then iterates, refining the groups by moving customers until the groups are as distinct and cohesive as possible. It's a process of adjustment and readjustment, seeking out the pattern that best fits the natural divisions in the data.
Finalization: Once the system settles on the best grouping, it has effectively ranked customers, identifying those who are the most valuable across all three RFM dimensions. These are your 'champions,' but the system also recognizes other groups, like new customers who've made a big initial purchase or long-time customers who buy less frequently but consistently.
By using this method, the system takes on the complex task of understanding the many ways customers can be valuable to a business. It provides a nuanced view that goes beyond simple categorizations, recognizing the diversity of customer value. The result is a highly tailored strategy for customer engagement that aligns perfectly with the actual behaviors observed, allowing businesses to interact more effectively with each segment, especially the 'champions' who drive a significant portion of revenue.
Here’s why this machine learning approach is more powerful than manual labeling:
Adaptive Learning: The system continuously learns and adapts based on actual behavior, not on pre-set rules that might miss the nuances of how customers are interacting right now.
Time Efficiency: It saves you a mountain of time. No more going through lists of transactions manually to score each customer. The system does it instantly.
Personalized Grouping: Because it’s based on actual behavior, the system creates groups that are tailor-made for your specific customer base and business model, rather than relying on broad, one-size-fits-all categories.
Scalability: Whether you have a hundred customers or a million, this smart system can handle the job. Manual scoring becomes impractical as your customer base grows.
Unbiased Decisions: The system is objective, based purely on data. There’s no risk of human bias that might categorize customers based on assumptions or incomplete information.
In essence, this smart approach to customer grouping helps businesses focus their energy where it counts, creating a personalized experience for each customer, just like a thoughtful host at a party who knows exactly who likes what. It’s about making everyone feel special without having to ask them a single question.
In the RFM model in Graphite Note, the intelligent system categorizes customers into segments based on their Recency (R), Frequency (F), and Monetary (M) values, assigning scores from 0 to 4 for each of these three dimensions. With five scoring options for each RFM category (including the '0' score), this creates a comprehensive grid of potential combinations—resulting in a total of 125 unique segments (5 options for R x 5 options for F x 5 options for M = 125 segments).
This segmentation allows for a high degree of specificity. Each customer falls into a segment that accurately reflects their interaction with the business. For example, a customer who recently made a purchase (high Recency), buys often (high Frequency), and spends a lot (high Monetary) could fall into a segment scored as 4-4-4. This would indicate a highly valuable 'champion' customer.
On the other hand, a customer who made a purchase a long time ago (low Recency), buys infrequently (low Frequency), but when they do buy, they spend a significant amount (high Monetary), might be scored as 0-0-4, placing them in a different segment that suggests a different engagement strategy.
By scoring customers on a scale from 0 to 4 across all three dimensions, the business can pinpoint exact customer profiles. This precision allows for highly tailored marketing strategies. For example, those in the highest scoring segments might receive exclusive offers as a reward for their loyalty, while those in segments with room for growth might be targeted with re-engagement campaigns.
The use of 125 segments ensures that the business can differentiate not just between generally good and poor customers, but between various shades of customer behavior, tailoring approaches to nurture the potential value of each unique segment. This granularity facilitates nuanced understanding and actionability for marketing, sales, and customer relationship management.
Wouldn't be great to tailor your marketing strategy regarding identified groups of customers? That way, you can target each group with personalized offers, increase profit, improve unit economics, etc.
RFM Customer Segmentation Model identifies customers based on three key factors:
Recency - how long it’s been since a customer bought something from you or visited your website
Frequency - how often a customer buys from you, or how often he visits your website
Monetary - the average spend of a customer per visit, or the overall transaction value in a given period
Let's go through the RFM analysis inside Graphite Note. The dataset on which you will run your RFM Model must contain a time-related column, given that this report studies customer behavior over some time.
We need to distinguish all customers, so we need an identifier variable like Customer ID.
If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.
Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).
That's it, you are ready to run your first RFM Model.
As we now know how to run RFM model analysis in Graphite Note, let's go through the Model Results. The results consist of 7 tabs: RFM Scores, RFM Analysis, Recency, Frequency, Monetary, RFM Matrix, and Details Tabs. All results are visualized because a visual summary of information makes it easier to identify patterns than looking through thousands of rows.
On the RFM Scores Tab, we have an overview of the customers and their scores:
Then you have a ranking of each RFM segment (125 of them) represented in a table.
And finally, a chart showing the number of customers per RFM score.
RFM model analysis ranks every customer in each of these three categories on a scale of 0 (worst) to 4 (best). After that, we assign an RFM score to each customer, by concatenating his numbers for Recency, Frequency, and Monetary value. Depending upon their RFM score, customers can be segregated into the following categories:
lost customer
hibernating customer
can-not-lose customer
at-risk customer
about-to-sleep customer
need-attention customer
promising customer
new customer
potential loyal customer
loyal customer
champion customer.
All information related to these groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the RFM Analysis Tab.
There is also a table at the end to summarize everything.
According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups:
lost
lapsing
average activity
active
very active.
In the Recency Tab, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.
As Frequency is defined as the total number of purchases, customers can buy:
very rarely
rarely
regullary
frequently
very frequently.
In the Frequency Tab, you can track down the same behavior of the related groups, as with the Recency Tab.
Monetary is defined as the amount of money the customer spent, so the customer can be a :
very low spender
low spender
medium spender
high spender
very high spender.
In the Monetary Tabs, you can track down the same behavior of the related groups, as with the Recency Tab.
The RFM Matrix Tab represents a matrix, showing the number of customers, monetary sum and average, average frequency, and average recency (with breakdown by Recency, Frequency, and Monetary segments).
All the values related to the first five tabs, with much more, can be found on the Details Tab, in the form of a table.
The RFM model columns outlined in your system provide a structured way to understand and leverage customer purchase behavior. Here’s how each column benefits the end user of the model:
Monetary: Indicates the total revenue a customer has generated. This helps prioritize customers who have contributed most to your revenue.
Avg_monetary: Shows the average spend per transaction. This can be used to gauge the spending level of different customer segments and tailor offers to match their spending habits.
Frequency: Reflects how often a customer purchases. This can inform retention strategies and indicate who might be receptive to more frequent communication.
Recency: Measures the time since the last purchase. This can help target re-engagement campaigns to customers who have recently interacted with your business.
Date_of_last_purchase & Date_of_first_purchase: These dates help track the customer lifecycle and can trigger communications at critical milestones.
Customer_age_days: The duration of the customer relationship. Long-standing customers might benefit from loyalty programs, while newer customers might be encouraged with welcome offers.
Recency_cluster, Frequency_cluster, and Monetary_cluster: These categorizations allow for segmentation at a granular level, helping customize strategies for groups of customers who share similar characteristics.
Rfm_cluster: This overall grouping combines recency, frequency, and monetary values, offering a holistic view of a customer's value and engagement, essential for creating differentiated customer journeys.
Recency_segment_name, Frequency_segment_name, and Monetary_segment_name: These descriptive labels provide intuitive insights into customer behavior and make it easier to understand the significance of each cluster for strategic planning.
Fm_cluster_sum: This score is a combined metric of frequency and monetary clusters, useful in prioritizing customers who are both frequent shoppers and high spenders.
Fm_segment_name and Rfm_segment_name: These labels offer a quick reference to the type of customer segment, simplifying the task of identifying and applying targeted marketing actions.
Seeking assurance about the model's accuracy and effectiveness? Here's how you can address these concerns:
Validation with Historical Data: Show how the model’s predictions align with actual customer behaviors observed historically. For instance, demonstrate how high RFM scores correlate with customers who have proven to be valuable.
Segmentation Analysis: Analyze the characteristics of customers within each RFM segment to validate that they make sense. For example, your top-tier RFM segment should clearly consist of customers who are recent, frequent, and high-spending.
Control Groups: Create control groups to test marketing strategies on different RFM segments and compare the outcomes. This can validate the effectiveness of segment-specific strategies suggested by the model.
A/B Testing: Implement A/B testing where different marketing approaches are applied to similar customer segments to see which performs better, thereby showcasing the model's utility in identifying the right targets for different strategies.
Benchmarking: Compare the RFM model’s performance against other segmentation models or against industry benchmarks to establish its effectiveness.
Detecting early signs of reduced customer engagement is pivotal for businesses aiming to maintain loyalty. A notable signal of this disengagement is when a customer's once regular purchasing pattern starts to taper off, leading to a significant decrease in activity. Early detection of such trends allows marketing teams to take swift, proactive measures. By deploying effective retention strategies, such as offering tailored promotions or engaging in personalized communication, businesses can reinvigorate customer interest and mitigate the risk of losing them to competitors.
Our objective is to utilize a model that not only alerts us to customers with an increased likelihood of churn but also forecasts their potential purchasing activity and, importantly, estimates the total value they are likely to bring to the business over time.
These analytical needs are served by what is known in data science as Buy 'Til You Die (BTYD) models. These models track the lifecycle of a customer's interaction with a business, from the initial purchase to the last.
While customer churn models are well-established within contractual business settings, where customers are bound by the terms of service agreements, and churn risk can be anticipated as contracts draw to a close, non-contractual environments present a different challenge. In such settings, there are no defined end points to signal churn risk, making traditional classification models insufficient.
To address this complexity, our model adopts a probabilistic approach to customer behavior analysis, which does not rely on fixed contract terms but on behavioral patterns and statistical assumptions. By doing so, we can discern the likelihood of future transactions for every customer, providing a comprehensive and predictive understanding of customer engagement and value.
The Customer Lifetime Value (CLV) model is a robust tool employed to ascertain the projected revenue a customer will contribute over their entire relationship with a business. The model employs historical data to inform predictive assessments, offering valuable foresight for strategic decision-making. This insight assists companies in prioritizing resources and tailoring customer engagement strategies to maximize long-term profitability.
The CLV model executes a series of sophisticated calculations. Yet, its operations can be conceptualized in a straightforward manner:
Historical Analysis: The model comprehensively evaluates past customer transaction data, noting the frequency and monetary value of purchases alongside the tenure of the customer relationship.
Engagement Probability: It assesses the likelihood of a customer’s future engagement based on their past activities, effectively estimating the chances of a customer continuing to transact with the business.
Forecasting: With the accumulated data, the model projects the customer’s future transaction behavior, predicting how often they will make purchases and the potential value of these purchases.
Lifetime Value Calculation: Integrating these elements, the model calculates an aggregate figure representing the total expected revenue from a customer for a designated future period.
The Customer Lifetime Value model uses historical customer data to predict the future value a customer will generate for a business. It leverages algorithms and statistical techniques to analyze customer behavior, purchase patterns, and other relevant factors to estimate the potential revenue a customer will bring over their lifetime.
The dataset on which you will run your model must contain a time-related column.
We need to distinguish all customers, so we need an identifier variable like Customer ID. If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.
We need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).
Finally, you need to choose the Starting Date from which you'd like to calculate this model for your dataset.
When you're looking at this option for calculating Customer Lifetime Value (CLV), think of it as setting a starting line for a race. The "race" in this case is the journey you're tracking: how much your customers will spend over time.
The "Starting Date for Customer Lifetime Value Calculation" is basically asking you when you want to start watching the race. You have a couple of choices:
Max Date: This is like saying, "I want to start watching the race from the last time we recorded someone crossing the line." It sets the starting point at the most recent date in your records where a customer made a purchase.
Today: Choosing this means you want to start tracking from right now, today. So any purchases made after today will count towards the CLV.
-- select date --: This would be an option if you want to pick a specific date to start from, other than today or the most recent date in your data.
Let's see how to interpret the results after we have run our model.
And then, the results consist of 2 tabs: CLV Insights and Details Tabs.
On the summary of repeat customers, we have:
the Total Repeat Customers: the customers came that keep returning (the loyal customers)
the Total Historical Amount: the past earnings from loyal customers
the Average Spend per Repeat Customer
the Average no. of Repeat Purchases: shows the customers' loyalty with the average number of repeat purchases
the Average Probability Alive Next 90 days: estimate the likelihood that a customer stays alive or active for their business in the next 90 days
the Predicted no. of Purchases next 90 days: the number of purchases you can expect the next 90 days based on our analysis
Predicted Amount Next 90 days: the revenue you can expect the next 90 days with our predicted amount feature
CLV Customer Lifetime Value: average revenue that one customer generated in the past and will generate in the future
The CLV Insights Tab shows some charts on the lifetime of customers.
The forecasted number of purchases chart estimates the number of purchases that are expected to be made by returning customers over a specific period.
The forecasted amount chart is a graphical representation of the projected value of purchases to be made by returning customers over a certain period.
Finally, the average alive probability chart illustrates the average probability of a customer remaining active for a business over time, assuming no repeat purchases.
Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
You have all the information in each column if you click on the link on the details tab.
The Details Tab within the Customer Lifetime Value Model offers an extensive breakdown of metrics for in-depth analysis. Each column represents a specific aspect of customer data that is pivotal to understanding and predicting customer behavior and value to your business. Below are the descriptions of the available columns:
amount_sum
Description: This column showcases the total historical revenue generated by an individual customer. By analyzing this data, businesses can identify high-value customers and allocate marketing resources efficiently.
amount_count
Description: Reflects the total number of purchases by a customer. This frequency metric is invaluable for loyalty assessments and can inform retention strategies.
repeated_frequency
Description: Indicates the frequency of repeated purchases, highlighting customer loyalty. This metric can be leveraged for targeted engagement campaigns.
customer_age
Description: The duration of the customer's relationship with the business, measured in days since their first purchase. It helps in segmenting customers based on the length of the relationship.
average_monetary
Description: Average monetary value per purchase, providing insight into customer spending habits. Businesses can use this to predict future revenue from a customer segment.
probability_alive
Description: Displays the current probability of a customer being active. A score of 1 means 100%, the customer is likely active, aiding in prioritizing engagement efforts.
probability_alive_7_30_60_90_365
Description: This column shows the probability of customers remaining active over various time frames without repeat purchases. It's critical for developing tailored customer retention plans.
predicted_no_purchases_7_30_60_90_365
Description: Predicts the number of future purchases within specific time frames. This forecast is essential for inventory planning and sales forecasting.
CVL_30_60_90_365
Description: Estimates potential customer value over different time frames, aiding in strategic financial planning and budget allocation for customer acquisition and retention.
In this given example, we have a snapshot of customer data from the CLV model. The model considers various unique aspects of customer behavior to predict future engagement and value. Let's analyze the key data points and what they signify in a non-technical way, while emphasizing the model’s ability to tailor predictions to individual customer behavior:
amount_sum: This customer has brought in a total revenue of $4,584.14 to your business.
amount_count: They have made 108 purchases, which shows a high level of engagement with your store.
repeated_frequency: Out of these purchases, 106 are repeat purchases, suggesting a strong customer loyalty.
customer_age: They have been a customer for 364 days, indicating a relatively long-term relationship with your business.
average_monetary: On average, they spend about $42.73 per transaction.
probability_alive: There’s an 85% to 86% chance that they are still actively engaging with your business, which is quite high.
probability_alive_7: Specifically, the probability that this customer will remain active in the next 7 days is about 44.48%.
Alex, with a remarkable 106 repeated purchases and a customer_age of 364 days, has shown a pattern of strong and consistent engagement. The average monetary value of their purchases is $42.73, contributing significantly to the revenue with a total amount_sum of $4,584.14. The current probability_alive is high, indicating Alex is likely still shopping.
However, even with this consistent past behavior, the probability_alive_7 drops to about 44.48%. It highlights a nuanced understanding of Alex's habits; a sudden change in their routine is notable, which is why the model predicts a more significant impact if Alex were to alter their shopping pattern even slightly.
On the other hand, we have Casey, who has made 2 purchases, with only 1 being a repeated transaction. Casey’s amount_sum is $185.93, with an average_monetary value of $84.44, and a customer_age of 135 days. Despite a high current probability_alive, the model shows a minimal decline to 83.73% in the probability_alive_7.
This slight decrease tells us that Casey's engagement is inherently more sporadic. The business doesn't expect Casey to make purchases with the same regularity as Alex. If Casey doesn't return for a week, it isn't alarming or out of character, as reflected in the gentle decline in their seven-day active probability.
The contrast in these profiles, painted by the CLV model, enables the business to craft distinct customer journeys for Alex and Casey. For Alex, it's about ensuring consistency and rewarding loyalty to maintain that habitual engagement. Perhaps an automated alert for engagement opportunities could be set up if they don't make their usual purchases.
For Casey, the strategy may involve creating moments that encourage repeat engagement, possibly through sporadic yet impactful touchpoints. Since Casey's behavior suggests openness to larger purchases, albeit less frequently, the focus could be on highlighting high-value items or exclusive offers that align with their sporadic engagement pattern.
The CLV model's behavioral predictions allow the business to personalize customer experiences, maximize the potential of each interaction, and strategically allocate resources to maintain and grow the value of each customer relationship over time. This bespoke approach is the essence of modern customer relationship management, as it aligns perfectly with the individualized tendencies of customers like Alex and Casey.
This detailed data is a treasure trove for businesses keen on data-driven decision-making. Here’s how to utilize the information effectively:
Custom Segmentation: Use customer_age
, amount_sum
, and average_monetary
to segment your customers into meaningful groups.
Detect Churners: Use probability_alive
to segment customers currently being active for non contractual business like eCommerce and Retail. A score of 0.1 means 10% probability the customer is active ("alive") for your business.
Targeted Marketing Campaigns: Leverage repeated_frequency
and probability_alive
columns to identify customers for loyalty programs or re-engagement campaigns.
Revenue Projections: The CVL_30_60_90_365
column helps in projecting future revenue and understanding the long-term value of customer segments.
Strategic Planning: Use predicted_no_purchases_7_30_60_90_365
to plan for demand, stock management, and to set realistic sales targets.
By engaging with the columns in the Details Tab, users can extract actionable insights that can drive strategies aimed at optimizing customer lifetime value. Each metric can serve as a building block for a more nuanced, data-driven approach to customer relationship management.
Do you wonder if the changes that you’ve made in your business impacted new customers or do you want to understand the needs of your user base or identify trends? That and much more you can do with our new model, Customer Cohort Analysis.
A cohort is a subset of users or customers grouped by common characteristics or by their first purchase date. Cohort analysis is a type of behavioral analytics that allows you to track and compare the performance of cohorts over time.
With Graphite, you are only a few steps away from your Cohort model. Once you have selected your dataset, it is time to enter the parameters into the model. The Time/Date Column represents a time-related column.
After that, you have to select the Aggregation level.
For example, if monthly aggregation is selected, Graphite will generate Cohort Analysis with a monthly frequency.
Also, your dataset must contain Customer ID and Order ID/ Trx ID columns as required model parameters.
Last but not least, you have to select the Monetary (amount spent) variable, which represents the main parameter for your Cohort Analysis.
Additionally, you can break down and filter Cohorts by a business dimension (variable) which you select after you enable the checkbox.
That's it, your first Customer Cohort Analysis model is ready.
After you run your model, the first tab that appears is the Cohorts Tab.
Depending on the metric (the default is No of Customers), the results are presented through a graphic representation of her heatmap and the heatmap.
In the example above, groups of customers are grouped by year when they made their first purchase. Column 0 represents the number of customers per cohort (i.e. 4255 customers made their first purchase in 2018). Now we can see their activity year to year: 799 customers came back in 2019, 685 in 2020, and 118 in 2021.
If you switch your metric to Percentage, you will get results in percentages.
Let's track our Monetary column (in our case total amount spent per customer) and switch metric to Amount to see how much money our customers spend through the years.
As you can see above, customers that made their first order in 2018 have spent 46.25M, and 799 customers that came back in 2019 have spent 12.38M. I
In case you want to track the total amount spent through the years, switch metric to Amount (Cumulative).
Basically, we tracked the long-term relationships that we have for our given groups (cohorts). On the other hand, we can compare different cohorts at the same stage in their lifetime. For example, for all the cohorts, we can see how much the average revenue per customer two years after they made their first purchase: the average revenue per customer in the cohort from 2019 (12.02K) is almost half less than from 2018 (21.05K). Here is an opportunity to see what went wrong and make a new business strategy.
In case you broke down and filtered cohorts by a variable with less than 20 distinct values (parameter Repeat by in Model Scenario), for each value you will get a separate Cohort Analysis in the Repeat by Tab.
All the values related to the Cohorts and Repeat by Tabs, with much more, can be found on the Details Tab, in the form of a table.
Now it's your turn to track your customer's behavior, see when is the best time for remarketing, and how to improve customer retention.
Now we will go through Model Results, which consists of 3 tabs: , , and .
We are going to see different metrics such as the , the , the , and the , but there are 3 more metrics: the Average Order Value, the Cumulative Average Order Value, and the Average Revenue per Customer.
Often companies spend a lot of time managing items/entities that have a low contribution to the profit margin. Every item/entity inside your shop does not have equal value - some of them cost more, some are used more frequently, and some are both. This is where the ABC Pareto analysis steps in, which helps companies to focus on the right items/entities.
ABC analysis is a classification method in which items/entities are divided into three categories, A, B, and C.
Category A is typically the smallest category and consists of the most important items/entities ('the vital few'),
while category C is the largest category and consists of least valuable items/entities ('the trivial many').
So far this is the simplest model, i.e. only 2 columns are needed in your dataset.
You have to identify:
the ID column
the numeric column
An ID column in your dataset is usually a Product ID or name, SKU, etc. Based on the selected values, the data will be grouped by that column.
After that, you have to select the numeric column (feature) which represents the value of the ID column (for example, product/customer revenue or the number of sold units,...).
Since ABC inventory analysis divides items into 3 categories, let's analyze these categories by checking the Model Results. The results consist of 3 tabs: ABC Summary, Pareto Chart, and Details Tabs.
If we take a look at the ABC Summary Tab, we can see two pie charts - on the first one we can see the percentage of items in each category, while on the other one, we can see the total value (revenue) of each category.
In the picture above, we can see that 31.99% of the items belong to category A and they represent 68.88% of the total value, meaning the biggest profit comes from the items in category A!
The ABC analysis, also called Pareto analysis, is based on the Pareto principle, which says that 80% of the results (output) come from 20% of the efforts (input). The Pareto Chart is a combination of a bar and a line graph - it contains both bars and lines, where each bar represents an item/entity in descending order, while the height of the bar represents the value of the item/entity. The curved orange line represents the cumulative percentage of the item/entity.
Finally, you have all the information on each entity on a table in the Details Tab.
There is a long list of benefits from including ABC analysis in your business, such as improved inventory optimization and forecasting, reduced storage expenses, strategic pricing of the products, etc. With Graphite, all you have to do is upload your data, create the desired model, and explore the results.
Let’s go through their basic characteristics.
New customers are:
forming the foundation of your customer base
telling you if your marketing campaigns are working (improving current offerings, what to add to your repertoire of products or services)
while returning customers are:
giving you feedback on your business (if you have a high number of returning customers it suggests that customers are finding value in your products or service)
saving you a lot of time, effort, and money.
Let's go through the New vs returning customer analysis inside Graphite. The dataset on which you will run your model must contain a time-related column.
Since the dataset contains data for a certain period, it's important to choose the aggregation level.
For example, if weekly aggregation is selected, Graphite will generate a new vs returning customers dataset with a weekly frequency.
It is necessary to contain data such as Customer ID
Additionally, if you want, you can choose the Monetary (amount spent) variable.
With Graphite, compare absolute figures and percentages, and learn how many customers you are currently retaining on a daily, weekly, or monthly basis.
Depending on the aggregation level, you can see the number of distinct and returning customers detected in the period on the New vs Returning Tab.
For example, in December 2020, there were a total of 2.88k customers, of which 1.84K were new and 1.05K returning. You can also choose a daily representation that is more precise.
If you are interested in retention, the percentage of your returning customers, through a period, use the Retention % Tab.
Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
With General segmentation, you can find out hidden similarities between the data, such as the similarity between the price of the product or services provided to the purchasing history of the customers. It's an unsupervised algorithm that segments the data into groups, based on some kind of similarity between the numerical variables.
So let's see how you can run this model in Graphite. Firstly, you have to identify an ID column - that way you can identify the customer or product within the groups. After that, you have to select the numeric columns (features) from your dataset on which the segmentation will be based.
In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!
As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values ​​of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph.
For example, in the picture above, you can see that customers in Cluster0 have the highest average value of the Spending Score, unlike the customers in Cluster3.
Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.
You can also have a Cluster Visualization Tab that shows the link between two arguments and how they are distributed.
You can change the measures to see different cluster and their distribution.
The devil is in the details - details are important, so be conscientious and pay attention to the small things. Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling.
In this report, we want to divide customers into returning and new customers (this is the most fundamental type of ). The new customers have made only one purchase from your business, while the returning ones have made more than one.
The model results consist of 4 tabs: , , , and Tab.
The results in the Revenue New vs Returning Tab depend on the Model Scenario: if you have selected a monetary variable in the , you can observe her behavior, depending on the new and returning customers.
Now we move to the tricky part, data preprocessing! We will rarely come across high-quality data - for the model to give the best possible results, we must do some and transformation. What to do with the missing values? You can either remove them or replace them with the corresponding value, such as the mean value or prediction. For example, let's suppose you have chosen Age and Height as numeric columns. The values of the variable Age range between 10 and 80, while the Height is between 100 and 210. The algorithm can give more importance to the Height variable, because it has higher values than Age - in case you decide to transform/scale your data, you can either standardize or normalize it.
Let's see how to interpret the results after we have run our model. The results consist of 5 tabs: , , , and Tabs.