LogoLogo
Log InSign UpHomepage
  • 👋Welcome
  • Account and Team Setup
    • Sign up
    • Subscription Plans
    • Profile information
    • Account information
    • Roles
    • Users
    • Tags
  • FAQ
  • UNDERSTANDING MACHINE LEARNING
    • What is Graphite Note
      • Graphite Note Insights Lifecycle
    • Introduction to Machine Learning
      • What is Machine Learning
      • Data Analitycs Maturity
    • Machine Learning concepts
      • Key Drivers
      • Confusion Matrix
      • Supervised vs Unsupervised ML
  • Demo datasets
    • Demo Datasets
      • Ads
      • Churn
      • CO2 Emission
      • Diamonds
      • eCommerce Orders
      • Housing Prices
      • Lead Scoring
      • Mall Customers
      • Marketing Mix
      • Car Sales
      • Store Item Demand
      • Upsell
    • What Dataset do I need for my use case?
      • Predict Cross Selling: Dataset
      • Predict Customer Churn: Dataset
      • Predictive Lead Scoring: Dataset
      • Predict Revenue : Dataset
      • Product Demand Forecast: Dataset
      • Predictive Ads Performance: Dataset
      • Media Mix Modeling (MMM): Dataset
      • Customer Lifetime Value Prediction : Dataset
      • RFM Customer Segmentation : Dataset
    • Dataset examples - from online sources
      • Free datasets for Machine Learning
  • Datasets
    • Introduction
    • Prepare your Data
      • Data Labeling
      • Expanding datasets
      • Merging datasets
      • CSV File creating and formatting
    • Data sources in Graphite Note
      • Import data from CSV file
        • Re-upload or append CSV
        • CSV upload troubleshooting tips
      • MySQL Connector
      • MariaDB Connector
      • PostgreSQL Connector
      • Redshift Connector
      • Big Query Connector
      • MS SQL Connector
      • Oracle Connector
  • Models
    • Introduction
    • Preprocessing Data
    • Machine Learning Models
      • Timeseries Forecast
      • Binary Classification
      • Multiclass Classification
      • Regression
      • General Segmentation
      • RFM Customer Segmentation
      • Customer Lifetime Value
      • Customer Cohort Analysis
      • ABC Pareto Analysis
      • New vs Returning Customers
    • Predict with ML Models
    • Overview and Model Health Check
    • Advanced parameters in ML Models
    • Actionable insights in ML Models
    • Improve your ML Models
  • Notebooks
    • What is Notebook?
    • My first Notebook
    • Data Visualization
  • REST API
    • API Introduction
    • Dataset API
      • Create
      • Fill
      • Complete
    • Prediction API
      • Quickstart
      • Request
        • Headers
        • Payload
        • Data
      • Response
        • Response Structure
      • API Limits
    • Model Results API
      • Quickstart
      • Request
        • Headers
        • Body
      • Response
      • Usage Notes
      • Code Examples
Powered by GitBook
On this page
  • Model Scenario
  • Model Results
  • Cluster Summary
  • By Cluster and By Numeric Value
  • Cluster Visualization
  • Details

Was this helpful?

Export as PDF
  1. Models
  2. Machine Learning Models

General Segmentation

PreviousRegressionNextRFM Customer Segmentation

Last updated 6 months ago

Was this helpful?

Model Scenario

With General Segmentation, you can uncover hidden similarities in data, such as the relationship between product prices and customer purchase histories. This unsupervised algorithm groups data based on similarities among numerical variables.


To run this model in Graphite, first identify an ID column to distinguish between values (e.g., customers or products within groups). Next, select the numeric columns (features) from your dataset for segmentation.

Now comes the tricky part: data preprocessing! We rarely encounter high-quality data, so we must clean and transform it for optimal model results. What should you do with missing values? Either remove them or replace them with relevant values, such as the mean or a prediction.

For instance, if you have chosen Age and Height as numeric columns, Age might range between 10 and 80, while Height could range from 100 to 210. The algorithm could prioritize Height due to its higher values. To avoid this, you should transform/scale your data; consider standardizing or normalizing it.

In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!


After reviewing all the steps, you can finish and Run Scenario. The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

Model Results


Cluster Summary

As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values ​​of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph.

For example, in the picture above, you can see that customers in Cluster2 have the highest average value of the Total spend, unlike the customers in Cluster0.

By Cluster and By Numeric Value

Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.


Cluster Visualization

You can also have a Cluster Visualization Tab that shows the link between two arguments and how they are distributed. You can change the measures to see different cluster and their distribution.


Details

Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling.


Let's see how to interpret the results after we have run our model. The results consist of 5 tabs: , , , and Tabs.

Cluster Summary
By Cluster, By Numeric Value
Cluster Visualization
Details
General Segmentation
Cluster summary tab
Summary table
Explore by Cluster tab
Explore by Numeric Value tab