General Segmentation Model

Model Scenario

With General segmentation, you can find out hidden similarities between the data, such as the similarity between the price of the product or services provided to the purchasing history of the customers. It's an unsupervised algorithm that segments the data into groups, based on some kind of similarity between the numerical variables.

So let's see how you can run this model in Graphite. Firstly, you have to identify an ID column - that way you can identify the customer or product within the groups. After that, you have to select the numeric columns (features) from your dataset on which the segmentation will be based.

Now we move to the tricky part, data preprocessing! We will rarely come across high-quality data - for the model to give the best possible results, we must do some data cleaning and transformation. What to do with the missing values? You can either remove them or replace them with the corresponding value, such as the mean value or prediction. For example, let's suppose you have chosen Age and Height as numeric columns. The values of the variable Age range between 10 and 80, while the Height is between 100 and 210. The algorithm can give more importance to the Height variable, because it has higher values than Age - in case you decide to transform/scale your data, you can either standardize or normalize it.

In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!

Model Results

Let's see how to interpret the results after we have run our model. The results consist of 5 tabs: Cluster Summary, By Cluster, By Numeric Value, Cluster Visualization, and Details Tabs.

Cluster Summary

As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values โ€‹โ€‹of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph.

For example, in the picture above, you can see that customers in Cluster0 have the highest average value of the Spending Score, unlike the customers in Cluster3.

By Cluster and By Numeric Value

Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.

Cluster Visualization

You can also have a Cluster Visualization Tab that shows the link between two arguments and how they are distributed.

You can change the measures to see different cluster and their distribution.


The devil is in the details - details are important, so be conscientious and pay attention to the small things. Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling.

Last updated