Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
data prep
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
To sign up you must create an account, if you already have an account you can log in.
For every lexical terms you can check on the machine learning glossary.
Welcome to the Graphite Note documentation portal. These guides will show you how to predict, visualize, and analyze your data using machine learning with no code. Graphite Note is a powerful tool designed to democratize the power of data analysis and machine learning, making it accessible to individuals and teams of all skill levels. Whether you're a marketer looking to segment your audience, a sales team predicting lead conversions, or an operations manager forecasting product demand, Graphite Note is your go-to platform.
The platform is built with a user-friendly interface that allows you to connect your data, generate predictive models, and share your results with just a few clicks. It's not just about making predictions; it's about understanding them. With Graphite Note's built-in prescriptive analytics and data storytelling feature, you can transform complex data into meaningful narratives that drive strategic actions. This documentation will guide you through every step of the process, from setting up your account to making your first prediction. So let's dive in and start exploring the power of no-code machine learning with Graphite Note.
For every lexical terms you can check on the machine learning glossary.
Each user in a team can have a role that you can see on the “users page” (more information on Team setup). There are two types of roles :
Administrator: can read and modify entities
Viewer: can only read entities, no editing allowed
The details are accessible through the Account tab in the top-right corner of Graphite Note, followed by selecting the Roles option from the drop-down menu.
If you want to change the properties of the viewer mode, you can click on the gear wheel to edit it.
You can now not allow access to some modules or give the possibility of editing them. You have also the possibility of deleting the role.
Create a Regression model on Demo Ads dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create a machine learning model. In this case we will select "Ads" dataset to create a Regression Analysis on marketing ads data.
3. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.
4. Adjust your dataset options on the Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create a new model in the Graphite Note main menu click on "Models".
6. You will get list of available models. Click on "New Model" to create a new one.
7. Select model type from our templates. In our case, we will select "Regression" by double clicking on its name.
8. Select dataset you want to use to produce a model. We will use "Demo-Ads.csv."
9. Name your new model. We will call it "Regression on Demo-Ads".
10. Write the description of the model and select tag. If you want you can also create a new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up a "Regression Model", first you will need to define the "Target Feature". That is a numeric column from your dataset that you'd like to make predictions about. In the case of Regression on Ads dataset, target feature is "Clicks" column.
13. Click "Next" to get the list of model features that will be included in the scenario. Model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.
14. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.
16. Explore the Regression Model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how the model is trained and set up.
17. If you want to take turn model into action click on "Predict" tab in the main model menu.
18. You can produce your own "What-If" analysis based on existing training results. You can also import a fresh CSV dataset into the data model, to make predictions on target column. In our case that is "Clicks". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour, and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
There are 3 for Graphite Note, to different statuses: individuals, teams, or enterprises.
To help you choose which plan matches your project, you can start with a free trial, schedule a demo, or book a meeting with an expert.
Create Regression model on Demo CO2 Emission dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create the machine learning model. In this case, we will select CO2 Car Emissions dataset to create Regression Analysis on car emissions data.
3. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.
4. Adjust your dataset options on Settings tab. Click the Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create a new model in the Graphite Note main menu click on "Models"
6. You will get list of available models. Click on "New Model" to create new one.
7. Select model type from our templates. In our case we will select "Regression" by double clicking on its name.
8. Select the dataset you want to use to produce the model. We will use "Demo-CO2-Car-Emissions-Canada.csv"
9. Name your new model. We will call it "Regression on Demo-CO2-Car-Emissions"
10. Write the model description and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up the Regression model first, you need to define "Target Feature". That is numeric column from your dataset that you'd like to make predictions about. In case of Regression on car emissions dataset target feature is "CO2 Emissions(g/km)" column.
13. Click "Next" to get the list of model features that will be included into scenario. Model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.
14. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on the "Performance" tab to get model insights and view the Key Drivers.
16. Explore the Regression model by clicking on Impact Analysis and Training Results, to get more insights on how model is trained.
17. If you want to take your model into action, click on the "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset that the model will use to make predictions on the target column. In our case that is "CO2 Emissions (g/km)". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Create Multi-Class Classification model on Demo Diamonds dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create machine learning model. In this case we will select a Diamonds dataset to create Multi Class Analysis on diamond characteristics data.
3. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.
4. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create new model in the Graphite Note main menu click on "Models"
6. You will get list of available models. Click on "New Model" to create a new one.
7. Select a model type from our templates. In our case we will select "Multi-Class Classification" by double clicking on its name.
8. Select the dataset you want to use to produce model. We will use "Demo-Diamonds.csv"
9. Name your new model. We will call it "Multi-Class Classification on Demo-Diamonds"
10. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up a Multi Class model first you need to define "Target Feature". That is text column from your dataset you'd like to make predictions about. In case of Multi Class on Diamonds dataset target feature is "Cut" column.
13. Click "Next" to get the list of model features that will be included under the scenario. Model relies on each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.
14. To start training model click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.
16. Explore Multi Class model by clicking on Impact Analysis, Model Fit, Accuracy Overview or Training Results to get more insights on how model is trained and set up.
17. If you want to take your model into action click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on target column. In our case that is "Cut". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
You can access to the Account information page by clicking on the Account tab in the top-right of Graphite Note, and then the Account info drop-down item.
This page features your personal information, including your team name, your current plan and the usage of it (data still available and the number of user). You can rename your team and contact sales to change your plan.
Create RFM Customer Segmentation on Demo eCommerce Orders dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create machine learning model. In this case we will select eCommerce Orders dataset to create RFM Customer Segmentation (Recency, Frequency, Monetary Value) analysis on ecommerce orders data.
3. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.
4. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore the dataset details on Summary tab.
5. To create new model in the Graphite Note main menu click on "Models"
6. You will get list of available models. Click on "New Model" to create new one.
7. Select a model type from our templates. In our case we will select "RFM Customer Segmentation" by double clicking on its name.
8. Select dataset you want to use to produce model. We will use "Demo-eCommerce-Orders.csv".
9. Name your new model. We will call it "RFM customer segmentation on Demo-eCommerce-Orders".
10. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. Click this text field.
13. To set up RFM model, you first need to identify and define few parameters. These are: "Time /Date Column", "Customer ID", "Customer Name" (optional) and "Monetary" (amount spent). In our case we will select "created_at" as date, "user_id" as customer and "total" as monetary parameter.
14. To start training model click "Run scenario".
15. Wait for a few moments and Voilà! Your RFM Customer Segmentation model is trained. Click on the "Results" tab to get model insights.
16. You can navigate over different tabs to get deep insights into RFM analysis from different perspectives: Recency, Frequency, Monetary.
17. Tab "RFM Scores" shows detailed explanation on different scores along with RFM segments and descriptions.
18. Tab "RFM Analysis" gives you more details on different segments
19. Tab "RFM Matrix" will show you number of customers belonging to different RFM segment. You can export matrix data to use Customer IDs for different business actions (e.g. exporting list of about to churn customers).
The "What Dataset Do I Need?" section of Graphite Note is a comprehensive resource designed to guide users through the intricacies of dataset selection and preparation for various machine learning models. This section is crucial for users, especially those without extensive AI expertise, as it provides clear, step-by-step instructions and examples on how to curate and structure data for different predictive analytics scenarios.
Key Features of the Section
Model-Specific Guidance: Each page within this section is tailored to a specific predictive model, such as cross-selling prediction, churn prediction, or customer segmentation. It outlines the type of data required, the format, and how to interpret and use the data effectively.
Sample Datasets and Templates: To make the process more user-friendly, the section includes sample datasets and templates. These examples showcase the necessary columns and data types, along with a brief explanation of each, helping users to model their datasets accurately.
Target Column Identification: A crucial aspect of preparing a dataset for machine learning is identifying the target column. This section provides clear guidance on selecting the appropriate target for different types of analyses, whether it's for classification, regression, or clustering.
Data Cleaning and Preparation Tips: Recognizing that data rarely comes in a ready-to-use format, this section offers valuable tips on cleaning and preparing data, ensuring that users start their predictive analytics journey on the right foot.
Real-World Applications and Use Cases: To bridge the gap between theory and practice, the section includes examples of real-world applications and use cases. This approach helps users understand how their data preparation efforts translate into actionable insights in various business contexts.
This page contains the most Frequently Asked Questions
Predictive analytics is a form of advanced analytics that uses both new and historical data to forecast future activity, behavior, and trends. It involves applying statistical analysis techniques, analytical queries, and automated machine learning algorithms to data sets to create predictive models that place a numerical value — or score — on the likelihood of a particular event happening.
Prescriptive analytics is a form of advanced analytics that examines data or content to answer the question "What should be done?" or "What can we do to make 'X' happen?". It is characterized by techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristics, and machine learning.
At Graphite Note, we take data security very seriously. We employ robust security measures to ensure your data is protected at all times. This includes encryption of data at rest and in transit, regular security audits, and strict access controls. Read more here.
Graphite Note is designed to work with a wide range of data types. You can import data from various sources such as CSV files, databases, data warehouses. The platform can handle structured, tabular data (like numerical and categorical data).
Importing data into Graphite Note is a straightforward process. You can upload data directly from your computer, connect to a database, or your data warehouse. Our platform supports a variety of data formats, including CSV, and SQL databases.
We offer a range of support options to help you get the most out of Graphite Note. This includes a comprehensive knowledge base, video tutorials, and email support. Our dedicated support team is always ready to assist you with any questions or issues you may have.
Absolutely! Graphite Note is designed to be user-friendly and accessible to everyone, regardless of their technical background. Our no-code platform allows you to generate predictive and prescriptive analytics without needing to write a single line of code.
Graphite Note is versatile and can be beneficial to a wide range of industries. This includes but is not limited to retail, e-commerce, marketing, sales, finance, healthcare, and manufacturing. Any industry that relies on data to make informed decisions can benefit from our platform.
Graphite Note is a flexible platform that can be tailored to meet your specific business needs. Whether you're looking to improve customer retention, optimize your marketing campaigns, forecast sales, or identify trends, our platform can provide the insights you need to drive growth.
We offer a variety of resources to help new users get started with Graphite Note. This includes step-by-step tutorials, webinars, and a comprehensive knowledge base. We're committed to helping you get the most out of our platform and will work with you during onboarding.
You can easily re-upload a CSV file with Graphite Note here.
A tag is a keyword associated with a model or dataset. It is a tool to group your models to easily find them, as you can filter your list by tags.
Also, you can create tags and manage them by clicking on the Account tab in the top-right of Graphite Note, and then the Tags drop-down item. You can change its name, its color, and its description or delete it.
You can also create a tag directly when you are importing a dataset or a model by clicking on Select tag and then Create & Apply or you can choose an existing one and Apply it.
In programming, parsing refers to the process of analyzing and interpreting the structure of a sequence of characters or symbols according to a specific grammar or syntax. It is used in our application to understand and extract meaningful information from input data.
During parsing, a parser takes the input, which can be a program's source code or any other form of textual data, and breaks it down into a hierarchical structure that conforms to a predefined set of rules or grammar. This hierarchical structure is typically represented using a data structure such as an abstract syntax tree (AST) or a parse tree.
Users can start their free trial of the SPROUT plan immediately. The trial is valid for the next 7 days. After seven days, if you want to continue our service, you must subscribe to a plan in communication with our sales team.
The starter plan is primarily designed for individual users that want to upload CSV files and create machine learning models.
The starter plan has the same core functionality as higher plans but with the following limitations:
Only one user in the workspace.
Only CSV connector
Number of total data source rows limited to 50k
APIs enable you to pull your predictions into your ERP, CRM, internal app, or website. It is a way to process or display predictions outside your Graphite Note account.
With Graphite Note, you have the option to add a Dedicated Data Scientist to your team. This is an expert in machine learning and data science who can assist you and your team with any questions or concerns you may have. They can also provide hands-on support with tasks such as data cleaning and improving the performance of your models.
We can extend your trial beyond the one-week default period in certain circumstances. Don't hesitate to get in touch with us before the end of your trial if you'd like to discuss this further.
Graphite Note runs on platforms belonging to reputable leading service providers and vendors that uphold the highest security standards, specifically: Amazon Web Service (AWS).
Create Binary Classification model on Demo Churn dataset.
Get an overview of Customer Churn demo dataset and how it can be used to create your new Graphite Note model in this video:
Or follow instructions below to get step by step guidance on how to use Customer Churn demo dataset:
1.If you want to use Graphite Note demo datasets click "Import DEMO Dataset"
2. Select the dataset you want to use to create a machine learning model. In this case we will select Churn dataset to create binary classification analysis on customer engagement data .
3. Once selected, demo dataset will load into your account. Dataset view will automatically open.
4. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create new model in the Graphite Note main menu click on "Models"
6. You will get list of available models. Click on "New Model" to create new one.
7. Select model type from our templates. In our case we will select "Binary Classification" by double clicking on its name.
8. Select dataset you want to use to produce model. We will use "Demo-Churn.csv."
9. Name your new model. We will call it "Binary Classification on Demo-Churn".
10. Write description of the model and select tag. If you want to, you can also create a new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up Binary Classification model first you need to define "Target Feature". That is binary column from your dataset that you'd like to make predictions about. In case of Binary Classification on Churn dataset, the target feature will be "Churn" column.
13. Click "Next" to get the list of model features that will be included in scenario. Model relies upon each column (feature) to make accurate predictions. When training model we will calculate which of the features are most important and behave as Key Drivers.
14. To start training model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for a few moments and Voilà! Your Binary Classification model is trained. Click on "Performance" tab to get model insights and view Key Drivers.
16. Explore Binary Classification model by clicking on Impact Analysis and Training Results to get more insights on how model is trained.
17. If you want to turn your model into action click on "Predict" tab in the main model menu.
18. You can produce your own "What-If analysis" based on existing training results. You can also import a fresh CSV dataset with data model will use to make predictions on a target column. In our case that is "Churn". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Create Timeseries on Demo monthly car sales dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select a dataset you want to use to create your advanced analytics model. In this case, we will select Monthly Car Sales dataset to create a "Timeseries Forecast" analysis on car sales data.
3. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.
4. Adjust your dataset options on the Settings tab. Click the Columns tab to view the list of available columns with their corresponding data types. Explore the dataset details on the Summary tab.
5. To create a new model in the Graphite Note main menu, click on "Models".
6. You will get list of available models. Click on "New Model" to create new one.
7. Select the model type from our templates. In our case, we will select "Timeseries Forecast" by double clicking on its name.
8. Select the dataset you want to use to produce the model. We will use "Demo-Monthly-Car-Sales.csv".
9. Name your new model. We will call it "Timeseries forecast on Demo-Monthly-Car-Sales".
10. Write description of the model and select a tag. If you want to, you can also create new tag from pop-up "Tags" window, that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up Timeseries forecast analysis first you need to define the "Target Column". That is a numeric column from your dataset that you'd like to forecast. In the case of Timeseries on monthly car sales dataset target column is "Sales"
13. If dataset includes multiple time series sequences, you can select field that will be used to uniquely identify each sequence. In the case of our demo dataset, we will not apply Sequence Identifier field since we have only "Sales" target column.
14.
15. Click "Next" to open "Time/Date Column" selection. Choose "Month" as date column.
16. From additional options below, choose "Monthly" as time interval and define "Forecast Horizon". We will set up forecast horizon to 6 months in the future.
17. Click "Next" to activate "Seasonality" options step. Here, you can define seasonality specifics of your forecast. If time interval is set to daily on the next step you will also have "Advanced options" available.
18. Click "Run Scenario" to train your timeseries forecast.
19. Wait for a few moments and Voilà! Your Timeseries forecast is trained. Click on the "Performance" tab to get insights and view the graph with original(historical) and predicted model data.
20. Explore more details on "trend", "Seasonality" and "Details" tabs.
21. If you want to turn your model into action click on "Predict" tab in the main model menu.
22. You can produce your own Forecast analysis based on the existing training results by selecting Start and End date from drop down calendar and clicking on "Predict" button.
23. Use your model often to predict future sales results. The more you use and retrain your model, the smarter it becomes!
In your Account information, you can view the number of users allowed based on your plan. To invite users to your team, ensure you have available user slots and click on Invite user.
You will then be redirected to the “Users page” accessible through the Account tab in the top-right corner of Graphite Note, followed by selecting the Users option from the drop-down menu. On this page, you can see all the information on each user of your team.
From there, click on Invite user in the top-right corner of the page, add the email address, select the desired role, and then select Invite.
Create a Regression model on Demo Marketing Mix dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create your machine learning model. In this case, we will select MMM dataset to create Regression Analysis on marketing mix and sales data.
3. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.
4. Adjust your dataset options on the Settings Tab. Click the Columns tab to view list of available columns with their corresponding data types. Explore dataset details on the Summary tab.
5. To create a new model in the Graphite Note main menu click on "Models".
6. You will get list of available models. Click on "New Model" to create a new one.
7. Select your model type from our templates. In our case we will select "Regression" by double clicking on its name.
8. Select dataset you want to use to produce the model. We will use "Demo-MMM.csv"
9. Name your new model. We will call it "Regression on Demo-MMM".
10. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up your "Regression Model", firstly, you need to define "Target Feature". That is numeric column from your dataset that you'd like to make predictions about. In the case of Regression on Marketing Mix and Sales Dataset, the target feature is "Sales" column.
13. Click "Next" to get the list of model features that will be included in scenario. The model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.
14. To start training the model, click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view the Key Drivers.
16. Explore Regression model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how the model is trained and set up.
17. If you want to take your model into action, click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import fresh CSV dataset with data model will use to make predictions on target column. In our case, that is "Sales". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Create Binary Classification model on Demo Upsell dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select dataset you want to use to create your machine learning model. In this case we will select Upsell dataset to create binary classification analysis on additional purchases by customer data.
After selection, the demo dataset will automatically load into your account and the dataset view will open immediately.
4. Adjust your dataset options on the Settings tab. Click Columns tab to view the list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create a new model in the Graphite Note main menu, click on "Models"
6. You will get a list of available models. Click on "New Model" to create a new one.
7. Select the model type from our templates. In our case we will select "Binary Classification" by double clicking on its name.
8. Select the dataset you want to use to produce model. We will use "Demo-Upsell.csv".
9. Name your new model. We will call it "Binary Classification on Demo-Upsell".
10. Write a description of the model and select a tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up a Binary Classification model, firstly, you need to define "Target Feature". That is binary column from your dataset that you'd like to make predictions about. In case of Binary Classification on Upsell dataset target feature will be "Applied" column.
13. Click "Next" to get the list of model features that will be included in the model scenario. Your model relies upon each column (feature) to make accurate predictions. When training the model we will calculate which of the features are most important and behave as Key Drivers.
14. To start training your model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Binary Classification model is trained. Click on "Performance" tab to get model insights and view the Key Drivers.
16. Explore Binary Classification model by clicking on Impact Analysis and Training Results to get more insights on how model is trained.
17. If you want to take your model into action click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on the target column. In our case that is "Applied". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Binary Classification Model on Demo Lead Scoring dataset.
Get an overview of Lead Scoring demo dataset and how it can be used to create your new Graphite Note model in this video:
Or follow instructions below to get step by step guidance on how to use Lead Scoring demo dataset:
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create the machine learning model. In this case, we will select "Lead Scoring dataset" to create binary classification analysis on potential customer interactions data.
3. Once selected, the demo dataset will load directly to your account. The Dataset view will automatically open.
4. Adjust your dataset options on the Settings tab. Click Columns tab to view the list of available columns with their corresponding data types. Then explore the dataset details on Summary tab.
5. Click "Models"
6. You will get list of available models. Click on "New Model" to create a new one.
7. Select the model type from our templates. In our case, we will select "Binary Classification" by double clicking on its name.
8. Select dataset you want to use to produce the model. We will use "Demo-Lead-Scoring.csv."
9. Name your new model. We will call it "Binary Classification on Demo-Lead-Scoring".
10. Write the description of the model and select a tag. If you want to, you can also create a new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up a Binary Classification model, firstly, you need to define the "Target Feature". That is a binary column from your dataset that you'd like to make predictions about. In the case of Binary Classification on a Lead Scoring dataset, the target feature will be the "Converted" column.
13. Click "Next" to get the list of model features that will be included in the model scenario. The model relies upon each column (feature) to make accurate predictions. When training the model, it will calculate which of the features are most important and behave as Key Drivers.
14. To start training model click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Binary Classification model is trained. Click on the "Performance" tab to get model insights and to view the Key Drivers.
16. Explore the "Binary Classification" model by clicking on the Impact Analysis and Training Results to get more insights on how the model is trained.
17. If you want to turn your model into action, click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on the target column. In our case that is "Converted". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour, and to learn which key drivers are impacting outcomes. The more you use and retrain your model, the smarter it becomes!
Create a Regression model on Demo Store Item Demand dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select dataset you want to use to create your machine learning model. In this case, we will select Store Item Demand dataset to create Regression analysis on sales across store locations data.
3. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.
4. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.
5. To create a new model in the Graphite Note main menu, click on "Models".
6. You will get list of available models. Click on "New Model" to create new one.
7. Select model type from our templates. In our case we will select "Regression" by double clicking on its name.
8. Select the dataset you want to use to produce model. We will use "Demo-Store-Item-Demand.csv".
9. Name your new model. We will call it "Regression on Demo-Store-Item-Demand".
10. Write a description of the model and select tag. If you want to, you can also create new tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up Regression Model, firstly, you will need to define "Target Feature". That is the numeric column from your dataset that you'd like to make predictions about. In case of Regression on Store Item Demand dataset target feature is "Sales" column.
13. Click "Next" to get the list of model features that will be included in model scenario. The model relies on each column (feature) to make accurate predictions. When training a model, we will calculate which of the features are most important and behave as the Key Drivers.
14. To start training your model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.
16. Explore Regression model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how model is trained and set up.
17. If you want to take your model into action click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on target column. In our case that is "Sales". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.
19. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Predict Revenue is a critical task for businesses aiming to forecast future revenue streams accurately. This challenge is typically addressed using a time series forecasting model, which analyzes historical revenue data to predict future trends and patterns.
Dataset Essentials for Predict Revenue
A suitable dataset for Predict Revenue using time series forecasting should include:
Date/Time: The timestamp of revenue data, usually in daily, weekly, or monthly intervals.
Revenue: The total revenue recorded in each time period.
Seasonal Factors: Data on seasonal variations or events that might affect revenue.
Economic Indicators: Relevant economic factors that could influence revenue trends.
Marketing Spend: Information on marketing and advertising expenditures, if applicable.
An example dataset for Predict Revenue with time series forecasting might look like this:
Target Column: The Total Revenue column is the primary focus, as the model aims to forecast future values in this series.
Steps to Success with Graphite Note
Data Collection: Compile historical revenue data along with any relevant external factors.
Time Series Analysis: Utilize Graphite Note to analyze the time series data and identify patterns.
Model Training: Train a time series forecasting model using the platform.
Model Evaluation: Continuously evaluate and adjust the model based on new data and changing market conditions.
Benefits of Predict Revenue with Time Series Forecasting
Accurate Financial Planning: Enables more precise budgeting and financial planning.
Strategic Decision Making: Informs strategic decisions with insights into future revenue trends.
Adaptability to Market Changes: Helps businesses adapt strategies in response to predicted market changes.
User-Friendly Analytics: Graphite Note's no-code approach makes sophisticated time series forecasting accessible to users without specialized statistical knowledge.
In summary, Predict Revenue with time series forecasting is an essential tool for businesses to anticipate future revenue trends. Graphite Note simplifies this complex task, allowing businesses to leverage their historical data for insightful and actionable revenue predictions.
Predictive Lead Scoring is a technique used to rank leads in terms of their likelihood to convert into customers. This approach typically employs a binary classification model, where each lead is classified as 'high potential' or 'low potential' based on various attributes and behaviors.
Dataset Essentials for Predictive Lead Scoring
To effectively implement Predictive Lead Scoring, a dataset with the following elements is essential:
Lead Demographics: Information such as age, location, and job title.
Engagement Metrics: Data on how the lead interacts with your business, like website visits, email opens, and download history.
Lead Source: The origin of the lead, such as organic search, referrals, or marketing campaigns.
Previous Interactions: History of past interactions, including calls, emails, or meetings.
Purchase History: If applicable, details of past purchases or subscriptions.
An example dataset for Predictive Lead Scoring might look like this:
Target Column: The Converted column is the target variable. It indicates whether the lead converted to a customer.
Steps to Success with Graphite Note
Data Collection: Gather detailed and relevant data on leads.
Feature Selection: Choose the most predictive features for lead scoring.
Model Training: Utilize Graphite Note to train a binary classification model.
Model Evaluation: Test and refine the model for optimal performance.
Benefits of Predictive Lead Scoring
Efficient Lead Management: Prioritize leads with the highest conversion potential, optimizing sales efforts.
Personalized Engagement: Tailor interactions based on the lead's predicted preferences and potential.
Resource Optimization: Allocate marketing and sales resources more effectively.
Accessible Analytics: Graphite Note's no-code platform makes predictive lead scoring accessible to teams without deep technical expertise.
In summary, Predictive Lead Scoring is a powerful tool for optimizing sales and marketing strategies. With Graphite Note, businesses can leverage advanced analytics to score leads effectively, enhancing their conversion rates and overall efficiency.
Product Demand Forecast is a crucial process for businesses to predict future demand for their products. This task typically involves time series forecasting models, which analyze historical sales data to forecast future demand patterns.
Dataset Essentials for Product Demand Forecast
An effective dataset for Product Demand Forecast using time series forecasting should include:
Date/Time: The timestamp for each data point, typically daily, weekly, or monthly.
Product Sales: The number of units sold or the sales volume of each product.
Product Features: Characteristics of the product, such as category, price, or any special features.
Promotional Activities: Data on any marketing or promotional activities that might affect sales.
External Factors: Information on external factors like market trends, economic conditions, or seasonal events.
An example dataset for Product Demand Forecast might look like this:
Target Column: The Sales Volume column is the primary focus, as the model aims to forecast future sales volumes for each product.
Steps to Success with Graphite Note
Data Collection: Gather detailed sales data along with product features and external factors.
Time Series Analysis: Use Graphite Note to analyze the sales data over time, identifying trends and patterns.
Model Training: Train a time series forecasting model on the platform.
Model Evaluation: Regularly evaluate the model's performance and adjust it based on new data and market changes.
Benefits of Product Demand Forecast
Inventory Management: Helps in planning inventory levels to meet future demand, avoiding stockouts or overstock situations.
Strategic Marketing: Informs marketing strategies by predicting when demand for certain products will increase.
Resource Allocation: Assists in allocating resources efficiently based on predicted product demand.
Accessible Forecasting: Graphite Note's no-code platform makes advanced forecasting techniques accessible to a wider range of users.
In summary, Product Demand Forecast is vital for businesses to anticipate market demand and plan accordingly. With Graphite Note, this complex analytical task becomes manageable, enabling businesses to leverage their data for effective demand planning and strategic decision-making.
Media Mix Modeling (MMM) is a statistical analysis technique used to quantify the impact of various marketing channels on sales and other key performance indicators (KPIs). It helps businesses allocate their marketing budget more effectively by understanding the contribution of each channel to overall performance.
Dataset Essentials for Media Mix Modeling
A robust dataset for Media Mix Modeling should include:
Time Period: The specific dates or periods for which the data is collected.
Marketing Spend: The amount spent on each marketing channel during the period.
Sales Data: The total sales achieved in the same time period.
Channel Performance Metrics: Metrics like impressions, clicks, conversions, etc., for each channel.
External Factors: Information on external factors like economic conditions, competitor activities, or seasonal events.
Market Dynamics: Changes in market conditions, customer preferences, or product availability.
An example dataset for Media Mix Modeling might look like this:
Target Column: Totall Sales
Steps to Success with Graphite Note
Data Compilation: Gather comprehensive data across all marketing channels and corresponding sales data.
Model Development: Use Graphite Note, Regression Model, to develop a statistical model that correlates marketing spend across various channels with sales outcomes.
Analysis and Insights: Analyze the model's output to understand the effectiveness of each marketing channel.
Strategic Decision Making: Apply these insights to optimize future marketing spends and strategies.
Benefits of Media Mix Modeling
Optimized Marketing Budget: Allocate marketing budgets more effectively across channels.
ROI Analysis: Understand the return on investment for each marketing channel.
Strategic Planning: Plan marketing strategies based on data-driven insights.
Adaptability: Adjust marketing strategies in response to changing market conditions and consumer behaviors.
Accessible Advanced Analytics: Graphite Note's no-code platform makes complex MMM accessible to teams without specialized statistical knowledge.
In summary, Media Mix Modeling is a powerful tool for businesses to optimize their marketing strategies based on comprehensive data analysis. With Graphite Note, this advanced capability becomes accessible, allowing for more informed and effective marketing budget allocation.
Predicting customer churn is a critical challenge for businesses aiming to retain their customers and reduce turnover. This problem typically involves a binary classification model, where the goal is to predict whether a customer is likely to leave or discontinue their use of a service or product in the near future.
Dataset Essentials for Customer Churn Prediction
A well-structured dataset is key to accurately predicting customer churn. Essential data elements include:
Customer Demographics: Age, gender, and other demographic factors that might influence customer loyalty.
Usage Patterns: Data on how frequently and in what manner customers use the product or service.
Customer Service Interactions: Records of customer support interactions, complaints, and resolutions.
Transaction History: Details of customer purchases, payment methods, and transaction frequency.
Engagement Metrics: Measures of customer engagement, such as email opens, website visits, or app usage.
A typical dataset for churn prediction might look like this:
Target Column: The Churned column is the target variable, indicating whether the customer has churned (Yes) or not (No).
Steps to Success with Graphite Note
Data Gathering: Collect comprehensive and relevant customer data.
Feature Engineering: Identify and create features that are most indicative of churn.
Model Training: Use Graphite Note to train a binary classification model on your dataset.
Model Evaluation: Test the model's performance and refine it for better accuracy.
Benefits of Predicting Customer Churn
Proactive Customer Retention: Identifying at-risk customers allows businesses to take proactive steps to retain them.
Improved Customer Experience: Insights from churn prediction can guide improvements in products and services.
Cost Efficiency: Retaining existing customers is often more cost-effective than acquiring new ones.
Accessible Analytics: Graphite Note's no-code platform makes predictive analytics accessible, enabling businesses of all sizes to leverage AI for customer retention.
In summary, the Predict Customer Churn model is an invaluable tool for businesses focused on customer retention. Through Graphite Note, this advanced predictive capability becomes accessible to businesses without the need for extensive technical expertise, allowing them to make informed, data-driven decisions for customer retention strategies.
RFM Customer Segmentation: An Overview RFM (Recency, Frequency, Monetary) customer segmentation is a method businesses use to categorize customers based on their purchasing behavior. This approach helps personalize marketing strategies, improve customer engagement, and increase sales.
The segmentation is based on three criteria:
Recency: How recently a customer made a purchase.
Frequency: How often they make purchases.
Monetary Value: How much money they spend.
Essential Dataset Components for RFM Segmentation A robust dataset for effective RFM segmentation includes the following key elements:
Date (Recency): The date of each customer's last transaction, essential for assessing the 'Recency' aspect of RFM.
Customer ID: A unique identifier for each customer, crucial for tracking individual purchasing behaviors.
Monetary Spent (Monetary Value): The total amount spent by the customer in each transaction, to evaluate the 'Monetary' component of RFM.
Example Dataset for RFM Customer Segmentation
Steps to Success with Graphite Note for RFM Segmentation
Data Collection: Gather comprehensive data including customer IDs, transaction dates, and amounts spent.
Data Analysis: Utilize Graphite Note to dissect the data, focusing on recency, frequency, and monetary values of customer transactions.
Segmentation Modeling: Employ models to segment customers based on RFM criteria, facilitating targeted marketing strategies.
Benefits of RFM Segmentation Using Graphite Note
Enhanced Marketing Strategies: Tailor marketing campaigns based on customer segments.
Improved Customer Engagement: Customize interactions based on individual customer behaviors.
Efficient Resource Allocation: Focus efforts on the most profitable customer segments.
Strategic Business Decisions: Make informed choices regarding customer relationship management and retention strategies.
In conclusion, RFM Customer Segmentation is a powerful approach for businesses seeking to understand and cater to their customers more effectively. Graphite Note offers a no-code platform that simplifies the analysis of customer data for RFM segmentation, enabling businesses to leverage their data for strategic advantage in customer engagement and retention.
When you open a dataset, you have five different tabs: , , , , and tabs.
First, on the Settings tab, you can re-upload the dataset, rename it, and change the description and the tag. You also have the information on the type, the ID, the creation date, and the updated date.
The Columns tab provides you the original name, the column name, the data type, and the data format of each column, that you can modify.
On the View Data tab, you have all the data with the number of columns and rows.
The Summary gives a simple analysis of each column with a graph.
For numerical columns, it counts the number of null values; and calculates the sum, the mean, the standard deviation, the min, the max, the lower and upper quantile, and the median.
For categorical columns, it counts the number of null values, of unique values, and the min and max length.
The last part is the Association tab, which measures the relationship between two variables. The association between numerical variables is the correlation:
a zero correlation indicates no relationship between the variables
a correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up as well
a correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down
If you need, you can use the More details button to better understand associations.
Data is an essential component of any data modeling and analysis process. The kind of data you need for modeling depends on the specific problem you are trying to solve. In general, the data should be relevant, accurate, and consistent, and it should cover a significant period. In some cases, you may also need to preprocess or transform the data to make it suitable for modeling.
If you are new to using Graphite Note or are looking for some examples to practice with, there are several popular datasets available that you can explore. Some examples include weather data, financial data, social media data, and sensor data. These datasets are often available in open-source repositories or can be downloaded from public sources, such as government websites, social media platforms, or financial databases.
Graphite Note is a powerful tool that allows you to predict, visualize and analyze data in real-time. With the right dataset, you can use Graphite Note to gain valuable insights and make informed decisions about your business or research. Whether you are analyzing financial data to predict market trends or monitoring sensor data to optimize your production processes, our platform can help you make sense of your data and identify patterns that would be difficult to detect otherwise.
While the kind of data you need may vary depending on your specific needs, there are several popular datasets that you can use to practice and explore the capabilities of Graphite Note. With the right dataset and a solid understanding of data modeling and analysis, you can unlock the full potential of Graphite Note and gain insights that will drive your business or research forward.
We have highlighted a few popular datasets so you can get to know our platform better. After that, it's all up to you - collect your data and start having insights and fun!
An education company named “X Education” sells online courses to industry professionals. Many professionals interested in the courses land on their website and browse for courses on any given day—an excellent dataset for Binary Classification, with a target column "Converted" (YES/NO).
Use Graphite Note to gain valuable insights into your sales pipeline by identifying which leads are converting to customers and the factors that contribute to their success. With this information, you can optimize your sales strategy and improve your overall conversion rates.
In addition, our tool can also help you predict which new leads are most likely to convert to customers and provide a probability score for each lead. This can enable you to prioritize your sales efforts and focus on the leads with the highest conversion potential.
By leveraging our tool, you can gain a deeper understanding of your sales funnel and take proactive steps to improve your conversion rates, reduce churn, and increase revenue.
To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Binary Classification model in Graphite Note with the 'Converted' variable as the Target Variable. This will allow you to predict which leads are most likely to convert to customers.
After training the model, explore the insights that it provides, such as the most important features for predicting conversion and the distribution of conversion probabilities. This can help you to gain a better understanding of the factors that contribute to lead conversion and make informed decisions about your sales strategy.
Finally, you can use the model to run a "what-if" scenario by predicting the conversion probability for new leads based on different scenarios or assumptions. This can help you to forecast the impact of changes in your sales approach or marketing efforts and make data-driven decisions.
By following these steps, you can leverage Graphite Note and the provided dataset to gain valuable insights into your sales pipeline, predict lead conversion, and optimize your sales strategy for better results.
A Telco company customer dataset. Each row represents a customer and each column contains the customer’s attributes. The dataset includes information about:
Customers who left the company – that will be our target column, ("Churn").
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies.
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges.
Demographic info about customers – gender, age range, and if they have partners and dependents.
Use Graphite Note to gain valuable insights into your customer base and identify which customers are most likely to churn. By analyzing the factors that contribute to churn, you can optimize your retention strategy and reduce customer churn rates.
In addition, our tool can also help you predict which customers are at high risk of churning, and provide a probability score for each customer. This can enable you to take proactive steps to retain those customers with the highest churn risk, such as offering personalized promotions or improving their overall experience.
By leveraging our tool, you can gain a deeper understanding of your customer base and identify opportunities to reduce churn, increase retention rates, and ultimately drive revenue growth. With our predictive churn model, you can make data-driven decisions that lead to more satisfied customers and a stronger business.
To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Binary Classification model in Graphite Note with the 'Churn' variable as the Target Variable.
This will allow you to predict which customers are most likely to churn.
After training the model, explore the insights that it provides, such as the most important features for predicting churn and the distribution of churn probabilities. This can help you to gain a better understanding of the factors that contribute to customer churn and make informed decisions about your retention strategy.
Finally, you can use the model to run a "what-if" scenario by predicting the churn probability for different groups of customers based on different scenarios or assumptions. This can help you to forecast the impact of changes in your retention approach or customer experience efforts and make data-driven decisions.
By following these steps, you can leverage Graphite Note and the provided dataset to gain valuable insights into your customer base, predict customer churn, and optimize your retention strategy for better results.
The dataset contains monthly data on car sales from 1960 to 1968. It is great for our time series forecast model with which you can predict sales for the upcoming months.
Use Graphite Note to gain valuable insights into your business operations and forecast future trends by analyzing time series data. With our advanced forecasting models, you can make informed decisions about your business and optimize your operations for better results.
Our tool enables you to analyze historical data and identify patterns and trends, such as seasonality or cyclical trends. This can help you to forecast future demand or performance and make data-driven decisions about resource allocation, capacity planning, or inventory management.
To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Timeseries Forecast model in Graphite Note with
The 'Sales' variable as aTarget Variable
Time/Date Column: Month
Time Interval: Monthly
After training the model, explore the insights that it provides, such as identifying patterns, seasonality, and trends. This can help you to forecast future performance, plan resources effectively, and optimize your operations.
Finally, you can use the model to run a "what-if" scenario by predicting future values.
This can help you to forecast the impact of changes in your business operations, such as changes in demand, capacity planning, or inventory management.
By following these steps, you can leverage Graphite Note to gain valuable insights into your business trends, forecast future performance, and optimize your operations for better results. With our advanced time series forecasting models, you can stay ahead of the competition and take advantage of new opportunities as they arise.
This is a demo CSV with orders for an imaginary eCommerce shop. You can use it for Timeseries forecasting, RFM model, Customer Lifetime Value Model, General Segmentation, or New vs Returning Customers model in Graphite.
A demo Mall Customers dataset from Kaggle. Ideal for General customer segmentation in Graphite.
First things first, you have to upload you CSV file(s) into Graphite.
To upload a CSV file:
Go to Create New on Datasets, or New Dataset when you are in the datasets list
Select the CSV file
Select or drop your CSV file
Select Parse file (you can rename or change the data type of your columns)
Select Create
Customer Lifetime Value (CLV) prediction is a process used by businesses to estimate the total value a customer will bring to the company over their entire relationship. This prediction helps in making informed decisions about marketing, sales, and customer service strategies.
Dataset Essentials for Customer Lifetime Value Prediction
A suitable dataset for CLV prediction should include:
Date: The date of each transaction or interaction with the customer.
Customer ID: A unique identifier for each customer.
Monetary Spent: The amount of money spent by the customer on each transaction.
An example dataset for Customer Lifetime Value prediction might look like this:
Steps to Success with Graphite Note
Data Collection: Compile transactional data including customer IDs and the amount spent.
Data Analysis: Use Graphite Note to analyze the data, focusing on customer purchase patterns and frequency.
Model Training: Train a model to predict the lifetime value of a customer based on their transaction history.
Benefits of Predicting Customer Lifetime Value
Targeted Marketing: Focus marketing efforts on high-value customers.
Customer Segmentation: Segment customers based on their predicted lifetime value.
Resource Allocation: Allocate resources more effectively by focusing on retaining high-value customers.
Personalized Customer Experience: Tailor customer experiences based on their predicted value to the business.
Strategic Decision-Making: Make informed decisions about customer acquisition and retention strategies.
In summary, predicting Customer Lifetime Value is crucial for businesses to understand the long-term value of their customers. Graphite Note facilitates this process by providing a no-code platform for analyzing customer data and predicting their lifetime value, enabling businesses to make data-driven decisions in customer relationship management.
If you are using SQL for your database, you can connect to it and write your own SQL.
To connect to your database:
Go to Create New on Datasets, or New Dataset when you are in the datasets list
Chose MySQL/MariaDB or PostgreSQL
Establish a connection
Enter your server hostname/IP address, database port, database user, database password, and database name
Select Check Connection
Ensure that your firewall accepts incoming requests from the following two IP addresses: 99.81.63.220 and 68.183.64.54
Write the desired SQL and click on the Run SQL button to get your data
You should see all the columns from the selected dataset appearing. If necessary, you can change column names, data types, or data formats
Click on the Create button to create your dataset
It's much easier to get data from databases using SQL - you can adjust the dataset to your needs! By repeating the above steps, you can easily get your data and start running various models without writing down any line of code.
If you have you collected more data related to your uploaded CSV or there has been a change in the you can use the re-uploading option.
To understand, if you have a new file with new data, you can re-upload to append it to the previous dataset. Just keep in mind that the file you select must have the same column structure as the previously uploaded file.
To re-upload a CSV file :
Go to Datasets list
Select the dataset you want to re-upload
Select Re-upload
Depending on your needs, you can select Append data
Select or drop your CSV file
Select Update
For example, it can be useful for monthly data. If every month you are getting new data in a CSV file and you need to merge the data for all the months into one CSV file. Instead of repeating the copy and paste commands, you can use the re-upload option on every month, which is easier and faster.
Explore all Graphite no-code machine learning Models .
Explore the most popular Use Cases .
If you want, you can name the dataset, write down a short description of the data and add a
Choose your options
If you want, you can name the dataset, write down a short description of the data and add a
2021-01-01
C001
$150
2021-01-15
C002
$200
2021-02-01
C001
$100
2021-02-15
C003
$250
2021-03-01
C002
$300
2021-01-01
C001
$150
2021-01-15
C002
$200
2021-02-01
C001
$100
2021-02-15
C003
$250
2021-03-01
C002
$300
Data labeling is the process of tagging data with meaningful and informative labels to train machine learning models. In predictive analytics, labeled data is crucial as it provides the model with examples of correct behavior. This document will guide you through the process of preparing and labeling data for three predictive models:
Lead Scoring,
Churn Prediction,
and MQL to SQL Conversion.
Objective: Predict if a lead will convert into a customer.
Dataset Example:
001
Tech
50-100
5
Yes
002
Finance
100-500
2
No
Steps:
Data Collection: Gather data on leads, including their industry, company size, and interactions with your platform.
Labeling: For each lead, label them as 'Yes' if they converted into a customer and 'No' if they didn't.
Reasoning: Labeling helps the model understand patterns of conversion based on the features provided.
Objective: Predict if a customer will churn or leave your service.
Dataset Example:
A1
50 hrs
2
4.5
No
B2
10 hrs
5
2.8
Yes
Steps:
Data Collection: Gather data on customer usage patterns, support interactions, and feedback scores.
Labeling: For each customer, label them as 'Yes' if they churned and 'No' if they continued using your service.
Reasoning: Labeling helps the model identify signs of customer dissatisfaction or reduced engagement, which might lead to churn.
Objective: Predict if a Marketing Qualified Lead (MQL) will become a Sales Qualified Lead (SQL).
Dataset Example:
M1
2
Yes
15%
Yes
M2
0
No
5%
No
Steps:
Data Collection: Gather data on MQLs, including their engagement with webinars, content downloads, and email interactions.
Labeling: For each MQL, label them as 'Yes' if they became an SQL and 'No' if they didn't.
Reasoning: Labeling helps the model recognize patterns of engagement that indicate a lead's readiness to move to the sales stage.
Data labeling is a foundational step in predictive analytics. By providing clear, accurate labels, you enable your predictive models to learn from past data and make accurate future predictions. Ensure your labels are consistent and based on well-defined criteria to achieve the best results with Graphite Note's no-code predictive analytics platform.
in this section we will give you few practical advices on how can you oimprove performance of your model
how is performance masured?
Explan and refer
The Confusion Matrix is a powerful diagnostic tool in classification tasks within predictive analytics. It presents a clear and concise layout for evaluating the performance of a classification model by showing the actual versus predicted values in a tabular format. The matrix allows users, regardless of their coding expertise, to assess the accuracy and effectiveness of a predictive model, providing insights into not only the number of correct and incorrect predictions but also the type of errors made.
A confusion matrix for a binary classification problem consists of four components:
True Positives (TP): The number of instances that were predicted as positive and are actually positive.
False Positives (FP): The number of instances that were predicted as positive but are actually negative.
True Negatives (TN): The number of instances that were predicted as negative and are actually negative.
False Negatives (FN): The number of instances that were predicted as negative but are actually positive.
In the context of Graphite Note, a no-code predictive analytics platform, the confusion matrix serves several key purposes:
Performance Measurement: It quantifies the performance of a classification model, offering a visual representation of the model's ability to correctly or incorrectly predict categories.
Error Analysis: By breaking down the types of errors (FP and FN), the matrix aids in understanding specific areas where the model may require improvement.
Decision Support: The confusion matrix supports decision-making by highlighting the balance between sensitivity (or recall) and precision, which can be crucial for business outcomes.
Model Tuning: Users can leverage the insights from the confusion matrix to adjust model parameters and thresholds to optimize for certain predictive behaviors.
Communication Tool: It acts as a straightforward communication tool for stakeholders to grasp the results of a classification model without delving into complex statistical jargon.
In the example confusion matrix (Model Performance -> Accuracy Overview)
There are 799 instances where the model correctly predicted the positive class (TP).
There are 15622 instances where the model incorrectly predicted the positive class (FP).
There are 348 instances where the model failed to identify the positive class (FN).
There are 18159 instances where the model correctly identified the negative class (TN).
The high number of FP and FN relative to TP suggests a potential imbalance or a need for model refinement to improve predictive accuracy.
The classification confusion matrix is an integral part of the model evaluation in Graphite Note, enabling users to make informed decisions about the deployment and iteration of their predictive models.
2021-01-01
$10,000
New Year
Stable
$2,000
2021-01-08
$12,000
None
Stable
$2,500
2021-01-15
$15,000
None
Growth
$3,000
2021-01-22
$13,000
None
Growth
$2,800
2021-01-29
$11,000
None
Stable
$2,200
L1001
30
NY
Manager
10
5
Organic
0
Yes
L1002
42
CA
Analyst
3
2
Referral
1
No
L1003
35
TX
Developer
8
7
Campaign
2
Yes
L1004
28
FL
Designer
5
3
Organic
0
No
L1005
45
WA
Executive
12
10
3
Yes
2021-01-01
ProdA
150
$20
None
Stable
New Year
2021-01-08
ProdB
200
$25
Discount
Growing
None
2021-01-15
ProdC
180
$30
Ad Campaign
Declining
None
2021-01-22
ProdA
170
$20
None
Stable
None
2021-01-29
ProdB
220
$25
Email Blast
Growing
None
Jan 2021
$20,000
$15,000
$5,000
$3,000
$100,000
Stable
New Year
Feb 2021
$25,000
$18,000
$4,000
$3,500
$120,000
Growth
Valentine's
Mar 2021
$22,000
$20,000
$6,000
$4,000
$110,000
Stable
None
Apr 2021
$18,000
$17,000
$5,500
$4,500
$105,000
Declining
Easter
May 2021
$20,000
$19,000
$7,000
$4,000
$115,000
Growth
Memorial Day
2001
32
F
58000
20 hours
2
30 days ago
No
2002
40
M
72000
15 hours
0
60 days ago
Yes
2003
25
F
45000
35 hours
3
10 days ago
No
2004
29
M
50000
25 hours
1
45 days ago
No
2005
47
F
65000
10 hours
4
90 days ago
Yes
When you first create your model you have to choose between many models.
Before running your scenario of your model, you can understand how the model is processed. First, it has to train, meaning we take 80% of the dataset to learn about it. Then, the remaining 20% are going to test it and calculate the model score. If the model score is high, the model trained is accurate and close to the test.
Data preprocessing is a crucial step in machine learning, enhancing model accuracy and performance by transforming and cleaning the raw data to remove inconsistencies, handle missing values, and scale features, and ensure compatibility with the chosen algorithm.
During preprocessing we can deal with
null values: if the column is 50% null or more, the column will not be included in model training
missing values: for a numerical column it will change it by the average, and for a categorical feature it will become "not_available"
One Hot Encoding: categorical data is transformed into numeric values before training a model, to be suitable for machine learning algorithms
fit imbalance: fixing the inequal distibution of target class which are not ideal for training
normalization: rescaling the values of numerical columns to have a better training result
constants: if the column has one unique value (a constant), the column will not be included in the model training
cardinality: if the column has high number of unique values, the column will not be included in the model training.
Introduction
In predictive modeling, key drivers (or influencers) are pivotal in discerning which features within a dataset most significantly impact the target variable. These influencers provide insights into the relative importance of each variable, enabling data scientists and analysts to understand and predict outcomes more accurately.
By highlighting the strongest predictors, key influencers inform the prioritization of features for model optimization, ensuring that models are precise and interpretable in real-world scenarios. This foundational understanding is crucial for refining models and aligning them closely with the underlying patterns and trends present in the data.
Reading Key Drivers
When examining the visualization of key influencers in Graphite Note Models, you'll find features arrayed according to their influence on the target variable, organized from most to least important on the left.
This ranking allows for a quick assessment of which factors are pivotal in the model's predictions.
By observing the length and direction of the bars associated with each feature, one can gauge the strength of influence they have on the target outcome.
The image shows a data visualization explaining how different amounts of interaction with website pages (measured in page visits) influence whether someone will take a specific action, labeled "Applied," with "YES" being the action taken.
For a high number of page visits, between 29.33 and 35, the likelihood of taking the action increases significantly—by more than double (2.26 times more likely).
For a moderate number of page visits, between 12.33 and 18, the action is still more likely but less so than the higher range—1.65 times more likely.
At a lower number of page visits, between 6.67 and 12.33, the action becomes less likely than the baseline by a factor of 1.37.
For very few page visits, less than 6.67, the likelihood of action drops drastically to less than half (2.36 times less likely).
The percentages and observations indicate how many cases fall within each range and how many of those cases resulted in the action "Applied" being taken. The visualization communicates that more engagement with the website (as measured by page visits) generally increases the likelihood of the desired action occurring.
Statistical Methodology Used
Graphite Note uses advanced statistical functions designed to calculate the influence of features on a target variable.
It employs a method of grouping the data by the feature and target columns and then counting occurrences. The calculations performed within this function aim to determine the proportion of each feature's categories contributing to a specific target value. The influence is quantified by comparing the observed proportion of the target value within each feature category against a weighted average, yielding an 'index value' that indicates the relative influence of each category on the target outcome. The function is robust, allowing for different data types in the target column, and ensures that only relevant categories with sufficient data are included in the final analysis.
Graphite Note here a quantitative analysis where numeric features (like 'Website Pages') are divided into bins or ranges.
The function then calculates the change in the likelihood of the target outcome (e.g., 'Applied' being 'YES') when the feature values fall within those bins. This calculation is done by comparing the base likelihood of the target outcome with the likelihood when the feature is within a specific bin, hence the multipliers like "increases by 2.26x" for certain ranges.
The analysis would remove any non-relevant categories (based on minimum percentage and row thresholds) and sort the results to clearly show which ranges of the feature increase or decrease the likelihood of the target outcome.
Now that you have imported your CSV file, or connected to your database, you can create your model.
To create your model:
Select a model type from our templates.
In this example, we are going with a Timeseries Model to predict our future sales!
Select a source Dataset for this Model. Pay attention to the required columns which are different from model to model.
If you don't have a time-related column like dates in your Dataset, you cannot perform a time-series forecasting
Define the name and description of the Model and put a tag on it if you like
In the Model Scenario, identify all parameters and relevant columns from your dataset, it depends on the model that you chose (which column in your dataset contains dates, which one you want to predict, and so on...)
Run the model and enjoy insights.
Don't forget to explore the Prediction, Trend, Seasonality, and Details Tab for additional insights.
And that’s it, your first Model is created and ready for exploration.
A Timeseries Forecast Model is designed to predict future values by analyzing historical time-related data. To utilize this model, your dataset must include both time-based and numerical columns. In this tutorial, we'll cover the fundamentals of the Model Scenario to help you achieve optimal results. Within the Model Scenario, you'll select parameters related to your dataset and the model itself.
For the Target Column, select a numeric value you want to predict. It's crucial to have values by day, week, or year. If some dates are repeated, you can aggregate them by taking their sum, average, etc.
Next, choose a Sequence Identifier Field to group certain fields and generate an independent forecast for each time series. These values shouldn't be unique; they must form a series.
Then, select the Time/Date Column, specifying the column containing time-related values. The Time Interval represents the data frequency—choose daily for daily data, yearly for annual data, etc. With Forecast Horizon, decide how many days, weeks, or years you want to predict from the last date in your dataset.
The model performs well with seasonal data patterns. If your data shows a linear growth trend, select "additive" for Seasonality Mode; for exponential growth, select "multiplicative." For example, if you see annual patterns, set Yearly Seasonality to True. (TIP: Plotting your data beforehand can help you understand these patterns.) If you're unsure, the model will attempt to detect seasonality automatically.
For daily or hourly intervals, you can access Advanced Parameters to add special dates, weekends, holidays, or limit the target value.
We are constantly enhancing our platform with new features and improving existing models. For your daily data, we've introduced some new capabilities that can significantly boost forecast accuracy. Now, you can limit your target predictions, remove outliers, and include country holidays and special events.
To set prediction limits, enter the minimum and maximum values for your target variable. For example, if you're predicting daily temperatures and know the maximum is 40°C, enter that value to prevent the model from predicting higher temperatures. This helps the model recognize the appropriate range of the Target Column. Additionally, you can use the Remove Days of the Week feature to exclude certain days from your predictions.
We added parameters for country holidays and special dates to improve model accuracy. Large deviations can occur around holidays, where stores see more customers than usual. By informing the model about these holidays, you can achieve more balanced and accurate predictions. To add holidays in Graphite Note, navigate to the advanced section of the Model Scenario and select the relevant country or countries.
Similarly, you can add promotions or events that affect your data. Enter the promotion name, start date, duration, and future dates. This ensures the model accounts for these events in future predictions.
Combining these parameters provides more accurate results. The more information the model receives, the better the predictions.
In addition to adding holidays and special events, you can delete specific data points from your dataset. In Graphite Note, enter the start and end dates of the period you want to remove. For single-day periods, enter the same start and end date. You can remove multiple periods if necessary. Understanding your data and identifying outliers or irrelevant periods is crucial for accurate predictions. Removing these dates can help eliminate biases and improve model accuracy.
By following these steps, you can harness the full potential of your Timeseries Forecast Model, providing valuable insights and more accurate predictions for your business. Now it's your turn to do some modeling and explore your results!
After setting all parameters it is time to Run Scenario and train Machine Learning model.
After running your model, review your results. The Model Performance section provides visual and numerical summaries of key metrics. You'll see values for four evaluation metrics, crucial for assessing your machine learning algorithm's performance.
R-squared determines the proportion of variance in the dependent variable that can be explained by the independent variable. MAPE (Mean absolute percentage error), MAE (Mean absolute error) and RMSE (Root mean squared error) are measures that describe the average difference between the actual and predicted value.
The results are organized into five tabs: Model Fit, Trend, Seasonality, Special Dates, and Details.
The Model Fit Tab features a graph displaying actual and predicted values. Besides the primary target value prediction (yellow line), the model shows a range of values, known as the uncertainty interval (yellow shaded area). This visualization helps you gauge your model's performance.
If you used the Sequence Identifier Field, you can choose which value to analyze in each Model Result.
Trends and seasonality are key characteristics of time-series data that should be analyzed. The Trend Tab displays a graph illustrating the global trend that Graphite Note has detected from your historical data.
Seasonality represents the repeating patterns or cycles of behavior over time. Depending on your Time Interval, you can find one or two graphs in the Seasonality Tab. For daily data, one graph shows weekly patterns, while the other shows yearly patterns. For weekly and monthly data, the graph highlights recurring patterns throughout the year.
The Special Dates graph shows the percentage effects of the special dates and holidays in historical and future data:
The Details Tab contains a comprehensive table with all the values related to the Model Fit Tab, along with additional information.
Once your model is trained zou can use to fullfill its real function and that is predicting future values for Target column and trend. Use Predict tab to set up Star date and End date for time interval for which prediction will be calculated.
after trigering predict button table with prediction results and trends will become available. Prediction results will have the same frequency as trained model. (for example if model is trained on daily data predictions will be calculated for every day in prediction interval, if model is trained monthly, predictions will be created for every month)
Beside predict option where you can mannualz enter prediction parameters in case of timeseries Start end End tade Graphite Note offers API connection that can be used as a two-way communication between Graphite Note model third party external applications. You can use the API to programmatically make predictions by passing data to a model and retrieve the prediction results in real time. Details on how to use API you can find in the REST API section.
To interact with the Graphite Note API and perform predictions, you need to make a POST request to the API endpoint. On the API tab zou will find generated code snippet containing request that needs to be sent using cURL.
The request requires the Authorization header to be included. This header should be set to "Bearer [token]
". You will replace[token]
with your unique token that can be found by accessing the account info page in the Graphite Note app, under the section displaying your current plan information.
Once your prediction model is prepared and you are using it to predict future outcome of Timeseries you can enable end users to run their own predictions with Notebook feature. Notebook is used to easaly and intuitively do your own Data Storytelling; create various visualization with detailed descriptions, plot model results for better understanding and enable users to make their own predictions. More about notebooks you can find in Notebooks - Data Storytelling section
Click on New Notebook button to open New notebook creation vizard. Choose neotebook name, description, and attributes and click Create.
You can use notebook as single place to represent any of the following data:
Text with descriptions and writen insights about data
New visualization - you can choose between different visualisations to show data from zour dataset (such as barchart, linechart,
MODEL RESULT - you can present model results containing predictions based on data feed to model throug Predict tab or API.
MODEL ACTIONABLE INSIGHT - zou can present recomended actionable insight prepared by Gen Ai.
In our case we want to expose predct option to final user similar as it is available on Predict tab. Select Model result option and you will be guided to Model result visualization creator.
Choose model you want to use and you will be guided to next step where zou will choose model result you want to include into notebook.
Choose Predict as tab you would like to include into notebook. Predict tab enables users to make their own prediction interval selections and generate predictions based on model.
Once Notebook options are saved you will get Notebook frontend screen. You can always expand zour notebook with additional text, visualisations and model results.
You can share notebook to other users by copying URL link
With the Multiclass Classification model, you can analyze the importance of the features with 2-25 distinct values. Unlike binary classification, which deals with only two classes, multiclass classification handles multiple classes simultaneously.
To achieve the best results, we will cover the basics of the Model Scenario. In this scenario, you choose parameters related to the dataset and the model.
To run the model, you need to select a Target Feature first. This target is the variable or outcome that the model aims to predict or estimate. The Target Feature should be a text-type column (not a numerical or binary column).
You will be taken to the next step where you can choose all the Model Features you want to analyze. You can select which features the model will analyze. Graphite Note will automatically exclude some features that are not suitable for the model and will provide reasons for each exclusion.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. details on Model metrics can also be found on Accuracy Overview tab.
On the performance tab, you can explore six different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Accuracy Overview, Training Results and Details.
Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.
The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.
The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for multiclass feature.
The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.
On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.
In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for multiclass classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.
Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.
Once the model is trained, you can use it to predict future values, solve multi-class classification problems, and drive business decisions. Here are ways to take action with your Multiclass Classification model:
In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Multiclass Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.
Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access
Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model.
Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the Data Storytelling section.
Detecting early signs of reduced customer engagement is pivotal for businesses aiming to maintain loyalty. A notable signal of this disengagement is when a customer's once regular purchasing pattern starts to taper off, leading to a significant decrease in activity. Early detection of such trends allows marketing teams to take swift, proactive measures. By deploying effective retention strategies, such as offering tailored promotions or engaging in personalized communication, businesses can reinvigorate customer interest and mitigate the risk of losing them to competitors.
Our objective is to utilize a model that not only alerts us to customers with an increased likelihood of churn but also forecasts their potential purchasing activity and, importantly, estimates the total value they are likely to bring to the business over time.
These analytical needs are served by what is known in data science as Buy 'Til You Die (BTYD) models. These models track the lifecycle of a customer's interaction with a business, from the initial purchase to the last.
While customer churn models are well-established within contractual business settings, where customers are bound by the terms of service agreements, and churn risk can be anticipated as contracts draw to a close, non-contractual environments present a different challenge. In such settings, there are no defined end points to signal churn risk, making traditional classification models insufficient.
To address this complexity, our model adopts a probabilistic approach to customer behavior analysis, which does not rely on fixed contract terms but on behavioral patterns and statistical assumptions. By doing so, we can discern the likelihood of future transactions for every customer, providing a comprehensive and predictive understanding of customer engagement and value.
The Customer Lifetime Value (CLV) model is a robust tool employed to ascertain the projected revenue a customer will contribute over their entire relationship with a business. The model employs historical data to inform predictive assessments, offering valuable foresight for strategic decision-making. This insight assists companies in prioritizing resources and tailoring customer engagement strategies to maximize long-term profitability.
The CLV model executes a series of sophisticated calculations. Yet, its operations can be conceptualized in a straightforward manner:
Historical Analysis: The model comprehensively evaluates past customer transaction data, noting the frequency and monetary value of purchases alongside the tenure of the customer relationship.
Engagement Probability: It assesses the likelihood of a customer’s future engagement based on their past activities, effectively estimating the chances of a customer continuing to transact with the business.
Forecasting: With the accumulated data, the model projects the customer’s future transaction behavior, predicting how often they will make purchases and the potential value of these purchases.
Lifetime Value Calculation: Integrating these elements, the model calculates an aggregate figure representing the total expected revenue from a customer for a designated future period.
The Customer Lifetime Value model uses historical customer data to predict the future value a customer will generate for a business. It leverages algorithms and statistical techniques to analyze customer behavior, purchase patterns, and other relevant factors to estimate the potential revenue a customer will bring over their lifetime.
The dataset on which you will run your model must contain a time-related column.
We need to distinguish all customers, so we need an identifier variable like Customer ID. If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.
We need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).
Finally, you need to choose the Starting Date from which you'd like to calculate this model for your dataset.
When you're looking at this option for calculating Customer Lifetime Value (CLV), think of it as setting a starting line for a race. The "race" in this case is the journey you're tracking: how much your customers will spend over time.
The "Starting Date for Customer Lifetime Value Calculation" is basically asking you when you want to start watching the race. You have a couple of choices:
Max Date: This is like saying, "I want to start watching the race from the last time we recorded someone crossing the line." It sets the starting point at the most recent date in your records where a customer made a purchase.
Today: Choosing this means you want to start tracking from right now, today. So any purchases made after today will count towards the CLV.
-- select date --: This would be an option if you want to pick a specific date to start from, other than today or the most recent date in your data.
Let's see how to interpret the results after we have run our model.
And then, the results consist of 2 tabs: CLV Insights and Details Tabs.
On the summary of repeat customers, we have:
the Total Repeat Customers: the customers came that keep returning (the loyal customers)
the Total Historical Amount: the past earnings from loyal customers
the Average Spend per Repeat Customer
the Average no. of Repeat Purchases: shows the customers' loyalty with the average number of repeat purchases
the Average Probability Alive Next 90 days: estimate the likelihood that a customer stays alive or active for their business in the next 90 days
the Predicted no. of Purchases next 90 days: the number of purchases you can expect the next 90 days based on our analysis
Predicted Amount Next 90 days: the revenue you can expect the next 90 days with our predicted amount feature
CLV Customer Lifetime Value: average revenue that one customer generated in the past and will generate in the future
The CLV Insights Tab shows some charts on the lifetime of customers.
The forecasted number of purchases chart estimates the number of purchases that are expected to be made by returning customers over a specific period.
The forecasted amount chart is a graphical representation of the projected value of purchases to be made by returning customers over a certain period.
Finally, the average alive probability chart illustrates the average probability of a customer remaining active for a business over time, assuming no repeat purchases.
Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
You have all the information in each column if you click on the link on the details tab.
The Details Tab within the Customer Lifetime Value Model offers an extensive breakdown of metrics for in-depth analysis. Each column represents a specific aspect of customer data that is pivotal to understanding and predicting customer behavior and value to your business. Below are the descriptions of the available columns:
amount_sum
Description: This column showcases the total historical revenue generated by an individual customer. By analyzing this data, businesses can identify high-value customers and allocate marketing resources efficiently.
amount_count
Description: Reflects the total number of purchases by a customer. This frequency metric is invaluable for loyalty assessments and can inform retention strategies.
repeated_frequency
Description: Indicates the frequency of repeated purchases, highlighting customer loyalty. This metric can be leveraged for targeted engagement campaigns.
customer_age
Description: The duration of the customer's relationship with the business, measured in days since their first purchase. It helps in segmenting customers based on the length of the relationship.
average_monetary
Description: Average monetary value per purchase, providing insight into customer spending habits. Businesses can use this to predict future revenue from a customer segment.
probability_alive
Description: Displays the current probability of a customer being active. A score of 1 means 100%, the customer is likely active, aiding in prioritizing engagement efforts.
probability_alive_7_30_60_90_365
Description: This column shows the probability of customers remaining active over various time frames without repeat purchases. It's critical for developing tailored customer retention plans.
predicted_no_purchases_7_30_60_90_365
Description: Predicts the number of future purchases within specific time frames. This forecast is essential for inventory planning and sales forecasting.
CVL_30_60_90_365
Description: Estimates potential customer value over different time frames, aiding in strategic financial planning and budget allocation for customer acquisition and retention.
In this given example, we have a snapshot of customer data from the CLV model. The model considers various unique aspects of customer behavior to predict future engagement and value. Let's analyze the key data points and what they signify in a non-technical way, while emphasizing the model’s ability to tailor predictions to individual customer behavior:
amount_sum: This customer has brought in a total revenue of $4,584.14 to your business.
amount_count: They have made 108 purchases, which shows a high level of engagement with your store.
repeated_frequency: Out of these purchases, 106 are repeat purchases, suggesting a strong customer loyalty.
customer_age: They have been a customer for 364 days, indicating a relatively long-term relationship with your business.
average_monetary: On average, they spend about $42.73 per transaction.
probability_alive: There’s an 85% to 86% chance that they are still actively engaging with your business, which is quite high.
probability_alive_7: Specifically, the probability that this customer will remain active in the next 7 days is about 44.48%.
Alex, with a remarkable 106 repeated purchases and a customer_age of 364 days, has shown a pattern of strong and consistent engagement. The average monetary value of their purchases is $42.73, contributing significantly to the revenue with a total amount_sum of $4,584.14. The current probability_alive is high, indicating Alex is likely still shopping.
However, even with this consistent past behavior, the probability_alive_7 drops to about 44.48%. It highlights a nuanced understanding of Alex's habits; a sudden change in their routine is notable, which is why the model predicts a more significant impact if Alex were to alter their shopping pattern even slightly.
On the other hand, we have Casey, who has made 2 purchases, with only 1 being a repeated transaction. Casey’s amount_sum is $185.93, with an average_monetary value of $84.44, and a customer_age of 135 days. Despite a high current probability_alive, the model shows a minimal decline to 83.73% in the probability_alive_7.
This slight decrease tells us that Casey's engagement is inherently more sporadic. The business doesn't expect Casey to make purchases with the same regularity as Alex. If Casey doesn't return for a week, it isn't alarming or out of character, as reflected in the gentle decline in their seven-day active probability.
The contrast in these profiles, painted by the CLV model, enables the business to craft distinct customer journeys for Alex and Casey. For Alex, it's about ensuring consistency and rewarding loyalty to maintain that habitual engagement. Perhaps an automated alert for engagement opportunities could be set up if they don't make their usual purchases.
For Casey, the strategy may involve creating moments that encourage repeat engagement, possibly through sporadic yet impactful touchpoints. Since Casey's behavior suggests openness to larger purchases, albeit less frequently, the focus could be on highlighting high-value items or exclusive offers that align with their sporadic engagement pattern.
The CLV model's behavioral predictions allow the business to personalize customer experiences, maximize the potential of each interaction, and strategically allocate resources to maintain and grow the value of each customer relationship over time. This bespoke approach is the essence of modern customer relationship management, as it aligns perfectly with the individualized tendencies of customers like Alex and Casey.
This detailed data is a treasure trove for businesses keen on data-driven decision-making. Here’s how to utilize the information effectively:
Custom Segmentation: Use customer_age
, amount_sum
, and average_monetary
to segment your customers into meaningful groups.
Detect Churners: Use probability_alive
to segment customers currently being active for non contractual business like eCommerce and Retail. A score of 0.1 means 10% probability the customer is active ("alive") for your business.
Targeted Marketing Campaigns: Leverage repeated_frequency
and probability_alive
columns to identify customers for loyalty programs or re-engagement campaigns.
Revenue Projections: The CVL_30_60_90_365
column helps in projecting future revenue and understanding the long-term value of customer segments.
Strategic Planning: Use predicted_no_purchases_7_30_60_90_365
to plan for demand, stock management, and to set realistic sales targets.
By engaging with the columns in the Details Tab, users can extract actionable insights that can drive strategies aimed at optimizing customer lifetime value. Each metric can serve as a building block for a more nuanced, data-driven approach to customer relationship management.
With the Binary Classification model, you can analyze feature importance in a binary column with two distinct values. This model also predicts likely outcomes based on various parameters. To achieve optimal results, we'll cover the basics of the Model Scenario, where you will select parameters related to your dataset and the model itself.
To run the scenario, you need to have a Target Feature, which must be a binary column. This means it should contain only two distinct values, such as Yes/No or 1/0.
In the next step, select the Model Features you wish to analyze. All features that fit into the model are selected by default, but you may deselect any features you do not want to use. Graphite Note automatically preprocesses your data for model training, excluding features that are unsuitable. You can view the list of excluded features and the reasons for their exclusion on the right side of the screen.
Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and Run Scenario.
The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.
To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. details on Model metrics can also be found on Accuracy Overview tab.
On the performance tab, you can explore six different views that provide insights related to model training and results: Key Drivers, Impact Analysis, Model Fit, Accuracy Overview, Training Results and Details.
Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.
The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.
The Model Fit Tab displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for binary values (1 or 0, Yes or No).
The Accuracy Overview tab features a Confusion Matrix to highlight classification errors, making it simple to identify if the model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about Classification Confusion Matrix in our Understanding ML section.
On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.
In the Training Results Tab, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for binary classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.
Details tab shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.
Once the model is trained, you can use it to predict future values, solve binary classification problems, and drive business decisions. Here are ways to take action with your Binary Classification model:
In Graphite Note, you can generate Actionable Insights using the Actionable Insights Input Form. Here, you can provide specific details about your business and objectives. This data is then combined with model training results (e.g., Binary Classification with Key Drivers) to produce a tailored analytics narrative aligned with your goals.
Actionable Insights leverage generative AI models to deliver these results. These insights are conclusions drawn from data that can be directly turned into actions or responses. You can access
Actionable Insights from the main navigation menu, provided you are subscribed to a Graphite Note plan that includes actionable insights queries.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
Create Notebook
You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model.
Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the Data Storytelling section.
With the Regression model, you can see which regression matches your dataset. To get the best possible results, we will go through the basics of the Model Scenario. In Model Scenario, you select parameters related to the dataset and model.
To run the model, you have to choose a Target Feature first. The target refers to the variable or outcome that the model aims to predict or estimate. In this case, it should be a numerical column.
The next thing to do is choose all the Model Features that you want to analyze. You can choose which feature the model will be analyzed. Some of them cannot fit for model and it shows the reason for each one.
Now you can finish the process and run the scenario.
And then you have all the information about the status with the best model used and the training time.
Let's see how to interpret the results after we have run our model.
First, you have all the performance, based on the best model and its accuracy.
And then, the results consist of 5 tabs: Feature Importance, Feature Impact, Model Fit, Training Result, and Details Tabs.
To see which feature has more impact on the target, we have the Feature Importance Tab. It shows how much each feature impacts the target and on the right more details on them.
The Feature Impact Tab represents a chart where you can see some features to see how it impacts the target. So you can select the feature that you want to analyze.
The Model Fit Tab contains a graph with actual and predicted values. You can see which one is correct and incorrect. With visualization, you can see how well or poorly your model is performing.
In the Training Results Tab, you have all the information about all the models to see which one is the best, trained on 80% of the dataset and tested on the 20% left, to have the accuracy.
In the end, a table with all the values related to the Model Fit Tab, with much more, can be found on the Details Tab.
After building and analyzing a predictive model using Graphite Note, the "Predict" function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making.
With General segmentation, you can find out hidden similarities between the data, such as the similarity between the price of the product or services provided to the purchasing history of the customers. It's an unsupervised algorithm that segments the data into groups, based on some kind of similarity between the numerical variables.
So let's see how you can run this model in Graphite. Firstly, you have to identify an ID column - that way you can identify the customer or product within the groups. After that, you have to select the numeric columns (features) from your dataset on which the segmentation will be based.
In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!
As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph.
For example, in the picture above, you can see that customers in Cluster0 have the highest average value of the Spending Score, unlike the customers in Cluster3.
Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.
You can also have a Cluster Visualization Tab that shows the link between two arguments and how they are distributed.
You can change the measures to see different cluster and their distribution.
The devil is in the details - details are important, so be conscientious and pay attention to the small things. Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling.
After you have created your notebook, we will go through some basic visualization tools (in case you missed how to create one, click ).
Data visualization gives us a clear idea of what the information means by giving it visual context through maps or graphs. This makes the data more natural for the human mind to comprehend, making it easier to identify trends, patterns, and outliers within large data sets.
Once you have created a notebook, to visualize we have to:
Select New visualization
Select a dataset; a CSV file you uploaded or a dataset obtained from a model you ran.
Select Visualization Type. Depending on what you want, you can select:
Select Add category; represents the abscissa of the coordinate system.
2. Select Add series; which represents the ordinate of the coordinate system. With a wide range of colors, you can choose different types of chart lines.
Select Add column; create a table from selected columns.
Select Add category; which represents the abscissa of the coordinate system.
Select Add series; which represents the ordinate of the coordinate system.
Select Add for Primary Measure
Select Add series; which represents the ordinate of the coordinate system.
You can create visualizations with different datasets - there is no restriction that all visualizations within a Notebook must be created from the same dataset.
Do you wonder if the changes that you’ve made in your business impacted new customers or do you want to understand the needs of your user base or identify trends? That and much more you can do with our new model, Customer Cohort Analysis.
A cohort is a subset of users or customers grouped by common characteristics or by their first purchase date. Cohort analysis is a type of behavioral analytics that allows you to track and compare the performance of cohorts over time.
With Graphite, you are only a few steps away from your Cohort model. Once you have selected your dataset, it is time to enter the parameters into the model. The Time/Date Column represents a time-related column.
After that, you have to select the Aggregation level.
For example, if monthly aggregation is selected, Graphite will generate Cohort Analysis with a monthly frequency.
Also, your dataset must contain Customer ID and Order ID/ Trx ID columns as required model parameters.
Last but not least, you have to select the Monetary (amount spent) variable, which represents the main parameter for your Cohort Analysis.
Additionally, you can break down and filter Cohorts by a business dimension (variable) which you select after you enable the checkbox.
That's it, your first Customer Cohort Analysis model is ready.
After you run your model, the first tab that appears is the Cohorts Tab.
Depending on the metric (the default is No of Customers), the results are presented through a graphic representation of her heatmap and the heatmap.
In the example above, groups of customers are grouped by year when they made their first purchase. Column 0 represents the number of customers per cohort (i.e. 4255 customers made their first purchase in 2018). Now we can see their activity year to year: 799 customers came back in 2019, 685 in 2020, and 118 in 2021.
If you switch your metric to Percentage, you will get results in percentages.
Let's track our Monetary column (in our case total amount spent per customer) and switch metric to Amount to see how much money our customers spend through the years.
As you can see above, customers that made their first order in 2018 have spent 46.25M, and 799 customers that came back in 2019 have spent 12.38M. I
In case you want to track the total amount spent through the years, switch metric to Amount (Cumulative).
Basically, we tracked the long-term relationships that we have for our given groups (cohorts). On the other hand, we can compare different cohorts at the same stage in their lifetime. For example, for all the cohorts, we can see how much the average revenue per customer two years after they made their first purchase: the average revenue per customer in the cohort from 2019 (12.02K) is almost half less than from 2018 (21.05K). Here is an opportunity to see what went wrong and make a new business strategy.
In case you broke down and filtered cohorts by a variable with less than 20 distinct values (parameter Repeat by in Model Scenario), for each value you will get a separate Cohort Analysis in the Repeat by Tab.
All the values related to the Cohorts and Repeat by Tabs, with much more, can be found on the Details Tab, in the form of a table.
Now it's your turn to track your customer's behavior, see when is the best time for remarketing, and how to improve customer retention.
Predict Dataset: Applying Model to a Specific Dataset
The "Predict Dataset" functionality in Graphite Note allows users to apply a successfully trained and deployed model to a specific dataset for generating predictions. This section provides a comprehensive guide on how to utilize this feature to make informed decisions.
Once a model has been trained and deployed within Graphite Note, it can be used to make predictions on a specific dataset. The dataset must have the same structure (columns) as the one the model was trained on. Graphite Note will add new columns to the dataset containing the model's predictions and scores for each row.
Select Dataset to Predict:
Navigate to the "Predict Dataset" section.
Choose the specific dataset you want to apply the model to. This dataset must have the same columns that the model was trained on.
Verify Dataset Structure:
Ensure that the selected dataset has the same structure (columns) as the trained model. This alignment is crucial for accurate predictions.
Apply Model to Dataset:
Select Dataset by double click
Graphite Note will add new columns to the selected dataset containing the model's predictions and scores for each row.
Analyze Predictions:
View the results within Graphite Note's user-friendly interface.
Analyze the predictions to understand trends, patterns, and insights that can guide decision-making.
Make Decisions:
Utilize the predictions to make informed decisions aligned with your business goals and strategies.
Export to Excel:
If desired, you can export the dataset with the added prediction columns to Excel for further analysis or sharing with stakeholders.
Column Alignment: The selected dataset must have the same columns as the ones the model was trained on. Mismatched columns may lead to incorrect predictions.
Real-time Application: Graphite Note's Predict Dataset feature provides real-time application of models to datasets, enabling quick insights.
Customizable Analysis: Tailor the analysis of predictions within Graphite Note to suit your specific needs and preferences.
The "Predict Dataset" feature in Graphite Note streamlines the process of applying trained models to specific datasets. By ensuring alignment between the model and dataset structure, users can generate accurate predictions that drive actionable insights.
The main idea behind Graphite Notebook is to do your own ; create various visualization with detailed descriptions, plot model results for better understanding, etc.
To create your notebook:
Go to Create New on Notebooks, or New Notebook when you are in the notebook list
Our intelligent system observes customers' shopping behavior without getting into the nitty-gritty technical details. It watches how recently each customer made a purchase, how often they come back, and how much they spend. The system notices patterns and groups customers accordingly.
This smart system doesn't need you to say, "Anyone who spends over $1000 is a champion." It figures out on its own who the champions are by comparing all the customers to one another.
When we talk about 'champion' customers in the context of RFM analysis, we're referring to those who are the most engaged, recent, and valuable. The system's approach to finding these champions is quite intuitive yet sophisticated.
Here's how it operates:
Observation: Just like a keen observer at a social event, the system starts by watching—collecting data on when each customer last made a purchase (Recency), how often they've made purchases over a certain period (Frequency), and how much they've spent in total (Monetary).
Comparison: Next, the system compares each customer to every other customer. It looks for natural groupings—clusters of customers who exhibit similar purchasing patterns. For example, it might notice a group of customers who shop frequently, no matter the amount they spend, and another group that makes less frequent but more high-value purchases.
Group Formation: Without being told what criteria to use, the system uses the data to form groups. Customers with the most recent purchases, highest frequency, and highest monetary value start to emerge as one group—these are your potential 'champions.' The system does this by measuring the 'distance' between customers in terms of RFM factors, grouping those who are closest together in their purchasing behavior.
Adjustment: The system then iterates, refining the groups by moving customers until the groups are as distinct and cohesive as possible. It's a process of adjustment and readjustment, seeking out the pattern that best fits the natural divisions in the data.
Finalization: Once the system settles on the best grouping, it has effectively ranked customers, identifying those who are the most valuable across all three RFM dimensions. These are your 'champions,' but the system also recognizes other groups, like new customers who've made a big initial purchase or long-time customers who buy less frequently but consistently.
By using this method, the system takes on the complex task of understanding the many ways customers can be valuable to a business. It provides a nuanced view that goes beyond simple categorizations, recognizing the diversity of customer value. The result is a highly tailored strategy for customer engagement that aligns perfectly with the actual behaviors observed, allowing businesses to interact more effectively with each segment, especially the 'champions' who drive a significant portion of revenue.
Here’s why this machine learning approach is more powerful than manual labeling:
Adaptive Learning: The system continuously learns and adapts based on actual behavior, not on pre-set rules that might miss the nuances of how customers are interacting right now.
Time Efficiency: It saves you a mountain of time. No more going through lists of transactions manually to score each customer. The system does it instantly.
Personalized Grouping: Because it’s based on actual behavior, the system creates groups that are tailor-made for your specific customer base and business model, rather than relying on broad, one-size-fits-all categories.
Scalability: Whether you have a hundred customers or a million, this smart system can handle the job. Manual scoring becomes impractical as your customer base grows.
Unbiased Decisions: The system is objective, based purely on data. There’s no risk of human bias that might categorize customers based on assumptions or incomplete information.
In essence, this smart approach to customer grouping helps businesses focus their energy where it counts, creating a personalized experience for each customer, just like a thoughtful host at a party who knows exactly who likes what. It’s about making everyone feel special without having to ask them a single question.
In the RFM model in Graphite Note, the intelligent system categorizes customers into segments based on their Recency (R), Frequency (F), and Monetary (M) values, assigning scores from 0 to 4 for each of these three dimensions. With five scoring options for each RFM category (including the '0' score), this creates a comprehensive grid of potential combinations—resulting in a total of 125 unique segments (5 options for R x 5 options for F x 5 options for M = 125 segments).
This segmentation allows for a high degree of specificity. Each customer falls into a segment that accurately reflects their interaction with the business. For example, a customer who recently made a purchase (high Recency), buys often (high Frequency), and spends a lot (high Monetary) could fall into a segment scored as 4-4-4. This would indicate a highly valuable 'champion' customer.
On the other hand, a customer who made a purchase a long time ago (low Recency), buys infrequently (low Frequency), but when they do buy, they spend a significant amount (high Monetary), might be scored as 0-0-4, placing them in a different segment that suggests a different engagement strategy.
By scoring customers on a scale from 0 to 4 across all three dimensions, the business can pinpoint exact customer profiles. This precision allows for highly tailored marketing strategies. For example, those in the highest scoring segments might receive exclusive offers as a reward for their loyalty, while those in segments with room for growth might be targeted with re-engagement campaigns.
The use of 125 segments ensures that the business can differentiate not just between generally good and poor customers, but between various shades of customer behavior, tailoring approaches to nurture the potential value of each unique segment. This granularity facilitates nuanced understanding and actionability for marketing, sales, and customer relationship management.
Wouldn't be great to tailor your marketing strategy regarding identified groups of customers? That way, you can target each group with personalized offers, increase profit, improve unit economics, etc.
Recency - how long it’s been since a customer bought something from you or visited your website
Frequency - how often a customer buys from you, or how often he visits your website
Monetary - the average spend of a customer per visit, or the overall transaction value in a given period
Let's go through the RFM analysis inside Graphite Note. The dataset on which you will run your RFM Model must contain a time-related column, given that this report studies customer behavior over some time.
We need to distinguish all customers, so we need an identifier variable like Customer ID.
If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field.
Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).
That's it, you are ready to run your first RFM Model.
On the RFM Scores Tab, we have an overview of the customers and their scores:
Then you have a ranking of each RFM segment (125 of them) represented in a table.
And finally, a chart showing the number of customers per RFM score.
lost customer
hibernating customer
can-not-lose customer
at-risk customer
about-to-sleep customer
need-attention customer
promising customer
new customer
potential loyal customer
loyal customer
champion customer.
All information related to these groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the RFM Analysis Tab.
There is also a table at the end to summarize everything.
According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups:
lost
lapsing
average activity
active
very active.
In the Recency Tab, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.
As Frequency is defined as the total number of purchases, customers can buy:
very rarely
rarely
regullary
frequently
very frequently.
Monetary is defined as the amount of money the customer spent, so the customer can be a :
very low spender
low spender
medium spender
high spender
very high spender.
All the values related to the first five tabs, with much more, can be found on the Details Tab, in the form of a table.
The RFM model columns outlined in your system provide a structured way to understand and leverage customer purchase behavior. Here’s how each column benefits the end user of the model:
Monetary: Indicates the total revenue a customer has generated. This helps prioritize customers who have contributed most to your revenue.
Avg_monetary: Shows the average spend per transaction. This can be used to gauge the spending level of different customer segments and tailor offers to match their spending habits.
Frequency: Reflects how often a customer purchases. This can inform retention strategies and indicate who might be receptive to more frequent communication.
Recency: Measures the time since the last purchase. This can help target re-engagement campaigns to customers who have recently interacted with your business.
Date_of_last_purchase & Date_of_first_purchase: These dates help track the customer lifecycle and can trigger communications at critical milestones.
Customer_age_days: The duration of the customer relationship. Long-standing customers might benefit from loyalty programs, while newer customers might be encouraged with welcome offers.
Recency_cluster, Frequency_cluster, and Monetary_cluster: These categorizations allow for segmentation at a granular level, helping customize strategies for groups of customers who share similar characteristics.
Rfm_cluster: This overall grouping combines recency, frequency, and monetary values, offering a holistic view of a customer's value and engagement, essential for creating differentiated customer journeys.
Recency_segment_name, Frequency_segment_name, and Monetary_segment_name: These descriptive labels provide intuitive insights into customer behavior and make it easier to understand the significance of each cluster for strategic planning.
Fm_cluster_sum: This score is a combined metric of frequency and monetary clusters, useful in prioritizing customers who are both frequent shoppers and high spenders.
Fm_segment_name and Rfm_segment_name: These labels offer a quick reference to the type of customer segment, simplifying the task of identifying and applying targeted marketing actions.
Seeking assurance about the model's accuracy and effectiveness? Here's how you can address these concerns:
Validation with Historical Data: Show how the model’s predictions align with actual customer behaviors observed historically. For instance, demonstrate how high RFM scores correlate with customers who have proven to be valuable.
Segmentation Analysis: Analyze the characteristics of customers within each RFM segment to validate that they make sense. For example, your top-tier RFM segment should clearly consist of customers who are recent, frequent, and high-spending.
Control Groups: Create control groups to test marketing strategies on different RFM segments and compare the outcomes. This can validate the effectiveness of segment-specific strategies suggested by the model.
A/B Testing: Implement A/B testing where different marketing approaches are applied to similar customer segments to see which performs better, thereby showcasing the model's utility in identifying the right targets for different strategies.
Benchmarking: Compare the RFM model’s performance against other segmentation models or against industry benchmarks to establish its effectiveness.
In this report, we want to divide customers into returning and new customers (this is the most fundamental type of ). The new customers have made only one purchase from your business, while the returning ones have made more than one.
Let’s go through their basic characteristics.
New customers are:
forming the foundation of your customer base
telling you if your marketing campaigns are working (improving current offerings, what to add to your repertoire of products or services)
while returning customers are:
giving you feedback on your business (if you have a high number of returning customers it suggests that customers are finding value in your products or service)
saving you a lot of time, effort, and money.
Let's go through the New vs returning customer analysis inside Graphite. The dataset on which you will run your model must contain a time-related column.
Since the dataset contains data for a certain period, it's important to choose the aggregation level.
For example, if weekly aggregation is selected, Graphite will generate a new vs returning customers dataset with a weekly frequency.
It is necessary to contain data such as Customer ID
Additionally, if you want, you can choose the Monetary (amount spent) variable.
With Graphite, compare absolute figures and percentages, and learn how many customers you are currently retaining on a daily, weekly, or monthly basis.
Depending on the aggregation level, you can see the number of distinct and returning customers detected in the period on the New vs Returning Tab.
For example, in December 2020, there were a total of 2.88k customers, of which 1.84K were new and 1.05K returning. You can also choose a daily representation that is more precise.
If you are interested in retention, the percentage of your returning customers, through a period, use the Retention % Tab.
Last but not least, on the Details Tab, you can find a detailed table where you can see all relevant values which were used for the above results.
Getting started guide
This section offers a comprehensive overview of testing and command usage, including detailed instructions. If you're new to the API, we recommend starting with the quick start guide. It's a straightforward solution designed to validate your setup and ensure you begin on the right track. We value your feedback and strive to provide the best experience possible. If you encounter any challenges with commands that should be included in the API or its documentation, our dedicated support team is ready to assist you. Feel free to reach out to us via our in-app chat or by emailing , and we'll be more than happy to guide you in the right direction or incorporate any necessary updates.
Start your interaction with Graphite API
To interact with the Graphite Note API and perform predictions using a specific model, you need to make a POST request to the API endpoint. The following code snippet demonstrates how to make such a request using - command line tool and a library that allows you to transfer data using various protocols, including HTTP, HTTPS.
Often companies spend a lot of time managing items/entities that have a low contribution to the profit margin. Every item/entity inside your shop does not have equal value - some of them cost more, some are used more frequently, and some are both. This is where the steps in, which helps companies to focus on the right items/entities.
ABC analysis is a classification method in which items/entities are divided into three categories, A, B, and C.
Category A is typically the smallest category and consists of the most important items/entities ('the vital few'),
while category C is the largest category and consists of least valuable items/entities ('the trivial many').
So far this is the simplest model, i.e. only 2 columns are needed in your dataset.
You have to identify:
the ID column
the numeric column
An ID column in your dataset is usually a Product ID or name, SKU, etc. Based on the selected values, the data will be grouped by that column.
After that, you have to select the numeric column (feature) which represents the value of the ID column (for example, product/customer revenue or the number of sold units,...).
If we take a look at the ABC Summary Tab, we can see two pie charts - on the first one we can see the percentage of items in each category, while on the other one, we can see the total value (revenue) of each category.
In the picture above, we can see that 31.99% of the items belong to category A and they represent 68.88% of the total value, meaning the biggest profit comes from the items in category A!
Finally, you have all the information on each entity on a table in the Details Tab.
There is a long list of benefits from including ABC analysis in your business, such as improved inventory optimization and forecasting, reduced storage expenses, strategic pricing of the products, etc. With Graphite, all you have to do is upload your data, create the desired model, and explore the results.
JSON response structures for various models
Binary classification, Regression, Multiclass classification
The following models have similar JSON structure: Binary classification, Logistic regression, Multiclass classification
The JSON structure consists of a root object with a key-value pair, where the key is "data" and the value is an object containing two keys: "columns" and "data".
"data"
: This key maps to an array of data objects. Each data object within the array represents a specific entry or prediction result. In this example, there are two data objects.
Each data object contains key-value pairs representing the column names and their corresponding values. For example, the first data object has the values "NO", "API", "bing", 0.935, 0.065, and "1" for the keys "Label", "Lead Origin", "Lead Source", "Score_NO", "Score_YES", and "Total Time Spent on Website", respectively.
The second data object follows a similar pattern, with different values for each key.
"columns"
: This key maps to an array of column names. In this example, the array contains three column names: "Total Time Spent on Website", "Lead Origin", and "Lead Source". These column names define the fields or attributes associated with each data entry.
Timeseries model
The JSON structure consists of a root object with a key-value pair, where the key is "data" and the value is an array. The array contains three elements representing different pieces of information. Last element in data array(sequenceID) is directly related to "sequenceID" sent in request, and the number of other elements depends on date difference sent in request.
"date"
and "predicted"
: The first two elements within the "data" array represent specific dates and their corresponding predicted values. Each element is an object with two key-value pairs: "date" and "predicted". The "date" represents a specific date and time in ISO 8601 format, and the "predicted" holds the corresponding predicted value for that date. In the given example, the predicted values for the dates "2023-04-17T00:00:00.000Z" and "2023-04-18T00:00:00.000Z" are 40.4385672276 and 41.1831442568, respectively.
"sequenceID"
: The third element within the "data" array represents the sequence ID. It is an object with a single key-value pair: "sequenceID" and its corresponding value. In this example, the sequence ID is represented as "A". If your dataset includes multiple time series sequences, you should choose a field that uniquely identifies each sequence (e.g., product ID, store ID, etc.). This will allow Graphite Note to generate independent forecasts for each individual time series. We don't allow fields with too many sequences (unique values) here.
This JSON structure is used to convey the predicted values for different dates in a Tmeseries model. Each date is associated with its predicted value, and the sequence ID provides additional context or identification for the timeseries data.
Now we move to the tricky part, data preprocessing! We will rarely come across high-quality data - for the model to give the best possible results, we must do some and transformation. What to do with the missing values? You can either remove them or replace them with the corresponding value, such as the mean value or prediction. For example, let's suppose you have chosen Age and Height as numeric columns. The values of the variable Age range between 10 and 80, while the Height is between 100 and 210. The algorithm can give more importance to the Height variable, because it has higher values than Age - in case you decide to transform/scale your data, you can either standardize or normalize it.
Let's see how to interpret the results after we have run our model. The results consist of 5 tabs: , , , and Tabs.
Now we will go through Model Results, which consists of 3 tabs: , , and .
We are going to see different metrics such as the , the , the , and the , but there are 3 more metrics: the Average Order Value, the Cumulative Average Order Value, and the Average Revenue per Customer.
Name your notebook. Additionally, you can add a description to your notebook. Also, you can select an existing (to connect your notebook with datasets and models) or create a new one.
You can now easily add and delete different text and visualization blocks (). If you are not satisfied with the block position, you can easily move it. To speed things up, you can even clone each block. Your first Notebook is created and ready for exploring.
Model identifies customers based on three key factors:
As we now know how to run in Graphite Note, let's go through the Model Results. The results consist of 7 tabs: , , , , , , and Tabs. All results are visualized because a visual summary of information makes it easier to identify patterns than looking through thousands of rows.
RFM model analysis ranks every customer in each of these three categories on a scale of 0 (worst) to 4 (best). After that, we assign an RFM score to each customer, by concatenating his numbers for , , and value. Depending upon their RFM score, customers can be segregated into the following categories:
In the Frequency Tab, you can track down the same behavior of the related groups, as with the Tab.
In the Monetary Tabs, you can track down the same behavior of the related groups, as with the Tab.
The RFM Matrix Tab represents a matrix, showing the number of customers, monetary sum and average, average frequency, and average recency (with breakdown by , , and segments).
The model results consist of 4 tabs: , , , and Tab.
The results in the Revenue New vs Returning Tab depend on the Model Scenario: if you have selected a monetary variable in the , you can observe her behavior, depending on the new and returning customers.
Since ABC inventory analysis divides items into 3 categories, let's analyze these categories by checking the Model Results. The results consist of 3 tabs: , , and Tabs.
The , also called Pareto analysis, is based on the Pareto principle, which says that 80% of the results (output) come from 20% of the efforts (input). The Pareto Chart is a combination of a bar and a line graph - it contains both bars and lines, where each bar represents an item/entity in descending order, while the height of the bar represents the value of the item/entity. The curved orange line represents the cumulative percentage of the item/entity.
Graphite enforces rate limits for API requests to ensure fair usage and prevent abuse. The system utilizes two levels of rate limiting: global and tenant-specific.
The system monitors overall API traffic to count the number of requests made within the last minute. If this count exceeds the configured global rate limit, further API requests are denied.
Additionally, the system tracks API usage on a per-tenant basis. If the count of API requests made by the current tenant within the last minute surpasses the specified rate limit, further API requests are denied.
The rate limits are configured as followed:
Note: Rate Limit Exceeded
When the API rate limit is reached, the system will deny further requests, and the API response should include the HTTP status code
429 (Too Many Requests)
. This status code indicates that the client has made too many requests within a specified time frame. It is essential for clients to handle this response code gracefully by adjusting their request frequency or implementing backoff strategies
JSON request structures for various models
Binary classification, Regression, Multiclass classification
The following models have similar JSON structure: Binary classification, Logistic regression, Multiclass classification
JSON structure represents a data object with a key "data" mapping to an array called "predict_values". The "predict_values" array contains multiple elements, each representing a set of data. Each set of data is represented as an array of objects. Each object within the array represents a key-value pair, where required "alias" is the key and required "selectedValue" is the corresponding value. The keys "Lead Source", "Lead Origin", and "Converted" are common in each object, but their values differ, representing different attributes or properties of the data. Timeseries model
JSON structure represents a data object with a key "data" mapping to an object called "predict_values". The "predict_values" object contains three key-value pairs. The required keys are "startDate", "endDate", and "sequenceID", and their corresponding values are "2023-04-17", "2023-04-18", and "A" respectively.
Receiving a response
Upon successful execution of the API request, you will receive a response containing the prediction results or any relevant information based on the model and data provided. The format and structure of the response will vary depending on the specific model and endpoint used.
This is a Timeseries period prediction example:
The response contains a single key-value pair:
"data"
: The key "data"
maps to an array of objects that represents the prediction results or relevant information.
Make sure to handle the response appropriately in your code to process the prediction results or handle any potential errors returned by the API. More about structure on next section.
Necessary information to make a request to an API endpoint for the Graphite Note application
The base URL for the API endpoint is:
Replace [model-code]
in the URL with the code of the specific model you want to use for predictions.
The easiest way of getting the [model-code]
is to edit particular model on model's list page. Model code is located in the ID section of the model's settings tab.
The request requires the following headers to be included:
Authorization
: This header should be set to "Bearer [token]
". Replace [token]
with your unique token. The token can be found by accessing the account info page in the Graphite Note app, under the section displaying your current plan information.
Content-Type
: This header should be set to "application/json" to indicate that the request payload is in JSON format.
We have prepared 12 distinct demo datasets that you can upload and use in your Graphite Note account. These demo datasets include dummy data from a variety of business scenarios and serve as an excellent starting point for building and running your initial Graphite Note machine learning models. Instructions on how to proceed with demo dataset will be provided in the following pages.
Create a Regression model on Demo Housing Prices dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select a dataset you want to use to create machine learning model. In this case we will select Housing-Prices dataset to create a "Regression Analysis" on house price historical data.
3. Once selected, the demo dataset will load directly to your account. The Dataset view will automatically open.
4. Adjust your dataset options on the Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore the dataset details on the Summary tab.
5. To create a new model in the Graphite Note main menu click on "Models"
6. You will get list of available models. Click on a "New Model" to create new one.
7. Select the model type from our templates. In our case, we will select "Regression" by double clicking on its name.
8. Select the dataset you want to use to produce the model. We will use "Demo-Housing-Prices.csv"
9. Name your new model. We will call it "Regression on Demo-Housing-Prices"
10. Write the description of the model and select a tag. If you want to, you can also create new a tag from pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up a Regression Model, firstly, you need to define the "Target Feature". That is a numeric column from your dataset that you'd like to make predictions about. In the case of Regression on Demo Housing Prices, the dataset target feature is "Price" column.
13. Click "Next" to get the list of model features that will be included into scenario. Model relies upon each column (feature) to make accurate predictions. When training model we will calculate which of the features are most important and behave as Key Drivers.
14. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.
15. Wait for few moments and Voilà! Your Regression model is trained. Click on the "Performance" tab to get model insights and view the Key Drivers.
16. Explore the Regression Model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how model is trained and set up.
17. If you want to take your model into action. Click on "Predict" tab in the main model menu.
18. You can produce your own What-If analysis based on existing training results. You can also import fresh CSV dataset with data model will use to make predictions on the target column. In our case, that is "Price". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.
19. Use your model often to predict future behaviour, and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!
Create General Segmentation on Demo Mall Customers dataset
1. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".
2. Select the dataset you want to use to create your machine learning model. In this case, we will select "Mall Customers dataset", to create General Segmentation analysis on customer engagement data.
3. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.
4. Adjust your dataset options on the Settings tab. Click the Columns tab to view the list of available columns, with their corresponding data types. Explore the dataset details on Summary tab.
5. To create a new model in the Graphite Note main menu, click on "Models"
6. You will get list of available models. Click on the "New Model" to create a new one.
7. Select the model type from our templates. In our case, we will select "General Segmentation" by double clicking on its name.
8. Select the dataset you want to use to produce model. We will use "Demo-Mall-Customers.csv"
9. Name your new model. We will call it "General Segmentation on Demo-Mall-Customers".
10. Write a description of the model and select a tag. If you want to, you can also create new tag from the pop-up "Tags" window that will appear on the screen.
11. Click "Create" to create your demo model environment.
12. To set up your General Segmentation model, firstly, you need to define "Feature" columns. That is numeric column (or columns) from your dataset on which segmentation would be based. In the case of General Segmentation on Mall Customers dataset, the numeric feature will be "Age", Annual Income" and "Spending Score" columns.
13. To start training the model click "Run Scenario". This will train your model based on the uploaded dataset.
14. Wait for few moments and Voilà! Your General Segmentation model is trained. Click on "Results" tab to get model insights and explore segmentation clusters.
15. Navigate over different tabs to get insights from high level "Cluster Summary" to "By Cluster" charts and tables.
16. Use "Cluster Visualisations" to view the scatter plot visualisations of cluster members.
Predictive Ads Performance is a process where businesses forecast the effectiveness of their advertising campaigns, particularly focusing on metrics like clicks, conversions, or engagement. This task typically involves regression or classification models, depending on the specific goals of the prediction.
Dataset Essentials for Predictive Ads Performance
A comprehensive dataset for Predictive Ads Performance focusing on predicting clicks should include:
Date/Time: The timestamp for when the ad was run.
Ad Characteristics: Details about the ad, such as format, content, placement, and duration.
Target Audience: Information about the audience targeted by the ad, like demographics, interests, or behaviors.
Spending: The amount spent on each ad campaign.
External Factors: Any external factors that might influence ad performance, such as market trends or seasonal events.
Historical Performance Data: Past performance metrics of similar ads.
An example dataset for Predictive Ads Performance with the target column being clicks might look like this:
2021-01-01
A101
Video
18-25
$500
Stable
New Year
300
2021-01-08
A102
Image
26-35
$750
Growing
None
450
2021-01-15
A103
Banner
36-45
$600
Declining
None
350
2021-01-22
A104
Video
46-55
$800
Stable
None
500
2021-01-29
A105
Image
18-25
$700
Growing
None
600
Target Column: The Clicks column is the primary focus, as the model aims to forecast the number of clicks each ad will receive.
Steps to Success with Graphite Note
Data Collection: Compile detailed data on past ad campaigns, including spending, audience, and performance metrics.
Feature Engineering: Identify and create features that are most indicative of ad performance.
Model Training: Use Graphite Note, Regression Model, to train a model that can predict the number of clicks based on the ad characteristics and other factors.
Model Evaluation: Test the model's accuracy and refine it for better performance.
Benefits of Predictive Ads Performance
Optimized Ad Spending: Predict which ads are likely to perform best and allocate budget accordingly.
Targeted Campaigns: Tailor ads to the audience segments most likely to engage.
Performance Insights: Gain insights into what makes an ad successful and apply these learnings to future campaigns.
Accessible Analytics: Graphite Note's no-code platform makes predictive analytics accessible, enabling businesses to leverage AI for ad performance prediction without needing deep technical expertise.
In summary, Predictive Ads Performance is a valuable tool for businesses looking to maximize the impact of their advertising efforts. With Graphite Note, this advanced capability becomes accessible, allowing for data-driven decisions in ad campaign management.
The Predict Cross Selling problem is a common challenge faced by businesses looking to maximize their sales opportunities by identifying additional products or services that a customer is likely to purchase. This predictive model falls under the multi-class classification category, where the objective is to predict the likelihood of a customer buying various products, based on their past purchasing behavior and other relevant data.
Dataset Essentials for Cross Selling
To effectively train a machine learning model for cross selling, you need a well-structured dataset that includes:
Customer Demographics: Information like age, gender, and income, which can influence purchasing decisions.
Purchase History: Detailed records of past purchases, indicating which products a customer has bought.
Engagement Metrics: Data on customer interactions with marketing campaigns, website visits, and other engagement indicators.
Product Details: Information about the products, such as category, price, and any special features.
A typical dataset might look like this:
1001
28
M
50000
Yes
No
Yes
...
Product2
1002
34
F
65000
No
Yes
No
...
Product3
1003
45
M
80000
Yes
Yes
Yes
...
Product4
1004
30
F
54000
No
No
Yes
...
Product1
1005
50
M
62000
Yes
No
No
...
Product2
Target Column: The Target_Product column is crucial as it represents the product that the model will predict the customer is most likely to purchase next.
Steps to Success with Graphite Note
Data Collection: Gather comprehensive, clean, and well-structured data.
Feature Selection: Identify the most relevant features that could influence the model's predictions.
Model Training: Utilize Graphite Note's intuitive platform to train your multi-class classification model.
Evaluation and Iteration: Continuously assess and refine the model for better accuracy and relevance.
The Advantage of Predict Cross Selling
Enhanced Customer Experience: By understanding customer preferences, businesses can offer more personalized recommendations.
Increased Sales Opportunities: Identifying potential cross-sell products can significantly boost sales.
Data-Driven Decision Making: Removes guesswork from marketing and sales strategies, relying on data-driven insights.
Accessibility: With Graphite Note, even non-technical users can build and deploy these models, making advanced analytics accessible to all.
In conclusion, the Predict Cross Selling model is a powerful tool in the arsenal of any business looking to enhance its sales strategy. With Graphite Note, this complex task becomes manageable, allowing businesses to leverage their data for maximum impact.
There are plenty of free sources to find free datasets for machine learning.
Here is a list of some of the most popular ones.
For each dataset, it is necessary to determine its quality. Several characteristics describe high-quality data, but it is essential to point out accuracy, reliability, and completeness. Every high-quality data should be precise and error-free. Otherwise, your data is misleading and inefficient. If your data is not complete, it is harder to use because of the lack of information. What if your data is ambiguous or vague? You cannot trust your data; it's unreliable.
By googling stuff like free datasets for machine learning, time-series dataset, classification dataset, etc., you see many links to different sources. But which of them includes high-quality data? We will list a few sources, but it is essential to know that among them, there are also data that have their drawbacks. Therefore, you have to be familiar with the characteristics of a good dataset.
Kaggle is a big data science competition platform for predictive modeling and analytics. There are plenty of datasets you can use to learn artificial intelligence and machine learning. Most of the data is accurate and referenced, so you can test or improve your skills or even work on projects that could help people.
Each dataset has its usability score and description. Within the dataset, there are various tabs such as Tasks, Code, Discussions, etc. Most datasets are related to different projects, so you can find other trained and tested models on the same datasets. On Kaggle, you can find a big community of data analysts, data scientists, and machine learning engineers who can evaluate your work and give you valuable tips for further development.
The UCI Machine Learning Repository is a database of high-quality and real-world datasets for machine learning algorithms. Datasets are well known in terms of exciting properties and expected good results; they can be an example of valuable baselines for comparisons. On the other hand, the datasets are small and already pre-processed.
GitHub is one of the world’s largest communities of developers. The primary purpose of GitHub is to be a code repository service. In most cases within a project, we can find its application on some datasets; you will need to spend a little more time to find the wanted dataset, but it will be worth it.
data.world is a large data community where people discover data and share analysis. Inside almost every project, there are some available datasets. When searching, you must be very precise to get the desired results.
Of course, there are many more sources, depending on your need. For example, if you need economic and financial datasets, you can visit World Bank Open Data, Yahoo Finance, EU Open Data Portal, etc.
Once you have found your dataset, it’s Graphite time; run several models and create various reports using visualizations and tables. With Graphite, it's easier to make business decisions. Maybe you are just a few clicks away from the turning point of your career.
This section of the Graphite Note user documentation will guide you through the process of merging multiple datasets into one.
Merging datasets allows you to combine data from different sources or related data for more comprehensive analysis.
To begin the process, navigate to the main menu and select the "Merge Dataset" option. This will open a new window where you can start the merging process.
In the new window, you will see fields to enter the name and description of your new merged dataset. This helps you identify the purpose of the merged dataset for future reference. You can also add optional tags to further categorize your dataset.
Next, you will select the first dataset you want to merge from the dropdown menu. Repeat this step to select the second dataset.
After selecting your datasets, choose the type of join you want to perform: inner, left, right, or outer. The type of join determines how the datasets are combined based on the values in the key columns.
Then, select the key columns on which to merge the datasets. These are the columns that the datasets have in common and will be used to align the data.
Now, you will choose which columns you want to include in your new merged dataset. You can select columns from either or both of the original datasets.
Once you've selected your columns, you can use the "Test This Merge" button to preview the merged rows. This allows you to check that the datasets are merging as expected before finalizing the process.
If you're happy with the preview of the merged dataset, click the "Create" button to finalize the merge. Your new merged dataset will now be available for use in your Graphite Note projects.
Remember, merging datasets is a powerful tool for combining and analyzing data in Graphite Note. By following these steps, you can easily merge datasets to gain new insights from your data.