# Welcome Welcome to the Graphite Note documentation portal. These guides will show you how to predict, visualize, and analyze your data using machine learning with no code. Graphite Note is a powerful tool designed to democratize the power of data analysis and machine learning, making it accessible to individuals and teams of all skill levels. Whether you're a marketer looking to segment your audience, a sales team predicting lead conversions, or an operations manager forecasting product demand, Graphite Note is your go-to platform. The platform is built with a user-friendly interface that allows you to connect your data, generate predictive models, and share your results with just a few clicks. It's not just about making predictions; it's about understanding them. With Graphite Note's built-in prescriptive analytics and data storytelling feature, you can transform complex data into meaningful narratives that drive strategic actions. This documentation will guide you through every step of the process, from setting up your account to making your first prediction. So let's dive in and start exploring the power of no-code machine learning with Graphite Note. For every lexical terms you can check on the [machine learning glossary](https://graphite-note.com/comprehensive-ai-and-machine-learning-glossary). # Sign up Welcome to Graphite Note App! To get started, [create an account ](https://app.graphite-note.com/#/signup)or [log in](https://app.graphite-note.com/#/login) if you already have one. With an active account, you can upload datasets and create machine learning models in just minutes. Refer to our documentation to learn more about machine learning and how to apply ML models to your data to boost your business, uncover insights, and make predictions.

# Subscription Plans Graphite Note offers two options tailored to different user needs: Sandbox and Enterprise. **Sandbox** is a learning and exploration environment designed for teams who want to experiment with no-code predictive templates, understand the platform interface, and validate early ideas. It is not intended for production use, decision impact analysis, or enterprise workloads. Causal models, uplift analysis, and prescriptive activation tools are not available in Sandbox. You can get started in Sandbox directly from the Graphite Note website. **Enterprise** is a full Decision Intelligence Suite combining the platform with hands-on delivery. It includes predictive and causal modeling, prescriptive playbooks and activation, dedicated high-performance infrastructure, custom model and user configuration, onboarding and managed services, and a dedicated account and success team. Enterprise plans are tailored based on the scope of decisions, data complexity, and delivery objectives. ISO 27001 and ISO 42001 certified. Graphite Note also offers special discounts for academic institutions and startups. Many teams choose to start with a focused pilot or a single decision use case to prove impact before expanding into a broader Enterprise engagement. Sample strategy materials are available on request. To explore the platform, you can start in Sandbox or book a demo with our team to discuss an Enterprise engagement. To help you choose the best plan, you can start with a 14-day [free trial](https://app.graphite-note.com/#/signup) or [schedule a demo](https://graphite-note.com/book-demo/) with our data science experts. For the latest details on pricing and plan features, please visit our official [Graphite Note Pricing Page](https://graphite-note.com/no-code-machine-learning-pricing/). # Profile information You can access the Profile information setup by clicking on the small user profile icon located in the top-right corner of the interface. This icon typically displays the first letters of your name and surname (e.g., “CP” for Chris Parker). Once clicked, a dropdown menu appears where you can select Profile to view or edit your profile settings.

This page is where users can manage their personal information and customize their profile settings: * **User Code:** A unique identifier for the user (e.g., a system-generated code like “e056ac433952”). * **Name:** Displays the user’s name. This can be updated to reflect the preferred name. * **AI Generated Content Language:** Lets the user select the language for AI-generated content (e.g., “English”). All AI generated content related to different models in Graphite note will be generate in selected language. When the language is changed, you need to rerun the model to generate content in the currently selected language. * **Select Avatar:** Allows users to choose a personalized avatar color to visually represent their profile. * **Email:** Shows the registered email address of the user. * **Role:** Indicates the user’s role in the system (e.g., “Administrator” or “Viewer”). * **Password:** Offers an option to change the user’s password with a Change Password button.

# Account information You can access the Account information page by clicking on the small wheel icon (⚙️) in the top-right corner of Graphite Note, and then selecting the Account info drop-down item. This page features your account information, including active plan information and plan usege statistics along with information on different fatures included in the plan.

*** ### Assistance A toggle option to enable assistance from the Graphite Note Support Team. By enabling this feature, users grant the support team access to their datasets, models, and notebooks. This facilitates quicker issue resolution and support for content creation. Assistance option is enabled by default. ### Current Plan Displays the subscription plan the account is currently on. Click on the Contact Sales button if you wish to upgrade, downgrade, or discuss your subscription options. ### API Token A secure token is shown (on the account info page it is partially masked for privacy) that users can use to integrate Graphite Note with other applications or systems. It provides a unique, secure key for API access, enabling advanced integration with external systems in a different ways: * **Dataset API** - enables users to easily populate their datasets by sending data directly to Graphite Note, ensuring seamless data integration. * **Prediction API** - allows users to request predictions based on attributes they provide, leveraging Graphite Note’s machine learning models to generate accurate business forecasts. * **Model Results API** - lets users fetch the outputs of their trained models in a structured, paginated format. This is especially useful for viewing or processing large prediction result sets in batches. Click the eye icon next to the token to view the full API token. More information about API Token usage you can find in the [REST API section](/graphite-note-documentation/rest-api/api-introduction).

*** ### Plan usage Gives a quick overview of your plan limits, usage, and enabled features: * Max Data Rows in Database: Shows the total data rows allowed (e.g., 10M), how many are used (e.g., 2.28M), and how many are left (e.g., 7.72M). A progress bar helps you track usage. * Max Number of Users: Displays the total users allowed (e.g., 10), how many are used (e.g., 1), and available slots (e.g., 9). You can invite new users with the Invite User link. * Available Dataset Plans: Lists supported dataset types like CSV, database integrations, model data, merged datasets, and BigQuery. * Additional Features: Confirms whether key features like AI insights, API access, white-labeled notebooks, and advanced model settings are enabled under your plan.

Plan usage screen showing options included in current plan

# Roles In Graphite Note, every user within a team is assigned a role that determines their access level and permissions. These roles help control what users can view or modify within the platform. Roles can be managed on the Roles Administration Page, accessed by clicking the wheel icon (⚙️) in the top-right corner and selecting Roles.

*** ### Role Types By default in Graphite Note you will have two types of predefined roles that cover the most common access needs: * **Administrator:** Grants full access to read and modify all entities, including datasets, models, notebooks, users, and system settings. Administrator role cannot be deleted but serve as the foundation for managing basic access levels. * **Viewer:** Provides read-only access, allowing users to view entities but not edit or modify them.

Administrators can create custom roles tailored to the specific needs of their organization. Custom roles allow for detailed control over permissions. Custom roles are created by clicking New Role button, providing a name and description, and defining permissions (e.g., Read & Modify or No Access) for each module.

With both default and customizable roles, Graphite Note provides a robust system for managing user access and ensuring data security across your team. *** ### Role Assignment Roles are assigned to users on the[ Users Page](/graphite-note-documentation/account-and-team-setup/users). When inviting a new user to the team, administrators select a role for them, ensuring the appropriate level of access. Existing roles can also be updated later to adapt to changing team responsibilities. *** ### Viewing Assigned Role Users can see their currently assigned role on the [Profile Information](/graphite-note-documentation/account-and-team-setup/profile-information) page, accessible via the profile dropdown menu in the top-right corner of the platform. This transparency allows users to understand their permissions within the system. # Users The Users Page in Graphite Note provides administrators with tools to manage team members, assign roles, and update user details. It can be accessed by clicking the wheel icon (⚙️) in the top-right corner and selecting Users.

*** ### User List Displays all team members with details such as user code, name, AI content language, email, assigned role, and activation status. Administrators can manage users directly from this list.

*** ### Inviting New Users New users can be invited by clicking on the Invite user button and entering their email address and assigning a role (e.g., Viewer, Administrator, or a custom role) from the Users Page. Once the invitation is sent, a user profile is automatically created within the system, even before the user accepts the invitation. This allows administrators to manage and edit the user’s details, such as their name, role, or preferences, immediately after the invitation is issued. The profile becomes fully active once the user accepts the invitation via email.

*** ### Editing Invited user Once a user is invited, administrators can edit their details by clicking the gear icon under the Action column on the Users Page. This opens the Edit User Panel, where the following information can be found: • **User Code:** A unique identifier for the user (non-editable). • **Name:** Modify the user’s display name to ensure accuracy or reflect changes. • **AI Generated Content Language:** Select the user’s preferred language for AI-generated content. • **Select Avatar:** Customize the user’s profile avatar by choosing from a range of color options for better visual distinction. • **Email:** Update the user’s registered email address. • **Role:** Change the user’s assigned role (e.g., Viewer, Administrator, or a custom role) to adjust their access permissions.

Changes are saved by clicking the Save button, ensuring the user profile reflects updated details. This flexibility allows for seamless management of team members’ information and roles. # Tags The Tags Page in Graphite Note is used to manage tags that help in organizing and distinguishing datasets, models, and notebooks. Tags improve searchability and allow users to categorize resources effectively. Note that only one tag can be assigned to each dataset, model, or notebook. The Tags Page can be accessed by clicking the wheel icon (⚙️) in the top-right corner and selecting Tags.

### Tag List Displays all created tags with their name, description, color, and a preview of how the tag appears. *** ### Creating new Tag To create a new tag, users can define its name and description while selecting a color for visual distinction. This allows tags to be uniquely identifiable and easy to manage. Tags can also be edited or deleted using the action icons, providing flexibility for keeping the tagging system up-to-date.

### Tag creation in Datasets, Models and Noteboks Additionally, tags can also be created during the process of creating a new dataset, model, or notebook, or within the settings options of an existing dataset, model, or notebook. This ensures that tagging remains a flexible and integrated part of resource management.
# Logs The Logs Page in Graphite Note lets you review a complete history of model-training runs—showing start / finish times, durations, statuses, and model codes for quick auditing and troubleshooting. Use it to confirm that a model finished successfully, inspect failures, or copy a model-code for API calls. You can open the Logs Page by clicking the wheel icon (⚙️) in the top-right corner and selecting Logs.

For a full breakdown of every column and filter, see the [Model Execution Logs](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/model-execution-logs) page. # FAQ This page contains the most Frequently Asked Questions ### **What is Predictive Analytics?** Predictive analytics is a form of advanced analytics that uses both new and historical data to forecast future activity, behavior, and trends. It involves applying statistical analysis techniques, analytical queries, and automated machine learning algorithms to data sets to create predictive models that place a numerical value — or score — on the likelihood of a particular event happening. ### **What is Prescriptive Analytics?** Prescriptive analytics is a form of advanced analytics that examines data or content to answer the question "What should be done?" or "What can we do to make 'X' happen?". It is characterized by techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristics, and machine learning. ### What is AutoML? AutoML (Automated Machine Learning) is the process of automating the end-to-end process of applying machine learning to real-world problems. It aims to make machine learning accessible to non-experts and to improve efficiency for experts by automating tasks such as data preprocessing, feature engineering, model selection, and hyperparameter tuning. ### **How does Graphite Note ensure data security?** At Graphite Note, we take data security very seriously. We employ robust security measures to ensure your data is protected at all times. This includes encryption of data at rest and in transit, regular security audits, and strict access controls. [Read more here](https://graphite-note.com/data-security-and-protection). ### **What types of data can I use with Graphite Note?** Graphite Note is designed to work with a wide range of data types. You can import data from various sources such as CSV files, databases, data warehouses. The platform can handle structured, tabular data (like numerical and categorical data). ### **How can I import my data into Graphite Note?** Importing data into Graphite Note is a straightforward process. You can upload data directly from your computer, connect to a database, or your data warehouse. Our platform supports a variety of data formats, including CSV, and SQL databases. ### **What kind of support is available if I need help using Graphite Note?** We offer a range of support options to help you get the most out of Graphite Note. This includes a comprehensive knowledge base, video tutorials, and email support. Our dedicated support team is always ready to assist you with any questions or issues you may have. ### **Can I use Graphite Note if I don't have a background in data science?** Absolutely! Graphite Note is designed to be user-friendly and accessible to everyone, regardless of their technical background. Our no-code platform allows you to generate predictive and prescriptive analytics without needing to write a single line of code. ### **What industries can benefit from using Graphite Note?** Graphite Note is versatile and can be beneficial to a wide range of industries. This includes but is not limited to retail, e-commerce, marketing, sales, finance, healthcare, and manufacturing. Any industry that relies on data to make informed decisions can benefit from our platform. ### **How can Graphite Note help with my specific business needs?** Graphite Note is a flexible platform that can be tailored to meet your specific business needs. Whether you're looking to improve customer retention, optimize your marketing campaigns, forecast sales, or identify trends, our platform can provide the insights you need to drive growth. ### **What training resources are available for new users?** We offer a variety of resources to help new users get started with Graphite Note. This includes step-by-step tutorials, webinars, and a comprehensive knowledge base. We're committed to helping you get the most out of our platform and will work with you during onboarding. ### How to update a model with new data? You can easily re-upload a CSV file with Graphite Note. This allows you to update or append new data to your existing dataset. More info on data update and append can be found [here](/graphite-note-documentation/datasets/data-sources/import-data-from-csv-file/re-upload-or-append-csv). ### What is a tag? A tag is a keyword associated with a model or dataset. It is a tool to group your models to easily find them, as you can filter your list by tags. Also, you can create tags and manage them by clicking on the *Account* tab in the top-right of Graphite Note, and then the *Tags* drop-down item. You can also create a tag directly when you are importing a dataset or creating a model or notebook by clicking on *Select tag* and then *Create & Apply. More info about tags can be found* [*here*](/graphite-note-documentation/account-and-team-setup/tags)*.* ### What does parsing mean? In programming, parsing refers to the process of analyzing and interpreting the structure of a sequence of characters or symbols according to a specific grammar or syntax. It is used in our application to understand and extract meaningful information from input data. During parsing, a parser takes the input, which can be a program's source code or any other form of textual data, and breaks it down into a hierarchical structure that conforms to a predefined set of rules or grammar. This hierarchical structure is typically represented using a data structure such as an abstract syntax tree (AST) or a parse tree. ### How does the free trial work? Users can start their free trial of the SPROUT plan immediately. The trial is valid for the next 14 days. After 14 days, if you want to continue our service, you must subscribe to a plan in communication with our sales team. ### What's the difference between the starter (SPROUT) and other plans? The starter plan is primarily designed for individual users that want to upload CSV files and create machine learning models. The starter plan has the same core functionality as higher plans but with the following limitations: * Only one user in the workspace. * Only CSV connector * Max of 3 models can be created * Number of total data source rows limited to 1 milion. ### What is API access? APIs enable you to easily upload your datasets by sending data directly to Graphite Note (Dataset APIs) or to pull your predictions into your ERP, CRM, internal app, or website (Prediction APIs). It is a way to process or display predictions outside your Graphite Note account. More about APIs can be found [here](/graphite-note-documentation/rest-api/api-introduction). ### What is a Dedicated Data Scientist? With Graphite Note, you have the option to add a Dedicated Data Scientist to your team. This is an expert in machine learning and data science who can assist you and your team with any questions or concerns you may have. They can also provide hands-on support with tasks such as data cleaning and improving the performance of your models. ### Can I extend my trial? We can extend your trial beyond the one-week default period in certain circumstances. Don't hesitate to get in touch with us before the end of your trial if you'd like to discuss this further. ### Special Discounts for Academia and Startups? Our SaaS platform provides no-code predictive analytics, making data analysis accessible to everyone. As a company, we are dedicated to supporting the academic and startup communities and offer generous discounts to these groups. If you want to hear more about our offerings, don’t hesitate to reach out to us! ### Reputable Provider Graphite Note runs on platforms belonging to reputable leading service providers and vendors that uphold the highest security standards, specifically: Amazon Web Service (AWS). ### I want to know more about Data Science and Machine Learning lexical terms For every lexical term you can check on the [machine learning glossary](https://graphite-note.com/comprehensive-ai-and-machine-learning-glossary). ### How is Graphite Note different then Gen AI? In summary, Graphite Note is a specialized tool for predictive analytics and data-driven decision-making, whereas Generative AI focuses on creating new content based on prompts. Both have unique strengths, but they serve to vastly different business and creative needs. **Graphite Note** specializes in machine learning tasks like regression, classification, and actionable insights based on structured datasets. It’s tailored for business scenarios such as sales forecasting, churn prediction, or lead scoring, **Generative AI** excels at generating unstructured outputs such as creative writing, dialogue generation, or designing visuals based on prompts. ### How to change password in Graphite Note? To change your password in Graphite Note, follow these steps: 1\. Access Your Profile: Click on the small user profile icon in the top-right corner of the interface. This icon usually displays the first letters of your name and surname. 2\. Open Profile Settings: From the dropdown menu, select Profile to access your personal settings. 3\. Change Password: Locate the Password section, where you will find a Change Password button. Click this button to update your password. This ensures a secure way to manage your account credentials. If you forget your password, use the password recovery option on the login page.\
# What is Graphite Note? Graphite Note is a no-code machine learning platform that enables data analysts to create predictive models in minutes. By connecting your data, the platform automatically preprocesses it, allowing you to train AI models without coding. It identifies key drivers affecting predictions and provides actionable insights through generative AI, facilitating data-driven decision-making. Graphite Note supports various industries with prebuilt model templates, making complex data analysis accessible and efficient. {% embed url="" %} # Graphite Note Insights Lifecycle No-code, Automated Machine Learning for Data Analytics Teams ### Data to Insights Lifecycle * **Dataset:** Begin with a dataset containing historical data. * **Feature Selection:** Identify the most important variables (features) for the model. * **Best Algorithm Search:** Test different algorithms to find the best fit for your data. * **Model Generation:** Create a predictive model based on selected features and the best algorithm. * **Model Tuning:** Fine-tune the model’s parameters to improve accuracy. * **Model Deployment:** Deploy the final model for real-world usage. * **Explore Key Drivers:** Analyze the key factors influencing the model’s predictions. * **Explore What-If Scenarios:** Test different hypothetical situations to see their impact. * **Predict Future Outcomes:** Use the model to forecast future trends or outcomes.

1. # Introduction to Machine Learning In this section, we’ll explore the core machine learning concepts that underpin the Graphite Note solution. You’ll learn about the algorithms and techniques used to analyze data, make predictions, and uncover valuable insights. By understanding these foundational principles, you’ll gain a deeper appreciation of how Graphite Note leverages machine learning to deliver powerful analytical capabilities. # What is Machine Learning #### Machine learning is a method that uses data to teach computers to recognize patterns and key drivers, allowing them to predict future outcomes without being explicitly programmed.

No-code machine learning is a simplified approach to machine learning that allows users to build, train, and deploy machine learning models without needing to write any code. This makes advanced data analysis accessible to non-technical users, empowering business teams to harness machine learning insights without relying on data scientists or programmers. In no-code machine learning, platforms like Graphite Note provide intuitive interfaces where users can import data, select features, and train models through guided steps. For example, machine learning, as a method, uses data to teach computers to recognize patterns and key drivers, enabling them to predict future outcomes. In a no-code environment, this process is automated, allowing users to set up predictive models by simply uploading data and selecting key variables, all through a user-friendly, visual workflow. By removing the complexity of coding, no-code machine learning enables organizations to leverage powerful data insights faster, supporting better business decisions and allowing companies to respond more quickly to market demands. # Data Analitycs Maturity From Business Intelligence (BI) to Artificial Intelligence (AI) Analytics maturity represents an organization’s progression in leveraging data to drive insights and decisions. This journey typically follows four levels:

**1. Descriptive Analytics:** The foundation of analytics maturity, focused on answering “What happened?” Descriptive analytics relies on reporting and data mining to summarize past events. Most organizations begin here, gaining basic insights by understanding historical data. **2. Diagnostic Analytics:** Building on descriptive insights, diagnostic analytics answers “Why did it happen?” by drilling deeper into data patterns and trends. Using techniques such as query drill-downs, diagnostic analytics provides context and explanations, helping organizations understand the causes of past events. Traditional organizations often operate within this descriptive and diagnostic phase. **3. Predictive Analytics:** Moving into more advanced analytics, predictive analytics addresses “What will happen?” by utilizing machine learning and AI to forecast future outcomes. Through statistical simulations and data models, predictive analytics enables organizations to anticipate trends, customer behavior, and potential risks. Elevating to this level empowers organizations to make more proactive, data-driven decisions and gain a competitive edge. **4. Prescriptive Analytics:** At the highest level of analytics maturity, prescriptive analytics answers “What should I do?” It combines machine learning, AI, and mathematical optimization to recommend actions that lead to desired outcomes. By offering actionable guidance, prescriptive analytics not only predicts future scenarios but also prescribes the best course of action, allowing organizations to optimize decisions and drive strategic growth. While many organizations remain in the descriptive and diagnostic phases, those aiming to stay competitive and drive innovation must elevate their analytics capabilities. **Graphite Note** is designed to accelerate this journey, helping organizations seamlessly transition into predictive and prescriptive analytics. By embracing machine learning and AI through **Graphite Note**, companies can transform their data into a strategic asset, enabling proactive decision-making and unlocking new avenues for operational efficiency and business growth. # Machine Learning Workflow A typical machine learning workflow consists of several key stages that build upon each other. #### 1. Problem Definition The first step is clearly defining the analytical problem. At this stage, the goal is to determine what type of prediction or insight is needed. Examples include: * Predicting whether a customer will churn * Estimating future sales or demand * Classifying leads as high or low conversion probability Clearly defining the objective helps determine which machine learning approach should be used, such as classification, regression, or segmentation. *** #### 2. Data Collection Once the problem is defined, relevant data must be gathered. Data may come from various sources such as: * CRM systems * transaction databases * marketing platforms * operational systems The quality and completeness of the data strongly influence the accuracy and usefulness of the resulting models. *** #### 3. Data Preparation and Exploratory Data Analysis (EDA) Before building any models, the dataset must be examined and prepared. This stage typically includes: * inspecting dataset structure * identifying missing values * detecting outliers * understanding distributions * analyzing relationships between variables Exploratory Data Analysis is essential for identifying potential issues and understanding which features may be important predictors. *** #### 4. Modeling After the data has been prepared, machine learning algorithms are used to train predictive models. Depending on the problem, different types of models may be applied: * Binary Classification * Regression * Multiclass Classification * Segmentation or clustering
The goal of this stage is to learn patterns from historical data that can be used to make predictions on new observations. *** #### 5. Model Evaluation Once a model is trained, its performance must be evaluated using appropriate metrics. For example: * classification models may be evaluated using metrics such as accuracy, precision, recall, and confusion matrices * regression models may be evaluated using error metrics such as RMSE or MAE Evaluation helps determine whether the model generalizes well to unseen data and whether it is suitable for real-world decision making. *** #### 6. Deployment and Decision Making The final step is operationalizing the model. Predictions generated by the model can be used to support business decisions such as: * prioritizing high-value customers * targeting marketing campaigns * optimizing pricing strategies * forecasting future demand In modern decision intelligence platforms, the focus is not only on prediction but on turning model outputs into practical actions. *** ### Why the Machine Learning Workflow Matters Following a structured workflow ensures that machine learning projects remain reliable, interpretable, and aligned with real-world objectives. Skipping early steps such as data exploration or preparation can lead to misleading models and poor predictions. By understanding each stage of the workflow, users can build stronger analytical intuition and better interpret the insights generated by predictive models. # Machine Learning concepts The following sections cover some of the most important machine learning concepts. # Key Drivers **Introduction** In predictive modeling, key drivers (or influencers) are pivotal in discerning which features within a dataset most significantly impact the target variable. These influencers provide insights into the relative importance of each variable, enabling data scientists and analysts to understand and predict outcomes more accurately.

By highlighting the strongest predictors, key influencers inform the prioritization of features for model optimization, ensuring that models are precise and interpretable in real-world scenarios. This foundational understanding is crucial for refining models and aligning them closely with the underlying patterns and trends present in the data. *** **Reading Key Drivers** When examining the visualization of key influencers in Graphite Note Models, you'll find features arrayed according to their influence on the target variable, organized from most to least important on the left.

Visual representation of most important Key drivers

This ranking allows for a quick assessment of which factors are pivotal in the model's predictions. By observing the length and direction of the bars associated with each feature, one can gauge the strength of influence they have on the target outcome. *** #### Understanding Key driver influence The image displays a visual breakdown of how different tenure values (the length of time a customer has stayed with a service) influence the likelihood of customer churn, specifically the outcome “Churn = Yes”.

The chart divides tenure into several ranges (or bins) and shows how each range increases or decreases the likelihood of churn compared to the average baseline. The churn likelihood is expressed as a multiplier: * Customers with short tenure (1.93 to 7.58 months) are 2.18 times more likely to churn, the highest risk group in this analysis. * Those in the 7.58 to 13.17 range also show an elevated risk, 1.46x more likely to churn. * Churn likelihood continues to decrease with tenure — between 13.17 and 18.75 months, customers are 1.27x more likely to churn than average. * In contrast, customers with tenure between 18.75 and 24.33 months are 1.04x less likely to churn — a marginal decrease, but moving into safer territory. * This trend improves further for tenure ranges of 24.33 to 29.92 and 29.92 to 35.5 months, both showing a 1.18x decrease in churn risk. {% hint style="info" %} **Key takeaway:** Longer-tenured customers churn less. The risk of churn is significantly higher during the first 18 months of a customer’s lifecycle and begins to drop off afterward. This suggests that businesses should focus retention efforts early in the customer journey — especially within the first year — where churn risk is highest. {% endhint %} *** **Statistical Methodology Used** Graphite Note uses a smart and transparent approach to show how each feature (or column) in your dataset influences the prediction outcome—like whether a customer is likely to churn. First, it calculates feature importance by checking how much the model relies on each column to make accurate predictions. This is done using a method called permutation importance: the system shuffles the values of one feature and observes how much the model’s performance drops. The bigger the drop, the more important that feature is. For numeric columns such as “tenure,” Graphite Note then goes one step further. It splits the values into logical ranges (called bins) and looks at how likely the predicted outcome (e.g., Churn = Yes) is for each range. It compares the average churn rate across the full dataset with the churn rate in each of those bins. If the churn rate in a bin is higher than average, it says the likelihood “increases” (e.g., by 2.18x); if it’s lower, it says it “decreases” (e.g., by 1.18x). This lets you clearly see which value ranges increase or reduce the chances of a specific outcome. To keep insights relevant, the system automatically removes categories or ranges that don’t have enough data to be meaningful. This makes the analysis not only accurate but also easy to act on—giving you clarity on what drives results and where to focus your strategy.

# Confusion Matrix #### Purpose The Confusion Matrix is a powerful diagnostic tool in classification tasks within predictive analytics. It presents a clear and concise layout for evaluating the performance of a classification model by showing the actual versus predicted values in a tabular format. The matrix allows users, regardless of their coding expertise, to assess the accuracy and effectiveness of a predictive model, providing insights into not only the number of correct and incorrect predictions but also the type of errors made. #### Components of the Confusion Matrix A confusion matrix for a binary classification problem consists of four components: * **True Positives (TP)**: The number of instances that were predicted as positive and are actually positive. * **False Positives (FP)**: The number of instances that were predicted as positive but are actually negative. * **True Negatives (TN)**: The number of instances that were predicted as negative and are actually negative. * **False Negatives (FN)**: The number of instances that were predicted as negative but are actually positive. #### Applications In the context of Graphite Note, a no-code predictive analytics platform, the confusion matrix serves several key purposes: 1. **Performance Measurement**: It quantifies the performance of a classification model, offering a visual representation of the model's ability to correctly or incorrectly predict categories. 2. **Error Analysis**: By breaking down the types of errors (FP and FN), the matrix aids in understanding specific areas where the model may require improvement. 3. **Decision Support**: The confusion matrix supports decision-making by highlighting the balance between sensitivity (or recall) and precision, which can be crucial for business outcomes. 4. **Model Tuning**: Users can leverage the insights from the confusion matrix to adjust model parameters and thresholds to optimize for certain predictive behaviors. 5. **Communication Tool**: It acts as a straightforward communication tool for stakeholders to grasp the results of a classification model without delving into complex statistical jargon. #### Interpretation In the example confusion matrix (Model Performance -> Accuracy Overview)

* There are 799 instances where the model correctly predicted the positive class (TP). * There are 15622 instances where the model incorrectly predicted the positive class (FP). * There are 348 instances where the model failed to identify the positive class (FN). * There are 18159 instances where the model correctly identified the negative class (TN). The high number of FP and FN relative to TP suggests a potential imbalance or a need for model refinement to improve predictive accuracy. #### Conclusion The classification confusion matrix is an integral part of the model evaluation in Graphite Note, enabling users to make informed decisions about the deployment and iteration of their predictive models. # Supervised vs Unsupervised ML In machine learning, supervised and unsupervised learning are two main types of approaches:
### 1. Supervised Learning In supervised learning, the model is trained on a labeled dataset. This means we provide both input data and the corresponding output labels to the model. The goal is for the model to learn the relationship between inputs and outputs so it can predict new, unseen data. Common examples include classification (e.g., email spam detection) and regression (e.g., predicting house prices). *For example, if you have an image dataset labeled with “cat” or “dog,” the model learns to classify new images as either a cat or dog based on this training.*
*** #### Supervised learning on Diamonds dataset example In this example, we have a dataset containing information about diamonds. The supervised machine learning approach focuses on predicting a specific target column based on other features in the dataset.

• Target is a Number (Regression): If the target column is numerical (e.g., “Price”), the goal is to predict the diamond’s price based on features like cut, color, and clarity. This is called regression. • Target is Text (Classification): If the target is categorical (e.g., “Cut” with values like Ideal, Very Good), the goal is to classify diamonds into categories based on their characteristics. This is known as classification. *** ### 2. Unsupervised Learning \ In unsupervised learning, the model is given only input data without any labeled outputs. The goal is to find patterns or groupings within the data. A common task here is clustering (e.g., grouping customers by purchasing behavior) and dimensionality reduction (e.g., simplifying data visualization). *For example, if you have images without labels, the model could group similar images together (like cats in one group and dogs in another) based on visual similarities.* *** #### Unsupervised learning on Diamonds dataset example In unsupervised learning, there is no target column or labeled output provided. Instead, the model analyzes patterns within the data to group or cluster similar items together.

In this diamond dataset example: • We don’t specify a target column (like price or cut); instead, the goal is to find natural groupings of data points based on their features. • Here, clustering is used to identify groups of diamonds with similar Carat Weight and Price characteristics. • The scatter plot on the right shows how the diamonds are grouped into different clusters (e.g., cluster0, cluster1, etc.), revealing patterns in the data without needing predefined labels. This approach is useful when you want the model to identify hidden structures or patterns within the data. Unsupervised learning is often used for customer segmentation, anomaly detection, and recommendation systems. # Exploratory Data Analysis (EDA) ### Purpose Exploratory Data Analysis (EDA) is the process of examining and understanding a dataset before applying machine learning models. It allows analysts to investigate the structure, quality, and relationships within the data in order to uncover patterns, detect anomalies, and form hypotheses about what may influence the target variable. EDA serves as a diagnostic phase of the machine learning workflow. Instead of immediately building predictive models, analysts first explore the data to ensure that it is reliable, meaningful, and suitable for modeling. Without proper exploratory analysis, models may learn misleading patterns caused by data errors, outliers, or incorrect assumptions. *** ### Key Objectives of Exploratory Data Analysis EDA helps answer several critical questions about a dataset. #### Understanding Dataset Structure The first step in EDA is understanding the structure of the dataset. This includes examining: * the number of rows and columns * the data types of each feature * whether values are missing or incomplete This step provides a high-level overview of the dataset and reveals potential data quality issues early in the process. *** #### Exploring Feature Distributions Analyzing how data values are distributed is an important part of exploratory analysis. For numerical features, analysts often examine: * histograms to understand distributions * boxplots to detect outliers * summary statistics such as mean, median, and standard deviation

These techniques help identify whether values are concentrated in specific ranges or whether unusual observations are present. *** #### Understanding Relationships Between Variables EDA also focuses on identifying relationships between variables. Scatter plots and correlation matrices are commonly used to determine whether certain features move together or influence one another. For example, a scatter plot might reveal that as one variable increases, another variable also tends to increase. Understanding these relationships helps identify features that may be strong predictors for future modeling. *** #### Detecting Outliers and Data Quality Issues Real-world datasets often contain incorrect or extreme values. These anomalies may arise from data entry errors, system issues, or unusual observations. Outliers can significantly influence statistical measures and machine learning models. Detecting and addressing these values is therefore an important step in preparing the dataset for analysis. *** #### Identifying Feature Types Another key objective of EDA is distinguishing between different types of features. Features typically fall into two categories: * Numerical features, which represent measurable quantities * Categorical features, which represent groups or labels Different analytical techniques and preprocessing methods apply to each type. Recognizing these differences helps guide the next stages of analysis and modeling. *** ### Why Exploratory Data Analysis Is Important Exploratory Data Analysis plays a crucial role in building reliable machine learning models. By thoroughly understanding the dataset before modeling begins, analysts can: * detect data quality problems early * identify important predictors * understand feature relationships * reduce the risk of building misleading models In many real-world machine learning projects, a significant portion of time is spent performing exploratory analysis and data preparation. This careful examination of the data ensures that subsequent modeling steps are built on a strong and trustworthy foundation. # Numerical vs Categorical Features ### Purpose Understanding the difference between numerical and categorical features is a fundamental step in data analysis and machine learning. Features represent the variables or attributes used to describe observations within a dataset. Identifying the type of each feature helps determine which analytical techniques, visualizations, and preprocessing methods should be applied before building a machine learning model. In predictive analytics, different feature types require different handling. Some algorithms operate directly on numerical values, while categorical variables must often be transformed into numerical representations before they can be used in modeling. Recognizing the distinction between these feature types is therefore essential for preparing datasets correctly and ensuring that machine learning models can interpret the data effectively. *** ### Numerical Features Numerical features represent measurable quantities and are expressed as numbers. These values may be continuous (such as price or temperature) or discrete (such as counts or quantities).
Examples of numerical features include: * product price * transaction amount * customer age * quantity purchased * physical measurements Numerical features allow direct mathematical operations such as addition, subtraction, averaging, and correlation analysis. Because of this, they are commonly used in statistical analysis and machine learning algorithms. In exploratory data analysis, numerical variables are typically examined using visualizations such as histograms, boxplots, or scatter plots to understand their distributions and relationships with other variables. *** ### Categorical Features Categorical features represent labels or groups rather than measurable values. These variables describe qualitative attributes and typically consist of a limited set of categories. Examples of categorical features include: * product category * customer segment * payment method * geographic region * subscription plan type
Unlike numerical features, categorical values cannot be directly used in many machine learning algorithms because mathematical operations on categories are not meaningful. Before modeling, categorical variables are usually converted into numerical representations using techniques such as: * one-hot encoding * label encoding * target encoding This transformation allows machine learning algorithms to process categorical information while preserving the meaning of the categories. *** ### Why Feature Types Matter Identifying feature types influences several important aspects of data analysis and machine learning preparation: **Choice of visualization** Different feature types require different visualization methods. * Numerical features → histograms, boxplots, scatter plots * Categorical features → bar charts or frequency tables
**Data preprocessing** Numerical features may require scaling or normalization, while categorical features must often be encoded into numeric form. **Algorithm compatibility** Some algorithms can naturally handle categorical variables, while others require all inputs to be numerical. Understanding the nature of each feature ensures that the dataset is prepared correctly and that models can learn meaningful patterns from the data. *** ## Correlation and Multicollinearity ### Purpose Correlation analysis is used to measure the strength and direction of the relationship between numerical variables. It helps analysts understand how changes in one variable are associated with changes in another. In machine learning and predictive analytics, correlation analysis is often performed during exploratory data analysis to identify which features may have predictive value and how variables interact with one another. At the same time, correlation analysis can reveal an important phenomenon known as multicollinearity, where multiple features carry very similar information. Detecting such relationships helps improve model stability and interpretability. *** ### Understanding Correlation Correlation measures how strongly two numerical variables move together. Correlation values typically range between −1 and +1. * Positive correlation indicates that two variables tend to increase together. * Negative correlation indicates that when one variable increases, the other tends to decrease. * Correlation close to zero suggests little or no linear relationship. For example, in many business datasets, purchase quantity and total revenue may exhibit strong positive correlation because larger purchases often lead to higher revenue. Correlation analysis provides an initial indication of which features might influence a target variable in predictive models. *** ### Interpreting Correlation Strength
The strength of correlation can be interpreted as follows: * Strong correlation – variables move closely together and may provide strong predictive signals. * Moderate correlation – variables share some relationship but are not perfectly aligned. * Weak correlation – variables show little consistent relationship. While correlation does not imply causation, it is a useful tool for identifying patterns that warrant further analysis. *** ### Multicollinearity Multicollinearity occurs when two or more features in a dataset are strongly correlated with each other. In such cases, the variables provide overlapping information about the same underlying phenomenon. For example, several variables describing the size of a product may all be closely related, meaning they convey similar information. Multicollinearity can create challenges for certain machine learning models, particularly linear models such as linear regression or logistic regression. When multiple features provide similar information, the model may struggle to determine which feature is truly responsible for the observed effect. *** ### Why Detecting Multicollinearity Is Important Identifying multicollinearity helps improve both model stability and interpretability. When highly correlated features are present, analysts may choose to: * remove redundant variables * combine variables into a single feature * select algorithms that are less sensitive to multicollinearity Understanding feature relationships ensures that models are built using meaningful and non-redundant inputs. # Outliers and Data Quality Checks ### Purpose Outliers are data points that differ significantly from the majority of observations in a dataset. They may represent unusual events, measurement errors, or incorrect data entries. Detecting and handling outliers is an important part of exploratory data analysis because extreme values can strongly influence statistical results and machine learning models. Data quality checks aim to ensure that the dataset accurately represents real-world conditions. By identifying anomalies, inconsistencies, or impossible values, analysts can prevent models from learning misleading patterns. *** ### What Are Outliers? An outlier is an observation that lies far outside the typical range of values in a dataset. Outliers can occur for several reasons: * measurement errors * incorrect data entry * rare but valid events * system or sensor failures For example, if most values in a dataset fall within a certain range but one observation is dramatically higher or lower than the rest, that observation may be considered an outlier. *** ### Why Outliers Matter Outliers can significantly affect statistical measures such as averages, correlations, and standard deviations. When extreme values are present, they may distort the patterns that machine learning algorithms attempt to learn. Some potential consequences include: * misleading correlations * unstable model coefficients * reduced predictive performance * models that focus too heavily on rare cases Because of these risks, outliers should always be investigated during exploratory analysis. *** ### Detecting Outliers Several techniques are commonly used to identify unusual observations: **Boxplots** Boxplots highlight the central distribution of values and clearly display points that fall outside the typical range. **Histograms** Histograms reveal the distribution of values and may expose extreme values in the tails of the distribution. **Scatter plots** Scatter plots help detect unusual observations when comparing relationships between two variables. These visualizations allow analysts to quickly spot anomalies and assess whether they represent genuine observations or potential data issues. *** ### Data Quality Checks In addition to identifying outliers, data quality checks focus on detecting inconsistencies or impossible values within a dataset. These checks may include: * verifying that numerical values fall within realistic ranges * identifying missing values * detecting duplicate records * checking for invalid measurements or formatting errors Ensuring high data quality is essential because machine learning models rely entirely on the information provided in the dataset. *** ### Why Data Quality Is Critical for Machine Learning Machine learning models learn patterns directly from the data they receive. If the data contains errors, inconsistencies, or unrealistic values, the resulting predictions may be unreliable. By performing careful data quality checks and addressing potential issues early in the workflow, analysts create a stronger foundation for building accurate and trustworthy predictive models. # Demo Datasets We have prepared 12 distinct demo datasets that you can upload and use in your Graphite Note account. These demo datasets include dummy data from a variety of business scenarios and serve as an excellent starting point for building and running your initial Graphite Note machine learning models. Instructions on how to proceed with demo dataset will be provided in the following pages.\

# Ads Create a Regression model on Demo Ads dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create a machine learning model. In this case we will select "Ads" dataset to create a Regression Analysis on marketing ads data.

3\. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.

4\. Adjust your dataset options on the Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create a new model in the Graphite Note main menu click on "Models".

6\. You will get list of available models. Click on "New Model" to create a new one.

7\. Select model type from our templates. In our case, we will select "Regression" by double clicking on its name.

8\. Select dataset you want to use to produce a model. We will use "Demo-Ads.csv."

9\. Name your new model. We will call it "Regression on Demo-Ads".

10\. Write the description of the model and select tag. If you want you can also create a new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up a "Regression Model", first you will need to define the "Target Feature". That is a numeric column from your dataset that you'd like to make predictions about. In the case of Regression on Ads dataset, target feature is "Clicks" column.

13\. Click "Next" to get the list of model features that will be included in the scenario. Model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.

16\. Explore the Regression Model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how the model is trained and set up.

17\. If you want to take turn model into action click on "Predict" tab in the main model menu.

18\. You can produce your own "What-If" analysis based on existing training results. You can also import a fresh CSV dataset into the data model, to make predictions on target column. In our case that is "Clicks". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour, and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

\

# Churn Create Binary Classification model on Demo Churn dataset. Get an overview of Customer Churn demo dataset and how it can be used to create your new Graphite Note model in this video: {% embed url="" %} Or follow instructions below to get step by step guidance on how to use Customer Churn demo dataset:
1.If you want to use Graphite Note demo datasets click "Import DEMO Dataset"

2\. Select the dataset you want to use to create a machine learning model. In this case we will select Churn dataset to create binary classification analysis on customer engagement data .

3\. Once selected, demo dataset will load into your account. Dataset view will automatically open.

4\. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create new model in the Graphite Note main menu click on "Models"

6\. You will get list of available models. Click on "New Model" to create new one.

7\. Select model type from our templates. In our case we will select "Binary Classification" by double clicking on its name.

8\. Select dataset you want to use to produce model. We will use "Demo-Churn.csv."

9\. Name your new model. We will call it "Binary Classification on Demo-Churn".

10\. Write description of the model and select tag. If you want to, you can also create a new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up Binary Classification model first you need to define "Target Feature". That is binary column from your dataset that you'd like to make predictions about. In case of Binary Classification on Churn dataset, the target feature will be "Churn" column.

13\. Click "Next" to get the list of model features that will be included in scenario. Model relies upon each column (feature) to make accurate predictions. When training model we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for a few moments and Voilà! Your Binary Classification model is trained. Click on "Performance" tab to get model insights and view Key Drivers.

16\. Explore Binary Classification model by clicking on Impact Analysis and Training Results to get more insights on how model is trained.

17\. If you want to turn your model into action click on "Predict" tab in the main model menu.

18\. You can produce your own "What-If analysis" based on existing training results. You can also import a fresh CSV dataset with data model will use to make predictions on a target column. In our case that is "Churn". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

\

# CO2 Emission Create Regression model on Demo CO2 Emission dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create the machine learning model. In this case, we will select CO2 Car Emissions dataset to create Regression Analysis on car emissions data.

3\. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.

4\. Adjust your dataset options on Settings tab. Click the Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create a new model in the Graphite Note main menu click on "Models"

6\. You will get list of available models. Click on "New Model" to create new one.

7\. Select model type from our templates. In our case we will select "Regression" by double clicking on its name.

8\. Select the dataset you want to use to produce the model. We will use "Demo-CO2-Car-Emissions-Canada.csv"

9\. Name your new model. We will call it "Regression on Demo-CO2-Car-Emissions"

10\. Write the model description and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up the Regression model first, you need to define "Target Feature". That is numeric column from your dataset that you'd like to make predictions about. In case of Regression on car emissions dataset target feature is "CO2 Emissions(g/km)" column.

13\. Click "Next" to get the list of model features that will be included into scenario. Model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on the "Performance" tab to get model insights and view the Key Drivers.

16\. Explore the Regression model by clicking on Impact Analysis and Training Results, to get more insights on how model is trained.

17\. If you want to take your model into action, click on the "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset that the model will use to make predictions on the target column. In our case that is "CO2 Emissions (g/km)". Keep in mind, the dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

# Diamonds Create Multi-Class Classification model on Demo Diamonds dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create machine learning model. In this case we will select a Diamonds dataset to create Multi Class Analysis on diamond characteristics data.

3\. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.

4\. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create new model in the Graphite Note main menu click on "Models"

6\. You will get list of available models. Click on "New Model" to create a new one.

7\. Select a model type from our templates. In our case we will select "Multi-Class Classification" by double clicking on its name.

8\. Select the dataset you want to use to produce model. We will use "Demo-Diamonds.csv"

9\. Name your new model. We will call it "Multi-Class Classification on Demo-Diamonds"

10\. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up a Multi Class model first you need to define "Target Feature". That is text column from your dataset you'd like to make predictions about. In case of Multi Class on Diamonds dataset target feature is "Cut" column.

13\. Click "Next" to get the list of model features that will be included under the scenario. Model relies on each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training model click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.

16\. Explore Multi Class model by clicking on Impact Analysis, Model Fit, Accuracy Overview or Training Results to get more insights on how model is trained and set up.

17\. If you want to take your model into action click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on target column. In our case that is "Cut". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

# eCommerce Orders Create RFM Customer Segmentation on Demo eCommerce Orders dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create machine learning model. In this case we will select eCommerce Orders dataset to create RFM Customer Segmentation (Recency, Frequency, Monetary Value) analysis on ecommerce orders data.

3\. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.

4\. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore the dataset details on Summary tab.

5\. To create new model in the Graphite Note main menu click on "Models"

6\. You will get list of available models. Click on "New Model" to create new one.

7\. Select a model type from our templates. In our case we will select "RFM Customer Segmentation" by double clicking on its name.

8\. Select dataset you want to use to produce model. We will use "Demo-eCommerce-Orders.csv".

9\. Name your new model. We will call it "RFM customer segmentation on Demo-eCommerce-Orders".

10\. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. Click this text field.

13\. To set up RFM model, you first need to identify and define few parameters. These are: "Time /Date Column", "Customer ID", "Customer Name" (optional) and "Monetary" (amount spent). In our case we will select "created\_at" as date, "user\_id" as customer and "total" as monetary parameter.

14\. To start training model click "Run scenario".

15\. Wait for a few moments and Voilà! Your RFM Customer Segmentation model is trained. Click on the "Results" tab to get model insights.

16\. You can navigate over different tabs to get deep insights into RFM analysis from different perspectives: Recency, Frequency, Monetary.

17\. Tab "RFM Scores" shows detailed explanation on different scores along with RFM segments and descriptions.

18\. Tab "RFM Analysis" gives you more details on different segments

19\. Tab "RFM Matrix" will show you number of customers belonging to different RFM segment. You can export matrix data to use Customer IDs for different business actions (e.g. exporting list of about to churn customers).

\

# Housing Prices Create a Regression model on Demo Housing Prices dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select a dataset you want to use to create machine learning model. In this case we will select Housing-Prices dataset to create a "Regression Analysis" on house price historical data.

3\. Once selected, the demo dataset will load directly to your account. The Dataset view will automatically open.

4\. Adjust your dataset options on the Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore the dataset details on the Summary tab.

5\. To create a new model in the Graphite Note main menu click on "Models"

6\. You will get list of available models. Click on a "New Model" to create new one.

7\. Select the model type from our templates. In our case, we will select "Regression" by double clicking on its name.

8\. Select the dataset you want to use to produce the model. We will use "Demo-Housing-Prices.csv"

9\. Name your new model. We will call it "Regression on Demo-Housing-Prices"

10\. Write the description of the model and select a tag. If you want to, you can also create new a tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up a Regression Model, firstly, you need to define the "Target Feature". That is a numeric column from your dataset that you'd like to make predictions about. In the case of Regression on Demo Housing Prices, the dataset target feature is "Price" column.

13\. Click "Next" to get the list of model features that will be included into scenario. Model relies upon each column (feature) to make accurate predictions. When training model we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training the model, click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on the "Performance" tab to get model insights and view the Key Drivers.

16\. Explore the Regression Model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how model is trained and set up.

17\. If you want to take your model into action. Click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import fresh CSV dataset with data model will use to make predictions on the target column. In our case, that is "Price". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.

19\. Use your model often to predict future behaviour, and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

\

# Lead Scoring Binary Classification Model on Demo Lead Scoring dataset. Get an overview of Lead Scoring demo dataset and how it can be used to create your new Graphite Note model in this video: {% embed url="" %} Or follow instructions below to get step by step guidance on how to use Lead Scoring demo dataset:

1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset". 2\. Select the dataset you want to use to create the machine learning model. In this case, we will select **"Lead Scoring dataset"** to create **binary classification analysis** on potential customer interactions data.

3\. Once selected, the demo dataset will load directly to your account. The Dataset view will automatically open.

4\. Adjust your dataset options on the Settings tab. Click Columns tab to view the list of available columns with their corresponding data types. Then explore the dataset details on Summary tab.

5\. Click "Models"

6\. You will get list of available models. Click on "New Model" to create a new one.

7\. Select the model type from our templates. In our case, we will **select "Binary Classification"** by double clicking on its name.

8\. Select dataset you want to use to produce the model. We will use "Demo-Lead-Scoring.csv."

9\. Name your new model. We will call it "Binary Classification on Demo-Lead-Scoring".

10\. Write the description of the model and select a tag. If you want to, you can also create a new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up a Binary Classification model, firstly, you need to define the "Target Feature". That is a binary column from your dataset that you'd like to make predictions about. In the case of Binary Classification on a Lead Scoring dataset, the target feature will be the "Converted" column.

13\. Click "Next" to get the list of model features that will be included in the model scenario. The model relies upon each column (feature) to make accurate predictions. When training the model, it will calculate which of the features are most important and behave as Key Drivers.

14\. To start training model click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Binary Classification model is trained. Click on the "Performance" tab to get model insights and to view the Key Drivers.

16\. Explore the "Binary Classification" model by clicking on the Impact Analysis and Training Results to get more insights on how the model is trained.

17\. If you want to turn your model into action, click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on the target column. In our case that is "Converted". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour, and to learn which key drivers are impacting outcomes. The more you use and retrain your model, the smarter it becomes!

\

# Mall Customers Create General Segmentation on Demo Mall Customers dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create your machine learning model. In this case, we will select "Mall Customers dataset", to create General Segmentation analysis on customer engagement data.

3\. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.

4\. Adjust your dataset options on the Settings tab. Click the Columns tab to view the list of available columns, with their corresponding data types. Explore the dataset details on Summary tab.

5\. To create a new model in the Graphite Note main menu, click on "Models"

6\. You will get list of available models. Click on the "New Model" to create a new one.

7\. Select the model type from our templates. In our case, we will select "General Segmentation" by double clicking on its name.

8\. Select the dataset you want to use to produce model. We will use "Demo-Mall-Customers.csv"

9\. Name your new model. We will call it "General Segmentation on Demo-Mall-Customers".

10\. Write a description of the model and select a tag. If you want to, you can also create new tag from the pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up your General Segmentation model, firstly, you need to define "Feature" columns. That is numeric column (or columns) from your dataset on which segmentation would be based. In the case of General Segmentation on Mall Customers dataset, the numeric feature will be "Age", Annual Income" and "Spending Score" columns.

13\. To start training the model click "Run Scenario". This will train your model based on the uploaded dataset.

14\. Wait for few moments and Voilà! Your General Segmentation model is trained. Click on "Results" tab to get model insights and explore segmentation clusters.

15\. Navigate over different tabs to get insights from high level "Cluster Summary" to "By Cluster" charts and tables.

16\. Use "Cluster Visualisations" to view the scatter plot visualisations of cluster members.

# Marketing Mix Create a Regression model on Demo Marketing Mix dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select the dataset you want to use to create your machine learning model. In this case, we will select **MMM dataset** to create **Regression Analysis on marketing mix and sales data.**

3\. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.

4\. Adjust your dataset options on the Settings Tab. Click the Columns tab to view list of available columns with their corresponding data types. Explore dataset details on the Summary tab.

5\. To create a new model in the Graphite Note main menu click on "Models".

6\. You will get list of available models. Click on "New Model" to create a new one.

7\. Select your model type from our templates. In our case we will select "Regression" by double clicking on its name.

8\. Select dataset you want to use to produce the model. We will use "Demo-MMM.csv"

9\. Name your new model. We will call it "Regression on Demo-MMM".

10\. Write description of the model and select tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up your "Regression Model", firstly, you need to define "Target Feature". That is numeric column from your dataset that you'd like to make predictions about. In the case of **Regression on Marketing Mix and Sales Dataset**, the target feature is "Sales" column.

13\. Click "Next" to get the list of model features that will be included in scenario. The model relies upon each column (feature) to make accurate predictions. When training the model, we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training the model, click "Run Scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view the Key Drivers.

16\. Explore Regression model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how the model is trained and set up.

17\. If you want to take your model into action, click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import fresh CSV dataset with data model will use to make predictions on target column. In our case, that is "Sales". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

\

# Car Sales Create Timeseries on Demo monthly car sales dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select a dataset you want to use to create your advanced analytics model. In this case, we will select **Monthly Car Sales dataset** to create a "Timeseries Forecast" analysis on car sales data.

3\. Once selected, the demo dataset will load directly to your account. Dataset view will automatically open.

4\. Adjust your dataset options on the Settings tab. Click the Columns tab to view the list of available columns with their corresponding data types. Explore the dataset details on the Summary tab.

5\. To create a new model in the Graphite Note main menu, click on "Models".

6\. You will get list of available models. Click on "New Model" to create new one.

7\. Select the model type from our templates. In our case, we will select "Timeseries Forecast" by double clicking on its name.

8\. Select the dataset you want to use to produce the model. We will use "Demo-Monthly-Car-Sales.csv".

9\. Name your new model. We will call it **"Timeseries forecast on Demo-Monthly-Car-Sales".**

10\. Write description of the model and select a tag. If you want to, you can also create new tag from pop-up "Tags" window, that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up Timeseries forecast analysis first you need to define the "Target Column". That is a numeric column from your dataset that you'd like to forecast. In the case of Timeseries on monthly car sales dataset target column is "Sales"

13\. If dataset includes multiple time series sequences, you can select field that will be used to uniquely identify each sequence. In the case of our demo dataset, we will not apply Sequence Identifier field since we have only "Sales" target column.

14\.

15\. Click "Next" to open "Time/Date Column" selection. Choose "Month" as date column.

16\. From additional options below, choose "Monthly" as time interval and define "Forecast Horizon". We will set up forecast horizon to 6 months in the future.

17\. Click "Next" to activate "Seasonality" options step. Here, you can define seasonality specifics of your forecast. If time interval is set to daily on the next step you will also have "Advanced options" available.

18\. Click "Run Scenario" to train your timeseries forecast.

19\. Wait for a few moments and Voilà! Your Timeseries forecast is trained. Click on the "Performance" tab to get insights and view the graph with original(historical) and predicted model data.

20\. Explore more details on "trend", "Seasonality" and "Details" tabs.

21\. If you want to turn your model into action click on "Predict" tab in the main model menu.

22\. You can produce your own Forecast analysis based on the existing training results by selecting Start and End date from drop down calendar and clicking on "Predict" button.

23\. Use your model often to predict future sales results. The more you use and retrain your model, the smarter it becomes!

# Store Item Demand Create a Regression model on Demo Store Item Demand dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select dataset you want to use to create your machine learning model. In this case, we will select Store Item Demand dataset to create Regression analysis on sales across store locations data.

3\. Once selected, the demo dataset will load directly to your account. The dataset view will automatically open.

4\. Adjust your dataset options on Settings tab. Click Columns tab to view list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create a new model in the Graphite Note main menu, click on "Models".

6\. You will get list of available models. Click on "New Model" to create new one.

7\. Select model type from our templates. In our case we will select "Regression" by double clicking on its name.

8\. Select the dataset you want to use to produce model. We will use "Demo-Store-Item-Demand.csv".

9\. Name your new model. We will call it "Regression on Demo-Store-Item-Demand".

10\. Write a description of the model and select tag. If you want to, you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up **Regression Model**, firstly, you will need to define "Target Feature". That is the numeric column from your dataset that you'd like to make predictions about. In case of Regression on Store Item Demand dataset target feature is "Sales" column.

13\. Click "Next" to get the list of model features that will be included in model scenario. The model relies on each column (feature) to make accurate predictions. When training a model, we will calculate which of the features are most important and behave as the Key Drivers.

14\. To start training your model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Regression model is trained. Click on "Performance" tab to get model insights and view Key Drivers.

16\. Explore Regression model by clicking on Impact Analysis, Model Fit and Training Results to get more insights on how model is trained and set up.

17\. If you want to take your model into action click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on target column. In our case that is "Sales". Keep in mind, dataset you are uploading needs to contain same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

# Upsell Create Binary Classification model on Demo Upsell dataset 1\. If you want to use Graphite Note demo datasets click "Import DEMO Dataset".

2\. Select dataset you want to use to create your machine learning model. In this case we will select Upsell dataset to create binary classification analysis on additional purchases by customer data.

3. After selection, the demo dataset will automatically load into your account and the dataset view will open immediately.

4\. Adjust your dataset options on the Settings tab. Click Columns tab to view the list of available columns with their corresponding data types. Explore dataset details on Summary tab.

5\. To create a new model in the Graphite Note main menu, click on "Models"

6\. You will get a list of available models. Click on "New Model" to create a new one.

7\. Select the model type from our templates. In our case we will select "Binary Classification" by double clicking on its name.

8\. Select the dataset you want to use to produce model. We will use "Demo-Upsell.csv".

9\. Name your new model. We will call it "Binary Classification on Demo-Upsell".

10\. Write a description of the model and select a tag. If you want you can also create new tag from pop-up "Tags" window that will appear on the screen.

11\. Click "Create" to create your demo model environment.

12\. To set up a Binary Classification model, firstly, you need to define "Target Feature". That is binary column from your dataset that you'd like to make predictions about. In case of Binary Classification on Upsell dataset target feature will be "Applied" column.

13\. Click "Next" to get the list of model features that will be included in the model scenario. Your model relies upon each column (feature) to make accurate predictions. When training the model we will calculate which of the features are most important and behave as Key Drivers.

14\. To start training your model click "Run scenario". This will take a sample of 80% of your data and train several machine learning models.

15\. Wait for few moments and Voilà! Your Binary Classification model is trained. Click on "Performance" tab to get model insights and view the Key Drivers.

16\. Explore Binary Classification model by clicking on Impact Analysis and Training Results to get more insights on how model is trained.

17\. If you want to take your model into action click on "Predict" tab in the main model menu.

18\. You can produce your own What-If analysis based on existing training results. You can also import a fresh CSV dataset to make predictions on the target column. In our case that is "Applied". Keep in mind, the dataset you are uploading needs to contain the same feature columns as your model.

19\. Use your model often to predict future behaviour and to learn which key drivers are impacting the outcomes. The more you use and retrain your model, the smarter it becomes!

\

# What Dataset do I need for my use case? The "What Dataset Do I Need?" section of Graphite Note is a comprehensive resource designed to guide users through the intricacies of dataset selection and preparation for various machine learning models. This section is crucial for users, especially those without extensive AI expertise, as it provides clear, step-by-step instructions and examples on how to curate and structure data for different predictive analytics scenarios. **Key Features of the Section** 1. **Model-Specific Guidance:** Each page within this section is tailored to a specific predictive model, such as cross-selling prediction, churn prediction, or customer segmentation. It outlines the type of data required, the format, and how to interpret and use the data effectively. 2. **Sample Datasets and Templates:** To make the process more user-friendly, the section includes sample datasets and templates. These examples showcase the necessary columns and data types, along with a brief explanation of each, helping users to model their datasets accurately. 3. **Target Column Identification:** A crucial aspect of preparing a dataset for machine learning is identifying the target column. This section provides clear guidance on selecting the appropriate target for different types of analyses, whether it's for classification, regression, or clustering. 4. **Data Cleaning and Preparation Tips:** Recognizing that data rarely comes in a ready-to-use format, this section offers valuable tips on cleaning and preparing data, ensuring that users start their predictive analytics journey on the right foot. 5. **Real-World Applications and Use Cases:** To bridge the gap between theory and practice, the section includes examples of real-world applications and use cases. This approach helps users understand how their data preparation efforts translate into actionable insights in various business contexts. # Predict Cross Selling: Dataset The Predict Cross Selling problem is a common challenge faced by businesses looking to maximize their sales opportunities by identifying additional products or services that a customer is likely to purchase. This predictive model falls under the multi-class classification category, where the objective is to predict the likelihood of a customer buying various products, based on their past purchasing behavior and other relevant data. **Dataset Essentials for Cross Selling** To effectively train a machine learning model for cross selling, you need a well-structured dataset that includes: * **Customer Demographics:** Information like age, gender, and income, which can influence purchasing decisions. * **Purchase History:** Detailed records of past purchases, indicating which products a customer has bought. * **Engagement Metrics:** Data on customer interactions with marketing campaigns, website visits, and other engagement indicators. * **Product Details:** Information about the products, such as category, price, and any special features. A typical dataset might look like this: | CustomerID | Age | Gender | AnnualIncome | Product1\_Purchased | Product2\_Purchased | Product3\_Purchased | ... | Target\_Product | | ---------- | --- | ------ | ------------ | ------------------- | ------------------- | ------------------- | --- | --------------- | | 1001 | 28 | M | 50000 | Yes | No | Yes | ... | Product2 | | 1002 | 34 | F | 65000 | No | Yes | No | ... | Product3 | | 1003 | 45 | M | 80000 | Yes | Yes | Yes | ... | Product4 | | 1004 | 30 | F | 54000 | No | No | Yes | ... | Product1 | | 1005 | 50 | M | 62000 | Yes | No | No | ... | Product2 | **Target Column:** The **Target\_Product** column is crucial as it represents the product that the model will predict the customer is most likely to purchase next. **Steps to Success with Graphite Note**

1. **Data Collection:** Gather comprehensive, clean, and well-structured data. 2. **Feature Selection:** Identify the most relevant features that could influence the model's predictions. 3. **Model Training:** Utilize Graphite Note's intuitive platform to train your multi-class classification model. 4. **Evaluation and Iteration:** Continuously assess and refine the model for better accuracy and relevance. **The Advantage of Predict Cross Selling** * **Enhanced Customer Experience:** By understanding customer preferences, businesses can offer more personalized recommendations. * **Increased Sales Opportunities:** Identifying potential cross-sell products can significantly boost sales. * **Data-Driven Decision Making:** Removes guesswork from marketing and sales strategies, relying on data-driven insights. * **Accessibility:** With Graphite Note, even non-technical users can build and deploy these models, making advanced analytics accessible to all. In conclusion, the Predict Cross Selling model is a powerful tool in the arsenal of any business looking to enhance its sales strategy. With Graphite Note, this complex task becomes manageable, allowing businesses to leverage their data for maximum impact. # Predict Customer Churn: Dataset Predicting customer churn is a critical challenge for businesses aiming to retain their customers and reduce turnover. This problem typically involves a binary classification model, where the goal is to predict whether a customer is likely to leave or discontinue their use of a service or product in the near future. **Dataset Essentials for Customer Churn Prediction** A well-structured dataset is key to accurately predicting customer churn. Essential data elements include: * **Customer Demographics:** Age, gender, and other demographic factors that might influence customer loyalty. * **Usage Patterns:** Data on how frequently and in what manner customers use the product or service. * **Customer Service Interactions:** Records of customer support interactions, complaints, and resolutions. * **Transaction History:** Details of customer purchases, payment methods, and transaction frequency. * **Engagement Metrics:** Measures of customer engagement, such as email opens, website visits, or app usage. A typical dataset for churn prediction might look like this: | CustomerID | Age | Gender | AnnualIncome | MonthlyUsage | SupportCalls | LastPurchase | Churned | | ---------- | --- | ------ | ------------ | ------------ | ------------ | ------------ | ------- | | 2001 | 32 | F | 58000 | 20 hours | 2 | 30 days ago | No | | 2002 | 40 | M | 72000 | 15 hours | 0 | 60 days ago | Yes | | 2003 | 25 | F | 45000 | 35 hours | 3 | 10 days ago | No | | 2004 | 29 | M | 50000 | 25 hours | 1 | 45 days ago | No | | 2005 | 47 | F | 65000 | 10 hours | 4 | 90 days ago | Yes |
**Target Column:** The **Churned** column is the target variable, indicating whether the customer has churned (Yes) or not (No). **Steps to Success with Graphite Note**

1. **Data Gathering:** Collect comprehensive and relevant customer data. 2. **Feature Engineering:** Identify and create features that are most indicative of churn. 3. **Model Training:** Use Graphite Note to train a binary classification model on your dataset. 4. **Model Evaluation:** Test the model's performance and refine it for better accuracy. **Benefits of Predicting Customer Churn** * **Proactive Customer Retention:** Identifying at-risk customers allows businesses to take proactive steps to retain them. * **Improved Customer Experience:** Insights from churn prediction can guide improvements in products and services. * **Cost Efficiency:** Retaining existing customers is often more cost-effective than acquiring new ones. * **Accessible Analytics:** Graphite Note's no-code platform makes predictive analytics accessible, enabling businesses of all sizes to leverage AI for customer retention. In summary, the Predict Customer Churn model is an invaluable tool for businesses focused on customer retention. Through Graphite Note, this advanced predictive capability becomes accessible to businesses without the need for extensive technical expertise, allowing them to make informed, data-driven decisions for customer retention strategies. # Predictive Lead Scoring: Dataset Predictive Lead Scoring is a technique used to rank leads in terms of their likelihood to convert into customers. This approach typically employs a binary classification model, where each lead is classified as 'high potential' or 'low potential' based on various attributes and behaviors. **Dataset Essentials for Predictive Lead Scoring** To effectively implement Predictive Lead Scoring, a dataset with the following elements is essential: * **Lead Demographics:** Information such as age, location, and job title. * **Engagement Metrics:** Data on how the lead interacts with your business, like website visits, email opens, and download history. * **Lead Source:** The origin of the lead, such as organic search, referrals, or marketing campaigns. * **Previous Interactions:** History of past interactions, including calls, emails, or meetings. * **Purchase History:** If applicable, details of past purchases or subscriptions. An example dataset for Predictive Lead Scoring might look like this: | LeadID | Age | Location | Job Title | Website Visits | Email Opens | Lead Source | Past Purchases | Converted | | ------ | --- | -------- | --------- | -------------- | ----------- | ----------- | -------------- | --------- | | L1001 | 30 | NY | Manager | 10 | 5 | Organic | 0 | Yes | | L1002 | 42 | CA | Analyst | 3 | 2 | Referral | 1 | No | | L1003 | 35 | TX | Developer | 8 | 7 | Campaign | 2 | Yes | | L1004 | 28 | FL | Designer | 5 | 3 | Organic | 0 | No | | L1005 | 45 | WA | Executive | 12 | 10 | Email | 3 | Yes | | | | | | | | | | | **Target Column:** The **Converted** column is the target variable. It indicates whether the lead converted to a customer. **Steps to Success with Graphite Note**

1. **Data Collection:** Gather detailed and relevant data on leads. 2. **Feature Selection:** Choose the most predictive features for lead scoring. 3. **Model Training:** Utilize Graphite Note to train a binary classification model. 4. **Model Evaluation:** Test and refine the model for optimal performance. **Benefits of Predictive Lead Scoring** * **Efficient Lead Management:** Prioritize leads with the highest conversion potential, optimizing sales efforts. * **Personalized Engagement:** Tailor interactions based on the lead's predicted preferences and potential. * **Resource Optimization:** Allocate marketing and sales resources more effectively. * **Accessible Analytics:** Graphite Note's no-code platform makes predictive lead scoring accessible to teams without deep technical expertise. In summary, Predictive Lead Scoring is a powerful tool for optimizing sales and marketing strategies. With Graphite Note, businesses can leverage advanced analytics to score leads effectively, enhancing their conversion rates and overall efficiency.
# Predict Revenue : Dataset Predict Revenue is a critical task for businesses aiming to forecast future revenue streams accurately. This challenge is typically addressed using a time series forecasting model, which analyzes historical revenue data to predict future trends and patterns. **Dataset Essentials for Predict Revenue** A suitable dataset for Predict Revenue using time series forecasting should include: * **Date/Time:** The timestamp of revenue data, usually in daily, weekly, or monthly intervals. * **Revenue:** The total revenue recorded in each time period. * **Seasonal Factors:** Data on seasonal variations or events that might affect revenue. * **Economic Indicators:** Relevant economic factors that could influence revenue trends. * **Marketing Spend:** Information on marketing and advertising expenditures, if applicable. An example dataset for Predict Revenue with time series forecasting might look like this: | Date | Total Revenue | Seasonal Event | Economic Indicator | Marketing Spend | | ---------- | ------------- | -------------- | ------------------ | --------------- | | 2021-01-01 | $10,000 | New Year | Stable | $2,000 | | 2021-01-08 | $12,000 | None | Stable | $2,500 | | 2021-01-15 | $15,000 | None | Growth | $3,000 | | 2021-01-22 | $13,000 | None | Growth | $2,800 | | 2021-01-29 | $11,000 | None | Stable | $2,200 | **Target Column:** The **Total Revenue** column is the primary focus, as the model aims to forecast future values in this series. **Steps to Success with Graphite Note**

1. **Data Collection:** Compile historical revenue data along with any relevant external factors. 2. **Time Series Analysis:** Utilize Graphite Note to analyze the time series data and identify patterns. 3. **Model Training:** Train a time series forecasting model using the platform. 4. **Model Evaluation:** Continuously evaluate and adjust the model based on new data and changing market conditions. **Benefits of Predict Revenue with Time Series Forecasting** * **Accurate Financial Planning:** Enables more precise budgeting and financial planning. * **Strategic Decision Making:** Informs strategic decisions with insights into future revenue trends. * **Adaptability to Market Changes:** Helps businesses adapt strategies in response to predicted market changes. * **User-Friendly Analytics:** Graphite Note's no-code approach makes sophisticated time series forecasting accessible to users without specialized statistical knowledge. In summary, Predict Revenue with time series forecasting is an essential tool for businesses to anticipate future revenue trends. Graphite Note simplifies this complex task, allowing businesses to leverage their historical data for insightful and actionable revenue predictions. # Product Demand Forecast: Dataset Product Demand Forecast is a crucial process for businesses to predict future demand for their products. This task typically involves time series forecasting models, which analyze historical sales data to forecast future demand patterns. **Dataset Essentials for Product Demand Forecast** An effective dataset for Product Demand Forecast using time series forecasting should include: * **Date/Time:** The timestamp for each data point, typically daily, weekly, or monthly. * **Product Sales:** The number of units sold or the sales volume of each product. * **Product Features:** Characteristics of the product, such as category, price, or any special features. * **Promotional Activities:** Data on any marketing or promotional activities that might affect sales. * **External Factors:** Information on external factors like market trends, economic conditions, or seasonal events. An example dataset for Product Demand Forecast might look like this: | Date | ProductID | Sales Volume | Price | Promotion | Market Trend | Seasonal Event | | ---------- | --------- | ------------ | ----- | ----------- | ------------ | -------------- | | 2021-01-01 | ProdA | 150 | $20 | None | Stable | New Year | | 2021-01-08 | ProdB | 200 | $25 | Discount | Growing | None | | 2021-01-15 | ProdC | 180 | $30 | Ad Campaign | Declining | None | | 2021-01-22 | ProdA | 170 | $20 | None | Stable | None | | 2021-01-29 | ProdB | 220 | $25 | Email Blast | Growing | None | **Target Column:** The **Sales Volume** column is the primary focus, as the model aims to forecast future sales volumes for each product. **Steps to Success with Graphite Note**

1. **Data Collection:** Gather detailed sales data along with product features and external factors. 2. **Time Series Analysis:** Use Graphite Note to analyze the sales data over time, identifying trends and patterns. 3. **Model Training:** Train a time series forecasting model on the platform. 4. **Model Evaluation:** Regularly evaluate the model's performance and adjust it based on new data and market changes. **Benefits of Product Demand Forecast** * **Inventory Management:** Helps in planning inventory levels to meet future demand, avoiding stockouts or overstock situations. * **Strategic Marketing:** Informs marketing strategies by predicting when demand for certain products will increase. * **Resource Allocation:** Assists in allocating resources efficiently based on predicted product demand. * **Accessible Forecasting:** Graphite Note's no-code platform makes advanced forecasting techniques accessible to a wider range of users. In summary, Product Demand Forecast is vital for businesses to anticipate market demand and plan accordingly. With Graphite Note, this complex analytical task becomes manageable, enabling businesses to leverage their data for effective demand planning and strategic decision-making. # Predictive Ads Performance: Dataset Predictive Ads Performance is a process where businesses forecast the effectiveness of their advertising campaigns, particularly focusing on metrics like clicks, conversions, or engagement. This task typically involves regression or classification models, depending on the specific goals of the prediction. **Dataset Essentials for Predictive Ads Performance** A comprehensive dataset for Predictive Ads Performance focusing on predicting clicks should include: * **Date/Time:** The timestamp for when the ad was run. * **Ad Characteristics:** Details about the ad, such as format, content, placement, and duration. * **Target Audience:** Information about the audience targeted by the ad, like demographics, interests, or behaviors. * **Spending:** The amount spent on each ad campaign. * **External Factors:** Any external factors that might influence ad performance, such as market trends or seasonal events. * **Historical Performance Data:** Past performance metrics of similar ads. An example dataset for Predictive Ads Performance with the target column being clicks might look like this: | Date | AdID | Format | Audience Age Group | Spending | Market Trend | Seasonal Event | Clicks | | ---------- | ---- | ------ | ------------------ | -------- | ------------ | -------------- | ------ | | 2021-01-01 | A101 | Video | 18-25 | $500 | Stable | New Year | 300 | | 2021-01-08 | A102 | Image | 26-35 | $750 | Growing | None | 450 | | 2021-01-15 | A103 | Banner | 36-45 | $600 | Declining | None | 350 | | 2021-01-22 | A104 | Video | 46-55 | $800 | Stable | None | 500 | | 2021-01-29 | A105 | Image | 18-25 | $700 | Growing | None | 600 | **Target Column:** The **Clicks** column is the primary focus, as the model aims to forecast the number of clicks each ad will receive. **Steps to Success with Graphite Note**

1. **Data Collection:** Compile detailed data on past ad campaigns, including spending, audience, and performance metrics. 2. **Feature Engineering:** Identify and create features that are most indicative of ad performance. 3. **Model Training:** Use Graphite Note, Regression Model, to train a model that can predict the number of clicks based on the ad characteristics and other factors. 4. **Model Evaluation:** Test the model's accuracy and refine it for better performance. **Benefits of Predictive Ads Performance** * **Optimized Ad Spending:** Predict which ads are likely to perform best and allocate budget accordingly. * **Targeted Campaigns:** Tailor ads to the audience segments most likely to engage. * **Performance Insights:** Gain insights into what makes an ad successful and apply these learnings to future campaigns. * **Accessible Analytics:** Graphite Note's no-code platform makes predictive analytics accessible, enabling businesses to leverage AI for ad performance prediction without needing deep technical expertise. In summary, Predictive Ads Performance is a valuable tool for businesses looking to maximize the impact of their advertising efforts. With Graphite Note, this advanced capability becomes accessible, allowing for data-driven decisions in ad campaign management.
# Media Mix Modeling (MMM): Dataset Media Mix Modeling (MMM) is a statistical analysis technique used to quantify the impact of various marketing channels on sales and other key performance indicators (KPIs). It helps businesses allocate their marketing budget more effectively by understanding the contribution of each channel to overall performance. **Dataset Essentials for Media Mix Modeling** A robust dataset for Media Mix Modeling should include: * **Time Period:** The specific dates or periods for which the data is collected. * **Marketing Spend:** The amount spent on each marketing channel during the period. * **Sales Data:** The total sales achieved in the same time period. * **Channel Performance Metrics:** Metrics like impressions, clicks, conversions, etc., for each channel. * **External Factors:** Information on external factors like economic conditions, competitor activities, or seasonal events. * **Market Dynamics:** Changes in market conditions, customer preferences, or product availability. An example dataset for Media Mix Modeling might look like this: | Time Period | TV Spend | Digital Spend | Radio Spend | Print Spend | Total Sales | Economic Condition | Seasonal Event | | ----------- | -------- | ------------- | ----------- | ----------- | ----------- | ------------------ | -------------- | | Jan 2021 | $20,000 | $15,000 | $5,000 | $3,000 | $100,000 | Stable | New Year | | Feb 2021 | $25,000 | $18,000 | $4,000 | $3,500 | $120,000 | Growth | Valentine's | | Mar 2021 | $22,000 | $20,000 | $6,000 | $4,000 | $110,000 | Stable | None | | Apr 2021 | $18,000 | $17,000 | $5,500 | $4,500 | $105,000 | Declining | Easter | | May 2021 | $20,000 | $19,000 | $7,000 | $4,000 | $115,000 | Growth | Memorial Day | **Target Column: Totall Sales** **Steps to Success with Graphite Note**

1. **Data Compilation:** Gather comprehensive data across all marketing channels and corresponding sales data. 2. **Model Development:** Use Graphite Note, Regression Model, to develop a statistical model that correlates marketing spend across various channels with sales outcomes. 3. **Analysis and Insights:** Analyze the model's output to understand the effectiveness of each marketing channel. 4. **Strategic Decision Making:** Apply these insights to optimize future marketing spends and strategies. **Benefits of Media Mix Modeling** * **Optimized Marketing Budget:** Allocate marketing budgets more effectively across channels. * **ROI Analysis:** Understand the return on investment for each marketing channel. * **Strategic Planning:** Plan marketing strategies based on data-driven insights. * **Adaptability:** Adjust marketing strategies in response to changing market conditions and consumer behaviors. * **Accessible Advanced Analytics:** Graphite Note's no-code platform makes complex MMM accessible to teams without specialized statistical knowledge. In summary, Media Mix Modeling is a powerful tool for businesses to optimize their marketing strategies based on comprehensive data analysis. With Graphite Note, this advanced capability becomes accessible, allowing for more informed and effective marketing budget allocation. # Customer Lifetime Value Prediction : Dataset Customer Lifetime Value (CLV) prediction is a process used by businesses to estimate the total value a customer will bring to the company over their entire relationship. This prediction helps in making informed decisions about marketing, sales, and customer service strategies.

**Dataset Essentials for Customer Lifetime Value Prediction** A suitable dataset for CLV prediction should include: * **Date:** The date of each transaction or interaction with the customer. * **Customer ID:** A unique identifier for each customer. * **Monetary Spent:** The amount of money spent by the customer on each transaction. An example dataset for Customer Lifetime Value prediction might look like this: | Date | Customer ID | Monetary Spent | | ---------- | ----------- | -------------- | | 2021-01-01 | C001 | $150 | | 2021-01-15 | C002 | $200 | | 2021-02-01 | C001 | $100 | | 2021-02-15 | C003 | $250 | | 2021-03-01 | C002 | $300 | **Steps to Success with Graphite Note** 1. **Data Collection:** Compile transactional data including customer IDs and the amount spent. 2. **Data Analysis:** Use Graphite Note to analyze the data, focusing on customer purchase patterns and frequency. 3. **Model Training:** Train a model to predict the lifetime value of a customer based on their transaction history. **Benefits of Predicting Customer Lifetime Value** * **Targeted Marketing:** Focus marketing efforts on high-value customers. * **Customer Segmentation:** Segment customers based on their predicted lifetime value. * **Resource Allocation:** Allocate resources more effectively by focusing on retaining high-value customers. * **Personalized Customer Experience:** Tailor customer experiences based on their predicted value to the business. * **Strategic Decision-Making:** Make informed decisions about customer acquisition and retention strategies. In summary, predicting Customer Lifetime Value is crucial for businesses to understand the long-term value of their customers. Graphite Note facilitates this process by providing a no-code platform for analyzing customer data and predicting their lifetime value, enabling businesses to make data-driven decisions in customer relationship management.
# RFM Customer Segmentation : Dataset **RFM Customer Segmentation: An Overview**\ RFM (Recency, Frequency, Monetary) customer segmentation is a method businesses use to categorize customers based on their purchasing behavior. This approach helps personalize marketing strategies, improve customer engagement, and increase sales.

The segmentation is based on three criteria: * **Recency**: How recently a customer made a purchase. * **Frequency**: How often they make purchases. * **Monetary Value**: How much money they spend. **Essential Dataset Components for RFM Segmentation**\ A robust dataset for effective RFM segmentation includes the following key elements: 1. **Date (Recency):** The date of each customer's last transaction, essential for assessing the 'Recency' aspect of RFM. 2. **Customer ID:** A unique identifier for each customer, crucial for tracking individual purchasing behaviors. 3. **Monetary Spent (Monetary Value):** The total amount spent by the customer in each transaction, to evaluate the 'Monetary' component of RFM. **Example Dataset for RFM Customer Segmentation** | Date | Customer ID | Monetary Spent | | ---------- | ----------- | -------------- | | 2021-01-01 | C001 | $150 | | 2021-01-15 | C002 | $200 | | 2021-02-01 | C001 | $100 | | 2021-02-15 | C003 | $250 | | 2021-03-01 | C002 | $300 | **Steps to Success with Graphite Note for RFM Segmentation** 1. **Data Collection:** Gather comprehensive data including customer IDs, transaction dates, and amounts spent. 2. **Data Analysis:** Utilize Graphite Note to dissect the data, focusing on recency, frequency, and monetary values of customer transactions. 3. **Segmentation Modeling:** Employ models to segment customers based on RFM criteria, facilitating targeted marketing strategies. **Benefits of RFM Segmentation Using Graphite Note** * **Enhanced Marketing Strategies:** Tailor marketing campaigns based on customer segments. * **Improved Customer Engagement:** Customize interactions based on individual customer behaviors. * **Efficient Resource Allocation:** Focus efforts on the most profitable customer segments. * **Strategic Business Decisions:** Make informed choices regarding customer relationship management and retention strategies. In conclusion, RFM Customer Segmentation is a powerful approach for businesses seeking to understand and cater to their customers more effectively. Graphite Note offers a no-code platform that simplifies the analysis of customer data for RFM segmentation, enabling businesses to leverage their data for strategic advantage in customer engagement and retention.
# Dataset examples - from online sources Data is an essential component of any data modeling and analysis process. The kind of data you need for modeling depends on the specific problem you are trying to solve. In general, the data should be relevant, accurate, and consistent, and it should cover a significant period. In some cases, you may also need to preprocess or transform the data to make it suitable for modeling. If you are new to using Graphite Note or are looking for some examples to practice with, there are several popular datasets available that you can explore. Some examples include weather data, financial data, social media data, and sensor data. These datasets are often available in open-source repositories or can be downloaded from public sources, such as government websites, social media platforms, or financial databases.

Graphite Note is a powerful tool that allows you to predict, visualize and analyze data in real-time. With the right dataset, you can use Graphite Note to gain valuable insights and make informed decisions about your business or research. Whether you are analyzing financial data to predict market trends or monitoring sensor data to optimize your production processes, our platform can help you make sense of your data and identify patterns that would be difficult to detect otherwise. While the kind of data you need may vary depending on your specific needs, there are several popular datasets that you can use to practice and explore the capabilities of Graphite Note. With the right dataset and a solid understanding of data modeling and analysis, you can unlock the full potential of Graphite Note and gain insights that will drive your business or research forward. We have highlighted a few popular datasets so you can get to know our platform better. After that, it's all up to you - collect your data and start having insights and fun! Explore all Graphite no-code machine learning Models [here](https://graphite-note.com/no-code-machine-learning-models). Explore the most popular Use Cases [here](https://graphite-note.com/use-cases-no-code-machine-learning). ## 1. [Lead Scoring Dataset](https://graphite-note.com/wp-content/uploads/2024/03/Demo-Graphite-Note-dataset-Lead-Scoring.csv) ( source: Kaggle) An education company named “X Education” sells online courses to industry professionals. Many professionals interested in the courses land on their website and browse for courses on any given day—an excellent dataset for Binary Classification, with a target column "**Converted**" (YES/NO). #### 1.1 Usage Use Graphite Note to gain valuable insights into your sales pipeline by identifying which leads are converting to customers and the factors that contribute to their success. With this information, you can optimize your sales strategy and improve your overall conversion rates. In addition, our tool can also help you predict which new leads are most likely to convert to customers and provide a probability score for each lead. This can enable you to prioritize your sales efforts and focus on the leads with the highest conversion potential. By leveraging our tool, you can gain a deeper understanding of your sales funnel and take proactive steps to improve your conversion rates, reduce churn, and increase revenue. #### 1.2 Model Type

#### 1.3 How to Build This Model To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Binary Classification model in Graphite Note with the 'Converted' variable as the Target Variable. This will allow you to predict which leads are most likely to convert to customers. After training the model, explore the insights that it provides, such as the most important features for predicting conversion and the distribution of conversion probabilities. This can help you to gain a better understanding of the factors that contribute to lead conversion and make informed decisions about your sales strategy. #### 1.4 What-if? Finally, you can use the model to run a "what-if" scenario by predicting the conversion probability for new leads based on different scenarios or assumptions. This can help you to forecast the impact of changes in your sales approach or marketing efforts and make data-driven decisions. By following these steps, you can leverage Graphite Note and the provided dataset to gain valuable insights into your sales pipeline, predict lead conversion, and optimize your sales strategy for better results. [Predictive Lead Scoring Live Demo](https://app.graphite-note.com/#/public/notebook/86228b91572a) ## 2. [Customer Churn Dataset](https://graphite-note.com/wp-content/uploads/2024/03/Demo-Graphite-Note-dataset-Churn.csv) ( source: Kaggle) A Telco company customer dataset. Each row represents a customer and each column contains the customer’s attributes. The dataset includes information about: Customers who left the company – that will be our target column, ("**Churn**"). Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies. Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges. Demographic info about customers – gender, age range, and if they have partners and dependents. #### 2.1 Usage Use Graphite Note to gain valuable insights into your customer base and identify which customers are most likely to churn. By analyzing the factors that contribute to churn, you can optimize your retention strategy and reduce customer churn rates. In addition, our tool can also help you predict which customers are at high risk of churning, and provide a probability score for each customer. This can enable you to take proactive steps to retain those customers with the highest churn risk, such as offering personalized promotions or improving their overall experience. By leveraging our tool, you can gain a deeper understanding of your customer base and identify opportunities to reduce churn, increase retention rates, and ultimately drive revenue growth. With our predictive churn model, you can make data-driven decisions that lead to more satisfied customers and a stronger business. #### 2.2 Model Type

#### 2.3 How to Build This Model To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Binary Classification model in Graphite Note with the 'Churn' variable as the Target Variable. This will allow you to predict which customers are most likely to churn. After training the model, explore the insights that it provides, such as the most important features for predicting churn and the distribution of churn probabilities. This can help you to gain a better understanding of the factors that contribute to customer churn and make informed decisions about your retention strategy. #### 2.4 What-if? Finally, you can use the model to run a "what-if" scenario by predicting the churn probability for different groups of customers based on different scenarios or assumptions. This can help you to forecast the impact of changes in your retention approach or customer experience efforts and make data-driven decisions. By following these steps, you can leverage Graphite Note and the provided dataset to gain valuable insights into your customer base, predict customer churn, and optimize your retention strategy for better results. [Predictive Customer Churn Live Demo](https://app.graphite-note.com/#/public/notebook/77e487e2b12c) ## 3. [Car Sales ](https://graphite-note.com/wp-content/uploads/2024/03/Demo-Graphite-Note-dataset-Monthly-Car-Sales.csv)(source: GitHub) The dataset contains monthly data on car sales from 1960 to 1968. It is great for our time series forecast model with which you can predict sales for the upcoming months. #### 3.1 Usage Use Graphite Note to gain valuable insights into your business operations and forecast future trends by analyzing time series data. With our advanced forecasting models, you can make informed decisions about your business and optimize your operations for better results. Our tool enables you to analyze historical data and identify patterns and trends, such as seasonality or cyclical trends. This can help you to forecast future demand or performance and make data-driven decisions about resource allocation, capacity planning, or inventory management. #### 3.2 Model Type

#### 3.3 How to Build This Model To get started, download the provided dataset and upload it to Graphite Note. Once uploaded, create a new Timeseries Forecast model in Graphite Note with * The 'Sales' variable as aTarget Variable * Time/Date Column: Month * Time Interval: Monthly After training the model, explore the insights that it provides, such as identifying patterns, seasonality, and trends. This can help you to forecast future performance, plan resources effectively, and optimize your operations. #### 3.4 What-if? Finally, you can use the model to run a "what-if" scenario by predicting future values. This can help you to forecast the impact of changes in your business operations, such as changes in demand, capacity planning, or inventory management. By following these steps, you can leverage Graphite Note to gain valuable insights into your business trends, forecast future performance, and optimize your operations for better results. With our advanced time series forecasting models, you can stay ahead of the competition and take advantage of new opportunities as they arise. [Time series Forecasting Live Demo](https://app.graphite-note.com/#/public/notebook/a7786aff906e) ## 4. [eCommerce orders example](https://graphite-note.com/wp-content/uploads/2024/03/Demo-Graphite-Note-dataset-eCommerce-Orders.csv) This is a demo CSV with orders for an imaginary eCommerce shop. You can use it for Timeseries forecasting, RFM model, Customer Lifetime Value Model, General Segmentation, or New vs Returning Customers model in Graphite. ## 7.[ Mall Customers ](https://graphite-note.com/wp-content/uploads/2024/03/Demo-Graphite-Note-dataset-Mall-Customers.csv)(source: Kaggle) A demo Mall Customers dataset from Kaggle. Ideal for General customer segmentation in Graphite. # Free datasets for Machine Learning There are plenty of free sources to find free datasets for machine learning. Here is a list of some of the most popular ones. For each dataset, it is necessary to determine its quality. Several characteristics describe high-quality data, but it is essential to point out accuracy, reliability, and completeness. Every high-quality data should be precise and error-free. Otherwise, your data is misleading and inefficient. If your data is not complete, it is harder to use because of the lack of information. What if your data is ambiguous or vague? You cannot trust your data; it's unreliable. ## Free datasets for machine learning By googling stuff like free datasets for machine learning, time-series dataset, classification dataset, etc., you see many links to different sources. But which of them includes high-quality data? We will list a few sources, but it is essential to know that among them, there are also data that have their drawbacks. Therefore, you have to be familiar with the characteristics of a good dataset. ## Kaggle [Kaggle ](https://www.kaggle.com/)is a big data science competition platform for predictive modeling and analytics. There are plenty of datasets you can use to learn artificial intelligence and machine learning. Most of the data is accurate and referenced, so you can test or improve your skills or even work on projects that could help people. Each dataset has its usability score and description. Within the dataset, there are various tabs such as Tasks, Code, Discussions, etc. Most datasets are related to different projects, so you can find other trained and tested models on the same datasets. On Kaggle, you can find a big community of data analysts, data scientists, and machine learning engineers who can evaluate your work and give you valuable tips for further development. ## UCI Machine Learning Repository [The UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets.php) is a database of high-quality and real-world datasets for machine learning algorithms. Datasets are well known in terms of exciting properties and expected good results; they can be an example of valuable baselines for comparisons. On the other hand, the datasets are small and already pre-processed. ## GitHub [GitHub ](https://github.com/search?q=datasets)is one of the world’s largest communities of developers. The primary purpose of GitHub is to be a code repository service. In most cases within a project, we can find its application on some datasets; you will need to spend a little more time to find the wanted dataset, but it will be worth it. ## data.world [data.world](https://data.world/search) is a large data community where people discover data and share analysis. Inside almost every project, there are some available datasets. When searching, you must be very precise to get the desired results. Of course, there are many more sources, depending on your need. For example, if you need economic and financial datasets, you can visit [World Bank Open Data](https://data.worldbank.org/), [Yahoo Finance](https://finance.yahoo.com/), [EU Open Data Portal](https://data.europa.eu/data/datasets?locale=en\&minScoring=0), etc. Once you have found your dataset, it’s Graphite time; run several models and create various reports using visualizations and tables. With Graphite, it's easier to make business decisions. Maybe you are just a few clicks away from the turning point of your career. # Introduction When you open a dataset, you have five different tabs: [Settings](#settings), [Columns](#columns), [View Data](#view-data), [Summary](#summary), and [Association ](#association)tabs.

### Settings First, on the **Settings** tab, you can re-upload the dataset, rename it, and change the description and the tag. You also have the information on the type, the ID, the creation date, and the updated date.

### Columns The **Columns** tab provides you the original name, the column name, the data type, and the data format of each column, that you can modify.

### View Data On the **View Data** tab, you have all the data with the number of columns and rows.

### Summary The **Summary** gives a simple analysis of each column with a graph. For numerical columns, it counts the number of null values; and calculates the sum, the mean, the standard deviation, the min, the max, the lower and upper quantile, and the median. For categorical columns, it counts the number of null values, of unique values, and the min and max length.

### Association The last part is the **Association** tab, which measures the relationship between two variables. The association between numerical variables is the correlation: * a zero correlation indicates no relationship between the variables * a correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up as well * a correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down If you need, you can use the *More details* button to better understand associations.

# Data sources # Import data from CSV file **Overview** Graphite Note’s CSV integration allows you to import your data from CSV files. This guide will walk you through the steps to import your CSV data into Graphite Note. **Steps to Import Data** 1. **Create a New Dataset** * Option 1: Go to the homepage and click on "Create" under Datasets. * Option 2: From the datasets list, click on "New Dataset."

Option 2: Create Dataset from the Dataset List page

2. **Select CSV file** * Choose "CSV" as your data source and click "Next".

3. **Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

4. **Choose Parsing Options** * Configure your parsing options to properly interpret the CSV file. * **Delimiter:** Choose the character that separates values in your CSV (e.g., comma, semicolon). * **Header:** Indicate whether your CSV file has a header row. * **Convert Empty Values to Null:** Convert 'empty string' values to null. By default, this is turned on. * **File Encoding:** Select the file character encoding. The default is Unicode (UTF-8). * **Skip Empty Rows:** Skip empty rows at the beginning or end of the file. By default, Graphite Note will ignore those lines. * Click on "Parse file" to process the file.

5. **Review parsed data**
Review the parsed data and make any necessary adjustments: * **Rename Columns:** Change column names if needed. * **Change Data Types:** Adjust the data types of your columns as required. * Click on the "Create" button to finalize and create your dataset. #### CSV filesize upload limits Graphite Note limits the file size for CSV uploads to 50 MB. This restriction ensures that the platform maintains optimal performance, prevents excessive resource consumption, and ensures fast processing times for data uploads. If users need to work with larger datasets, alternative methods such as connecting directly to a database (Postgres, MySQL, BigQuery) can be used. #### Next Steps Now that your data is imported and prepared, you can proceed to create a model without writing any code. Simply go to the [Models](https://app.gitbook.com/o/tMDHUVlAYCj1y8ercRu6/s/gnR78y9L7FDWeb4jdvdW/models/introduction) and follow the steps to build and deploy your model using Graphite Note's intuitive interface. # Re-upload or append CSV If you've collected additional data related to your previously uploaded CSV file or there have been changes to the existing data, you can use the re-upload option. This allows you to update or append new data to your existing dataset. Expanding your dataset with more records can benefit your machine learning model, but the impact depends on the quality and relevance of the new data. Learn more about the benefits of dataset expansion [here](/graphite-note-documentation/datasets/prepare-your-data/expanding-datasets). #### How to Re-upload a CSV File: 1. **Go to the Datasets List**: * Navigate to the list of datasets.

2. **Select the Dataset**: * Choose the dataset you wish to re-upload.

3. **Select Re-upload**: * Click on the 'Re-upload' option.

4. **Choose Options and upload**: * **Decide on data append:** * Depending on your needs, you can choose 'Append data' to add new records to the existing dataset. If 'Append data' is turned off, the new dataset will overwrite the existing * **Upload Your File**: * Select or drag and drop your CSV file. Ensure the file has the same column structure as the previously uploaded file. * Click **Update** to complete the re-upload process.

# CSV upload troubleshooting tips If you’re experiencing problems when uploading CSV files to Graphite Note, this page provides a concise list of common issues and recommended solutions. ### 1. File Too Large **Symptom:** You receive an error indicating that the file exceeds the size limit. **Cause:** Graphite Note enforces a maximum 50 MB limit for CSV uploads. **Solution:** * Split the dataset into multiple smaller CSV files. * Remove unnecessary columns or rows to reduce file size. * Connect directly to a database (e.g., PostgreSQL, MySQL, BigQuery) for larger datasets. *** ### 2. Incorrect Delimiters or Quoting **Symptom:** Data is misaligned, or you see parsing errors. **Cause:** Inconsistent delimiter usage (e.g., commas in some rows, semicolons in others) or unescaped quotes. **Solution:** * Ensure a single, consistent delimiter throughout (commas or semicolons). * Escape or quote fields that contain the delimiter (e.g., "123, Main Street"). *** ### 3. Encoding / Special Character Issues **Symptom:** Special characters (e.g., accented letters, non-Latin scripts) appear as garbled text or question marks. **Cause:** The CSV file is not saved in UTF-8 format. **• Solution:** * Save or export the file in UTF-8. * Convert the file encoding using a text editor (e.g., VSCode, Notepad++) if CSV UTF-8 is not available in your spreadsheet software. *** ### 4. Missing or Inconsistent Headers **Symptom:** Your upload fails, or certain columns are not recognized. **Cause:** The first row does not contain meaningful column names, or column names differ between rows. **Solution:** * Always include a header row with consistent, clear column names. * Avoid special characters or spaces in headers; use underscores or camelCase instead. *** ### 5. Date/Time Format Confusion **Symptom:** Dates appear as 0000-00-00 or fail to parse entirely. **Cause:** Inconsistent date/time formats (e.g., mixing MM/DD/YYYY with YYYY-MM-DD). **Solution:** * Standardize all date fields to a single format (e.g., ISO 8601: YYYY-MM-DD HH:MM:SS). * Confirm your spreadsheet application doesn’t convert date formats automatically. *** ### 6. Empty or Corrupted Rows **Symptom:** You see blank rows or the upload process halts unexpectedly. **Cause:** Hidden rows, line breaks, or a corrupted export process. **Solution:** * Remove any hidden or blank rows in your spreadsheet before exporting. * Open the CSV in a plain text editor to confirm each row is valid and properly delimited. *** ### 7. Network or Timeout Errors **Symptom:** The file upload keeps failing or timing out. **Cause:** Unstable internet connection or very large file size near the 50 MB limit. **Solution:** * Split the file if it’s too large. * Retry on a stronger or wired internet connection. *** ### Additional Resources [**CSV File creating and formatting**](/graphite-note-documentation/datasets/prepare-your-data/csv-file-creating-and-formatting)\ Learn how to format and encode your CSV files for best results. [**Creating a CSV from Excel**](/graphite-note-documentation/datasets/prepare-your-data/csv-file-creating-and-formatting#how-to-create-a-csv-file-from-excel)\ Step-by-step instructions on exporting your data to CSV format. [**Dataset API**](/graphite-note-documentation/rest-api/api-introduction)\ Explore a code-free approach to sending data directly to Graphite Note without manual file uploads. *** ### Still Need Help? If you’ve tried these suggestions and still experience issues, contact our support team by emailing . We’re here to help you resolve any CSV upload problems and ensure your data is correctly imported. # MySQL Connector **Overview** The MySQL connector in Graphite Note allows you to import your data from a MySQL database or run custom SQL queries directly within the platform. **Prerequisites** Before starting, ensure your firewall allows incoming requests from the following IP address: * 99.81.63.220 **Steps to Import Data** **1. Create a New Dataset** * Option 1: Go to the homepage and click on "Create" under Datasets. * Option 2: From the datasets list, click on "New Dataset."

**2. Select MySQL** * Choose "MySQL" as your dataset source and click "Next".

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

**4. Establish a Connection** Fill in the following connection details: * Server Name: Enter the hostname or IP address of your MySQL instance. * Database Port: Enter the port number for your database (typically 3306). * Database User: Provide the username for the database. * Database Password: Enter the password for the database user. * Database Name: Specify the name of the database you wish to connect to. * **SSL (Secure Sockets Layer):** SSL is a protocol for encrypting information over the internet. If your MySQL instance requires SSL, ensure you enable this option.

**2. Select MariaDB** * Choose "MariaDB" as your dataset source and click "Next".

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

**4. Establish a Connection** Fill in the following connection details: * Server Name: Enter the hostname or IP address of your MariaDB instance. * Database Port: Enter the port number for your database (typically 3306). * Database User: Provide the username for the database. * Database Password: Enter the password for the database user. * Database Name: Specify the name of the database you wish to connect to. * **SSL (Secure Sockets Layer):** SSL is a protocol for encrypting information over the internet. If your MariaDB instance requires SSL, ensure you enable this option.

**2. Select PostgreSQL** * Choose "PostgreSQL" as your dataset source and click "Next".

Select PostgreSQL as your dataset source

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

**4. Establish a Connection** Fill in the following connection details: * Server Name: Enter the hostname or IP address of your PostgreSQL instance. * Database Port: Enter the port number for your database (typically 5432). * Database User: Provide the username for the database. * Database Password: Enter the password for the database user. * Database Name: Specify the name of the database you wish to connect to. * SSL (Secure Sockets Layer)**:** SSL is a protocol for encrypting information over the internet. If your PostgreSQL instance requires SSL, ensure you enable this option.

**2. Select Redshift** * Choose "Redshift" as your dataset source and click "Next".

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

**4. Establish a Connection** Fill in the following connection details: * Server Name: Enter the hostname or IP address of your Redshift instance. * Database Port: Enter the port number for your database (typically 5432). * Database User: Provide the username for the database. * Database Password: Enter the password for the database user. * Database Name: Specify the name of the database you wish to connect to. * SSL (Secure Sockets Layer)**:** SSL is a protocol for encrypting information over the internet. If your Redshift instance requires SSL, ensure you enable this option.

**2. Select Big Query** * Choose "Big Query" as your dataset source and click "Next".

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

#### 4. **Configure Big Query Connection** * **Project ID**: Enter your Google Cloud project ID. * **Dataset ID**: Enter the Big Query dataset ID. * **Table ID**: Enter the table ID if you want to import data from a specific table.

#### 5. Download JSON Key from BigQuery To enable Graphite Note to access your BigQuery data, you'll need to provide a service account key in JSON format: 1. **Create a Service Account:** * Navigate to the Google Cloud Console. * Select your project or create a new one. * Go to **IAM & Admin** > **Service Accounts**. * Click on **+ CREATE SERVICE ACCOUNT**. * Provide a name and description for the service account, then click **CREATE**. 2. **Grant Permissions:** * Assign the necessary roles, such as **BigQuery Data Viewer** and **BigQuery Job User**. These roles allow the service account to view datasets and execute queries. 3. **Create Key:** * In the service account list, click on the created service account. * Go to the **Keys** tab and click **ADD KEY** > **Create new key**. * Select **JSON** as the key type and click **Create**. This will download the JSON key file to your computer. #### 6. Upload JSON Key to Graphite Note 1. **Access the Dataset Creation Page:** * Return to the Graphite Note platform where you left off at the "Create a New Dataset" step. 2. **Upload JSON Key:** * You will be prompted to upload the JSON key file. Click on **Upload JSON Key** and select the file you downloaded from the Google Cloud Console. 3. **Check Connection:** * After uploading the JSON key, click **Check Connection** to ensure that Graphite Note can successfully connect to your BigQuery instance. #### 7. Review data Review the data and make any necessary adjustments: * **Rename Columns:** Change column names if needed. * **Change Data Types:** Adjust the data types of your columns as required. * Click on the "Create" button to finalize and create your dataset. #### Next Steps Now that your data is imported and prepared, you can proceed to create a model without writing any code. Simply go to the [Models](https://app.gitbook.com/o/tMDHUVlAYCj1y8ercRu6/s/gnR78y9L7FDWeb4jdvdW/models/introduction) and follow the steps to build and deploy your model using Graphite Note's intuitive interface. # MS SQL Connector **Overview** The MS SQL connector in Graphite Note allows you to import your data from a MS SQL or run custom SQL queries directly within the platform. **Prerequisites** Before starting, ensure your firewall allows incoming requests from the following IP address: * 99.81.63.220 **Steps to Import Data** **1. Create a New Dataset** * Option 1: Go to the homepage and click on "Create" under Datasets. * Option 2: From the datasets list, click on "New Dataset."

**2. Select MS SQL** * Choose "MS SQL" as your dataset source and click "Next".

**3. Enter Dataset Information** * Name: Provide a name for the dataset. * Description: Add a short description of the data. * Tags: Add tags for better organization. * Click "Next" to proceed.

**4. Establish a Connection** Fill in the following connection details: * Server Name: Enter the hostname or IP address of your MS SQL instance. * Database Port: Enter the port number for your database (typically 1433). * Database User: Provide the username for the database. * Database Password: Enter the password for the database user. * Database Name: Specify the name of the database you wish to connect to. * **SSL (Secure Sockets Layer):** SSL is a protocol for encrypting information over the internet. If your MS SQL instance requires SSL, ensure you enable this option.

**5. Check the Connection** * Click on "Check Connection" to validate the connection details. **6. Write and Run SQL** * Write the desired SQL query to fetch your data. * Click on the "Run SQL" button to execute the query and retrieve the data. **7. Review and Adjust Data** * You should see all the columns from the selected dataset appearing. * If necessary, you can change column names, data types, or data formats. **8. Create the Dataset** * Click on the "Create" button to finalize and create your dataset. **Troubleshooting Connection Issues** Ensure your firewall settings are configured to accept incoming requests from the IP address mentioned above. This is crucial for establishing a successful connection between Graphite Note and your MS SQL Server. **Import Process** Once the connection is validated: * Graphite Note will initiate the data import process. * The duration of this process depends on the size of your dataset. Small datasets will import in a few minutes, while larger datasets may take longer. **Next Steps** * Now that your data is imported and prepared, you can proceed to create a model without writing any code. Simply go to the [Models](https://app.gitbook.com/o/tMDHUVlAYCj1y8ercRu6/s/gnR78y9L7FDWeb4jdvdW/models/introduction) and follow the steps to build and deploy your model using Graphite Note's intuitive interface. # Oracle Connector Oracle Connector for Graphite Note will soon be available for public use. This connector will enable seamless integration with Oracle databases, enhancing data accessibility and supporting advanced analysis within the Graphite Note platform. Stay tuned for its release to unlock efficient Oracle data connectivity. # Prepare your Data data prep # Data Labeling ### **Introduction** Data labeling is the process of tagging data with meaningful and informative labels to train machine learning models. In predictive analytics, labeled data is crucial as it provides the model with examples of correct behavior. This document will guide you through the process of preparing and labeling data for three predictive models: Lead Scoring, Churn Prediction, and MQL to SQL Conversion. ### **1. Lead Scoring Model (Converted: Yes/No)** **Objective:** Predict if a lead will convert into a customer. **Dataset Example:** | Lead\_ID | Industry | Company\_Size | Interaction\_Count | Converted | | -------- | -------- | ------------- | ------------------ | --------- | | 001 | Tech | 50-100 | 5 | Yes | | 002 | Finance | 100-500 | 2 | No | **Steps:** 1. **Data Collection:** Gather data on leads, including their industry, company size, and interactions with your platform. 2. **Labeling:** For each lead, label them as 'Yes' if they converted into a customer and 'No' if they didn't. 3. **Reasoning:** Labeling helps the model understand patterns of conversion based on the features provided. ### **2. Churn Prediction Model (Churned: Yes/No)** **Objective:** Predict if a customer will churn or leave your service. **Dataset Example:** | Customer\_ID | Monthly\_Usage | Support\_Tickets | Feedback\_Score | Churned | | ------------ | -------------- | ---------------- | --------------- | ------- | | A1 | 50 hrs | 2 | 4.5 | No | | B2 | 10 hrs | 5 | 2.8 | Yes | **Steps:** 1. **Data Collection:** Gather data on customer usage patterns, support interactions, and feedback scores. 2. **Labeling:** For each customer, label them as 'Yes' if they churned and 'No' if they continued using your service. 3. **Reasoning:** Labeling helps the model identify signs of customer dissatisfaction or reduced engagement, which might lead to churn. ### **3. MQL to SQL Conversion Model (Converted: Yes/No)** **Objective:** Predict if a Marketing Qualified Lead (MQL) will become a Sales Qualified Lead (SQL). **Dataset Example:** | MQL\_ID | Webinar\_Attendance | Downloaded\_Content | Email\_Click\_Rate | Converted | | ------- | ------------------- | ------------------- | ------------------ | --------- | | M1 | 2 | Yes | 15% | Yes | | M2 | 0 | No | 5% | No | **Steps:** 1. **Data Collection:** Gather data on MQLs, including their engagement with webinars, content downloads, and email interactions. 2. **Labeling:** For each MQL, label them as 'Yes' if they became an SQL and 'No' if they didn't. 3. **Reasoning:** Labeling helps the model recognize patterns of engagement that indicate a lead's readiness to move to the sales stage. ### **Conclusion** Data labeling is a foundational step in predictive analytics. By providing clear, accurate labels, you enable your predictive models to learn from past data and make accurate future predictions. Ensure your labels are consistent and based on well-defined criteria to achieve the best results with Graphite Note's no-code predictive analytics platform. # Expanding datasets Expanding your dataset with more records can be beneficial for your machine learning model, but the impact depends on the quality and relevance of the new data. To learn how to re-upload or append data to your existing CSV dataset go [here](/graphite-note-documentation/datasets/data-sources/import-data-from-csv-file/re-upload-or-append-csv). Here’s how adding more records might help: #### 1. Improved Generalization More data generally helps the model to learn better and generalize to unseen data. If your initial dataset was limited, your model might have overfitted to the specific patterns in that data. Adding more data helps the model capture a wider range of patterns, leading to better performance on new data. #### 2. Reducing Overfitting When your dataset is small, the model may learn noise or irrelevant patterns (overfitting). Expanding the dataset introduces more variety, making it harder for the model to memorize specific samples, thereby helping to reduce overfitting. #### 3. Better Representing the Data Distribution A larger dataset often better represents the underlying data distribution, especially if the new records cover more edge cases, outliers, or scenarios that were underrepresented in the original dataset. This helps the model become more robust and perform well across a wider range of inputs. #### 4. Enhanced Model Accuracy In most cases, expanding your dataset improves the accuracy of the model, especially if the model is data-hungry (like deep learning models). More data means more examples for the model to learn from, allowing it to better predict future outcomes. #### 5. Handling Class Imbalance If your dataset suffers from class imbalance (e.g., if one class has far more records than another in a classification problem), adding more records from the minority class can make your dataset more balanced, improving the model’s ability to predict minority classes correctly. #### Considerations: • Quality over Quantity: Simply adding more data isn’t always beneficial if the additional data is noisy, irrelevant, or incorrectly labeled. High-quality, representative data is more important than just increasing the size of the dataset. • Data Diversity: Adding data that captures a wider variety of features or scenarios is more helpful than adding redundant or very similar data points. If the new data points are too similar to the existing ones, the impact on model performance might be minimal. • Graphite Note plan limits: Consider that expanding the dataset will increase both the computational requirements for training the model and the total number of dataset rows used in your current plan. More about Graphite Note plans finde [here](/graphite-note-documentation/account-and-team-setup/subscription-plans).
# Merging datasets This section of the Graphite Note user documentation will guide you through the process of merging multiple datasets into one. Merging datasets allows you to combine data from different sources or related data for more comprehensive analysis. ### Steps to Merge Datasets #### 1. Select Merge Dataset from the Menu ![](/files/ISW5GQmeYxICeltjNSuX) To begin the process, navigate to the main menu and select the "Merge Dataset" option. This will open a new window where you can start the merging process. #### 2. Enter Name and Description of the Merged Dataset In the new window, you will see fields to enter the name and description of your new merged dataset. This helps you identify the purpose of the merged dataset for future reference. You can also add optional tags to further categorize your dataset.

#### 3. Select Datasets to Merge and Define the Type of Join

Next, you will select the first dataset you want to merge from the dropdown menu. Repeat this step to select the second dataset. After selecting your datasets, choose the type of join you want to perform: inner, left, right, or outer. The type of join determines how the datasets are combined based on the values in the key columns.

Then, select the key columns on which to merge the datasets. These are the columns that the datasets have in common and will be used to align the data. #### 4. Select Columns for Your New Merged Dataset

Now, you will choose which columns you want to include in your new merged dataset. You can select columns from either or both of the original datasets. Once you've selected your columns, you can use the "Test This Merge" button to preview the merged rows. This allows you to check that the datasets are merging as expected before finalizing the process. #### 5. Create Your Merged Dataset If you're happy with the preview of the merged dataset, click the "Create" button to finalize the merge. Your new merged dataset will now be available for use in your Graphite Note projects. Remember, merging datasets is a powerful tool for combining and analyzing data in Graphite Note. By following these steps, you can easily merge datasets to gain new insights from your data. # CSV File creating and formatting This page outlines best practices for preparing CSV files that can be easily parsed and imported into Graphite Note. ## Formatting Recommendations ### 1. Include a Header Row • The first line should clearly list column names (e.g., customer\_id, date, purchase\_amount). • Avoid using spaces or special characters in header names; if needed, use underscores or camelCase. *** ### 2. Check Delimiters • Ensure the file uses a consistent delimiter (most commonly a comma , or semicolon ;). • If your data contains commas within fields, enclose those fields in quotes (e.g., "123, Main Street"). *** ### 3. Use UTF-8 Encoding • Always save or export your file in UTF-8 format to avoid character encoding issues. • This helps ensure Graphite Note can accurately parse all characters, including accented letters. *** ### 4. Date and Time Formats • Use consistent date formats (e.g., YYYY-MM-DD) or a well-defined datetime format (e.g., YYYY-MM-DD HH:MM:SS). • This helps Graphite Note accurately recognize date/time fields. *** ### 5. Handling Special Characters • If your data contains quotes, commas, or other punctuation, ensure those fields are properly escaped or wrapped in quotes to prevent parsing errors. *** ### 6. Remove Unused Columns • If possible, eliminate columns that you do not plan to analyze. This helps keep the CSV file size smaller and easier to handle. *** ### 7. Adhere to the 50 MB File Size Limit • Graphite Note limits CSV uploads to 50 MB to ensure optimal performance and resource usage. • If your file exceeds 50 MB, try splitting it into multiple CSVs or removing unnecessary columns and rows. • For significantly larger datasets, consider connecting directly to a database (e.g., PostgreSQL, MySQL, BigQuery) to feed data into Graphite Note. *** ## How to Create a CSV File from Excel? #### 1. Open Your Spreadsheet in Excel * Make sure you review and clean up any unnecessary columns or rows before exporting. #### 2. Save As CSV * Click on File → Save As (or Save a Copy in newer versions). * Select CSV (Comma delimited) or CSV UTF-8 (Comma delimited) in the Save as type dropdown menu. * Choose a file name and location, then click Save. #### 3. Verify Encoding * If possible, choose CSV UTF-8 when saving. * If CSV UTF-8 is not available, you may need to convert the file encoding separately (e.g., using a text editor like VSCode or Notepad++). #### 4. Confirm Data Integrity * Open the resulting CSV file in a plain text editor to verify that the column names and any special characters appear correctly. *** # Introduction Graphite Note offers a suite of powerful machine learning and advanced analytics models designed to empower businesses to make data-driven decisions efficiently. Each model is tailored to address specific business needs, from forecasting future trends to segmenting customer bases. With these models, users can transform raw data into actionable insights quickly and without the need for complex coding.

Here’s a quick introduction to each type of model: 1\. **Timeseries Forecast**: Ideal for predicting future values in timeseries data, such as sales or demand, based on historical patterns and seasonality. 2\. **Binary Classification**: Used to classify data into two distinct groups (e.g., yes/no or true/false) based on historical data patterns. 3\. **Multi-Class Classification**: Expands classification to multiple categories, allowing predictions across several possible outcomes. 4\. **Regression**: A model that predicts a continuous numeric value (e.g., sales amount or customer age) based on other input features. 5\. **General Segmentation:** Unsupervised learning that groups similar entities together, helpful in creating customer or product segments based on numeric similarities. 6\. **RFM Customer Segmentation**: A specialized segmentation technique that segments customers based on Recency, Frequency, and Monetary value, aiding in targeted marketing. 7\. **Customer Lifetime Value**: Predicts the future value of customers, estimating metrics like repeat purchase date and overall customer value for better retention strategies. 8\. **New vs Returning Customers**: Provides insights into customer behavior by segmenting new and returning customers over various time frames (daily, weekly, monthly, etc.). 9\. **Customer Cohort Analysis**: Groups customers based on their first purchase date, allowing businesses to analyze behavior patterns over time. 10\. **ABC Analysis**: Categorizes items into A, B, and C categories based on their impact on a chosen metric, helping prioritize resources on high-impact items. # Preprocessing Data In Graphite Note, data preparation is divided into two main steps to ensure optimal results, with all tasks handled automatically so you don’t have to worry about them. Data preprocessing is a crucial step in machine learning, enhancing model accuracy and performance by transforming and cleaning the raw data to remove inconsistencies, handle missing values, and scale features, and ensure compatibility with the chosen algorithm. ### Step 1: Exclusion of Columns Features Not Fit for Model: Graphite automatically excludes columns that aren’t suitable for modeling, such as date/datetime columns, to ensure only relevant features are used in training.

*** ### Step 2: Preprocessing \ To achieve the best results, Graphite Note takes care of several preprocessing steps: • **Null Values:** It identifies and processes null values based on best practices. If the column is 50% null or more, the column will not be included in model training • **Missing Values:** Missing values are managed automatically to maintain data integrity. For a numerical column it will change it by the average, and for a categorical feature it will become "not\_available" **• One-Hot Encoding:** Categorical variables are automatically transformed using one-hot encoding, converting categories into numerical formats suitable for model training. **• Fix Imbalance:** Graphite addresses class imbalance in classification tasks, fixing the inequal distibution of target class and ensuring a balanced representation of classes. **• Normalization:** Numeric columns are scaled to a uniform range, ensuring consistent data for models that require normalized input. **• Constants:** Columns with constant values, which don’t contribute useful information, are identified and excluded from the dataset. **• Cardinality:** Graphite optimizes high-cardinality categorical columns for model performance, handling complex categorical data effectively.

In traditional data science projects, these steps would require manual effort from data scientists, including data cleaning, encoding, scaling, and testing, often involving a significant amount of time and expertise. Graphite Note automates this entire process, completing these steps in seconds and allowing users to focus on insights and decision-making rather than data preparation. # Machine Learning Models In the next section, you’ll learn how to define a scenario, train a model, and leverage the results to make predictions, take strategic actions, and make data-driven decisions that directly impact your business. Each model will be introduced on a dedicated page with step-by-step instructions and a video tutorial, guiding you through the process from setup to actionable insights. # Timeseries Forecast ## Model Scenario {% embed url="" %} Watch a video on how to build a Time Series forecast based on demo dataset {% endembed %} A **Timeseries Forecast Model** is designed to predict future values by analyzing historical time-related data. To utilize this model, your dataset must include both time-based and numerical columns. In this tutorial, we'll cover the fundamentals of the **Model Scenario** to help you achieve optimal results.

*** ### Target Column For the **Target Column**, select a numeric value you want to predict. It's crucial to have values by day, week, or year. If some dates are repeated, you can aggregate them by taking their sum, average, etc.

*** ### Sequence and Time dimension setup Next, you can choose a **Sequence Identifier Field** to group fields and generate an independent time series and forecast forecast for each group. Keep in mind, these values shouldn't be unique; they must form a series and there is maximum of 500 unique values allowed as sequence identifier. If you don't want to generate independent time series for each group, you can leave this option empty.

*** Then, select the **Time/Date Column**, specifying the column containing time-related values. The **Time Interval** represents the data frequency—choose daily for daily data, yearly for annual data, etc. With **Forecast Horizon**, decide how many days, weeks, or years you want to predict from the last date in your dataset. If your dataset includes external factors that influence the target—such as marketing spend, weather, or pricing—you can add them as regressors. Simply select up to 5 additional columns in the *Regressors* field. These values will help the model better understand patterns and improve forecast accuracy. More info can be found in the [Regressors](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/regressors) section. > :warning: Note: Since you’ve selected additional regressors for this model, predictions can’t be generated directly within the app interface. This is because the model requires future values for each regressor to make accurate predictions. Providing this data manually can be impractical, especially for large datasets. Instead, please use our Prediction API to submit your regressor values and generate forecasts programmatically.

*** The model performs well with seasonal data patterns. If your data shows a linear growth trend, select "additive" for **Seasonality Mode**; for exponential growth, select "multiplicative." For example, if you see annual patterns, set **Yearly Seasonality** to True. (TIP: Plotting your data beforehand can help you understand these patterns.) If you're unsure, the model will attempt to detect seasonality automatically.

For daily or hourly intervals, you can access **Advanced Parameters** to add special dates, weekends, holidays, or limit the target value. *** ### Advanced Parameters We are constantly enhancing our platform with new features and improving existing models. For your daily data, we've introduced some new capabilities that can significantly boost forecast accuracy. Now, you can limit your target predictions, remove outliers, and include country holidays and special events.

Selecting advanced parameters like special dates and holidays

To set prediction limits, enter the minimum and maximum values for your target variable. For example, if you're predicting daily temperatures and know the maximum is 40°C, enter that value to prevent the model from predicting higher temperatures. This helps the model recognize the appropriate range of the **Target Column**. Additionally, you can use the **Remove Days of the Week** feature to exclude certain days from your predictions. *** ### Country holidays and special dates We added parameters for country holidays and special dates to improve model accuracy. Large deviations can occur around holidays, where stores see more customers than usual. By informing the model about these holidays, you can achieve more balanced and accurate predictions. To add holidays in Graphite Note, navigate to the advanced section of the **Model Scenario** and select the relevant country or countries.

Similarly, you can add promotions or events that affect your data by enabling **Add special dates** option. Enter the promotion name, start date, duration, and future dates. This ensures the model accounts for these events in future predictions.

Combining these parameters provides more accurate results. The more information the model receives, the better the predictions. *** ### Removing data points In addition to adding holidays and special events, you can delete specific data points from your dataset. In Graphite Note, enter the start and end dates of the period you want to remove. For single-day periods, enter the same start and end date. You can remove multiple periods if necessary. Understanding your data and identifying outliers or irrelevant periods is crucial for accurate predictions. Removing these dates can help eliminate biases and improve model accuracy.

By following these steps, you can harness the full potential of your **Timeseries Forecast Model**, providing valuable insights and more accurate predictions for your business. Now it's your turn to do some modeling and explore your results! *** ### Training model After setting all parameters it is time to **Run Scenario** and train **Machine Learning** model.\ \ Before you do, you can choose to toggle Enable Model Results Dataset. This option creates a dedicated dataset that compares the model’s historical predictions with actual values, enabling detailed seasonality, trend analysis, and performance evaluation. These results will be accessible in the Performance tab once the model has finished training.

*** The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

*** ## Model Performance To interpret the results after running your model, you will be taken to Performance tab. Here, on the Overivew screen, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. On the performance tab, you can explore six different views that provide insights related to model training and results: Overview, [Model Fit](#model-fit), [Trend](#trend),[ Seasonality](#seasonality),[ Special Dates](#special-dates), and [Details](#details). *** ### Overview The Overview screen provides a summary of your time series model’s forecasting accuracy. It displays key performance metrics to help you quickly assess how well the model predicts your target variable (e.g., *sales*). The following metrics are shown: * R-Squared – Indicates how much of the variance in your target variable is explained by the model (closer to 100% is better). * MAPE (Mean Absolute Percentage Error) – Shows the average percentage error between predicted and actual values (lower is better). * MAE (Mean Absolute Error) – Represents the average absolute difference between predicted and actual values. * RMSE (Root Mean Squared Error) – Measures the average magnitude of the prediction error, giving more weight to large errors.
* ```

``` These metrics offer a quick and easy way to understand how reliable your forecast is, before diving deeper into model fit, trends, seasonality, and special date effects. *** ### Model Fit The **Model Fit Tab** displays a graph with actual and predicted values. The primary prediction is shown with a yellow line, and the uncertainty interval is illustrated with a yellow shaded area. This visualization helps assess the model's performance. If you used the **Sequence Identifier Field**, you can choose which value to analyze in each **Model Result**.

*** ### Trend Trends and seasonality are key characteristics of time-series data that should be analyzed. The **Trend Tab** displays a graph illustrating the global trend that Graphite Note has detected from your historical data.

*** ### Seasonality Seasonality represents the repeating patterns or cycles of behavior over time. Depending on your **Time Interval**, you can find one or two graphs in the **Seasonality Tab**. For daily data, one graph shows weekly patterns, while the other shows yearly patterns. For weekly and monthly data, the graph highlights recurring patterns throughout the year.

Seasonality graph showing patterns in historical data

*** ### Special Dates The Special Dates graph shows the percentage effects of the special dates and holidays in historical and future data. Special Dates are only supported for the Daily Time interval set in the scenario.

*** ### Details **Details tab** shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exporetd into Excel by clicking on the XLSX button in the right corner.

Model results in a dataset that can be exported to Excel

*** ## Take actions with Timeseries forecast After building and analyzing a time series forecasting model in Graphite Note, the Predict function allows you to apply the model to real-world use cases. This enables you to forecast future values based on historical patterns and known sequences, helping you plan more effectively and make proactive business decisions. There are two ways to use prediction in **Time Series Forecast** models: * [What-If Scenario Predictions](#what-if-scenario-predictions) * [API Predictions](#api-predictions) #### **What-If Scenario Predictions** In the Predict screen, you can select a specific sequence ID (such as a product, store, or region) and define a forecast period using start and end dates. After submitting your selection, Graphite Note generates a forecast for that time range, showing predicted values in tabular format. This is useful for: * Exploring how trends will evolve for specific time series * Planning inventory, staffing, or production based on expected values * Reviewing changes over different time periods or entities

{% hint style="info" %} ⚠️ Important: If your model includes additional regressors, predictions cannot be generated through the app interface. This is because the model requires future values for each regressor, which must be supplied at the time of prediction. To generate predictions with regressors, please use the Graphite Note Prediction API, where you can programmatically provide the necessary inputs. {% endhint %} #### API Predictions For time series forecasting models, Graphite Note also supports predictions via the Prediction API. This is ideal for integrating forecasts into dashboards, external systems, or automated workflows.

API request definition that can be used to set up API call from third party tool

Instead of using the interface manually, you send a simple API request with three key inputs: * startDate: The beginning of the forecast period * endDate: The end of the forecast period * sequenceID: The identifier for the time series you want to forecast (e.g., “Product A” ) This API makes it easy to fetch forecasts programmatically and use them wherever you need, whether that’s a planning tool, a custom app, or a business report. For full details, visit the [Prediction API documentation](https://docs.graphite-note.com/api/predict). {% hint style="info" %} Timeseries predictions can be made only with API Prediction request v1. {% endhint %} *** ### Create Notebook You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Timerseries Forecast model. Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the [Data Storytelling section](/graphite-note-documentation/notebooks/what-is-notebook). # Binary Classification ## Model Scenario With the **Binary Classification** model, you can analyze feature importance in a binary column with two distinct values. This model also predicts likely outcomes based on various parameters. To achieve optimal results, we'll cover the basics of the **Model Scenario**, where you will select parameters related to your dataset and the model itself.

*** ### Target feature To run the scenario, you need to have a **Target Feature**, which must be a binary column. This means it should contain only two distinct values, such as Yes/No or 1/0.

*** ### Model features In the next step, select the **Model Features** you wish to analyze. All features that fit into the model are selected by default, but you may deselect any features you do not want to use. Graphite Note automatically preprocesses your data for model training, excluding features that are unsuitable. You can view the list of excluded features and the reasons for their exclusion on the right side of the screen.

*** ### Advanced parameters The Advanced Parameters step in model creation allows users to fine-tune their model settings, enabling behavior similar to how a data scientist would approach the task. These parameters are designed for advanced customization, but for most users, it is recommended to leave the default settings as they are to ensure optimal performance. Users can explore and adjust these parameters to tailor the model to specific needs. For detailed explanations of the different advanced parameter settings, refer to the [Advanced Parameters ](#advanced-parameters)section.

*** ### Actionable Insights Goal
The Generate Actionable Insights section allows users to enable the automatic generation of actionable insights based on model predictions, enhanced with the capabilities of generative AI. The insights are generated in the language specified in the [User profile information page](/graphite-note-documentation/account-and-team-setup/profile-information) under the AI Generated Content Language settings. You can activate this feature by checking the Generate Actionable Insights box. Once enabled, the system will use model predictions to create insights tailored to your needs. Specify the primary objective of the analytics by completing the Goal field. This includes choosing an action (e.g., “Increase” or “Decrease”) and the specific metric or outcome (e.g., “Churn”). These inputs guide the insights generation process. Additional Context is an optional field to provide extra details about your business, target audience, or specific focus areas. Examples might include demographics (e.g., focusing on age group 25-35) or market focus (e.g., targeting the European market). This helps align the generated insights with your business narrative.

*** ### Model Training Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and **Run Scenario** that will start model training.

The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

*** ## Model Performance To interpret the results after running your model, go to the Performance tab. Here, you can see the overall model performance post-training. Model evaluation metrics such as F1 Score, Accuracy, AUC, Precision, and Recall are displayed to assess the performance of classification models. Details on Model metrics can also be found on [Accuracy Overview ](#accuracy-overview)tab.

On the performance tab, you can explore seven different views that provide insights related to model training and results: [Overview](#overview), [Key Drivers](#key-drivers), [Impact Analysis](#impact-analysis), [Model Fit](#model-fit), [Accuracy Overview](#accuracy-overview), [Training Results](#training-results) and [Details](#details). *** ### Overview When you open the Performance tab, you land first on the Overview screen. This page gives you a quick snapshot of model quality—showing headline metrics (F1, Accuracy, AUC, Precision, Recall for classification, or R²/MAE for regression) along with a built-in Model Health Check that highlights class imbalance, data-set volume, stability, and possible leakage. For a deeper breakdown of every KPI and diagnostic panel, see the [Model Overview](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/model-overview) section of the documentation.

Model Overview with Health check details

*** ### Key Drivers Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.

*** ### Impact Analysis The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.

*** ### Model Fit The **Model Fit Tab** displays the performance of the trained model. It includes a stacked bar chart with percentages showing correct and incorrect predictions for binary values (1 or 0, Yes or No).

*** ### Accuracy Overview The **Accuracy Overview** tab features a **Confusion Matrix** to highlight classification errors, making it simple to identify if the model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about [Classification Confusion Matrix ](/graphite-note-documentation/understanding-machine-learning/machine-learning-concepts/confusion-matrix)in our Understanding ML section.

On the Accuracy Overview tab, you'll find detailed information on correct and incorrect predictions (True positives and negatives / False positives and negatives). Model metrics are explained at the bottom of the section.

*** ### Training Results In the **Training Results Tab**, you will find information about all the models automatically considered during the training process. Graphite ran several machine learning algorithms suitable for binary classification problems, using 80% of the data for training and 20% for testing. The best model, based on the F1 score, is chosen and marked in green in the models list.

*** ### Details **Details tab** shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking on the XLSX button in the right corner.

*** ## Take actions with Binary Classification Once the model is trained, you can use it to predict future values, solve binary classification problems, and drive business decisions. Here are ways to take action with your **Binary Classification** model: ### Actionable Insights If you enabled Generate Actionable Insights while defining your scenario, the trained model produces two distinct insight layers that appear under the Actionable Insights screen: * Strategic Summary – an executive-level brief that turns the model’s key drivers into clear business goals, KPIs, and evidence-based strategies. Use this narrative when you need to present findings to leadership, define high-level initiatives, or align cross-functional teams.
* Feature Insights – a driver-by-driver deep dive that shows feature importance, value-range multipliers (for example, “tenure 1–6 months increases churn 2.2×”), and plain-language recommendations for each range or category. Refer to this view when you want granular guidance for pricing, segmentation, or campaign design. Both tabs are generated automatically by Graphite Note’s generative-AI engine as soon as training completes. Open Actionable Insights to review them, then follow the suggestions to move from prediction to measurable business impact. For a detailed walkthrough of each tab, see the dedicated [*Actionable Insights*](#actionable-insights) documentation page.

Strategic Summary in Actionable Insights

*** ### Predict The Predict Screen allows users to generate predictions based on their trained model by providing input values for specific features. This screen bridges the gap between model insights and actionable application, enabling users to explore hypothetical situations or process large-scale predictions efficiently. There are three ways to use prediction in Graphite Note:
1. What-If Scenario Predictions, 2. CSV File or Dataset Predictions, 3. API Predictions. #### **What-If Scenario Predictions** You can manually input values for relevant features (e.g., tenure, TotalCharges, InternetService) to simulate specific scenarios. Once the values are entered, clicking the Predict button provides the predicted outcome along with a probability score for each possible result (e.g., Churn is Yes or Churn is No). Results are displayed as probability scores, giving users insights into the likelihood of different outcomes based on the input features.

#### **CSV File or Dataset Predictions** You can upload a CSV file containing data for multiple observations to generate predictions in bulk or you can also utilize existing datasets from Graphite Note for batch predictions, leveraging previously uploaded data.

#### API Predictions For binary classification models, you can generate predictions using the Graphite Note Prediction API, allowing you to integrate your model into external tools, websites, or automated workflows. Instead of using the platform manually, you send data to Graphite Note via an API request, and the system responds with a predicted outcome (e.g., *Yes* or *No*) along with the probability score. This is ideal for real-time decision-making, such as detecting churn, approving applications, or flagging risk. Graphite Note supports two versions of the Prediction API: * **v1 Endpoint** – Uses field aliases and is compatible with existing setups. * **v2 Endpoint (Recommended)** – Uses the actual column names from your dataset and is easier for new users to work with. For full details, visit the [Prediction API documentation](https://docs.graphite-note.com/api/predict).

*** ### Create Notebook You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Binary Classification model. Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the [Data Storytelling section](/graphite-note-documentation/notebooks/what-is-notebook). # Multiclass Classification ## Model Scenario With the Multiclass Classification model, you can analyze the importance of the features with 2-25 distinct values. Unlike binary classification, which deals with only two classes, multiclass classification handles multiple classes simultaneously. To achieve the best results, we will cover the basics of the **Model Scenario**. In this scenario, you choose parameters related to the dataset and the model.

*** ### Target feature To run the model, you need to select a **Target Feature** first. This target is the variable or outcome that the model aims to predict or estimate. The Target Feature should be a text-type column (not a numerical or binary column).

Selecting target feature in Multiclass model scenario

*** ### Model features You will be taken to the next step where you can choose all the **Model Features** you want to analyze. You can select which features the model will analyze. Graphite Note will automatically exclude some features that are not suitable for the model and will provide reasons for each exclusion.

*** ### Actionable Insights Goal The Generate Actionable Insights section allows users to enable the automatic generation of actionable insights based on model predictions, enhanced with the capabilities of generative AI. The insights are generated in the language specified in the [User profile information page](/graphite-note-documentation/account-and-team-setup/profile-information) under the AI Generated Content Language settings. You can activate this feature by checking the Generate Actionable Insights box. Once enabled, the system will use model predictions to create insights tailored to your needs. Specify the primary objective of the analytics by completing the Goal field. This includes choosing an action (e.g., “Increase” or “Decrease”) and the specific metric or outcome (e.g., frequency of “Revenue Class” to be "High"). These inputs guide the insights generation process. Additional Context is an optional field to provide extra details about your business, target audience, or specific focus areas. Examples might include demographics (e.g., focusing on age group 25-35) or market focus (e.g., targeting the European market). This helps align the generated insights with your business narrative.

*** ### Model training Moving forward, you'll see a comprehensive list of preprocessing steps that Graphite Note will apply to prepare your data for training. This enhances data quality, ensuring your model produces accurate results. Typically, these steps are performed by data scientists, but with our no-code machine learning platform, Graphite Note handles it for you. After reviewing the preprocessing steps, you can finish and **Run Scenario**.

Evaluation metrics showing overall model performance

On the performance tab, you can explore seven different views that provide insights related to model training and results: [Overview](#key-drivers), [Key Drivers](#key-drivers), [Impact Analysis](#impact-analysis), [Model Fit](#model-fit), [Accuracy Overview](#accuracy-overview), [Training Results](#training-results) and [Details](#details). *** ### Overview When you open the Performance tab, you land first on the Overview screen. This page gives you a quick snapshot of model quality—showing headline metrics (F1, Accuracy, AUC, Precision, Recall for classification, or R²/MAE for regression) along with a built-in Model Health Check that highlights class imbalance, data-set volume, stability, and possible leakage. For a deeper breakdown of every KPI and diagnostic panel, see the [Model Overview](https://docs.graphite-note.com/graphite-note-documentation/models/advanced-ml-model-settings/model-overview) section of the documentation.

Key drivers in Multiclass Classification model

Impact analysis in Multiclass Classification model

Comparison of correct and incorect predictions in Model Fit section

*** ### Accuracy Overview The **Accuracy Overview** tab features a **Confusion Matrix** to highlight classification errors, making it simple to identify if the model is confusing classes. For each class, it summarizes the number of correct and incorrect predictions. Find out more about [Classification Confusion Matrix ](https://docs.graphite-note.com/graphite-note-documentation/~/changes/E2AMshWWxt1bMPRlFt2a/understanding-machine-learning/confusion-matrix)in our Understanding ML section.

Confusion Matrix in Multiclass Classification

*** ### Training Results The Training Results tab displays every algorithm Graphite Note evaluated for your multiclass classification problem. In the example above, 75 % of the data was used for training and 25 % for testing. Each candidate model is shown with its key classification metrics—F1, Accuracy, AUC, MCC, Precision, and Recall—so you can compare performance at a glance. The model that scores highest on the primary metric (F1 by default) is highlighted in green and marked with a check icon. Selecting any row expands a Model Hyper Parameters panel, revealing the exact settings (learning rate, max depth, class weights, and more) used in that run. A Copy Parameters button lets you grab this JSON block for re-use or further tuning outside Graphite Note.

Training results with model Hyper Parameters

*** ### Details **Details tab** shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking the XLSX button in the right corner.

*** ## Take actions with Multiclass Classification Once the model is trained, you can use it to predict future values, solve multi-class classification problems, and drive business decisions. Here are ways to take action with your **Multiclass Classification** model: ### Actionable Insights If you enabled Generate Actionable Insights while defining your scenario, the trained model produces two distinct insight layers that appear under the Actionable Insights screen: * Strategic Summary – an executive-level brief that turns the model’s key drivers into clear business goals, KPIs, and evidence-based strategies. Use this narrative when you need to present findings to leadership, define high-level initiatives, or align cross-functional teams. * Feature Insights – a driver-by-driver deep dive that shows feature importance, value-range multipliers (for example, “tenure 1–6 months increases churn 2.2×”), and plain-language recommendations for each range or category. Refer to this view when you want granular guidance for pricing, segmentation, or campaign design. Both tabs are generated automatically by Graphite Note’s generative-AI engine as soon as training completes. Open Actionable Insights to review them, then follow the suggestions to move from prediction to measurable business impact. For a detailed walkthrough of each tab, see the dedicated *Actionable Insights* documentation page.

*** ### Predict After building and analyzing a predictive model using Graphite Note, the **Predict** function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making. There are three ways to use prediction in Graphite Note:
1. What-If Scenario Predictions, 2. CSV File or Dataset Predictions, 3. API Predictions. #### **What-If Scenario Predictions** You can manually input values for relevant features (e.g., Price, Clarity, Cut) to simulate specific scenarios. Once the values are entered, clicking the Predict button provides the predicted outcome along with a probability score for each possible result. Results are displayed as probability scores, giving users insights into the likelihood of different outcomes based on the input features.

#### **API Predictions** You can also make predictions using the Graphite Note Prediction API, which allows you to connect your trained model with other tools or systems — like dashboards, web apps, or automated workflows. Instead of uploading data manually, you send it directly to Graphite Note through a simple API request. The API will return the model’s prediction along with the probability for each possible outcome. There are two versions of the Prediction API you can choose from: * **v1 Endpoint** – Uses field aliases and is great for keeping compatibility with older projects. * **v2 Endpoint** (Recommended) – Uses actual column names from your dataset, making it easier to understand and use. See how to get started in the [Prediction API documentation](/graphite-note-documentation/rest-api/prediction-api).

*** ### Create Notebook You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Multiclass Classification model. Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the [Data Storytelling section](https://docs.graphite-note.com/graphite-note-documentation/~/changes/E2AMshWWxt1bMPRlFt2a/notebooks-data-storytelling/what-is-notebook). # Regression ## Model Scenario A regression model in machine learning is a type of predictive model used to estimate the relationship between a dependent variable (target feature) and one or more independent variables. It aims to predict continuous outcomes by fitting a line or curve to the data points, minimizing the difference between observed and predicted values. To get the best possible results, we will go through the basics of the **Model Scenario**. In Model Scenario, you select parameters related to the dataset and model.

*** ### Target feature To run the model, you have to choose a **Target Feature** first. The target refers to the variable or outcome that the model aims to predict or estimate. In this case, it should be a numerical column.

Selecting target feature in Regression model scenario

*** ### Actionable Insights Goal The Generate Actionable Insights section allows users to enable the automatic generation of actionable insights based on model predictions, enhanced with the capabilities of generative AI. The insights are generated in the language specified in the [User profile information page](https://docs.graphite-note.com/graphite-note-documentation/account-and-team-setup/profile-information) under the AI Generated Content Language settings. You can activate this feature by checking the Generate Actionable Insights box. Once enabled, the system will use model predictions to create insights tailored to your needs. Specify the primary objective of the analytics by completing the Goal field. This includes choosing an action (e.g., “Increase” or “Decrease”) and the specific metric or outcome (e.g., average of “Monthly Charges” ). These inputs guide the insights generation process. Additional Context is an optional field to provide extra details about your business, target audience, or specific focus areas. Examples might include demographics (e.g., focusing on age group 25-35) or market focus (e.g., targeting the European market). This helps align the generated insights with your business narrative.

On the performance tab, you can explore six different views that provide insights related to model training and results: Overview, [Key Drivers](#key-drivers), [Impact Analysis](#impact-analysis), [Model Fit](#model-fit), [Training Results](#training-results) and [Details](#details). *** #### Overview Opening the Performance tab on a regression model takes you to Overview, where top-line fit metrics—R², MAPE, MAE, RMSE, and MSE—summarise accuracy and error at a glance. Just below, auto-generated text describes your dataset structure, target-column distribution, and the run parameters used, giving instant context for how those metrics were achieved. For deeper diagnostics, see the full [Model Overview](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/model-overview) documentation.

*** #### **Key Drivers** Key Drivers indicate the importance of each column (feature) for the Model's predictions. The higher the reliance of the model on a feature, the more critical it is. Graphite uses permutation feature importance to determine these values.

*** ### **Impact Analysis** The Impact Analysis tab allows you to select various features and analyze, using a bar chart, how changes in each feature affect the target feature. You can switch between Count and Percentage views.

*** ### **Model Fit** The **Model Fit Tab** displays the performance of the trained model. It includes a stacked bar chart with percentages showing comparison between known outcomes (historical) and model predicted outcomes.

*** ### **Training Results** The Training Results tab lists every regression algorithm Graphite Note tried during the automated training run. In the example above, 75 % of the data (5 282 rows) was used for training and 25 % (1 761 rows) for testing. For each candidate model you’ll see the key fit metrics side-by-side—R-Squared, MAPE, MAE, RMSE, and MSE—so you can compare accuracy and error at a glance. The model that delivers the best result for the primary metric (R-Squared by default) is shaded green and marked with a check icon. When you click a row, Graphite reveals the Model Hyper Parameters panel beneath the table, showing the exact settings (e.g., learning-rate, max\_depth, regularisation values) that produced the winning run. This makes it easy to audit, reproduce, or fine-tune your model outside the no-code environment if needed.

*** ### **Details** The **Details tab** shows the results of the predictive model, presented in a table format. Each record includes the predicted label, predicted probability, and predicted correctness, offering insights into the model's predictions, confidence, and accuracy for each data point. Dataset test results can be exported into Excel by clicking on the XLSX button in the right corner.

*** ## Take actions with Regression Once the model is trained, you can use it to predict future values, solve multi-class classification problems, and drive business decisions. Here are ways to take action with your **Regression** model: ### **Actionable Insights** If you enabled Generate Actionable Insights while defining your scenario, the trained model produces two distinct insight layers that appear under the Actionable Insights screen: * Strategic Summary – an executive-level brief that turns the model’s key drivers into clear business goals, KPIs, and evidence-based strategies. Use this narrative when you need to present findings to leadership, define high-level initiatives, or align cross-functional teams. * Feature Insights – a driver-by-driver deep dive that shows feature importance, value-range multipliers (for example, “tenure 1–6 months increases churn 2.2×”), and plain-language recommendations for each range or category. Refer to this view when you want granular guidance for pricing, segmentation, or campaign design. Both tabs are generated automatically by Graphite Note’s generative-AI engine as soon as training completes. Open Actionable Insights to review them, then follow the suggestions to move from prediction to measurable business impact. For a detailed walkthrough of each tab, see the dedicated [*Actionable Insights*](#actionable-insights) documentation page.

Actionable Insights in Regression with Strategic Summary and Feature Insights

*** ### **Predict** After building and analyzing a predictive model using Graphite Note, the **Predict** function allows you to apply the model to new data. This enables you to forecast outcomes or target variables based on different feature combinations, providing actionable insights for decision-making. There are three ways to use prediction in Graphite Note:
1. What-If Scenario Predictions, 2. CSV File or Dataset Predictions, 3. API Predictions. #### **What-If Scenario Predictions** You can manually input values for relevant features (e.g., tenure, PaymentMethod, Contract) to simulate specific scenarios. Once the values are entered, clicking the Predict button provides the predicted outcome as a numerical value.

#### API Predictions For regression models, you can also generate predictions using the Graphite Note Prediction API, which lets you connect your trained model to other tools, apps, or workflows. Instead of uploading files or entering data manually, you send the input values to Graphite Note through an API request, and the system returns the predicted numerical value based on your model — such as future sales, revenue, or energy consumption. There are two versions of the Prediction API available: * **v1 Endpoint** – Uses field aliases and is best for older setups. * **v2 Endpoint (Recommended)** – Uses actual column names from your dataset and is easier to work with, especially for new users. See how to get started in the [Prediction API documentation](/graphite-note-documentation/rest-api/prediction-api).

*** ### Create Notebook You can share your prediction results with your team using the Notebook feature. With Notebooks, users can also run their own predictions on your Regression model. Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the [Data Storytelling section](https://docs.graphite-note.com/graphite-note-documentation/~/changes/E2AMshWWxt1bMPRlFt2a/notebooks-data-storytelling/what-is-notebook). # General Segmentation ## Model Scenario With **General Segmentation,** you can uncover hidden similarities in data, such as the relationship between product prices and customer purchase histories. This unsupervised algorithm groups data based on similarities among numerical variables.

*** To run this model in Graphite, first identify an **ID column** to distinguish between values (e.g., customers or products within groups). Next, select the **numeric columns** (features) from your dataset for segmentation.

Now comes the tricky part: data preprocessing! We rarely encounter high-quality data, so we must clean and transform it for optimal model results. What should you do with **missing values**? Either remove them or replace them with relevant values, such as the mean or a prediction. For instance, if you have chosen Age and Height as numeric columns, Age might range between 10 and 80, while Height could range from 100 to 210. The algorithm could prioritize Height due to its higher values. To avoid this, you should **transform/scale** your data; consider standardizing or normalizing it. In the end, you need to determine the **number of groups** you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post!

*** After reviewing all the steps, you can finish and **Run Scenario**. The training duration may vary depending on the data volume, typically ranging from 1 to 10 minutes. The training will utilize 80% of the data to train various machine learning models and the remaining 20% to test these models and calculate relevant scores. Once completed, you will receive information about the best model based on the F1 value and details about training time.

## Model Results Let's see how to interpret the results after we have run our model. The results consist of 5 tabs: [Cluster Summary](#cluster-summary), [By Cluster, By Numeric Value](#by-cluster-and-by-numeric-value), [Cluster Visualization](#cluster-visualization), and [Details ](#details)Tabs.

*** ### Cluster Summary As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values of the variables across all clusters. That's why in the **Cluster Summary Tab** you can see the differences between the clusters through the graph.

For example, in the picture above, you can see that customers in Cluster2 have the highest average value of the Total spend, unlike the customers in Cluster0.

### By Cluster and By Numeric Value Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the **By Cluster** and **By Numeric Value Tab** - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc.

*** ### Cluster Visualization You can also have a **Cluster Visualization Tab** that shows the link between two arguments and how they are distributed. You can change the measures to see different cluster and their distribution.

*** ### Details Last but not least, on the **Details Tab**, you can find a detailed table where you can see all relevant values which were used for the above results.

With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling. *** # RFM Customer Segmentation {% embed url="" %} Watch a video on how to build a RFM Customer Segmentation based on demo dataset {% endembed %} ### RFM Customer Model - How it Works? Our intelligent system observes customers' shopping behavior without getting into the nitty-gritty technical details. It watches how recently each customer made a purchase, how often they come back, and how much they spend. The system notices patterns and groups customers accordingly. This smart system doesn't need you to say, "Anyone who spends over $1000 is a champion." It figures out on its own who the champions are by comparing all the customers to one another. When we talk about 'champion' customers in the context of RFM analysis, we're referring to those who are the most engaged, recent, and valuable. The system's approach to finding these champions is quite intuitive yet sophisticated.

Here's how it operates: 1. **Observation**: Just like a keen observer at a social event, the system starts by watching—collecting data on when each customer last made a purchase (Recency), how often they've made purchases over a certain period (Frequency), and how much they've spent in total (Monetary). 2. **Comparison**: Next, the system compares each customer to every other customer. It looks for natural groupings—clusters of customers who exhibit similar purchasing patterns. For example, it might notice a group of customers who shop frequently, no matter the amount they spend, and another group that makes less frequent but more high-value purchases. 3. **Group Formation**: **Without being told what criteria to use**, the system uses the data to form groups. Customers with the most recent purchases, highest frequency, and highest monetary value start to emerge as one group—these are your potential 'champions.' The system does this by measuring the 'distance' between customers in terms of RFM factors, grouping those who are closest together in their purchasing behavior. 4. **Adjustment**: The system then iterates, refining the groups by moving customers until the groups are as distinct and cohesive as possible. It's a process of adjustment and readjustment, seeking out the pattern that best fits the natural divisions in the data. 5. **Finalization**: Once the system settles on the best grouping, it has effectively ranked customers, identifying those who are the most valuable across all three RFM dimensions. These are your 'champions,' but the system also recognizes other groups, like new customers who've made a big initial purchase or long-time customers who buy less frequently but consistently. By using this method, the system takes on the complex task of understanding the many ways customers can be valuable to a business. It provides a nuanced view that goes beyond simple categorizations, recognizing the diversity of customer value. The result is a highly tailored strategy for customer engagement that aligns perfectly with the actual behaviors observed, allowing businesses to interact more effectively with each segment, especially the 'champions' who drive a significant portion of revenue. ### Why is ML better than rules-based segmentation? Here’s why this machine learning approach is more powerful than manual labeling: 1. **Adaptive Learning**: The system continuously learns and adapts based on actual behavior, not on pre-set rules that might miss the nuances of how customers are interacting right now. 2. **Time Efficiency**: It saves you a mountain of time. No more going through lists of transactions manually to score each customer. The system does it instantly. 3. **Personalized Grouping**: Because it’s based on actual behavior, the system creates groups that are tailor-made for your specific customer base and business model, rather than relying on broad, one-size-fits-all categories. 4. **Scalability**: Whether you have a hundred customers or a million, this smart system can handle the job. Manual scoring becomes impractical as your customer base grows. 5. **Unbiased Decisions**: The system is objective, based purely on data. There’s no risk of human bias that might categorize customers based on assumptions or incomplete information. In essence, this smart approach to customer grouping helps businesses focus their energy where it counts, creating a personalized experience for each customer, just like a thoughtful host at a party who knows exactly who likes what. It’s about making everyone feel special without having to ask them a single question. ### RFM Model Scores In the RFM model in Graphite Note, the intelligent system categorizes customers into segments based on their Recency (R), Frequency (F), and Monetary (M) values, assigning scores from 0 to 4 for each of these three dimensions. With five scoring options for each RFM category (including the '0' score), this creates a comprehensive grid of potential combinations—resulting in a total of 125 unique segments (5 options for R x 5 options for F x 5 options for M = 125 segments). This segmentation allows for a high degree of specificity. Each customer falls into a segment that accurately reflects their interaction with the business. For example, a customer who recently made a purchase (high Recency), buys often (high Frequency), and spends a lot (high Monetary) could fall into a segment scored as 4-4-4. This would indicate a highly valuable 'champion' customer. On the other hand, a customer who made a purchase a long time ago (low Recency), buys infrequently (low Frequency), but when they do buy, they spend a significant amount (high Monetary), might be scored as 0-0-4, placing them in a different segment that suggests a different engagement strategy. By scoring customers on a scale from 0 to 4 across all three dimensions, the business can pinpoint exact customer profiles. This precision allows for highly tailored marketing strategies. For example, those in the highest scoring segments might receive exclusive offers as a reward for their loyalty, while those in segments with room for growth might be targeted with re-engagement campaigns. The use of 125 segments ensures that the business can differentiate not just between generally good and poor customers, but between various shades of customer behavior, tailoring approaches to nurture the potential value of each unique segment. This granularity facilitates nuanced understanding and actionability for marketing, sales, and customer relationship management. ### Model Scenario Wouldn't be great to tailor your marketing strategy regarding identified groups of customers? That way, you can target each group with personalized offers, increase profit, improve unit economics, etc. [RFM](https://graphite-note.com/rfm-analysis-inspiring-stories-ecommerce-saas) [Customer Segmentation](https://graphite-note.com/machine-learning-for-customer-segmentation) Model identifies customers based on **three** **key factors**: * **Recency** - how long it’s been since a customer bought something from you or visited your website * **Frequency** - how often a customer buys from you, or how often he visits your website * **Monetary** - the average spend of a customer per visit, or the overall transaction value in a given period Let's go through the RFM analysis inside Graphite Note. The dataset on which you will run your RFM Model must contain a **time-related column**, given that this report studies customer behavior over some time.

We need to distinguish all customers, so we need an identifier variable like **Customer ID**.

If you might have data about **Customer Names**, great, if not, don't worry, just select the same column as in the Customer ID field. Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called **Monetary (amount spent)**.

That's it, you are ready to run your first RFM Model. ### RFM Model Results As we now know how to run [RFM model analysis](https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp#:~:text=The%20recency%2C%20frequency%2C%20monetary%20value%20$RFM$%20model%20is,%2C%20the%20better%20the%20result\).) in Graphite Note, let's go through the Model Results. The results consist of 7 tabs: [RFM Scores](#rfm-scores), [RFM Analysis](#rfm-analysis), [Recency](#recency), [Frequency](#frequency), [Monetary](#monetary), [RFM Matrix](#rfm-matrix), and [Details](#details) Tabs. All results are visualized because a visual summary of information makes it easier to identify patterns than looking through thousands of rows.

### RFM Scores On the **RFM Scores Tab**, we have an overview of the customers and their scores:

Then you have a ranking of each RFM segment (125 of them) represented in a table.

And finally, a chart showing the number of customers per RFM score.

### RFM Analysis RFM model analysis ranks every customer in each of these three categories on a scale of 0 (worst) to 4 (best). After that, we assign an RFM score to each customer, by concatenating his numbers for [Recency](#recency), [Frequency](#frequency), and [Monetary ](#monetary)value. Depending upon their RFM score, customers can be segregated into the following categories: * lost customer * hibernating customer * can-not-lose customer * at-risk customer * about-to-sleep customer * need-attention customer * promising customer * new customer * potential loyal customer * loyal customer * champion customer. All information related to these groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the **RFM Analysis Tab**.

There is also a table at the end to summarize everything.

### Recency According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups: * lost * lapsing * average activity * active * very active. In the **Recency Tab**, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.

### Frequency As Frequency is defined as the total number of purchases, customers can buy: * very rarely * rarely * regullary * frequently * very frequently. In the **Frequency Tab**, you can track down the same behavior of the related groups, as with the [Recency ](#recency)Tab.

### Monetary Monetary is defined as the amount of money the customer spent, so the customer can be a : * very low spender * low spender * medium spender * high spender * very high spender. In the **Monetary Tabs**, you can track down the same behavior of the related groups, as with the [Recency ](#recency)Tab.

### RFM Matrix The **RFM Matrix Tab** represents a matrix, showing the number of customers, monetary sum and average, average frequency, and average recency (with breakdown by [Recency](#recency), [Frequency](#frequency), and [Monetary ](#monetary)segments).

### Details All the values related to the first five tabs, with much more, can be found on the **Details Tab**, in the form of a table.

The RFM model columns outlined in your system provide a structured way to understand and leverage customer purchase behavior. Here’s how each column benefits the end user of the model: 1. **Monetary**: Indicates the total revenue a customer has generated. This helps prioritize customers who have contributed most to your revenue. 2. **Avg\_monetary**: Shows the average spend per transaction. This can be used to gauge the spending level of different customer segments and tailor offers to match their spending habits. 3. **Frequency**: Reflects how often a customer purchases. This can inform retention strategies and indicate who might be receptive to more frequent communication. 4. **Recency**: Measures the time since the last purchase. This can help target re-engagement campaigns to customers who have recently interacted with your business. 5. **Date\_of\_last\_purchase** & **Date\_of\_first\_purchase**: These dates help track the customer lifecycle and can trigger communications at critical milestones. 6. **Customer\_age\_days**: The duration of the customer relationship. Long-standing customers might benefit from loyalty programs, while newer customers might be encouraged with welcome offers. 7. **Recency\_cluster**, **Frequency\_cluster**, and **Monetary\_cluster**: These categorizations allow for segmentation at a granular level, helping customize strategies for groups of customers who share similar characteristics. 8. **Rfm\_cluster**: This overall grouping combines recency, frequency, and monetary values, offering a holistic view of a customer's value and engagement, essential for creating differentiated customer journeys. 9. **Recency\_segment\_name**, **Frequency\_segment\_name**, and **Monetary\_segment\_name**: These descriptive labels provide intuitive insights into customer behavior and make it easier to understand the significance of each cluster for strategic planning. 10. **Fm\_cluster\_sum**: This score is a combined metric of frequency and monetary clusters, useful in prioritizing customers who are both frequent shoppers and high spenders. 11. **Fm\_segment\_name** and **Rfm\_segment\_name**: These labels offer a quick reference to the type of customer segment, simplifying the task of identifying and applying targeted marketing actions. ### Testing the Model: Seeking assurance about the model's accuracy and effectiveness? Here's how you can address these concerns: 1. **Validation with Historical Data**: Show how the model’s predictions align with actual customer behaviors observed historically. For instance, demonstrate how high RFM scores correlate with customers who have proven to be valuable. 2. **Segmentation Analysis**: Analyze the characteristics of customers within each RFM segment to validate that they make sense. For example, your top-tier RFM segment should clearly consist of customers who are recent, frequent, and high-spending. 3. **Control Groups**: Create control groups to test marketing strategies on different RFM segments and compare the outcomes. This can validate the effectiveness of segment-specific strategies suggested by the model. ### Building Trust in the Model: 1. **A/B Testing**: Implement A/B testing where different marketing approaches are applied to similar customer segments to see which performs better, thereby showcasing the model's utility in identifying the right targets for different strategies. 2. **Benchmarking**: Compare the RFM model’s performance against other segmentation models or against industry benchmarks to establish its effectiveness. # Customer Lifetime Value {% embed url="" %} Watch a video on how to build a Customer Lifetime Value model based on demo dataset {% endembed %} ## Introduction Detecting early signs of reduced customer engagement is pivotal for businesses aiming to maintain loyalty. A notable signal of this disengagement is when a customer's once regular purchasing pattern starts to taper off, leading to a significant decrease in activity. Early detection of such trends allows marketing teams to take swift, proactive measures. By deploying effective retention strategies, such as offering tailored promotions or engaging in personalized communication, businesses can reinvigorate customer interest and mitigate the risk of losing them to competitors. Our objective is to utilize a model that not only alerts us to customers with an increased likelihood of churn but also forecasts their potential purchasing activity and, importantly, estimates the total value they are likely to bring to the business over time.

These analytical needs are served by what is known in data science as **Buy 'Til You Die** (BTYD) models. These models track the lifecycle of a customer's interaction with a business, from the initial purchase to the last. While customer churn models are well-established within contractual business settings, where customers are bound by the terms of service agreements, and churn risk can be anticipated as contracts draw to a close, **non-contractual environments** present a different challenge. In such settings, there are no defined end points to signal churn risk, making traditional classification models insufficient.

To address this complexity, our model adopts a probabilistic approach to customer behavior analysis, which does not rely on fixed contract terms but on behavioral patterns and statistical assumptions. By doing so, *we can discern the likelihood of future transactions for every customer, providing a comprehensive and predictive understanding of customer engagement and value*. #### Customer Lifetime Value - How it Works? The Customer Lifetime Value (CLV) model is a robust tool employed to ascertain the projected revenue a customer will contribute over their entire relationship with a business. The model employs historical data to inform predictive assessments, offering valuable foresight for strategic decision-making. This insight assists companies in prioritizing resources and tailoring customer engagement strategies to maximize long-term profitability. The CLV model executes a series of sophisticated calculations. Yet, its operations can be conceptualized in a straightforward manner: 1. **Historical Analysis**: The model comprehensively evaluates past customer transaction data, noting the frequency and monetary value of purchases alongside the tenure of the customer relationship. 2. **Engagement Probability**: It assesses the likelihood of a customer’s future engagement based on their past activities, effectively estimating the chances of a customer continuing to transact with the business. 3. **Forecasting**: With the accumulated data, the model projects the customer’s future transaction behavior, predicting how often they will make purchases and the potential value of these purchases. 4. **Lifetime Value Calculation**: Integrating these elements, the model calculates an aggregate figure representing the total expected revenue from a customer for a designated future period. *** ## Model Scenario To set up the model, you’ll need to configure a few key fields: * Time/Date Column: Select the column that contains the date of each transaction (e.g., invoice date, order date). This tells the model when each customer activity occurred. * Customer ID: Choose a column that uniquely identifies each customer. This ensures that all purchases are correctly grouped under the same customer. * Customer Name *(optional)*: If your dataset includes customer names, you can select this field to display names in the results. If not, simply use the same column as Customer ID. * Monetary (amount spent): Select a numeric column that shows the amount spent per transaction. This is the main metric used to calculate lifetime value. * Starting Date for CLV Calculation: Decide from which point in time you’d like to start calculating the customer lifetime value. You can choose: * Max Date: Uses the latest date available in your dataset. * Today: Starts the CLV calculation from the current date. * Custom Date: Manually select any specific date to serve as the calculation start point.

*** ## Model Results Once you’ve run your CLV scenario, the Results tab will present a comprehensive view of how valuable your repeat customers are, how long they are expected to remain active, and what you can expect in future revenue. The Results section is split into four core views: * [Overview](#overview) * [CLV Insights](#clv-insights) * [Segments](#segments) * [Details](#details) Each view gives you unique perspectives on your customer base and revenue predictions. Below is a guide for interpreting each one. *** ### Overview The Overview tab gives you a high-level summary of your repeat customers and their expected future behavior, all in one place.

Overview tab with high level summary and KPIs

At the top, you’ll see key metrics that describe the health and value of your customer base: * Total Repeat Customers – how many customers made more than one purchase * Total Historical Amount – the total revenue generated by these repeat customers * Average Spend per Customer – average revenue per repeat customer * Average No. of Repeat Purchases – how many purchases each customer made on average * Average Probability Alive (Next 90 Days) – how likely they are to stay active in the near future * Predicted No. of Purchases and Amount (Next 90 Days) – expected orders and revenue in the next three months * CLV – estimated lifetime value per customer
These numbers give you a snapshot of customer loyalty, revenue contribution, and future potential. Below that, in **Deep Dive into Customer Behavior** section, you’ll find a breakdown of how these insights are calculated. It includes values like total revenue per customer, number of purchases, average spend, customer age, and their predicted likelihood of staying active. Overview helps you better understand which customers are most engaged, most valuable, or at risk, so you can take action where it matters most. *** ### CLV Insights This tab offers time-based forecasts in visual form to help you understand how customer behavior and value change over time. Two main charts are presented: **Forecasted Number of Purchases** * A line chart showing how many purchases are expected in 7, 30, 60, 90, and 365 days. * The curve reflects expected buying activity, helping you estimate short- and long-term customer engagement. A steeper curve suggests strong future buying behavior, while a flatter line indicates less projected activity.

The forecasted number of purchases chart in CLV Insights

**Forecasted Amount** * This chart mirrors the number of purchases, but focuses on revenue. It shows how much your customers are likely to spend within the same time intervals: 7, 30, 60, 90, and 365 days. * This helps you plan revenue forecasts and guide sales or marketing campaigns based on expected spend. For example, a sharp increase toward 365 days may indicate strong long-term potential.

The forecasted amount chart in CLV Insights

**The average alive probabilty** * This chart illustrates the average probability that a customer remains active (i.e., still engaged and likely to purchase again) over time — assuming no repeat purchases. * The curve shows how this probability gradually declines as time passes without interaction. For example, if the probability drops significantly by 365 days, it signals a potential churn issue in the long run.

Theaverage alive probability chart in CLV Insights

Together, these insights give you a forward-looking view of how much value your existing customers will bring in both volume and revenue, so you can plan smarter. *** ### Segments The Segments tab provides a visual breakdown of your customer base according to their likelihood of remaining active.
* The Customer Segmentation Based on Probability Alive pie chart groups customers into five risk levels—Very Low Risk, Low Risk, Medium Risk, Very High Risk, and High Risk—based on their probability of staying active. This segmentation helps you easily identify which customers are most likely to churn and which are loyal.
* The Customer Churn Risk Analysis bar chart simplifies this further by splitting customers into two groups using a 50% probability threshold: those more likely to stay and those more likely to leave. This chart offers a quick snapshot of retention risk across your entire repeat customer base.

Customer Segmentation Based on Probability Alive in Segments tab

Together, these visuals help you prioritize your marketing and retention efforts by highlighting which segments need attention and which are driving your recurring revenue. *** ### Details The Details tab presents a full breakdown of all customers used in the Customer Lifetime Value (CLV) model, showing individualized metrics for each one. This table allows you to explore how the model calculates and segments customer lifetime behavior on a granular level.

Each row represents one customer, and each column offers specific insight into their engagement, loyalty, spending behavior, and predicted future value. You can filter or search by any column to find customers of interest (e.g., high spenders, churn risks, new users, etc.). **Key Columns Explained:** Here are some of the most relevant fields available in the table: * amount\_sum - Total amount spent historically by the customer. * amount\_count – Number of total purchases made. * repeated\_frequency – How many of those purchases were repeat purchases. * customer\_age – Age of the customer in days since their first transaction. * average\_monetary – Average amount spent per transaction. * probability\_alive – Current probability that the customer is still active (1 = 100%). * probability\_alive\_segment – Segment grouping based on risk (e.g., Very Low Risk, High Risk). * probability\_alive\_50\_perc – Whether the customer’s probability to stay is above or below 50%. * predicted\_no\_purchases\_7\_30\_60\_90\_365 – Forecasted number of purchases the customer is likely to make in the next 7, 30, 60, 90, or 365 days. * CLV\_30\_60\_90\_365 – Estimated monetary value the customer is likely to bring in the next 30, 60, 90, or 365 days. * avg\_days\_between\_purchases – Average time between each purchase. * days\_since\_last\_purchase – Time since their last transaction. Each metric helps you evaluate customer loyalty, frequency, predicted churn, and future potential revenue. You can export this data into Excel for further analysis or integrate it with marketing and CRM workflows. Because each row represents one individual customer, and all the key metrics are already calculated (such as purchase frequency, churn risk, and expected revenue), you can immediately use this information to guide your next steps. *** #### Using Details to Understand Individual Customer Behavior The Customer Lifetime Value (CLV) model helps you see beyond averages by offering detailed insights into each customer’s unique behavior. Let’s compare two very different shopping profiles found in model results.

**Customer 1 – Loyal and Consistent** This customer shows clear signs of long-term engagement: * They’ve been active for nearly 4 years (1,474 days), * Have made 7 purchases, of which 6 are repeat transactions, * And contributed over €1,100 in total revenue. What stands out most is their very high probability of continued activity—over 98%, both now and over the next 7 days. The predicted Customer Lifetime Value (CLV) over the next year is also significant. This customer is a clear example of high loyalty, high predictability, and strong recurring value.

**Customer 2 – Occasional and Unpredictable** By contrast, this customer represents a much riskier profile: * Despite being in the system for 3 years (1,097 days), they’ve only made 2 purchases, * With just 1 repeat, * And a total spend of around €380. Even though their average transaction value is quite high, their behavior is infrequent and highly unpredictable. The model assigns them a low probability of being active (33%), both currently and in the upcoming week. Their 1-year CLV forecast is modest, and they fall into a high-risk segment.

**What This Tells Us?** This comparison highlights how businesses can tailor engagement strategies: * For Customer 1, the goal is retention and appreciation—they’ve proven loyal and should be nurtured with rewards or exclusive offers to maintain momentum. * For Customer 2, the opportunity lies in re-engagement—offering incentives or personalized outreach could help turn a sporadic buyer into a more regular one. By understanding these behavioral patterns, teams can move beyond surface metrics and take targeted, meaningful actions to increase long-term customer value. *** ## Take actions with CLV The true power of Customer Lifetime Value (CLV) modeling lies not only in understanding historical customer behavior but in using that understanding to shape future strategy. Just as binary classification allows us to distinguish between likely churners and loyal customers, CLV reveals how much value each individual customer is expected to generate—and how likely they are to remain engaged.Here’s how to utilize the information effectively: * **Custom Segmentation**: Use `customer_age`, `amount_sum`, and `average_monetary` to segment your customers into meaningful groups. * **Detect Churners:** Use `probability_alive` to segment customers currently being active for non contractual business like eCommerce and Retail. A score of 0.1 means 10% probability the customer is active ("alive") for your business. * **Targeted Marketing Campaigns**: Leverage `repeated_frequency` and `probability_alive` columns to identify customers for loyalty programs or re-engagement campaigns. * **Revenue Projections**: The `CVL_30_60_90_365` column helps in projecting future revenue and understanding the long-term value of customer segments. * **Strategic Planning**: Use `predicted_no_purchases_7_30_60_90_365` to plan for demand, stock management, and to set realistic sales targets. By engaging with the columns in the Details Tab, users can extract actionable insights that can drive strategies aimed at optimizing customer lifetime value. Each metric can serve as a building block for a more nuanced, data-driven approach to customer relationship management. *** ### Actionable Insights The Actionable Insights section translates complex CLV model results into clear, strategic takeaways. It identifies which customer segments pose the greatest revenue risk due to low retention probabilities and quantifies the financial impact of improving retention. By simulating small increases in customer engagement across segments, it reveals the potential uplift in revenue and highlights the most effective levers—such as win-back offers, onboarding nudges, or tailored campaigns. This enables businesses to prioritize actions where they’ll generate the most value, turning predictive insights into measurable impact.

*** ### Model Results API All CLV model results can be retrieved programmatically using the Model Results API. This is especially helpful when: * You want to analyze CLV outputs or take actions in external tools (BI or CRM) * You need to trigger automated actions based on CLV scores—for example, enrolling a “Very High Risk” customer in a retention campaign. The API returns structured outputs for each customer, including total spend, purchase frequency, risk segment, and predicted future value—enabling custom workflows, real-time applications, or dashboard integrations. For implementation details, refer to the [Model Results API documentation](/graphite-note-documentation/rest-api/model-results-api).

*** ### Notebooks With the Notebook feature in Graphite Note, you can present CLV model outputs in a compelling, shareable format. These notebooks are ideal for team collaboration, reporting, or embedding into partner-facing portals. Notebooks act as your storytelling layer—bringing data to life and turning predictions into strategy. For more information, refer to the [Data Storytelling section](/graphite-note-documentation/notebooks/what-is-notebook). *** # Customer Cohort Analysis ## Model Scenario Do you want to understand how customer behavior changes over time, identify key trends, or measure the long-term impact of your business decisions? The Customer Cohort Analysis model in Graphite Note helps you do exactly that by grouping customers based on their first interaction or shared characteristics, and tracking their behavior over time.

A cohort is a group of customers who share a common attribute, most commonly the time of their first purchase. This model enables you to analyze how these cohorts perform over time, such as how often they return, how much they spend, and when their engagement drops off. This is a powerful tool for identifying growth opportunities, evaluating retention strategies, and improving customer lifetime value. *** #### Setting Up the Model Creating a Customer Cohort Analysis in Graphite Note is straightforward. Here’s what you need to configure: * **Time/Date Column:** Select a time-based column (e.g., order date or signup date) to define when customers enter a cohort. * **Aggregation Level:** Choose how to group the data over time—monthly, weekly, or daily. For example, selecting monthly will track cohorts by the month they made their first purchase. * **Customer ID & Transaction ID:** These are required. Customer ID identifies unique customers, while Transaction ID (Order ID) helps track purchases. * **Monetary Column:** Select a column that represents the monetary value (e.g., total amount spent) for each transaction. This is the key metric used to evaluate customer behavior. * **Optional Breakdown (RepeatBy):** You can break down cohort analysis by a business dimension (e.g., region, product category) by enabling the Repeat by option. This is available for variables with fewer than 20 unique values.

Once you’ve configured the settings, your Customer Cohort model is ready to run. *** ## Model Results The results of your cohort analysis are divided into three tabs: [Cohorts](#cohorts), [Repeat by](#repeat-by), and [Details](#details).

*** ### Cohorts This tab displays heatmaps that show how different customer cohorts behave over time. You can switch between several metrics. #### Results representation For each metric in the Cohort Analysis (like Amount, Percentage, or Number of Customers), the results are shown in two formats to help you better understand your customer behavior over time: * **Line Chart (on top): S**hows how each cohort (group of customers that started in the same period) performs over time. Each line represents one cohort, and the chart helps you quickly spot trends — for example, how spending or retention drops off or grows. * **Cohort Table with Heatmap (below): S**hows the exact values for each cohort and time period (e.g., Year 0, Year 1, Year 2…). The color intensity makes it easy to see where values are high or low — helping you spot strong-performing or weak-performing cohorts at a glance.

Results representation in Customer Cohorts

Together, these two visuals give a complete picture: a bird’s-eye trend view and detailed numbers in one place. *** #### **Metrics** **Percentage:** This metric shows you, in percentages, how many people from each group (cohort) are still active in the months or years after their first purchase — compared to how many started in that group. For example, let’s say 1,000 people made their first purchase in January 2022. If 300 of them came back and purchased again in February, the percentage shown for that month would be 30%. If only 150 came back in March, it would show 15%. **Number of Customers:** Shows how many customers from each cohort made repeat purchases in subsequent time periods. Imagine a group of people who all made their first purchase in 2018. The chart then shows how many of them continued to buy again in the years that followed. For example, in 2019, a portion of that 2018 group returned and made another purchase. In 2020, a smaller part came back again, and so on. This pattern helps you understand how long people stay active after their first order and how engagement drops or changes over time for each starting group. **Amount:** This metric shows how much money each group of customers (cohort) spent in each time period after their first purchase. For example, you can see how much the 2018 cohort spent in their first year, then how much they spent in the second year, and so on. It helps you understand how spending behavior changes over time, do customers keep buying, or does spending drop off? **Cumulative Amount:** Instead of showing how much was spent in each individual year, this view adds it all up across the years. It tells you the total amount each cohort has spent from their first year up to the current period. This way, you can track the long-term value of each customer group and compare which cohorts brought in more total revenue over time. **Average Order Value / Revenue Per Customer:** This metrics looks at how much revenue each customer brings on average. It’s especially useful when you want to compare the quality of cohorts, not just their size. For example, even if two groups have the same number of customers, one may bring in more revenue per person. This helps you identify high-value customer groups and refine your marketing or sales strategies.

*** ### Repeat By If you enabled the Repeat by option when setting up the model, this tab will show separate cohort analyses for each value in the selected variable (e.g., each country or product type). This allows you to explore cohort behavior within specific segments of your business.

Repeat by with different Categories that are shown separately

*** ### Details All numerical results from the Cohorts and Repeat by tabs are available here in table format. You can export this data for further analysis, reporting, or dashboard integration.

*** ## Take actions with Customer Cohort Analysis ### Uncover patterns Customer Cohort Analysis helps you go beyond basic reporting by uncovering long-term behavioral patterns across customer groups. Once the model is trained, you can use it to answer important questions about customer retention, spending habits, and lifecycle performance. Here’s how you can take action with Customer Cohort Analysis in Graphite Note: * Track Retention Over Time: Understand how many customers from each cohort return in subsequent periods. This helps evaluate how well your business retains users and where drop-offs occur. * Evaluate Monetization Strategies: See how much each cohort is spending over time. You can compare cohorts to understand whether recent strategy changes (like pricing, discounts, or onboarding) are improving lifetime value. * Spot Trends and Anomalies: Use the visual charts to identify changes in customer behavior. Are newer cohorts spending less or more than older ones? Are there periods of rapid drop-off? Use these insights to adapt your customer journey. * Compare Segments with “Repeat By”: Analyze how different customer segments (like product category, region, or channel) behave over time. This allows for fine-tuned marketing and retention strategies targeted at specific groups. * Optimize Remarketing Timing: By identifying when customer engagement typically drops off, you can time your outreach campaigns to re-engage users right when it matters most.

By analyzing how customer behavior evolves over time, Customer Cohort Analysis turns your raw transaction data into clear, actionable decisions that improve retention, increase revenue, and sharpen your business strategy. ### Create Notebooks You can share your prediction results with your team using the Notebook feature. Notebooks allow you to create various visualizations with detailed descriptions. You can plot model results for better understanding and enable users to make their own predictions. For more information, refer to the [Data Storytelling section](/graphite-note-documentation/notebooks/what-is-notebook). # ABC Pareto Analysis ## Model Scenario Often companies spend a lot of time managing items/entities that have a low contribution to the profit margin. Every item/entity inside your shop does not have equal value - some of them cost more, some are used more frequently, and some are both. This is where the [ABC Pareto analysis](https://en.wikipedia.org/wiki/Pareto_principle) steps in, which helps companies to **focus on the right items/entities.**

ABC analysis is a classification method in which items/entities are divided into three categories, A, B, and C. * Category A is typically the smallest category and consists of the most important items/entities ('the vital few'), * while category C is the largest category and consists of least valuable items/entities ('the trivial many'). To create analysis you need to define 2 parameters: * ID column This represents the unique identifier or description of each entity being analyzed, such as a product ID or product name. * Numeric column - This is a measurable value used to categorize items into A, B, or C classes based on their relative importance. Common metrics include total sales volume, revenue, or usage frequency.

Choosing parameters for ABC Analysis scenario

*** ## Model Results Since ABC inventory analysis divides items into 3 categories, let's analyze these categories by checking the Model Results. The results consist of 4 tabs: [Overview](#overview), [ABC Summary](#abc-summary), [Pareto Chart](#pareto-chart), and [Details ](#details)Tabs. ### Overview In the Overview tab provides an actionable summary that supports data-driven decision-making by focusing on high-impact areas within the dataset. You’ll find a structured breakdown of entities within a chosen dimension (e.g., product\_id) categorized based on a specific metric (e.g., price). This analysis highlights the contributions of different entities, focusing on the most impactful ones. Key highlights in the Overview tab include: • Category Breakdown: The dimension is divided into three categories: *• **Category A:** Top contributors representing few entities with a large share of the total metric.* *• **Category B:** Mid-range contributors with moderate impact and growth potential.* *• **Category C:** The largest group with the least individual impact.* • ABC Analysis Process: Explanation of sorting entities, calculating cumulative totals, and dynamically determining category boundaries based on cumulative contributions. • Benefits and Next Steps: Highlights key points of the analysis. Encourages reviewing the Pareto Chart for visual insights, exploring detailed metrics, and identifying high-impact entities for strategic action.
*** ### ABC Summary • The left chart shows the percentage of entities in each category (A, B, and C), illustrating how they are divided within the selected dimension (product\_id). • The right chart highlights each category’s contribution to the total metric (freight\_price), showing how a smaller portion of entities (Category A) accounts for the majority of the impact, while the larger portion (Category C) has a lesser effect. Together, these charts emphasize the purpose of ABC Analysis: to identify the “vital few” entities (Category A) that drive the most value, supporting targeted decision-making.

In the picture above, we can see that 33.77% of the items belong to category A and they represent 50.55% of the total value, meaning the biggest profit comes from the items in category A! *** ### Pareto Chart The [ABC analysis](https://en.wikipedia.org/wiki/ABC_analysis), also called Pareto analysis, is based on the Pareto principle, which says that 80% of the results (output) come from 20% of the efforts (input). The **Pareto Chart** is a combination of a bar and a line graph - it contains both bars and lines, where each bar represents an item/entity in descending order, while the height of the bar represents the value of the item/entity. The curved orange line represents the cumulative percentage of the item/entity.

*** ### Details The Details tab provides a granular view of the dataset resulting from the ABC Analysis. Each row represents an entity along with the following key details: • The metric used for categorization, indicating each entity’s contribution (. • The category assigned to each entity (A, B, or C) based on its relative impact. • The cumulative percentage contribution of each entity to the total freight price, showing its share within the dataset. This detailed breakdown allows users to identify specific high-impact entities in Category A, moderate contributors in Category B, and lower-impact entities in Category C, supporting data-driven prioritization and decision-making.

There is a long list of benefits from including ABC analysis in your business, such as improved inventory optimization and forecasting, reduced storage expenses, strategic pricing of the products, etc. With Graphite, all you have to do is upload your data, create the desired model, and explore the results. # New vs Returning Customers ## Model Scenario In this report, we want to divide customers into returning and new customers (this is the most fundamental type of [customer segmentation](https://graphite-note.com/machine-learning-for-customer-segmentation)). The new customers have made only one purchase from your business, while the returning ones have made more than one.

Let’s go through their basic characteristics. New customers are: * forming the foundation of your customer base * telling you if your marketing campaigns are working (improving current offerings, what to add to your repertoire of products or services) while returning customers are: * giving you feedback on your business (if you have a high number of returning customers it suggests that customers are finding value in your products or service) * saving you a lot of time, effort, and money. Let's go through the New vs returning customer analysis inside Graphite. The dataset on which you will run your model must contain a **time-related column**.

Since the dataset contains data for a certain period, it's important to choose the **aggregation level**.

For example, if weekly aggregation is selected, Graphite will generate a new vs returning customers dataset with a weekly frequency. It is necessary to contain data such as **Customer ID**

Additionally, if you want, you can choose the **Monetary (amount spent)** variable.

With Graphite, compare absolute figures and percentages, and learn how many customers you are currently retaining on a daily, weekly, or monthly basis. ## Model Results The model results consist of 4 tabs: [New vs Returning](#new-vs-returning), [Retention %](#retention), [Revenue New vs Returning](#revenue-new-vs-returning), and [Details ](#details)Tab. ### New vs Returning Depending on the aggregation level, you can see the number of distinct and returning customers detected in the period on the **New vs Returning Tab**.

Chart showing New and Returning customers in a period in a monthly representation

For example, in December 2020, there were a total of 2.88k customers, of which 1.84K were new and 1.05K returning. You can also choose a daily representation that is more precise.

Chart showing New and Returning customers in a period in a daily representation

### Retention % If you are interested in retention, the percentage of your returning customers, through a period, use the **Retention % Tab**.

Chart showing the percentage of returning customers in a period in a monthly representation

### Revenue New vs Returning The results in the **Revenue New vs Returning Tab** depend on the Model Scenario: if you have selected a monetary variable in the [Model Scenario](#model-scenario), you can observe her behavior, depending on the new and returning customers.

Chart showing the revenue spent depending if they are new customers or not in a period in a monthly representation

### Details Last but not least, on the **Details Tab**, you can find a detailed table where you can see all relevant values which were used for the above results.

# Basket Analysis ### About **Model** In many retail and e-commerce environments, customers rarely purchase a single item. Instead, they buy groups of items that naturally belong together. Understanding these hidden relationships is invaluable for improving store layout, driving cross-sell and upsell strategies, designing bundles, and planning promotions.

Basket Analysis uncovers these meaningful combinations. It identifies which items frequently appear together in the same transaction and highlights strong associations among them. The model returns three key insights: * **Frequent Items:** Which products appear most often. * **Pair Rules:** Which two items tend to be purchased together. * **Triplet Rules:** Which three-item combinations frequently co-occur. Basket Analysis is powered by association-rule mining and uses standard industry metrics—**support**, **confidence**, and **lift**—to measure the strength of product relationships. The goal is simple:\ Help companies understand what customers naturally buy together so they can take immediate action to increase revenue, improve merchandising, and optimize stock decisions. *** ### **Required Columns** To run a Basket Analysis, your dataset must include three required fields:

#### **1. Order ID** A unique identifier representing each customer transaction or basket.\ Examples: invoice number, bill number, transaction ID. #### **2. Items** The name or identifier of the purchased product in each row.\ Every row represents *one item inside a specific transaction*. #### **3. Item Quantities** The number of units sold for that item within the transaction. Graphite Note automatically handles multiple units, duplicates inside the same order, and cleans the dataset before analysis. *** ### **Advanced Settings** Basket Analysis includes configurable thresholds that balance model performance, speed, and insight quality. These parameters allow users to control how rare or frequent an itemset must be to be considered relevant.

#### **Step 1: Ignore Very Rare Items** This parameter filters out items that appear in only a tiny fraction of transactions. * **Why it matters:**\ Very rare items introduce noise and lead to extremely large rule sets with little business value. * **Allowed range:** 0.000001 – 0.05 * **Example:** A value of 0.00005 (0.005%) means any item appearing in less than 0.005% of transactions will be excluded. Higher thresholds produce faster, cleaner results. *** #### **Step 2: Minimum Basket Frequency** This determines how common a pair or triplet must be to be analyzed. * **Why it matters:**\ Setting a minimum ensures you focus on patterns with meaningful sample size. * **Allowed range:** 0.0001 – 0.20 * **Example:** 0.001 (0.1%) means a pair must appear in at least 0.1% of all orders. Lower values allow rare but potentially interesting combinations to surface.\ Higher values return only very frequent itemsets. *** #### **Step 3: Rule Strength Metric** You can choose how association strength is evaluated: * **Lift** (default): Measures how much more likely items appear together than by random chance. * **Confidence:** Probability that B is purchased when A is purchased. Use Lift when you want reliable cross-sell signals.\ Use Confidence when evaluating predictability. **Minimum Rule Strength** filters out weak relationships. * **Lift range:** 1–50 * **Confidence range:** 0.05–1.0 *** #### **Exclude Items (Optional)** You can remove specific items from the analysis (e.g., shipping fees, packaging material). Enter comma-separated, quoted names.\ Example: `"SERVICE FEE", "PACKAGING"`. *** ## **Model Results** Once the scenario runs, Basket Analysis provides four structured results sections: *** ### **Overview**

The Overview page summarizes your dataset after cleaning and transformation. It includes: * **Transactions Analyzed** – Total number of baskets processed. * **Total Unique Items (before cleaning)** – All item types found in the dataset. * **Items Kept After Cleaning** – Items retained after removing ultra-rare entries. * **Average Basket Size** – Typical number of items per transaction. * **Pruning Ratio** – Percentage of items removed due to rarity thresholds. This gives you a clear view of data quality and model scope before diving into pair and triplet patterns. *** ### **Frequent Items** This tab highlights the **Top Items by Support**, showing which products appear most often across all transactions.

Support is defined as: > **Support = (Number of transactions containing the item) ÷ (Total transactions)** This view is useful for: * Identifying top sellers * Planning replenishment and stock levels * Understanding core product demand patterns Each listed item displays: * The number of orders it appears in * Its support percentage * A visual bar indicator for quick comparison *** ### **Pair Rules** Pair Rules uncover strong two-item associations.\ This section answers:\ \&#xNAN;**“If a customer buys item A, which item B do they also tend to buy?”**

Each rule includes: * **Occurrences** – How many transactions contain both items. * **Confidence** – Probability B appears when A appears. * **Support** – How common this pair is across all transactions. * **Lift** – How much stronger the association is compared to random chance. High-lift rules indicate meaningful cross-sell opportunities—ideal for: * Recommender systems * Combo deals * Checkout suggestions * Shelf placement optimization *** ### **Triplet Rules** Triplet Rules identify combinations of **three items** that frequently appear together.\ This reveals natural bundles and multi-product purchasing patterns.

For each triplet, you see: * **Occurrences across transactions** * **Confidence of the three-item combination** * **Support percentage** * **Lift score**, showing the strength of the trio relationship Triplet rules are often used to design bundles, seasonal sets, curated gift packs, or promotions around complementary items. *** ### **Details** The Details tab provides full access to: * Frequency tables for items * All pairwise rules (in sortable table form) * All triplet rules * Internal metrics used in rule creation * Filtered itemsets after pruning * Support and confidence thresholds applied

This is the best place for analysts who want granular, technical output. *** ## **Actionable Insights** Graphite Note automatically generates an **Executive Strategy Brief**, converting the statistical results into clear business recommendations. The brief includes: #### **1. Executive Summary** A plain-language overview of what the analysis revealed, including key product relationships and opportunities. #### **2. Metrics Explained (for Business Users)** Definitions and clear examples of: * **Support** * **Confidence** * **Lift** Written so non-technical stakeholders can understand what makes a rule strong or weak.

#### **3. What We Analyzed** Overview of dataset filtering, thresholds applied, and the focus of the model (frequent items, pairs, triplets). #### **4. Recommended Actions** Graphite Note generates ready-to-implement steps such as: * Cross-sell suggestions * Bundle ideas * Merchandising improvements * Promotional opportunities * Product placement strategies All grounded directly in the discovered rules. *** ## **API Access** You can retrieve model results programmatically through the **Model Results API**.\ This allows automated ingestion into dashboards, recommendation engines, or custom applications. #### **Python Example** ```python import requests url = "https://app.graphite-note.com/api/model/fetch-result/" headers = { "Authorization": "Bearer ", "Content-Type": "application/json" } payload = { "page-size": 100, "page": 1 } response = requests.post(url, json=payload, headers=headers) print(response.json()) ``` The API returns paginated data, including: * Frequent items * Pair rules * Triplet rules * All model metadata * All computed metrics This enables seamless integration into downstream systems. # Advanced ML model settings The Advanced ML Model Settings section in Graphite Note is designed for users who want to go beyond the basics. Whether you’re optimizing a forecast, enriching your model with external signals, or exploring deeper insights from your data, these advanced tools help you unlock the full potential of machine learning—without writing a single line of code. This section covers: * [Actionable Insights ](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/actionable-insights)\ Enable strategic and feature-level AI-generated insights to guide real business decisions. * [Advanced Parameters ](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/advanced-parameters)\ Fine-tune internal model behavior (e.g., handling of outliers, binning strategies, or custom filters). * [Model Health Check ](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/model-overview)\ Get automatic diagnostics and visual feedback to evaluate how well your model is performing. * [Regressors ](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/regressors)\ Add external variables (e.g., marketing spend, weather, events) to improve time series accuracy. Explore each subpage to learn how these features work, when to use them, and how they can boost the performance and interpretability of your ML models in Graphite Note. # Actionable insights The Actionable Insights feature in Graphite Note is designed to provide users with tailored, data-driven recommendations generated using Generative AI. This functionality is available for models created with binary classification, multiclass classification, and regression tasks. Once model is trained, Graphite Note automatically generates two layers of insight designed for both strategic planning and data-driven operational decisions. To generate Actionable Insights in Graphite Note, you must enable them during the model scenario definition process. These insights are available in two dedicated tabs: * [Strategic Summary ](#strategic-summary-tab)– executive-level narrative and high-level strategy * [Feature Insights](#feature-insights-tab) – field-level exploration of key drivers and metrics

Actionable Insights definition in model Scenario

#### Strategic Summary Tab The Strategic Summary tab provides a high-level, narrative-style document that synthesizes the most important findings from your model into a clear business strategy. This view is designed for business leaders, strategists, and cross-functional teams. This automatically generated report includes: * A summary of top feature drivers and patterns influencing your target outcome (e.g., churn, conversion, revenue) * A clear, high-level business goal based on the model output * A set of Strategic Directions — each linked to specific feature behaviors, explaining the “why” and “how” behind trends * Concrete, evidence-based Actionable Insights that suggest changes to pricing, packaging, service tiers, or customer engagement * A table of Objectives, Goals, and KPIs aligned with each strategic recommendation * Root-cause analysis frameworks (e.g., Five Whys, Impact/Effort Matrix, JTBD) that structure the logic behind the recommendations * Observations about anomalies or outliers found in the data distribution {% hint style="info" %} *Use this tab when you want to present insights to leadership, define strategic initiatives, or translate ML outputs into decisions that align with your company’s goals.* {% endhint %} #### Feature Insights Tab The Feature Insights tab takes you deeper into the behavior of individual features and their statistical relationship with the model’s output. Each feature includes: * Feature importance (impact on prediction) * Distribution analysis (e.g., skewness, groupings, bins) * Impact multipliers – how much each range or category increases or decreases likelihood of the target outcome * Narrative interpretation – human-readable explanations of what’s driving the behavior and suggestions for mitigation or action Example: * *TotalCharges*: High churn risk found in low and very high ranges, suggesting pricing and customer value strategies. * *Contract*: Month-to-month contracts show the highest churn risk, guiding strategy toward longer-term offerings. {% hint style="info" %} *Use this tab when you need granular insights for optimizing pricing, targeting specific customer groups, or building campaigns based on churn/engagement signals.* {% endhint %} *** ### How Actionable Insights Work: • Actionable insights are automatically generated during the model training process if this option is enabled. • Once the model is trained, the results are presented on the Actionable Insights Tab, offering users prescriptive analytics tailored to their specific business needs. • These insights analyze the key drivers that most significantly influence the target outcome (e.g., churn, customer segmentation, or sales trends).

Actionable Insights with Strategic Summary tab

*** ### Benefits of Actionable Insights: • Understand Key Drivers: Gain a clear understanding of which factors have the greatest impact on your predictions, such as customer tenure, spending patterns, or product features. • Actionable Recommendations: Receive specific, practical strategies to address identified trends, such as improving customer retention or targeting the right customer segments. • Business Alignment: Tailored narratives help you align insights with your business goals, ensuring data-driven actions that lead to measurable improvements. *** ### Language Customization: The language used in the actionable insights can be adjusted via the [User ](/graphite-note-documentation/account-and-team-setup/profile-information)[profile information](/graphite-note-documentation/account-and-team-setup/profile-information) page, allowing users to receive insights in their preferred language for enhanced understanding. # Advanced parameters The Advanced Parameters section in Graphite Note provides users with the ability to fine-tune their machine learning models for [Binary classification](/graphite-note-documentation/graphite-note-models/machine-learning-models/binary-classification), [Multiclass classification](/graphite-note-documentation/graphite-note-models/machine-learning-models/multiclass-classification), and [Regression](/graphite-note-documentation/graphite-note-models/machine-learning-models/regression) tasks. These parameters mimic the adjustments a data scientist would make to optimize model performance.

While advanced parameters offer flexibility and control, changes to these settings can significantly impact model training and behavior. Users are advised to adjust them cautiously and only with a clear understanding of their effects. *** ### Training Dataset Size * Description: Specifies the proportion of the dataset to be used for training the model, while the remaining portion is reserved for testing. For example, a value of 0.75 means 75% of the data is used to train the model, and 25% is used to evaluate its performance. * Default Value: 0.75 (75% training and 25% testing). * Impact: Adjusting the training dataset size affects the balance between model learning and evaluation: • A higher training size (e.g., 0.85) gives the model more data to learn from, which can improve its ability to recognize patterns. However, it leaves less data for testing, which may limit the ability to accurately assess how well the model will perform on new data. • A lower training size (e.g., 0.6) reserves more data for testing, providing a better evaluation of the model’s generalization to unseen data. However, this reduces the data available for training, which might result in a less accurate model. Choosing the right balance ensures the model has enough data to learn effectively while leaving sufficient data for reliable testing and validation.

*** ### Algorithms to Run * **Description:** A list of machine learning algorithms that will be evaluated and compared during model training. The available algorithms depend on the type of task, with separate sets of algorithms for Regression and Classification (Binary and Multiclass). * *Regression algorithms:* Linear Regression, Ridge Regression, Decision Tree, Random Forest, Support Vector Machine, Light Gradient Boosting Machine, K-Nearest Neighbors

* *Binary and Multiclass Classification algorithms:* K-Nearest Neighbors, Decision Tree, Random Forest, Logistic Regression, LightGBM, Gradient Boosting Classifier, AdaBoost, Multi-Layer Perceptron.

Algorithms you can run for Binary and Multiclass Classification

* **Impact:** By choosing from these algorithms, users can experiment to identify the best-performing model for their specific use case. Selecting an appropriate algorithm based on the task type ensures optimal results and efficient model training. *** ### Sort Models By * **Description:** The Sort Models By option allows users to rank classification models (binary and multiclass) based on specific evaluation metrics after training. This helps users identify the best-performing model for their specific goals. Note that this option is available only for classification tasks and is not applicable to regression models.
Users can sort models by the following metrics: • Accuracy: Measures the proportion of correct predictions among all predictions. • AUC (Area Under the Curve): Indicates how well the model distinguishes between classes; higher values indicate better performance. • F1 Score: The harmonic mean of precision and recall, balancing both metrics. • Precision: The proportion of correctly predicted positive cases out of all positive predictions. • Recall: The proportion of actual positives correctly identified by the model. * **Default Value:** The default metric for sorting is F1 Score, as it balances precision and recall, making it suitable for many classification tasks.
* **Impact:** Sorting models allows users to prioritize and identify the best-performing model based on a specific metric that aligns with their business or project needs. For instance: • If minimizing false negatives is critical, users might prioritize Recall. • If balancing precision and recall is essential, F1 Score would be a better choice.

*** ### Probability Threshold * **Description:** Sets the decision threshold for classifying probabilities in binary classification models. For example, if the threshold is set to 0.5, the model will classify predictions with a probability above 50% as “positive” and below 50% as “negative.” Note that this option is available only for classification tasks and is not applicable to regression models. * **Default Value:** 0.5. * **Impact:** Adjusting the threshold changes how the model makes decisions, which can influence its behavior in identifying positive and negative outcomes. • A lower threshold (e.g., 0.3) makes the model more likely to classify predictions as “positive.” This increases sensitivity (catching more actual positives) but may also increase false positives (incorrectly predicting positives). • A higher threshold (e.g., 0.7) makes the model more conservative in predicting positives. This increases specificity (fewer false positives) but may miss some true positives, leading to more false negatives.

* Simple Example: Imagine you are using a model to detect spam emails: • A low threshold might flag more emails as spam, including some legitimate ones (false positives). • A high threshold might avoid labeling legitimate emails as spam but could miss some actual spam emails (false negatives).
Choosing the right threshold depends on what is more important for your use case—minimizing missed positives or avoiding false alarms. For most general scenarios, the default value of 0.5 works well. *** ### Remove Multicollinearity * **Description:** A toggle to remove highly correlated features from the dataset to address multicollinearity issues. When enabled, the model will automatically exclude features that are too similar to each other. * **Default Value:** True. * **Impact:** Removing multicollinearity improves model stability, interpretability, and performance by ensuring that features are independent and not redundant. • *What is Multicollinearity?* Multicollinearity occurs when two or more features in a dataset are highly correlated, meaning they provide overlapping information to the model. For example, “Total Price” and “Price per Unit” might be highly correlated because one depends on the other. • *Why is it a Problem?* When features are highly correlated, the model struggles to determine which feature is actually influencing the prediction. This can lead to instability in the model’s results and make it harder to interpret which features are important. • *How Does Removing It Help?* By removing one of the correlated features, the model focuses only on unique, non-redundant information. This makes the model more reliable and easier to understand. **ELI5** -Imagine you are solving a puzzle, but you have duplicate pieces that fit in the same spot. Removing the duplicate pieces makes it easier to complete the puzzle and understand how each piece fits. Similarly, removing multicollinearity helps the model work more efficiently and effectively.

*** ### Multicollinearity Threshold * Description: Defines the correlation threshold (e.g., 0.95) to determine which features in the dataset are considered multicollinear. If the correlation between two features exceeds this threshold, one of them will be removed. This option is only available if the Remove Multicollinearity toggle is set to True. * Default Value: 0.95. * Impact: Adjusting the multicollinearity threshold helps control how strictly the model identifies and removes redundant features. This improves model interpretability, simplifies feature selection, and ensures that only unique and valuable information is used for predictions.
• *What Does the Threshold Do?* The threshold determines how strong the correlation between two features must be for them to be considered “too similar.” For example: • A threshold of 0.95 means that features with a correlation of 95% or more are considered redundant. • A lower threshold (e.g., 0.85) will remove more features because it considers lower correlations as redundant. • *Why Does It Matter?:* Highly correlated features confuse the model because they provide the same or overlapping information. By setting the threshold, you decide how much overlap is acceptable before a feature is removed. **ELI5** -Think of the threshold like deciding how similar two books need to be before you donate one of them to save space. If the books tell almost the same story (high correlation), you keep just one. The same logic applies to features in your dataset!

*** ### Enable Model Tuning * **Description:** When enabled, applies hyperparameter tuning to optimize the model’s configuration. Hyperparameter tuning adjusts internal settings of the algorithm to find the combination that delivers the best results. * **Default Value:** False. * **Impact:** Enabling model tuning can significantly improve the model’s accuracy and overall performance by finding the optimal settings for how the algorithm works. However, this process requires additional training time, as the system runs multiple tests to identify the best configuration.
• *What is Hyperparameter Tuning?* Think of hyperparameters as “knobs” that control how a model learns. For example, in a Random Forest algorithm, hyperparameters might decide how many decision trees to use or how deep each tree can grow. Tuning adjusts these knobs to find the best combination for your specific data. • *Why Enable Model Tuning?* Without tuning, the model uses default settings, which might not be the best for your dataset. Tuning customizes the algorithm, helping it perform better by maximizing accuracy or minimizing errors. • *What’s the Trade-off?* Tuning takes more time because the system tests many combinations of hyperparameters to find the best one. This makes training longer, but the results are usually more accurate and reliable. \ **ELI5** - Imagine you’re baking a cake and adjusting the temperature and baking time to get the perfect result. Hyperparameter tuning is like trying different combinations of time and temperature to make the cake just right. Enabling this feature ensures your “cake” (model) performs its best!

*** ### Remove Outliers * **Description:** Specifies whether to remove outliers—extreme or unusual data points—from the dataset based on a defined threshold. If set to True, you can adjust the Outliers Threshold option to determine which data points are considered outliers. * **Default Value:** False.

* **Impact:** Removing outliers can improve model performance by eliminating data points that are far from the majority of the data and could negatively affect predictions. However, removing too many points might result in losing important information, so it’s essential to set the threshold carefully. • *What are Outliers?* Outliers are data points that are very different from the rest of your dataset. For example, if most customers spend $100 to $200 monthly but one customer spends $10,000, that’s an outlier. • *Why Remove Them?* Outliers can confuse the model because they don’t represent typical behavior. For example, if the model tries to adjust for the $10,000 spender, it might make poor predictions for customers in the normal $100-$200 range. • *What Happens if You Enable This?* When you set Remove Outliers to True, you can choose an Outliers Threshold to decide how far a data point must be from the average to be removed. This helps keep only relevant and meaningful data for training the model. \ **ELI5** - Imagine you’re cooking and one ingredient is wildly over-measured compared to the rest. Removing that extreme amount ensures your dish tastes balanced. Similarly, removing outliers ensures your model isn’t influenced by extreme, unusual data points. *** ### Outliers Threshold * **Description:** The Outliers Threshold defines the proportion of data points that are considered outliers. For example, setting the threshold to 0.05 means that 5% of the most extreme data points in the dataset will be treated as outliers and removed. This option is available only if the Remove Outliers toggle is set to True. * **Default Value:** 0.05 (5% of data points are considered outliers) * **Impact:** Adjusting the threshold controls how strict the model is in identifying and removing outliers. • A lower threshold (e.g., 0.02) is stricter and identifies fewer but more extreme outliers. This ensures that only the most unusual data points are removed, preserving the majority of the data. • A higher threshold (e.g., 0.1) is less strict and removes a larger portion of the data. This can be useful for datasets with significant variability but might risk removing useful information.
By setting the threshold appropriately, users can ensure that extreme values that could negatively affect the model’s performance are removed while retaining as much meaningful data as possible. This balance is crucial for improving model accuracy and ensuring the dataset represents typical patterns.

# Model Overview This documentation page explains the Model Overview in Graphite Note, including key performance metrics, health checks, and suggestions to improve dataset quality. The Model Overview page in Graphite Note helps you assess the quality and reliability of your trained model before using it for predictions. This section appears under the Performance tab after model training and provides valuable diagnostics including performance metrics, dataset analysis, and training behavior insights. Detailed Model Health Check section helps users assess how reliable and stable the model is — and whether it’s ready to be used for prediction or needs improvement. The Overview tab in Graphite Note is divided into three distinct sections each designed to help you understand a different aspect of your machine learning model’s performance, quality, and readiness for real-world application:
* [Model Overview](#model-overview) * [Model Health Check](#model-health-check) * [Potential Dataset Improvements](#potential-dataset-improvements) Depending on the model type — classification, regression, or time series forecast — different metrics and diagnostics will be shown.

*** ### Model Overview summarizes all technical elements of your model setup. It includes: * **Dataset Characteristics and Columns** A list of included columns, their data quality (e.g., 0% nulls), and categorical/numerical split. * **Target Column Analysis** A breakdown of the target variable — its class distribution, type (binary, multiclass, continuous), and any known imbalances. * **Model Run Parameters** Key configuration settings: which models were tested, how imbalance was handled, percentage split between training/testing, outlier removal, and multicollinearity treatment. * **Data Preprocessing** A summary of how your data was cleaned and transformed — including imputation, normalization (e.g., z-score), encoding, and feature engineering. * **Model Comparison & Selection** An overview of tested models (e.g., Logistic Regression, Random Forest, LightGBM) and which model performed best based on your primary metric (e.g., F1 score). * **Performance Metrics** Full display of KPIs including Accuracy, F1, AUC, Precision, Recall (for classification), or R², MAE, MAPE, RMSE (for regression and time series). * **Train vs. Test Consistency** Indicates how the model performed on unseen test data compared to training. Small performance drops are acceptable; large gaps should be investigated. * **Next Steps** Instructions for what you can do next — such as using the Predict tab to forecast on new data, or exporting results via the Graphite Note API. *** ### Performance KPIs Performance KPIs shown on the top of Overview screen depend on Model type
For Binary & Multiclass Classification Models, you’ll see: * **F1 Score:** Balance between precision and recall; useful for imbalanced classes. * **Accuracy:** Overall percentage of correct predictions. * **AUC** (Area Under Curve): Measures how well the model distinguishes between classes. * **Precision:** How many predicted positives were actually correct. * **Recall:** How many actual positives were correctly predicted.

Performance KPIs on Binary Classification model

For Regression Models, you’ll see: * **R-Squared:** Measures how much variance in the target variable is explained by the model. * **MAPE** (Mean Absolute Percentage Error): Shows prediction error as a percentage of actual values. * **MAE** (Mean Absolute Error): Average of absolute differences between predicted and actual values. * **RMSE** (Root Mean Squared Error): Penalizes larger errors more than MAE. * **MSE** (Mean Squared Error): Square of MAE, another standard error metric.

For Time Series Forecasting Models, you’ll see the same regression-style KPIs but without explanation popovers: * **R-Squared** * **MAPE** * **MAE** * **RMSE**

These values give you an immediate sense of how accurate your forecast is. Since time series involves predicting over future dates, low MAPE and RMSE values typically indicate strong forecasting performance. *** ### Model Health Check This section provides a deeper explanation of the internal diagnostics automatically generated by Graphite Note during model training. These insights include: * **Target Column Imbalance**\ Graphite Note analyzes the distribution of classes in your target variable to identify potential imbalance issues. For classification models, it shows the number and percentage of each class. If the dataset is heavily skewed toward one class, the model may perform well overall (especially in accuracy) but fail to capture the minority class — an issue especially relevant for binary and multiclass models. For regression and time series, this check is not applicable. * **Dataset Volume**\ We automatically evaluate the ratio between the number of rows and the number of features (columns). A healthy rows-to-columns ratio (e.g., 352:1) typically provides better generalization power. A low ratio might indicate overfitting risk or the need for more data.
* **Performance Stability**\ This compares the model’s performance between the training and testing datasets. Large gaps between training and test scores may point to overfitting. A model that performs well on both is likely to generalize better to unseen data. Metrics such as F1, AUC, and Accuracy (for classification) or R² and MAPE (for regression and time series) are considered here. * **Data Leakage Risk**\ If the model achieves perfect scores across all metrics (100% accuracy, precision, recall), it may indicate data leakage — meaning the model might have access to target-related information it shouldn’t. This results in artificially inflated performance. Feature importance is also analyzed to detect irrelevant or suspiciously strong signals. *** ### Potential Dataset Improvements Based on model diagnostics, Graphite Note suggests ways to improve your model through engineered or additional features. These include: * Engagement scores * Payment behavior * Usage trends * Support interaction history * Customer demographics or lifetime value proxies These tips help guide users toward actionable ways to increase model value and robustness. # Regressors #### What Are Regressors in Time Series Forecasting? In machine learning, regressors (also called external regressors or exogenous variables) are additional features that help improve the accuracy of a model’s predictions. They represent outside influences—factors that may not be part of your target or time column, but that still affect the outcome you’re trying to forecast. For time series models, regressors are often used to account for real-world conditions that can influence trends. These might include: * Marketing Spend (e.g., ad campaigns that drive demand) * Weather Conditions (e.g., temperature affecting ice cream sales) * Holiday Flags or Events (e.g., Black Friday spikes) * Competitor Pricing or economic indicators * Stock Levels, seasonal promotions, or other business-specific signals *** #### Why Use Regressors? Adding regressors allows your time series model to go beyond simply looking at past values. Instead, it starts to understand the why behind patterns. Benefits include: * Improved Forecast Accuracy: The model can better explain variations in the data. * Causal Insights: You get clearer visibility into which external factors are driving changes. * Smarter Planning: You can simulate what might happen *if* certain inputs (like budget or weather) change in the future. *** #### When Should You Use Regressors? Use regressors when: * You know that external factors strongly affect the target value. * You want to improve your forecast by modeling the impact of these factors. * You can provide future values for the regressors during prediction (important!). For example, if you’re forecasting product demand and you know that advertising spend or pricing will change next month, including those as regressors gives the model a much better chance of anticipating that change. *** {% hint style="info" %} You can select up to 5 regressor columns when training your Time series model in Graphite Note. {% endhint %} {% hint style="info" %} Once a model includes regressors, it requires future values of those regressors to make forecasts. Because entering those values manually in the UI isn’t practical, forecasts for Graphite Note models with regressors can only be generated via the Prediction API. {% endhint %} # Regressors in action Regressors help time series forecasting models “think ahead” by adding relevant external factors that influence the predicted outcome—such as promotions, price changes, or weather conditions. This makes your forecast more intelligent and better aligned with real-world scenarios.
#### 📊 Key Takeaways from Our Case Study: * Adding regressors to a sales forecast increased accuracy by 36%. * Regressors helped the model understand why sales might spike or drop—beyond just seasonal patterns. * They capture cause-effect relationships that time-based signals alone can’t fully explain. * Results were visible immediately: better performance metrics like lower error rates (e.g., MAE, RMSE) and more accurate predictions during key events. *** #### ⚡ Before vs After Regressors
Below you can see a direct visual comparison of model performance before and after applying regressors:

*** #### ✅ Why It Matter? Regressors turn a simple forecast into a strategic tool. They give your model context about what’s happening in the business, allowing it to: * Anticipate demand shifts more accurately * Understand the impact of external drivers * Provide more actionable insights for planning and decision-making *** #### 📖 Full Article Read the full story and real-life example on our blog: [Forecasting That Thinks Ahead: How Regressors Improved Accuracy by 36%](https://graphite-note.com/forecasting-that-thinks-ahead-how-regressors-improved-accuracy-by-36/)\ \ ![](/files/fl0nsom9iRB4SJBbjpPN) # Frequency alignment #### Frequency Alignment: Matching Regressors to Your Model’s Time Granularity Every regressor must supply exactly one value for each timestamp in your target series—no more, no less. Graphite Note enforces this 1:1 rule to ensure your external features line up perfectly with what you’re forecasting. * Daily models → one regressor value per date * Hourly models → one regressor value per hour * Weekly models → one regressor value per week, etc. If your raw data has a finer granularity than your model (e.g., minute-level web clicks for a daily forecast), you must aggregate it (sum, mean, max, etc.) so each day has a single number. Conversely, you can’t “stretch” a single daily value into multiple hourly slots without interpolation—each timestamp needs its own authentic input.\ \ \&#xNAN;***Numerical Regressors (e.g. Cost per Unit)*** * *Aggregate before uploading.* * *Choose a summary statistic that fits your use case:* * *Average (mean unit cost)* * *Sum (total cost volume)* * *Max/Min (peak or baseline cost)* * *Result: one number per date (or hour/week) that aligns 1:1 with your target.* \ \&#xNAN;***Categorical Regressors (e.g. Category Code, Weather Label)*** * *Ensure only one category value per timestamp.* * *If multiple occur:* * *Select the most representative label (e.g., “predominant” weather condition),* * *Encode logic-based rules (e.g., if any “Rain” occurs → is\_Rain = 1),* * *Or use proportions: % Category A = count(A) ÷ total.* By strictly aligning frequencies—one regressor row for each target timestamp—you eliminate timing mismatches and give your model clean, reliable inputs. # Model execution logs ### Overview The Execution Logs dialog (open via ⚙️ > Logs) records every model run across your workspace. It captures metadata such as start/end time, model type, hyper-parameters, and test-set metrics—providing a single place to verify, audit, and debug model training at scale.

*** ### Key Features | Column | What it tells you | | ------------------------------ | ------------------------------------------------------------------------------------------- | | Start time / Finished time | UTC timestamps marking the beginning and end of training. | | Duration | How long the run took (e.g., 2m37s). | | Status | done-ok, error, or running; useful for spotting failures quickly. | | Model name & Model Type | Friendly name plus classification / regression / time-series, etc. | | Model Code | Unique 12-character hash—required for API calls (/prediction, /fetch-result, etc.). | | Tenant code | Internal workspace identifier (visible for multi-tenant admins). | | Actionable Insights goal | “Show Value…” link with the text prompt you supplied when enabling AI insights. | | Model advanced run parameters | All non-default parameters—outlier threshold, collinearity cutoff, imbalance handling, etc. | | Metrics for test dataset | F1, Accuracy, AUC for classification; R², MAE, RMSE for regression/time-series. | | Trained model hyper-parameters | Captures grid-search results or any user-defined hyper-settings. | | Dataset shape | Rows and columns fed into the trainer after preprocessing. | *** ### Filters and Search * Use the 🔍 field beneath each header to search by model name, code, or date. * Click the funnel icon to show only errors, a specific model type, or a date range. *** ### Typical Workflows * Confirm completion —Refresh logs to ensure today’s run shows done-ok. * Grab model code for API endpoints without opening the model UI. * Compare durations to detect unusually long or short runs, hinting at data issues. * Audit hyper-parameters before sharing results with stakeholders. * Investigate failures (error status) and cross-reference with advanced parameters. *** ### Best Practices * Refresh first – Click Refresh Logs after a run to pull the latest status. * Export for audit – Copy rows or take a screenshot before purging old models. * Track trends – Rising durations or frequent errors can indicate growing data size or schema drift. * Secure access – Only Admins can view logs; restrict role permissions if needed. # Improve your ML Models Even when a platform handles most of the heavy lifting, a little preparation and post-training care can unlock noticeably better predictions. Improving your machine learning models is a process. With better data, clear structure, and a few practical tweaks, you’ll get results that are not only accurate—but also actionable. Graphite Note is here to guide you every step of the way. ### Start Simple: Use Default Settings If you’re not familiar with advanced machine learning concepts, it’s best to leave the [Advanced Parameters](/graphite-note-documentation/graphite-note-models/advanced-ml-model-settings/advanced-parameters) unchanged. The default settings in Graphite Note are optimized to give solid performance in most scenarios. You don’t need to tweak anything unless you’re confident about what each parameter does.
*** ### **Always look the Model Overview & Health Check** Once your model is trained, always review the Model Overview screen. This includes the Model Health Check, which will highlight potential issues like class imbalance or overfitting. The “Potential Dataset Improvements” section will also suggest ways to improve the dataset used for training. Don’t skip this part—it’s your shortcut to better model quality.
*** ### Low F1 score or **R-Squared**? Don’t Panic. If your classification model shows a low **F1 score** or your regression model has a weak **R-Squared** value, that doesn’t always mean it’s a “bad” model. It could simply mean the model doesn’t have enough quality data to learn from, or the problem you’re solving is more complex and needs richer input. Instead of judging only by the score, explore what your model is telling you: * Are there enough examples of each target value (Yes/No, or different classes)? * Do features (columns) have enough variation and filled values? * Are there hidden patterns that could become visible with more data? {% hint style="info" %} **Low score DOES NOT MEAN bad model.** *A classifier with an F1 of 0.45 in a domain where random chance is 0.05 may be excellent; an R² of 0.30 on financial time series can be perfectly usable for ranking decisions.* Context matters! {% endhint %} *** ### Tips to Improve Your Model Here are practical, beginner-friendly tips to level up your ML models in Graphite Note **Improve Data Quality and Consistency** * Fill missing values if possible, or clean them using simple methods (like replacing with average or most frequent value). * Avoid empty or half-filled columns—features need data to be useful. * Ensure consistent formatting (e.g., “Yes” and “yes” should be the same). **Add More Data** * More data = more signal. The model performs better when it sees more examples. * For time series models, more historical data is crucial. One or two months of data won’t give strong predictions. Ideally, use several years of data if available. * If you’re predicting daily or weekly events (e.g., sales), having frequent and consistent data points is key. If a product only appears once every 3 months, the model won’t have enough information to learn from it. **Enrich Your Dataset** * More features help the model “see” more patterns. * Add meaningful attributes like customer type, region, channel, or weather—anything that might influence the outcome. * These additional features become Key Drivers in your analysis and help unlock deeper insights. **Use Derived Features for Time Context** * If you want to simulate time series prediction using regression, create new columns from your date: extract Day, Month, Year, Weekday, IsWeekend, etc. * These features help regression models understand temporal patterns without needing a full time series setup. **Ensure Good Target Balance** * For classification models, ensure your target column isn’t too skewed. A model trained with 95% “No” and 5% “Yes” will struggle to predict the “Yes” cases. * If imbalance exists, Graphite Note shows it in the Health Check—use techniques like resampling or gathering more examples of the underrepresented class. **Avoid Redundant or Highly-Correlated Features** * When building a regression model, skip predictor columns that are direct arithmetic combinations of each other. * For instance, if you’re forecasting Revenue and you already include Price and UnitsSold as inputs, do not also feed Revenue (or Price × UnitsSold) back in as a feature—the model would essentially be learning from its own answer, leading to multicollinearity and unreliable coefficients. Keep either the individual components *or* the combined metric, but never both. **Keep It Relevant** * Remove features that are irrelevant or only available after the event happens. These can lead to data leakage, giving your model false confidence. * Always think: “Would I have this information *before* making the prediction?” **Name Your Features Clearly** * Use clear, intuitive names for your columns. This helps when reviewing Key Drivers, insights, and actionable recommendations. **Monitor Model Drift** * If you’re using live data, monitor how your model performs over time. A model trained six months ago may become outdated if customer behavior or market conditions change. * Periodically retrain your model with fresh data. *** ### **Additional Levers for Improvement** * Feature selection / dimensionality reduction – remove redundant or noisy variables to cut over-fitting. * Automated hyper-parameter search – once you’re comfortable, use grid/random/Bayesian search *systematically* instead of one-off tweaks. * Cross-validation – prefer k-fold or rolling-window CV to a single split, especially for small or non-IID data. * Regular retraining & monitoring – schedule retrains when fresh data arrives and set up drift alerts on key metrics. * Human-in-the-loop review – domain experts can spot impossible predictions long before metrics change. * Ensembling – combine diverse algorithms (e.g., gradient-boosted trees + neural nets) for stability. * Explainability tools – SHAP values or partial-dependence plots reveal where the model is right *and* where it is fragile. *** ### **Iterate Methodically** 1. Plan a single change (e.g., add holiday indicator). 2. Retrain and document both the new metric values *and* qualitative observations. 3. Compare to the previous run; keep the change only if it *consistently* helps. 4. Repeat—small, controlled experiments beat guessing. # What is Notebook? The main idea behind Graphite Notebook is to do your own [Data Story](https://graphite-note.com/machine-learning-automation-and-data-storytelling)[telling](https://graphite-note.com/machine-learning-automation-and-data-storytelling); create various visualization with detailed descriptions, plot model results for better understanding, etc. # My first Notebook To create your notebook: 1. Go to *Create New* on *Notebooks*, or *New Notebook* when you are in the notebook list

2. Name your notebook. Additionally, you can add a description to your notebook. Also, you can select an existing [tag](/graphite-note-documentation/faq#what-is-a-tag) (to connect your notebook with datasets and models) or create a new one.

You can now easily add and delete different text and visualization blocks ([more about visualization](/graphite-note-documentation/notebooks/data-visualization)). If you are not satisfied with the block position, you can easily move it. To speed things up, you can even clone each block. Your first Notebook is created and ready for exploring. # Data Visualization After you have created your notebook, we will go through some basic visualization tools (in case you missed how to create one, click [here](/graphite-note-documentation/notebooks/my-first-notebook)). > *Data visualization gives us a clear idea of what the information means by giving it visual context through maps or graphs. This makes the data more natural for the human mind to comprehend, making it easier to identify trends, patterns, and outliers within large data sets.* Once you have created a notebook, to visualize we have to: 1. Select *New visualization*

2. Select a dataset; a CSV file you uploaded or a dataset obtained from a model you ran.

3. Select Visualization Type. Depending on what you want, you can select: ### Combination Graph

1. Select *Add category*; represents the abscissa of the coordinate system. 2\. Select *Add series*; which represents the ordinate of the coordinate system.\ With a wide range of colors, you can choose different types of chart lines.

### Table

1. Select *Add column*; create a table from selected columns.

### Pie Chart

1. Select *Add category*; which represents the abscissa of the coordinate system. 2. Select *Add series*; which represents the ordinate of the coordinate system.

### Scatter Chart

1. Select *Add* for Primary Measure 2. Select *Add series*; which represents the ordinate of the coordinate system.

You can create visualizations with different datasets - there is no restriction that all visualizations within a Notebook must be created from the same dataset. # Notebook Settings # API Introduction Graphite Note offers three powerful APIs to enhance your data-driven workflows. Whether you need to upload fresh data, generate predictions, or retrieve model outputs, these APIs are designed for seamless integration and rapid deployment * [Dataset API ](/graphite-note-documentation/rest-api/dataset-api)-enables users to easily populate their datasets by sending data directly to Graphite Note, ensuring seamless data integration. * [Prediction API ](/graphite-note-documentation/rest-api/prediction-api)-allows users to request predictions based on attributes they provide, leveraging Graphite Note’s machine learning models to generate accurate business forecasts. * [Model Results API ](/graphite-note-documentation/rest-api/model-results-api)-lets users fetch the outputs of their trained models in a structured, paginated format. This is especially useful for viewing or processing large prediction result sets in batches. * [Model Info API ](/graphite-note-documentation/rest-api/model-info-api)– provides clean and structured metadata about your models, including codes, names, timestamps, dataset linkage, and full model configuration (excluding bulky training artifacts), making it ideal for MLOps, auditing, and cataloging.

APIs available in Graphite Note environment

Each API is straightforward to use, allowing you to integrate predictive analytics into your existing systems rapidly and effectively. This section offers a comprehensive overview of testing and command usage, including detailed instructions on how to make the most of these APIs. This section offers a comprehensive overview of testing and command usage, including detailed instructions. If you're new to the API, we recommend starting with the quick start guide. It's a straightforward solution designed to validate your setup and ensure you begin on the right track.\ \ We value your feedback and strive to provide the best experience possible. If you encounter any challenges with commands that should be included in the API or its documentation, our dedicated support team is ready to assist you. Feel free to reach out to us via our in-app chat or by emailing , and we'll be more than happy to guide you in the right direction or incorporate any necessary updates. # Dataset API Dataset API enables users to easily populate their datasets by sending data directly to Graphite Note, ensuring seamless data integration.

*** Use this API to create new datasets directly in Graphite Note environment, specifying the dataset's structure. This API is particularly useful for automating the setup of datasets during the onboarding process, allowing for easy integration with client-specific data requirements. *** In the following sections, you will find more details about the Dataset API. # Create Use this endpoint to define the columns, types, and other properties for a new dataset tailored to your needs. ### Create a Dataset To create a new dataset, follow these steps: ```python url = 'https://app.graphite-note.com/api/dataset-create' ``` 1. To create a new dataset, make a POST request to `/dataset-create` with the required parameters in the request body. 2. To specify the dataset structure, include an array of column definitions with each column's name, alias, type, subtype, and optional format. * **Type** can be: * `measure` * `dimension` * **Subtype** can be: * `text` * `numeric` * `date` * `datetime` The response will include key details about the created dataset, including the dataset code, table name, and the number of columns. *** ### **Example Usage** **Creating a New Dataset** For example, making a POST request to the following URL with the provided JSON body would result in the response below:\ \ **Request** ```json POST /dataset-create Authorization: Bearer YOUR-TENANT-TOKEN { "user-code": "0f02b4d4f9ae", "columns": [ { "name": "InvoiceNo", "alias": "InvoiceNo", "type": "dimension", "subtype": "text" }, { "name": "StockCode", "alias": "StockCode", "type": "dimension", "subtype": "text" }, { "name": "Description", "alias": "Description", "type": "dimension", "subtype": "text" }, { "name": "Quantity", "alias": "Quantity", "type": "measure", "subtype": "numeric", "format": "#,###.##" }, { "name": "InvoiceDate", "alias": "InvoiceDate", "type": "dimension", "subtype": "datetime", "format": "Y-d-m H:i:s" }, { "name": "UnitPrice", "alias": "UnitPrice", "type": "measure", "subtype": "numeric", "format": "#,###.##" }, { "name": "CustomerID", "alias": "CustomerID", "type": "measure", "subtype": "numeric", "format": "#,###.##" } ], "name": "Client onboarding dataset creation" } ``` **Response** ```json { "data": { "dataset-code": "eca2ad3940e3", "table-name": "dataset_csv_eca2ad3940e3", "columns": 7 } } ``` This request creates a dataset with the specified columns, each having unique names, types, and formats tailored to client onboarding requirements. ### Example Python Implementation ```python import requests # Replace with your actual tenant token tenant_token = YOUR-TOKEN # Replace with your actual endpoint URL url = 'https://app.graphite-note.com/api/dataset-create' # Payload for dataset creation payload = { "user-code": YOUR-CODE, "columns": [ { "name": "order", "alias": "order", "type": "dimension", "subtype": "text" }, { "name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text" }, { "name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##" } ], "name": "API dataset test" } # Headers with the bearer token headers = { "Authorization": f"Bearer {tenant_token}", "Content-Type": "application/json" } # Send the POST request response = requests.post(url, json=payload, headers=headers) # Print response print("Status code:", response.status_code) print("Response body:", response.json()) ``` # Fill Use this API to populate an existing dataset with new data. This API allows you to insert or append rows of data into a pre-defined dataset, making it useful for updating data during your projects to keep ML models in Graphite Note with up to date datasets. ```python fill_url = "https://app.graphite-note.com/api/dataset-fill" ``` ### **Fill a Dataset** To populate a dataset, follow these steps: 1. Make a POST request to `/dataset-fill` with the required parameters in the request body. 2. Include the `user-code` and `dataset-code` to identify the dataset and the user making the request. 3. Define the structure of the data by specifying the `columns` parameter, which includes details like column names, aliases, types, subtypes, and optional formats. 4. Provide the data to be inserted via the `insert-data` parameter. * If `compressed` is `false`, the data should be formatted as a **JSON-escaped string**. * If `compressed` is `true`, the data should be **base64-encoded** after being gzipped. {% hint style="info" %} If you have large dataset, it is a good idea to call dataset-fill in batch sizes of 10.000 rows, for example. Working example is below. {% endhint %} **Example of `insert-data` with JSON (when `compressed: false`):** ```json { "insert-data": "[[\"536365\",\"85123A\",\"WHITE HANGING HEART T-LIGHT HOLDER\",6,\"2010-12-01 08:26:00\",2.55,17850], [\"536365\",\"71053\",\"WHITE METAL LANTERN\",6,\"2010-12-01 08:26:00\",3.39,17850]]", "compressed": false } ``` 5. Use the `append` parameter to indicate whether to append the data to the existing dataset (`true`) or truncate the dataset before inserting the new data (`false`). 6. use the `compressed` parameter to specify if the data is **gzip compressed** (`true`) or not (`false`). ### **Example Usage** **Filling a Dataset with Base64-encoded Data** For example, making a POST request to the following URL with the provided JSON body would result in the response below: ```json POST /dataset-fill Authorization: Bearer YOUR-TENANT-TOKEN { "user-code": "0f02b4d4f9ae", "dataset-code": "eca2ad3940e3", "columns": [ { "name": "InvoiceNo", "alias": "InvoiceNo", "type": "dimension", "subtype": "text" }, { "name": "StockCode", "alias": "StockCode", "type": "dimension", "subtype": "text" }, { "name": "Description", "alias": "Description", "type": "dimension", "subtype": "text" }, { "name": "Quantity", "alias": "Quantity", "type": "measure", "subtype": "numeric", "format": "#,###.##" }, { "name": "InvoiceDate", "alias": "InvoiceDate", "type": "dimension", "subtype": "datetime", "format": "Y-d-m H:i:s" }, { "name": "UnitPrice", "alias": "UnitPrice", "type": "measure", "subtype": "numeric", "format": "#,###.##" }, { "name": "CustomerID", "alias": "CustomerID", "type": "measure", "subtype": "numeric", "format": "#,###.##" } ], "insert-data": "H4sIAAAAAAAACtT9W4/...NDtGHlBlNUP4lMXrO9OfaUk2XMReE/t///68rEPhOT+AC" "compressed": true, "append": true } ``` **Response** ```json { "data": { "status": "success", "details": { "dataset-code": "eca2ad3940e3", "rows-count": 518125 } } } ``` The response will confirm the status of the operation, including the dataset code and the number of rows inserted. *** ### **Sample Python Code** This Python script reads data from a PostgresSQL, converts it to JSON, compresses it using gzip, and encodes it as a Base64 string, ready to be sent via API requests. Then It sends to Graphite Note in batch size of 10.000 rows. ```python import psycopg2 import json import gzip import base64 import requests from io import BytesIO import time db_config = { "host": "localhost", "user": "", "password": "", "port": "", "database": "dwh" } tenant_token = 'YOUT-TOKEN' dataset_code = "GN-DATASET-CODE" user_code = "GN-USER-CODE" fill_url = "https://app.graphite-note.com/api/dataset-fill" complete_url = "https://app.graphite-note.com/api/dataset-complete" batch_size = 10000 columns = [ {"name": "order", "alias": "order_uuid", "type": "dimension", "subtype": "text"}, {"name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text"}, {"name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##"}, ] headers = { "Authorization": f"Bearer {tenant_token}", "Content-Type": "application/json" } def compress_data(data): json_string = json.dumps(data) buffer = BytesIO() with gzip.GzipFile(fileobj=buffer, mode="w") as f: f.write(json_string.encode("utf-8")) return base64.b64encode(buffer.getvalue()).decode("utf-8") def send_batch(batch, batch_number): print(f"\nSending batch #{batch_number} with {len(batch)} rows...") start_time = time.perf_counter() payload = { "user-code": user_code, "dataset-code": dataset_code, "columns": columns, "insert-data": compress_data(batch), "compressed": True, "append": True } response = requests.post(fill_url, headers=headers, json=payload) end_time = time.perf_counter() elapsed = round(end_time - start_time, 2) print(f"Batch #{batch_number} response: {response.status_code} (Time: {elapsed} seconds)") if response.status_code != 200: print(response.text) def send_completion(): payload = { "user-code": user_code, "dataset-code": dataset_code } response = requests.post(complete_url, headers=headers, json=payload) print(f"\nComplete response: {response.status_code}") if response.status_code != 200: print(response.text) # === MAIN EXECUTION === try: conn = psycopg2.connect(**db_config) cursor = conn.cursor(name='stream_cursor') cursor.itersize = batch_size cursor.execute("SELECT order, customer_id, number_of_items FROM public.my_dwh_db") batch = [] batch_number = 1 for row in cursor: clean_row = [str(val) if not isinstance(val, (int, float)) else val for val in row] batch.append(clean_row) if len(batch) >= batch_size: send_batch(batch, batch_number) batch = [] batch_number += 1 if batch: send_batch(batch, batch_number) send_completion() except Exception as e: print("Error:", e) finally: if 'cursor' in locals(): cursor.close() if 'conn' in locals(): conn.close() ``` # Complete ### Description The /dataset-complete endpoint is designed to signal the end of the dataset insertion process. Once all batches have been inserted via the /dataset-fill endpoint, this method should be called to trigger the final dataset shape calculation and any other necessary post-processing steps. ```python complete_url = "https://app.graphite-note.com/api/dataset-complete" ``` ### Parameters * user-code (string): Unique code identifying the user. * dataset-code (string): A unique code for the dataset, if pre-defined. ### **Indicate end of dataset insertion** To populate a dataset, follow these steps: 1. Make a POST request to `/dataset-complete` with the required parameters in the request body. 2. Include the `user-code` and `dataset-code` to identify the dataset and the user making the request. ### **Example Usage** For example, making a POST with following header included into request with the provided JSON body would result in the response below: **Header** ``` POST /dataset-complete Authorization: Bearer YOUR-TENANT-TOKEN ``` **Request** ```json { "user-code": "0f02b4d4f9ae", "dataset-code": " a49932c0f135" } ``` **Response** ```json { "data": { "status": "success", "details": { "dataset-code": "a49932c0f135", "rows-count": 542289 } } } ``` # Prediction API Prediction API allows users to request predictions by sending data to a trained model in Graphite Note.

*** To interact with the Graphite Note API and perform predictions using a specific model, you need to make a POST request to the API endpoint. Depending on the model type and use case, you can now use two different versions of the API, each with its own request body format: * **v1 Endpoint** – for backward-compatible, alias-based input * **v2 Endpoint** – for simplified, column-name-based input (recommended for new users) *** In the following sections, you will find more details about the Prediction API. # Request v1 Necessary information to make a request to an API endpoint for the Graphite Note application #### Method API is using **POST** method *** #### Supported Models It supports the following model types: * ✅ Binary Classification (e.g., YES/NO, True/False outcomes) * ✅ Multiclass Classification (e.g., predicting one of several categories) * ✅ Regression (e.g., predicting a numeric value) *** #### Request URL The base URL for the API endpoint is: ``` https://app.graphite-note.com/api/v1/prediction/model/[model-code] ``` Replace `[model-code]` in the URL with the code of the specific model you want to use for predictions. *** #### Where can I find model code? To easily find the `[model-code]`, open the specific model and navigate to the Settings tab. The model code can be found in the ID section.

# Headers The request requires the following headers to be included: * `Authorization`: This header should be set to "`Bearer [token]`". Replace `[token]` with your unique token. The token can be found by accessing the account info page in the Graphite Note app, under the section displaying your current plan information.

* `Content-Type`: This header should be set to "application/json" to indicate that the request payload is in JSON format. --- [Next Page](/graphite-note-documentation/llms-full.txt/1)