Preprocessing Data

In Graphite Note, data preparation is divided into two main steps to ensure optimal results, with all tasks handled automatically so you don’t have to worry about them. Data preprocessing is a crucial step in machine learning, enhancing model accuracy and performance by transforming and cleaning the raw data to remove inconsistencies, handle missing values, and scale features, and ensure compatibility with the chosen algorithm.

Step 1: Exclusion of Columns

Features Not Fit for Model: Graphite automatically excludes columns that aren’t suitable for modeling, such as date/datetime columns, to ensure only relevant features are used in training.

Step 2: Preprocessing

To achieve the best results, Graphite Note takes care of several preprocessing steps:

• Null Values: It identifies and processes null values based on best practices. If the column is 50% null or more, the column will not be included in model training

• Missing Values: Missing values are managed automatically to maintain data integrity. For a numerical column it will change it by the average, and for a categorical feature it will become "not_available"

• One-Hot Encoding: Categorical variables are automatically transformed using one-hot encoding, converting categories into numerical formats suitable for model training.

• Fix Imbalance: Graphite addresses class imbalance in classification tasks, fixing the inequal distibution of target class and ensuring a balanced representation of classes.

• Normalization: Numeric columns are scaled to a uniform range, ensuring consistent data for models that require normalized input.

• Constants: Columns with constant values, which don’t contribute useful information, are identified and excluded from the dataset.

• Cardinality: Graphite optimizes high-cardinality categorical columns for model performance, handling complex categorical data effectively.

In traditional data science projects, these steps would require manual effort from data scientists, including data cleaning, encoding, scaling, and testing, often involving a significant amount of time and expertise. Graphite Note automates this entire process, completing these steps in seconds and allowing users to focus on insights and decision-making rather than data preparation.

PreviousIntroduction NextMachine Learning Models

Last updated 2 months ago

Was this helpful?