Preprocessing Data
Last updated
Last updated
In Graphite Note, data preparation is divided into two main steps to ensure optimal results, with all tasks handled automatically so you don’t have to worry about them.
Features Not Fit for Model: Graphite automatically excludes columns that aren’t suitable for modeling, such as date/datetime columns, to ensure only relevant features are used in training.
To achieve the best results, Graphite Note takes care of several preprocessing steps:
• Null Values: It identifies and processes null values based on best practices, which may include imputing missing values or removing rows with excessive null values.
• Missing Values: Missing values are managed automatically to maintain data integrity, using appropriate methods such as filling, averaging, or exclusion based on the data type.
• One-Hot Encoding: Categorical variables are automatically transformed using one-hot encoding, converting categories into numerical formats suitable for model training.
• Fix Imbalance: Graphite addresses class imbalance in classification tasks, ensuring a balanced representation of classes.
• Normalization: Numeric columns are scaled to a uniform range, ensuring consistent data for models that require normalized input.
• Constants: Columns with constant values, which don’t contribute useful information, are identified and excluded from the dataset.
• Cardinality: Graphite optimizes high-cardinality categorical columns for model performance, handling complex categorical data effectively.
In traditional data science projects, these steps would require manual effort from data scientists, including data cleaning, encoding, scaling, and testing, often involving a significant amount of time and expertise. Graphite Note automates this entire process, completing these steps in seconds and allowing users to focus on insights and decision-making rather than data preparation.