Improve your ML Models
Last updated
Was this helpful?
Last updated
Was this helpful?
Even when a platform handles most of the heavy lifting, a little preparation and post-training care can unlock noticeably better predictions. Improving your machine learning models is a process. With better data, clear structure, and a few practical tweaks, you’ll get results that are not only accurate—but also actionable. Graphite Note is here to guide you every step of the way.
If you’re not familiar with advanced machine learning concepts, it’s best to leave the unchanged. The default settings in Graphite Note are optimized to give solid performance in most scenarios. You don’t need to tweak anything unless you’re confident about what each parameter does.
Once your model is trained, always review the Model Overview screen. This includes the Model Health Check, which will highlight potential issues like class imbalance or overfitting. The “Potential Dataset Improvements” section will also suggest ways to improve the dataset used for training. Don’t skip this part—it’s your shortcut to better model quality.
If your classification model shows a low F1 score or your regression model has a weak R-Squared value, that doesn’t always mean it’s a “bad” model. It could simply mean the model doesn’t have enough quality data to learn from, or the problem you’re solving is more complex and needs richer input.
Instead of judging only by the score, explore what your model is telling you:
Are there enough examples of each target value (Yes/No, or different classes)?
Do features (columns) have enough variation and filled values?
Are there hidden patterns that could become visible with more data?
Here are practical, beginner-friendly tips to level up your ML models in Graphite Note
Improve Data Quality and Consistency
Fill missing values if possible, or clean them using simple methods (like replacing with average or most frequent value).
Avoid empty or half-filled columns—features need data to be useful.
Ensure consistent formatting (e.g., “Yes” and “yes” should be the same).
Add More Data
More data = more signal. The model performs better when it sees more examples.
For time series models, more historical data is crucial. One or two months of data won’t give strong predictions. Ideally, use several years of data if available.
If you’re predicting daily or weekly events (e.g., sales), having frequent and consistent data points is key. If a product only appears once every 3 months, the model won’t have enough information to learn from it.
Enrich Your Dataset
More features help the model “see” more patterns.
Add meaningful attributes like customer type, region, channel, or weather—anything that might influence the outcome.
These additional features become Key Drivers in your analysis and help unlock deeper insights.
Use Derived Features for Time Context
If you want to simulate time series prediction using regression, create new columns from your date: extract Day, Month, Year, Weekday, IsWeekend, etc.
These features help regression models understand temporal patterns without needing a full time series setup.
Ensure Good Target Balance
For classification models, ensure your target column isn’t too skewed. A model trained with 95% “No” and 5% “Yes” will struggle to predict the “Yes” cases.
If imbalance exists, Graphite Note shows it in the Health Check—use techniques like resampling or gathering more examples of the underrepresented class.
Avoid Redundant or Highly-Correlated Features
When building a regression model, skip predictor columns that are direct arithmetic combinations of each other.
For instance, if you’re forecasting Revenue and you already include Price and UnitsSold as inputs, do not also feed Revenue (or Price × UnitsSold) back in as a feature—the model would essentially be learning from its own answer, leading to multicollinearity and unreliable coefficients. Keep either the individual components or the combined metric, but never both.
Keep It Relevant
Remove features that are irrelevant or only available after the event happens. These can lead to data leakage, giving your model false confidence.
Always think: “Would I have this information before making the prediction?”
Name Your Features Clearly
Use clear, intuitive names for your columns. This helps when reviewing Key Drivers, insights, and actionable recommendations.
Monitor Model Drift
If you’re using live data, monitor how your model performs over time. A model trained six months ago may become outdated if customer behavior or market conditions change.
Periodically retrain your model with fresh data.
Feature selection / dimensionality reduction – remove redundant or noisy variables to cut over-fitting.
Automated hyper-parameter search – once you’re comfortable, use grid/random/Bayesian search systematically instead of one-off tweaks.
Cross-validation – prefer k-fold or rolling-window CV to a single split, especially for small or non-IID data.
Regular retraining & monitoring – schedule retrains when fresh data arrives and set up drift alerts on key metrics.
Human-in-the-loop review – domain experts can spot impossible predictions long before metrics change.
Ensembling – combine diverse algorithms (e.g., gradient-boosted trees + neural nets) for stability.
Explainability tools – SHAP values or partial-dependence plots reveal where the model is right and where it is fragile.
Plan a single change (e.g., add holiday indicator).
Retrain and document both the new metric values and qualitative observations.
Compare to the previous run; keep the change only if it consistently helps.
Repeat—small, controlled experiments beat guessing.