Introduction
In predictive modeling, key drivers (or influencers) are pivotal in discerning which features within a dataset most significantly impact the target variable. These influencers provide insights into the relative importance of each variable, enabling data scientists and analysts to understand and predict outcomes more accurately.
By highlighting the strongest predictors, key influencers inform the prioritization of features for model optimization, ensuring that models are precise and interpretable in real-world scenarios. This foundational understanding is crucial for refining models and aligning them closely with the underlying patterns and trends present in the data.
Reading Key Drivers
When examining the visualization of key influencers in Graphite Note Models, you'll find features arrayed according to their influence on the target variable, organized from most to least important on the left.
This ranking allows for a quick assessment of which factors are pivotal in the model's predictions.
By observing the length and direction of the bars associated with each feature, one can gauge the strength of influence they have on the target outcome.
The image shows a data visualization explaining how different amounts of interaction with website pages (measured in page visits) influence whether someone will take a specific action, labeled "Applied," with "YES" being the action taken.
For a high number of page visits, between 29.33 and 35, the likelihood of taking the action increases significantly—by more than double (2.26 times more likely).
For a moderate number of page visits, between 12.33 and 18, the action is still more likely but less so than the higher range—1.65 times more likely.
At a lower number of page visits, between 6.67 and 12.33, the action becomes less likely than the baseline by a factor of 1.37.
For very few page visits, less than 6.67, the likelihood of action drops drastically to less than half (2.36 times less likely).
The percentages and observations indicate how many cases fall within each range and how many of those cases resulted in the action "Applied" being taken. The visualization communicates that more engagement with the website (as measured by page visits) generally increases the likelihood of the desired action occurring.
Statistical Methodology Used
Graphite Note uses advanced statistical functions designed to calculate the influence of features on a target variable.
It employs a method of grouping the data by the feature and target columns and then counting occurrences. The calculations performed within this function aim to determine the proportion of each feature's categories contributing to a specific target value. The influence is quantified by comparing the observed proportion of the target value within each feature category against a weighted average, yielding an 'index value' that indicates the relative influence of each category on the target outcome. The function is robust, allowing for different data types in the target column, and ensures that only relevant categories with sufficient data are included in the final analysis.
Graphite Note here a quantitative analysis where numeric features (like 'Website Pages') are divided into bins or ranges.
The function then calculates the change in the likelihood of the target outcome (e.g., 'Applied' being 'YES') when the feature values fall within those bins. This calculation is done by comparing the base likelihood of the target outcome with the likelihood when the feature is within a specific bin, hence the multipliers like "increases by 2.26x" for certain ranges.
The analysis would remove any non-relevant categories (based on minimum percentage and row thresholds) and sort the results to clearly show which ranges of the feature increase or decrease the likelihood of the target outcome.