Machine Learning Models
When you first create your model you have to choose between many models.
Data preprocessing
Before running your scenario of your model, you can understand how the model is processed. First, it has to train, meaning we take 80% of the dataset to learn about it. Then, the remaining 20% are going to test it and calculate the model score. If the model score is high, the model trained is accurate and close to the test.
Data preprocessing is a crucial step in machine learning, enhancing model accuracy and performance by transforming and cleaning the raw data to remove inconsistencies, handle missing values, and scale features, and ensure compatibility with the chosen algorithm.
During preprocessing we can deal with
null values: if the column is 50% null or more, the column will not be included in model training
missing values: for a numerical column it will change it by the average, and for a categorical feature it will become "not_available"
One Hot Encoding: categorical data is transformed into numeric values before training a model, to be suitable for machine learning algorithms
fit imbalance: fixing the inequal distibution of target class which are not ideal for training
normalization: rescaling the values of numerical columns to have a better training result
constants: if the column has one unique value (a constant), the column will not be included in the model training
cardinality: if the column has high number of unique values, the column will not be included in the model training.
Last updated