Time-series Forecast Model

Model Scenario

Timeseries Forecast Model is a machine learning model which predicts future values based on previously observed time-related values. The model can only be run on a dataset that contains time-related and numeric columns. We will go through the basics of the Model Scenario to get the best possible results. In Model Scenario, you select parameters related to the dataset and model.

For the Target Column, a numeric value that you want to predict. It is essential to have values ​​by day, week, or year; if some dates are repeated, we can take their sum, average, etc.

Then, you can choose a Sequence Identifier Field that can group some fields to generate an independent forecast for each time series. It can't be unique values, it has to be a series of the same ones.

After that, you must select the Time/Date Column and the column name containing time-related values. The Time Interval represents the frequency of the data. For example, if you have daily data, select daily, or if you have annual data, select yearly. With Forecast Horizon, you choose how many days, weeks, or years you want to predict (from the last date in the dataset).

The model gives excellent results for data with seasonal patterns. If your data shows a linear growth trend, select additive for Seasonality Mode; if data shows an exponential growth trend, select multiplicative. For example, if you notice the same behavior on an annual basis, you can set the Yearly Seasonality as True (TIP: it is useful to plot the data before modeling to better understanding). If you're not sure, don't worry, the model will try to automatically detect seasonality if there is any.

If you choose daily or hourly intervals, you can have some Advanced Parameters to add special dates, weekends, holidays, or limit the target value.

make accurate

Model Results

After you have run your model, let’s see your results. With Model Results, we summarize all relevant results through visualizations and numerical values. First, you can see the values of four evaluation metrics; evaluating a machine learning algorithm is an important part of any project.

The results consist of 5 tabs: Model Fit, Trend, Seasonality, Special Dates, and Details Tabs.

R-squared determines the proportion of variance in the dependent variable that can be explained by the independent variable. MAPE (Mean absolute percentage error), MAE (Mean absolute error) and RMSE (Root mean squared error) are measures that describe the average difference between the actual and predicted value.

Model Fit

The Model Fit Tab contains a graph with actual and predicted values. Besides the main prediction of the target value, the model predicts the range of values for every day in the future, the range between lower and upper values, also known as uncertainty interval. With visualization, you can see how well or poorly your model is performing.

If you used the Sequence Identifier Field you can choose the value that you want to analyze (you can do the same for each Model Result).

Trend

Trend and seasonality are characteristic of time-series data that can be visually identified in time-series plots, so it's important to analyze them too. In the Trend Tab, the graph shows the global trend Graphite detected from the historical data.

Seasonality

Seasonality represents the repeating patterns or cycles of behavior over time. Depending on your Time Interval, you can find one/two graphs in the Seasonality Tab. In case you have daily data, the first one shows a detected pattern in historical data that repeats every week, while the second refers to a year. For example, in the picture above, you can see that the biggest positive impact happens on Mondays. For weekly and monthly data, the graph shows detected patterns in data that repeat every week/month through the year.

Details

In the end, a table with all the values ​​related to the Model Fit Tab, with much more, can be found on the Details Tab.

Later, we will talk about the Special Dates Tab.

By running this model, you get fast and quality insights into your business - just one new piece of information obtained from the data can considerably help in further business. Now it's your turn to do some modeling and explore your results.

Limit Target Prediction

Daily, we develop new things but also upgrade existing models. For your daily data, we added some new features - with a little effort, you can improve your forecast accuracy considerably. Now you can limit your target prediction, remove outliers, and add country holidays and special events.

Firstly, let's go through prediction limitations. Once you have selected the main parameters of the model, you can limit the target variable. If you know that the variable reaches a certain minimum or maximum, you enter it in the appropriate field. For example, your want to predict daily temperature. If you know the maximum temperature is 40 °C, you will pass that value to the model so that it would not predict higher values. You can limit both the target minimum and maximum.

If there are large oscillations among the data, it is difficult for the model to recognize what is the minimum and maximum value of the Target Column - that's why he needs your help. Through the next few post, we will go through the remaining features, which have a greater impact on modeling than the target limitation, so stay tuned!

Country holidays and special dates

Besides the target prediction limitation, we added two new parameters that are related to country holidays and special dates. Make yourself at home, because we will go through these parameters that can significantly improve model accuracy.

There are cases where you can notice some large deviations for certain days in data or in the results of the model. For example, for days around holidays, stores record more customers than during the year, but the model gives too much importance to those days so the predictive values ​​are much higher than expected. But if the model was "informed" about these holidays, we would get much better results - a balance emerges between the data. In Graphite, we added a new parameter Country Holidays: all you have to do is go to the advanced part inside the Model Scenario and select a country or countries for which you want to add holidays.

Adding holidays can improve our evaluation metrics (MAPE, MAE, RMSE, and R-squared can be worst for some and better for others).

On the other hand, if you have various promotions or events during the year that affect your data, you can add them to the model. But it is important to know when these promotions or events will occur in the future. To do that in Graphite, you have to enter the name of the promotion, the start date of the promotion, how many days it lasted/will last in the future, and all its future dates.

By combining these two parameters, you can get much better results. The more information the model receives, the more accurate the prediction will be.

Remove data points

Last but not least, besides adding country holidays and special events to your model, you can also delete some data points from your dataset.

In Graphite, all you have to do is enter the start and end times of the period you want to delete. If your period lasts just one day, the start and end dates should be the same. Also, you can remove multiple data periods. Giving the model as much information as possible is important to get the most accurate prediction. To do that, you have to know your data, what type of data you are managing, on which date you need to pay attention, which days are outliers, etc.

It can be useful when you have a special event but you don't know anything about that period, which creates a problem where the model carries the influence of that period to all the same dates in the past and the future. It can give you a more accurate prediction if you remove these dates.

Last updated