The problem with forecasting? You are always wrong!

Crystal Ball - Forecasting

You are asked to forecast the weather. So what do you do? You simply take the best forecast method available and come up with high-precision forecasts! I wish, it would be this easy …

As you expected, forecasting is not that simple. First, what is the best forecast method? Deep Neural Networks, Double Exponential Smoothing, ARIMA, Cubic Spline? The list goes on forever. Even if you have decided for a method, you still have to tune it. Most methods have few to many parameters which have to be set correctly to get consistently good forecasts.

In my years as scientific researcher in the field of time series forecasting and machine learning, I had to struggle with these hurdles as well. This article discusses important aspects of the forecasting process and presents my approach for time series forecasting. At the end, you have a structured way on how to approach this overall complex problem.


So what are questions you have to ask yourself and the tasks you probably have to execute to come up with accurate forecasts? Here is a list of steps I can think of:

  • Preprocess the data
    • Remove outliers
    • Add missing data
    • Remove invalid data
    • Smooth the data
    • Normalise it to a certain range
  • Determine characteristics of your time series
  • Choose your model depending on the characteristics
  • Tune the parameters of the chosen model
  • Define the performance measures to evaluate the accuracy of the model
  • Execute the model (train + run)
  • Evaluate the outcome (statistical analysis, plotting, etc.)


Before you can start, lets first think about what data you have readily available and what inputs might be valuable for your target. To forecast the weather, you might consider satellite photos from different sources, different weather models, and other input. On the other hand, using more inputs (called features) does not necessarily improve anything. You might only end up complicating the underlying problem and making it harder for your learner to create its model.

Data is usually noisy and has faulty or missing data points. Thereby, your first step is to clean the data. You can simply remove invalid data points and outliers or you can try to fill these points with reasonable values (such as the average of nearby data points).

Many machine learning methods perform better when the data is standard normally distributed with zero mean and unit variance following a Gaussian distribution.

If you use Artifical Neural Networks (ANN), you usually have to normalise the input values to a range of [0;1] or [-1;1], depending on the activation function you are using. For Support Vector Machines, which in their basic form do two-class classification, expect the labels to be -1 or 1. Otherwise, features which have drastically higher values might dominate and the learner might not be to learn from the other features.

Determine time series characteristics

Time series usually have one or more of the following characteristics:

  • stationary behaviour  (same average over a certain time span)
  • trend (de- or increasing)
  • seasonal (repeating patterns)
  • irregular parts (chaotic behaviour)

A trend is a long-term increase or decrease of the level of a time series. Seasonality is defined by seasonal factors with a fixed period of time, e.g. the yearly recorded temperature. A time series has a cyclic component if it exhibits fluctuations with no fixed period. A time series is stationary when the random mechanism producing it does not change over time.

You can find out, if a time series exhibits one or more of these behaviours by plotting it and looking at its plot. Although, there are some statistical approaches calculating the presence or absence of these and other time series characteristics.

Choose your model

Time series forecasting methods can be categorised into qualitative and quantitative approaches, parametric and non-parametric regression, machine learning algorithms, and ensemble forecasting techniques.

You can use your findings of the previous step to make a more sophisticated selection of forecast candidates. For example, if your time series has seasonal patterns, you might want to consider methods which are especially designed to cover those aspects (e.g. SARIMA or Seasonal Exponential Smoothing).

Usually, I do not select a single but 3-5 different methods with different strenghts and weaknesses. I execute all methods on the same data and combine their forecasts into a final comprehensive forecast

The selection of a model is not only depending on the performance aspect. Sometimes, you want to look inside your method to see why it does what it does. This could be to figure out where you have to tweak your model to improve its accuracy or you just want to understand the underlying model leading to the forecasts. Many models are difficult to visualise or to understand by humans. Neural Networks with their huge number of neurons may have good performance but you possibly will never fully understand why it works. On the other hand, Learning Classifier Systems are designed in a way that their rule base is human-readable.

Another question is the forecast horizon you are targeting for. Do you want to create short-term forecast, lets say from 5 minutes up to 2 hours, or are you forecasting the weather for the next few days or weeks (long-term forecast).

Parameter tuning

Most machine learning algorithms have parameters which have to be more or less carefully selected according to the current problem. You cannot test every possible value as the combinatoric complexity is too big. Also, not necessarily all parameters might have the same influence. So figure out where the important screws are and try some values in a wider range and get some feeling for them. Then you can close in on a certain range and try some more settings before you decide on values. There is no perfect setting!

Performance measures

Without evaluation and error measures, we cannot determine which parameter setting or which method is better for the given scenario. Again, a variety of metrics from different categories are available, all with their pros and cons. I would advise you to not only consider a single metric but a combination of a few to cover different aspects. Commonly used measure are Mean Average Error (MAE), Root Mean Squared Error (RMSE), and Mean Average Percentage Error (MAPE).  Besides those, I like to use Theils U-statistic, a relative accuracy measure.


A forecast method is considered accurate if the deviations between its forecasts and the actual values are low. Measures based on percentage errors, such as MAPE, have the advantage of being scale independent,
and can be used to compare forecast methods across different data sets. RMSE and MAE are representatives of scale-dependent measures. They can’t be used when comparing across different data sets with different scales.

As many forecast methods have a random initialisation phase, you should average the evaluation results across several runs. For example, ANNs depend on the random weights set initially to the neurons. Cross-validation is a common procedure, where you split the data set into k parts. In each of the k runs, you use k-1 parts for training and the k-th part for testing.


This articles laid out a systematic approach to time series forecasting. Although, we just briefly touched the surface on most steps. Feel free to start from here and dig deeper where you feel your forecasts can benefit the most.

Remember, forecasts are usually wrong but our goal is to be as accurate as possible! Until then, it is the constant improval we are striving for.