# How to detect outliers for improved demand forecasting

### Winsorization

This first idea is a rather simplistic approach of defining a certain minimum and maximum range in which the data will simply be disregarded. Statistically, this margin is defined as the percentile, which means the value below which x% of the observations in a group will fall. For example, 99% of the demand observations for a product will be lower than its 99th percentile.

This approach may be efficient, but it can also result in some problems, such as detecting fake outliers in a dataset without outliers (see figure 1 below); or in the case of real outliers, it doesn’t sufficiently reduce the value keeping it at a level way above our expectation. It is an  approach that ends up requiring a very accurate critical analysis by the planner, so it is not as efficient and reliable as it should be, as it would take out the high and low values of a dataset, even if they aren’t exceptional per se.

Figure 1: Winsorization of a simple dataset without outlier

### Standard Deviation

Another approach would be to use the demand variation around the historical average and exclude the values that are exceptionally far from this average, according to a certain range between two thresholds centered on the demand.

Figure 2: Distance to the mean

With this method, in a situation without outliers, we do not change any demand observation (we keep all values in this case), and in another situation with an outlier (see Y2 below), we do not change the low or high demand points but only the actual outlier (from 100 to 49 in the figure below).

Figure 3: Outlier detection based on standard deviation (Normalization)

Although it may seem to solve all the issues arising from the Winsorization approach, the actual limitation will arise when you have a product with a trend or seasonality. In this case, its historical average will not accurately represent the actual average for that given period, and consequently the thresholds may not correctly limit a possible outlier.

### Error standard deviation

In order to solve the drawbacks from the Standard Deviation and Winsorization let’s go back to the definition of an outlier. An outlier is a value that you didn’t expect. In other words, it is a value far away from your prediction (i.e. your forecast). To spot outliers, we must therefore analyze the forecast error and see which periods are exceptionally wrong. If we compute the error for our forecast we would obtain a mean error and a standard deviation. You can refer to the book by Nicolas Vandeput to see real-life examples and more details. The figure below illustrates our results obtained from our forecast.

Figure 4: Outlier detection via forecast error