In this Section, I’ll illustrate how to identify and delete outliers using the boxplot.stats function in R. The following R code creates a new vector without outliers:
x_out_rm <- x[!x %in% boxplot.stats(x)$out] # Remove outliers |
Let’s check how many values we have removed:
length(x) – length(x_out_rm) # Count removed observations # 10 |
We have removed ten values from our data. Note that we have inserted only five outliers in the data creation process above. In other words: We deleted five values that are no real outliers (more about that below).
However, now we can draw another boxplot without outliers:
boxplot(x_out_rm) # Create boxplot without outliers |

The output of the previous R code is shown in Figure 2 – A boxplot that ignores outliers.
Important note: Outlier deletion is a very controversial topic in statistics theory. Any removal of outliers might delete valid values, which might lead to bias in the analysis of a data set.
Furthermore, I have shown you a very simple technique for the detection of outliers in R using the boxplot function (have a look at the documentation of boxplots.stats for more details). However, there exist much more advanced techniques such as machine learning based anomaly detection.