how to remove outliers in r

In this Section, I’ll illustrate how to identify and delete outliers using the boxplot.stats function in R. The following R code creates a new vector without outliers:

x_out_rm <- x[!x %in% boxplot.stats(x)$out] # Remove outliers

Let’s check how many values we have removed:

length(x) – length(x_out_rm) # Count removed observations # 10

We have removed ten values from our data. Note that we have inserted only five outliers in the data creation process above. In other words: We deleted five values that are no real outliers (more about that below).

However, now we can draw another boxplot without outliers:

boxplot(x_out_rm) # Create boxplot without outliers
r graph figure 2 remove outliers from data set

The output of the previous R code is shown in Figure 2 – A boxplot that ignores outliers.

Important note: Outlier deletion is a very controversial topic in statistics theory. Any removal of outliers might delete valid values, which might lead to bias in the analysis of a data set.

Furthermore, I have shown you a very simple technique for the detection of outliers in R using the boxplot function (have a look at the documentation of boxplots.stats for more details). However, there exist much more advanced techniques such as machine learning based anomaly detection.

Leave a Comment