Gregor Gorjanc (gg): Trimmed standard deviation

2008-11-26

Trimmed standard deviation

R provides you with a regular mean, i.e., sum the values and divide them by the number of values, as well as with the trimmed version. In the later, we remove a proportion of the smallest and the largest values. This can be handy to evaluate the effect of outliers on the mean. Check this out:

## Sample 100 values from standard normal distribution
x <- rnorm(n=100)
## Add an outlier
x <- c(x, 100)
## Calculate the mean
mean(x)
## [1] 1.018909
## Calculate the trimmed mean
mean(x, trim=0.01)
## [1] 0.04981092

I will not talk here about the effect of outliers on the mean. Above example can be used for the exploration. I wanted to do the same trick with standard deviation, but found out that this is not possible out of the box in R. I coul be wrong-. Well, I modified the mean.default function to do that. It is not perfect, but serves my purpose. Here are the result for the above data:

sd.trim(x)
## [1] 9.997196
sd.trim(x, trim=0.01)
## [1] 0.9841417

And the code:

sd.trim <- function(x, trim=0, na.rm=FALSE, ...)
{
  if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
    warning("argument is not numeric or logical: returning NA")
    return(NA_real_)
  }
  if(na.rm) x <- x[!is.na(x)]
  if(!is.numeric(trim) || length(trim) != 1)
    stop("'trim' must be numeric of length one")
  n <- length(x)
  if(trim > 0 && n > 0) {
     if(is.complex(x)) stop("trimmed sd are not defined for complex data")
     if(trim >= 0.5) return(0)
     lo <- floor(n * trim) + 1
     hi <- n + 1 - lo
     x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
  }
  sd(x)
}

2 comments:

Freddy Lopez said...: Thanks for your post. It is not difficult to program but I'm lazzy today... I will share and cite your link. Thanks you again.; 6 December 2010 at 15:37
Author said...: Standard Deviation

Standard Deviation Calculator; 12 October 2011 at 12:48