## 2008-11-26

### Trimmed standard deviation

R provides you with a regular mean, i.e., sum the values and divide them by the number of values, as well as with the trimmed version. In the later, we remove a proportion of the smallest and the largest values. This can be handy to evaluate the effect of outliers on the mean. Check this out:
## Sample 100 values from standard normal distribution
x <- rnorm(n=100)
x <- c(x, 100)
## Calculate the mean
mean(x)
## [1] 1.018909
## Calculate the trimmed mean
mean(x, trim=0.01)
## [1] 0.04981092
I will not talk here about the effect of outliers on the mean. Above example can be used for the exploration. I wanted to do the same trick with standard deviation, but found out that this is not possible out of the box in R. I coul be wrong-. Well, I modified the mean.default function to do that. It is not perfect, but serves my purpose. Here are the result for the above data:
sd.trim(x)
## [1] 9.997196
sd.trim(x, trim=0.01)
## [1] 0.9841417
And the code:
sd.trim <- function(x, trim=0, na.rm=FALSE, ...)
{
if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}
if(na.rm) x <- x[!is.na(x)]
if(!is.numeric(trim) || length(trim) != 1)
stop("'trim' must be numeric of length one")
n <- length(x)
if(trim > 0 && n > 0) {
if(is.complex(x)) stop("trimmed sd are not defined for complex data")
if(trim >= 0.5) return(0)
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
sd(x)
}