## 2008-11-26

### Trimmed standard deviation

R provides you with a regular mean, i.e., sum the values and divide them by the number of values, as well as with the trimmed version. In the later, we remove a proportion of the smallest and the largest values. This can be handy to evaluate the effect of outliers on the mean. Check this out:
`## Sample 100 values from standard normal distributionx <- rnorm(n=100)## Add an outlierx <- c(x, 100)## Calculate the meanmean(x)##  1.018909## Calculate the trimmed meanmean(x, trim=0.01)##  0.04981092 `
I will not talk here about the effect of outliers on the mean. Above example can be used for the exploration. I wanted to do the same trick with standard deviation, but found out that this is not possible out of the box in R. I coul be wrong-. Well, I modified the mean.default function to do that. It is not perfect, but serves my purpose. Here are the result for the above data:
sd.trim(x)
##  9.997196
sd.trim(x, trim=0.01)
##  0.9841417
And the code:
`sd.trim <- function(x, trim=0, na.rm=FALSE, ...){  if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) {    warning("argument is not numeric or logical: returning NA")    return(NA_real_)  }  if(na.rm) x <- x[!is.na(x)]  if(!is.numeric(trim) || length(trim) != 1)    stop("'trim' must be numeric of length one")  n <- length(x)  if(trim > 0 && n > 0) {     if(is.complex(x)) stop("trimmed sd are not defined for complex data")     if(trim >= 0.5) return(0)     lo <- floor(n * trim) + 1     hi <- n + 1 - lo     x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]  }  sd(x)}`