2008-11-26

Sweave.sh plays with cacheSweave

I have added support for caching to Sweave.sh script as implemented in cacheSweave R package written by Roger D. Peng. Now, one can set caching on for chunks that are time consuming (data import, some calculations, ...) and the Sweaving process will reuse the cached objects each time they are needed. Read the details about the cacheSweave package in the package vignette. Option --cache for Sweave.sh script should also be easy to understand. However, here is a minimalist example:
\documentclass{article}
\usepackage{Sweave}
\begin{document}

<<setup>>=
n <- 10
s <- 15
@

Let us first simulate \Sexpr{n} values from a normal distribution and add a \Sexpr{s} sec pause to show the effect of caching.

<<simulate, cache=true>>=
x <- rnorm(n)
Sys.sleep(s)
@

Now print the values:

<<print, results=verbatim>>=
print(x)
@

\end{document}
Now, one can run the following command:
Sweave.sh --cache test.Rnw
and the output on the command line is:
Run Sweave and postprocess with LaTeX directly from the command line
- cache mode via cacheSweave R package

R version 2.8.0 (2008-10-20)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(package='cacheSweave'); Sweave(file='test.Rnw', driver=cacheSweaveDriver);
Loading required package: filehash
filehash: Simple key-value database (2.0 2008-08-03)
Loading required package: stashR
A Set of Tools for Administering SHared Repositories (0.3-2 2008-04-30)
Writing to file test.tex
Processing code chunks ...
1 : echo term verbatim (label=setup)
2 : echo term verbatim (label=simulate)
3 : echo term verbatim (label=print)

You can now run LaTeX on 'test.tex'
When you repeat the Sweaving process, which you more or less always do, there is no need to wait for 15 second since cacheSweave package takes the x object from the cache! Excellent job Roger!

Trimmed standard deviation

R provides you with a regular mean, i.e., sum the values and divide them by the number of values, as well as with the trimmed version. In the later, we remove a proportion of the smallest and the largest values. This can be handy to evaluate the effect of outliers on the mean. Check this out:
## Sample 100 values from standard normal distribution
x <- rnorm(n=100)
## Add an outlier
x <- c(x, 100)
## Calculate the mean
mean(x)
## [1] 1.018909
## Calculate the trimmed mean
mean(x, trim=0.01)
## [1] 0.04981092
I will not talk here about the effect of outliers on the mean. Above example can be used for the exploration. I wanted to do the same trick with standard deviation, but found out that this is not possible out of the box in R. I coul be wrong-. Well, I modified the mean.default function to do that. It is not perfect, but serves my purpose. Here are the result for the above data:
sd.trim(x)
## [1] 9.997196
sd.trim(x, trim=0.01)
## [1] 0.9841417
And the code:
sd.trim <- function(x, trim=0, na.rm=FALSE, ...)
{
if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}
if(na.rm) x <- x[!is.na(x)]
if(!is.numeric(trim) || length(trim) != 1)
stop("'trim' must be numeric of length one")
n <- length(x)
if(trim > 0 && n > 0) {
if(is.complex(x)) stop("trimmed sd are not defined for complex data")
if(trim >= 0.5) return(0)
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
sd(x)
}

2008-11-19

Version control (CVS and SVN) cleints for MS Windows

Version control is a good thing. I have some experience with CVS and SVN. Installing the clients to "talk" with the server is very simple under Linux, i.e., something like:
apt-get install cvs svn
It is quite some time since I "discovered" the TortoiseSVN, which is a client for MS Windows. It is really nice, since it integrates smoothly with the Windows Explorer (the yellow folder icon). Today, I also "discovered" that there is also TortoiseCVS. I usually do not do any development under MS Windows, but it is very handy to be able to do a change or two and commit the changes from MS Windows.

In my opinion using the open source programs (R, FireFox, ThunderBird, ...) can improve the working experience with MS Windows a lot.

2008-11-17

PopG genetic simulation program

Teaching the concepts of population genetics can be tedious and boring if one does not have a feeling for the meaning of the "equations" and "numbers". Students at our department always have problems with these concepts. Therefore, I bundled some R scripts in the DemoPopQuanGen package, but students have a hard time installing the package and running the demos. I guess the learning curve is too steep for the majority of the students.

It seems that the PopG genetic simulation program can do more or less anything we need for the population genetics part. From now on I will use this nice program to show the concepts and I am sure students will like this more than R. Though, it is nice to be able to do some more fancy stuff in R, but one really needs at least some feeling about the programming.

2008-11-15

Mailing lists for quantitative and/or statistical genetics

It is always good to have someone to ask some questions or to disscus relevant issues. Mailing lists or forums are a very good place for this. Unfortunatelly, there is to my knowledge no place on the net, where one would find all people working in this field in "one place". The field of quantitative and/or statistical genetics is very scattered among various "disciplines" (animal breeding, plant breeding, human genetics, ...). I somewhat monitor the following places:
Does anyone know of any other list?

P.S. If you do not want to have an email jam in your inbox, I warmly recomend you to use something like Bloglines.com, which allows you to create an e-mail to subscribe to a particular list. Then you can read the e-mails as any other RSS news.

2008-11-08

Genetic diversity in Alpine sheep breeds - emphasis on Slovenian breeds

Dalvit et al. have published a paper "Genetic diversity in Alpine sheep breeds" in Small Ruminant Research. They attempted to study the genetic diversity --> similarity via microsatellite molecular markers in breeds of sheep kept in the region of Alps. Their study involved animals of the following breeds:
  • Italy
    • Bergamasca (TIHO, SRR paper)
    • Biellese (TIHO)
    • Schwarzbraunes Bergschaf = Black-brown mountain sheep (TIHO)
    • Tiroler Bergschaf = Tiroler mountain sheep (TIHO)
    • Schnalserschaf = ??? sheep (???)
  • Germany
  • Slovenia
My interest in this article is of course due to the Slovenian breeds of sheep: the Bovec sheep and the Jezersko-Solčava sheep breed. The main result regarding these two breeds are that Bovec, Jezersko-Solčava and Carinthian were in the same cluster according to the Reynolds' genetic distance (e.g. see this lecture). This is partly an expected result, since to my knowledge Jezersko-Solčava and Carinthian are phenotypically very similar and the origin and history of these two breeds are similar. Sometimes a name Seeländerschaf was also used for Brillenschaf, which basically means Jezersko (~ lake land) sheep. The following Fst and mean molecular coancestry (MC) values were obtained:
  • Jezersko-Solčava - Carinthian, Fst = 0.053, MC = 0.188
  • Jezersko-Solčava - Bovec, Fst = 0.056 , MC = 0.201
  • Bovec - Carinthian, Fst = 0.084, MC = 0.176
Surprisingly, the molecular coancestry showed that Jezersko-Solčava breed has a bit more similarities with Bovec than with Carinthian breed, though Fst was lower for Jezersko-Solčava - Carinthian pair. The difference for mean molecular coancestry is not large, but I would say that Jezersko-Solčava and Carinthian are phenotypically much more similar. Bovec sheep is phenotypically quite different to Jezersko-Solčava. There surely have been different ways of selection of Bovec and Jezersko-Solčava breed since the former is today used for milk production, while the later is not milked at all and used as a meat type sheep i.e. only for rearing lambs. In study of Dalvit et al. Carinthian animals were sampled from Germany. I wonder if the same results would be obtained with animals from Austria. Additionally, TIHO site "states" that there has been some introgression of White mountain (Weisses Bergschaf) breed into Carinthian breed in Germany. However, this does not mean that Carinthian breed in Germany today still has any of the "genes" from White mountain. For example, in Slovenia Romanov rams were used in some flocks of Jezersko-Solčava sheep, but those animals were never treated as Jezersko-Solčava, but as a separate breed called improved Jezersko-Solčava or JSR in short. JSR breed is today a breed with the biggest population among sheep in Slovenia. Jezersko-Solčava breeds is maintained in its "original" environment, but also in some other parts of the Slovenia.

It was also interesting that Jezersko-Solčava and Carinthian were not clustered with Bergamasca. Historical records for both breeds say that at some time in the past breeders used Bergamasca and Paduaner rams. It seems that only some introgression of Bergamasca was done. The following Fst and molecular coancestries (MC) were obtained:
  • Jezersko-Solčava - Bergamasca, Fst = 0.050, MC = 0.185
  • Carinthian - Bergamasca, Fst = 0.077, MC = 0.160
  • Jezersko-Solčava - Carinthian, Fst = 0.053, MC = 0.188
The mean molecular coancestry for the Jezersko-Solčava - Bergamasca pair was practically of the same value as for the Jezersko-Solčava - Carinthian pair, while Carinthian - Bergamasca pair had a bit lower value. I again think this is not consistent with the history of Jezersko-Solčava and Carinthian breed. Bergamasca breed was involved in both populations, but that was some time ago. I would expect more similarities between animals of Jezersko-Solčava and Carinthian breed. These peculiarities could be due to the Germans vs Austrian Carinthian population, chosen set of microsatellites, substructures etc.

P.S. Slightly different results were obtained in "another version" as published in "Best Practices Manual for sheep and goat breeding". In that version, Jezersko-Solcava and Carinthian breed has greater similarity than Jezersko-Solcava (or Carinthian) and Bovec breed.

P.S. See also the "Atlas of Alpine sheep breeds" by Feldmann et al.

2008-11-06