Gregor Gorjanc (gg): 01/01/2009

2009-01-29

More software for statistical/quantitative genetics

Today, there was a message on ACTEON list about the TM site that hosts two programs: TM (threshold and censored models) & GS3 (genomic selection) - see here.

MCMCglmm package for R

Jarrod Hadfield published MCMCglmm package on CRAN. The package can fit generalised linear mixed models via MCMC methods. Bellow is the abstract from the vignette. The list of supported models is quite impressive. Nice job Jarrod! This is not the first package by Jarrod - there is also interesting (at least to me) package MasterBayes.

MCMCglmm is a package for fitting Generalised Linear Mixed Models using Markov chain Monte Carlo techniques. Most commonly used distributions like the normal and the Poisson are supported together with some useful but less popular ones like the zero-inflated Poisson and the multinomial. Missing values and left, right and interval censoring are accommodated for all traits. The package also supports multi-trait models where the multiple responses can follow different types of distribution. The package allows various residual and random effect variance structures to be specified including heterogeneous variances, unstructured covariance matrices and random regression (e.g. random slope models). Three special types of variance structure that can be specified are those associated with pedigrees (animal models), phylogenies (the comparative method) and measurement error (meta-analysis). The package makes heavy use of results in Sorensen and Gianola [2002] and Davis [2006] which taken together result in what is hopefully a fast and effcient routine. Most small to medium sized problems should take seconds to a few minutes, but large problems (> 20,000 records) are possible. My interest is in evolutionary biology so there are also several functions for applying tensor analysis [Rice, 2004] to real data and functions for visualising and comparing matrices.

2009-01-21

Sweave.sh plays with weaver

After adding support for the cacheSweave package (see here) I have also added support for the weaver package to my Sweave.sh script. The experimental version is available here. After testing and feedback I will upload it to CRAN. One can use this new feature with the command (see previous post for the Sweave file):

Sweave.sh --weaver test.Rnw

This will turn caching on. If one has Sweave file with code chunks specifying chunk=true, but would like to recompute all information that is cached, the following command can be used:

Sweave.sh --skipweaver test.Rnw

I noticed that weaver does not play nice with example from my previous posts, i.e., it still waits for 15 seconds. My guess is that cacheSweave is a more complete implementation, but I might be wrong!

2009-01-17

Versions of Sweave.sh

There are now many places where one can find "my" Sweave.sh shell script for running Sweave and post processing with LaTeX directly from the command line. I published first version of Sweave.sh here (web page of the department). Later I moved my pages to Google Pages and uploaded the script to CRAN. There has been at least one update (adding the support for cacheSweave) since first version and the version at CRAN is the most recent one and can be treated as "the offical and stable" version. New versions will be uploaded directly to CRAN, while the old site will be switched off in the near future.

Use of include and input in Sweave documents

When you write a large structured document using LaTeX, it is wise to use \input{} and/or \include{} commands (see here for a nice description). However, you can not "fully" use these two commands with Sweave documents. You can use them, but when you weave the master file, the \input{} and \include{} files are not weaved! The author of Sweave instead implemented the \SweaveInput{} command. If you still want to use \input{} and/or \include{}, then you can take the following approach.

Say we have a master file file0.Rnw, which includes file1.Rnw and file2.Rnw. Then you need first to run Sweave on file1.Rnw and file2.Rnw and at the end on file0.Rnw. Warning! - there can be no interaction between files, which basically means that you can use this approach only if each file represents a complete analysis. This might be a suitable Makefile in such cases:

all: sweave # Compile the whole document
 Sweave.sh --latex2pdf file0.Rnw

sweave: # Sweave individual files
 Sweave.sh file1.Rnw file2.Rnw

If master file is a pure LaTeX file, you can change the all target to:

all: sweave # Compile the whole document
 Sweave.sh --noweave --latex2pdf file0.tex

Creating R Packages: A Tutorial

See here.

2009-01-16

Življenjska prireja ovc bovške in oplemenjene bovške pasme

Pri diplomski nalogi Kendi Perčič (PDF 575 kB) smo analizirali življenjsko prirejo ovc bovške in oplemenjene bovške pasme. Sedaj smo en del tega dela pripravili za objavo v reviji - tokrat hrvaški. Prispevek si lahko ogledate tukaj.

2009-01-15

Sending human genomes via email

Looks like this is now possible. See paper by Christley et al. 2009

Christley et al. 2009. Human genomes as email attachments. Bioinformatics 2009 25(2):274-275. doi:10.1093/bioinformatics/btn582

2009-01-12

Komentar glede "popravljanja" zarodkov

Glej tukaj.

2009-01-10

Sketch of browser "wars"

Source

2009-01-09

EU petition: STOP Long Animal Transports

If you live in EU, please got to http://www.8hours.eu and add Your vote against long animal transports. Thank you!

Using Makefile to ease the repeated compilaton of LaTeX source

I like LaTeX, but it can be tedious if you want "instant" check of the produced output. Usually, there is no need to check the output very often - basically you can write the whole article/report, ... and then compile the LaTeX source. However, when I create a presentation, I often check what does it look like. LyX can also be very handy with "instant" checking! Two days ago I gave a LaTeX Beamer package a try and I use the following Makefile for the process of several compilations of the Sweave source file (fusing LaTeX for the creation of the presentation and R for doing the computations and plotting) - here are the PDF and the source file.

You need to be carefull with formatting of the Makefile, i.e., the lines bellow the target (say line two) needs to start with a TAB and not with spaces!

all:
      Sweave.sh --latex2pdf talk.Rnw; make rm

tex: # Sweave --> LaTeX
      Sweave.sh talk.Rnw; make rm

rm: # Remove some other files
      rm -f .pdf Rplots.pdf *.out *.nav *.snm *.log *.tex *.aux *.toc

Using this Makefile I only needed to type make or better pressing the "up" key to repeat the previous (make) command in the terminal to compile the Sweave source file.

Unix tools in MS Windows command terminal (prompt)

Previously, I wrote about using Rtools when one already has Cygwin installed on a MS Windows machine. The "solution" was to avoid putting the Cygwin into the PATH variable and to create a new script which adds Rtools to the PATH variable on the fly. This (the first thing) eventually means that I was not able to use Unix tools (that are installed with Cygwin) in the MS Windows command terminal (prompt). For example, I was prepairing the presentation using the LaTeX Beamer package and I usually use a Makefile to ease the repeated compilation. In order to be able to use make and other unix shell tools, I created another startup script (CMD_Cygwin.bat) with the content as shown bellow.

rem --- Add Cygwin and current folder to the PATH ---
set PATH=.;%PATH%;c:\cygwin\bin;c:\cygwin\usr\bin;c:\cygwin\sbin;c:\cygwin\bin;c:\cygwin\usr\local\bin

rem --- Start the Command Prompt ---
cmd

2009-01-07

New York Times on R

This article by New York Times surely shows that R is one of mainstream statistical packages.

Update: See also a followup article.

2009-01-06

Backup (export) blogger blog

Open http://draft.blogger.com
Go to settings
Choose export
Save XML file

2009-01-05

Drawing pedigree examples using Graphviz

Graphviz is a nice choice for drawing pedigrees. See my recent post about drawing pedigree. There, I wrote that I will probably use Pedigraph for large and complex pedigrees and Graphviz directly for small ones. I needed to draw two simple pedigrees today and I decided to bite the bullet and to try the Graphviz approach - in the past I used to draw pedigrees using MS PowerPoint, which is a very good piece of software, but the process of drawing was tedious. I needed two simple pedigrees for demonstrating the calculation of coefficient of inbreeding. I used the same approach as shown by David Duffy (example, his homepage, more info). The examples are:

half-brother and half-sister mating (Graphviz file)

08 Rodovnik A

Publish at Scribd or explore others: Biology Science Graphics Genetics

grandfather and granddaughter mating (Graphviz file)

08 Rodovnik B

Publish at Scribd or explore others: Biology Science Graphics Genetics

2009-01-02

A quantitative measure of Hardy-Weinberg equilibrium

I came across a quantitative measure of Hardy-Weinberg equilibrium (HWE). It is due to Olson and Foley and it is very simple:

theta = H^2 / (4PQ) = (2pq)^2 / (4p^2q^2),

where H is the frequency (probability) of heterozygotes (say A/B), P is the frequency of "first" homozygotes (say A/A), Q likewise for other homozygotes (say B/B), and p and q are the allele frequencies for alleles A and B, respectively. Under HWE theta equals 1, since P = p^2, H = 2pq, and Q = q^2. Too many heterzygotes will rise the theta above 1, while theta bellow one would correspond to too few heterzygotes. I like this measure. Up to now I have done quite some tests of HWE using the Chi-square test or the MCMC method (see above link for the details), but I never really checked what is the reason for deviation from HWE. A measure by Olson and Foley can shed some more light.

Olson JM, Foley M (1996) Testing for homogeneity of Hardy-Weinberg disequilibrium using data sampled from several populations. Biometrics 52: 971–979.

Accessing files (read and write) with dual boot (Windows & Linux)

If you have a computer with a dual boot system (you have installed Windows and Linux on the same computer and you decide which one to use at computer startup) you can easily access files in read and write mode using EXT2 IFS on the Windows side and Linux-NTFS on the Linux side. I have used both of them for quite some time without any problems.

Update: I experienced a problem with EXT2 IFS with accessing the disk partitions on a freshly installed Ubuntu 8.10. Using the mountidag tool (as recommended here) I get this message:

C:\ggorjan\Desktop>mountdiag d:
The volume has an Ext2/Ext3 file system, but the Ext2 IFS 1.11 software did not
mount it because the file system has an inode size unequal to 128 bytes (inode
size: 256 bytes).
The only way to solve it is to back up the volume's files and format the file
system: give the mkfs.ext3 utility the -I 128 switch. Finally, restore all
backed-up files.
After that, the Ext2 IFS software should be able to access the volume.

I did not see this error in the past. I did some googling and found out that new linuxes use larger inode size (whatever this means). I could reformat the linux partitions, but I found out that there are also other solutions beside EXT2 IFS - see this article for more info. Since I want the write support it comes down to use either EXT2 IFS or Ext2fsd. I was a bit afraid to use Ext2fsd, because I do not know it, but I also did no know EXT2 IFS in the past, so I guess that posts at Ubuntu forums gave me confidence. What I did:

installed Ext2fsd
assigned the drive letters
reboot
after that I was not able to create a new file on a Linux (Ext3) partition (using the MS Windows OS) --> went to Ext2fsd folder and started the Ext2 Volume Manager --> right click on a partition --> Ext2 Management --> Check out the readonly option
reboot
voila!