Gregor Gorjanc (gg): 2010

2010-12-31

Simple reparameterization to improve convergence in linear mixed models

Quite some time ago I had an opportunity to visit Luis Alberto Garcia-Cortes at INIA (Madrid, Spain), where I was introduced to Bayesian statistics and McMC methods. We were also doing some research in that area. We started to write a short note, which has been rejected for publication due to lack of "breadth". We failed to find time to work on this topic further on. Therefore, I published the note in our local journal. The note can be found here or embeded bellow.

Simple reparameterization to improve convergence in linear mixed models

2010-12-18

ASReml-R: Storing A inverse as a sparse matrix

I was testing ASReml-R program (an R package that links propriety ASReml binaries that can be used only with valid licence) this week and had to do some manipulations with the numerator relationship matrix (A). ASReml-R provides a function (asreml.Ainverse) that can create inverse of A directly from the pedigree as this inverse is needed in pedigree based mixed model. Bulding inverse of A directly from a pedigree is a well known result dating back to Henderson in 1970s or so. The funny thing is that it is cheaper to setup inverse of A directly than to setup up first A and then to invert it. In addition, inverse of A is very spare so it is easy/cheap to store it. Documentation for asreml.Ainverse has a tiny example of usage. Since the result of this function is a list with several elements (data.frame with "triplets" for non-zero elements of inverse of A, inbreeding coefficients, ...) example also shows how to create a matrix object in R as shown bellow:

library(package="asreml")
## Create test pedigree
ped <- data.frame( me=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                  dad=c(0, 0, 0, 1, 1, 2, 4, 5, 7,  9),
                  mum=c(0, 0, 0, 1, 1, 2, 6, 6, 8,  9))
## Create inverse of A in triplet form
tmp <- asreml.Ainverse(pedigree=ped)$ginv
## Create a "proper" matrix
AInv <- asreml.sparse2mat(x=tmp)
## Print AInv
AInv

So the inverse of A would be:

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]      [,9]     [,10]
[1,]     5    0    0   -2   -2    0  0.0  0.0  0.000000  0.000000
[2,]     0    3    0    0    0   -2  0.0  0.0  0.000000  0.000000
[3,]     0    0    1    0    0    0  0.0  0.0  0.000000  0.000000
[4,]    -2    0    0    3    0    1 -2.0  0.0  0.000000  0.000000
[5,]    -2    0    0    0    3    1  0.0 -2.0  0.000000  0.000000
[6,]     0   -2    0    1    1    4 -2.0 -2.0  0.000000  0.000000
[7,]     0    0    0   -2    0   -2  4.5  0.5 -1.000000  0.000000
[8,]     0    0    0    0   -2   -2  0.5  4.5 -1.000000  0.000000
[9,]     0    0    0    0    0    0 -1.0 -1.0  4.909091 -2.909091
[10,]    0    0    0    0    0    0  0.0  0.0 -2.909091  2.909091

However, this is problematic as it creates a dense matrix - zero values are also stored (you can see them). If we would have 1,000 individuals, such a matrix would consume 7.6 Mb of RAM (= (((1000 * (1000 + 1)) / 2) * 16) / 2^20). This is not a lot, but with 10,000 individuals we would already need 763 Mb of RAM, which can create some bottlenecks. A solution is to create a sparse matrix using the Matrix R package. Luckily we have all the ingredients prepared by asreml.Ainverse function - the triplets. However, the essential R code is a bit daunting and I had to test several options before I figured it out - code from my previous post helped;)

## Load package
library(package="Matrix")
## Number of pedigree members
nI <- nrow(ped)
## Store inverse of A in sparse form
AInv2 <- as(new("dsTMatrix",
                 Dim=c(nI, nI),
                 uplo="L",
                 i=(as.integer(tmp$Row) - 1L),
                 j=(as.integer(tmp$Column) - 1L),
                 x=tmp$Ainverse),
             "dsCMatrix")
## Add row and column names - optional
dimnames(AInv2) <- list(attr(x=tmp, which="rowNames"),
                        attr(x=tmp, which="rowNames"))
## Print AInv
AInv2

And the inverse of A is now:

10 x 10 sparse Matrix of class "dsCMatrix"
[[ suppressing 10 column names ‘1’, ‘2’, ‘3’ ... ]]
                                       
1   5  . . -2 -2  .  .    .    .         .  
2   .  3 .  .  . -2  .    .    .         .  
3   .  . 1  .  .  .  .    .    .         .  
4  -2  . .  3  .  1 -2.0  .    .         .  
5  -2  . .  .  3  1  .   -2.0  .         .  
6   . -2 .  1  1  4 -2.0 -2.0  .         .  
7   .  . . -2  . -2  4.5  0.5 -1.000000  .  
8   .  . .  . -2 -2  0.5  4.5 -1.000000  .  
9   .  . .  .  .  . -1.0 -1.0  4.909091 -2.909091
10  .  . .  .  .  .  .    .   -2.909091  2.909091

you can clearly see the structure and it soon becomes obvious why such a storage is more efficient.

If we want to go back from matrix to triplet form (this might be useful if we want to create a matrix for programs as ASReml) we can use the following code:

## Convert back to triplet form - first the matrix
tmp2 <- as(AInv2, "dsTMatrix")
## ...                          - now to data.frame
tmp3 <- data.frame(Row=tmp2@i + 1, Column=tmp2@j + 1, Ainverse=tmp2@x)
## Sort
tmp3 <- tmp3[order(tmp3$Row, tmp3$Column), ]
## Test that we get the same stuff
any((tmp[, 3] - tmp3[, 3]) > 0)
## ASReml-R specificities
attr(x=tmp3, which="rowNames") <- rownames(tmp)
attr(x=tmp3, which="geneticGroups") <- c(0, 0)

2010-10-28

Nice slides on Bayesian and frequentist view of doing stats

Here - by George Casella.

2010-10-12

Recent talks

I have been traveling a lot in the past few months. I am settling down for a while and have some time to publish talks I had. I have to acknowledge the support of my department for the travels as well as Ingelin Steinsland from NTNU for invitation. Now the talks:

Graphical Model Representation of Pedigree Based Mixed Model, 32nd ITI, Cavtat, Croatia; see paper

Introduction to Animal Breeding with Examples of (Non-)Gaussian Traits, INLA group meeting, NTNU, Department of Mathematical Sciences, Trondheim, Norway

Statistical Models in Animal Breeding, INLA group meeting, NTNU, Department of Mathematical Sciences, Trondheim, Norway

Animal Breeding Applications of Pedigree Based Mixed Model, House sparrow "lunch” meeting, NTNU, Department of Biology, Trondheim, Norway

Evaluation of Multiple Allele Probabilities at a Single Locus for Ungenotyped Members of Complex Pedigrees, Applied Statistics 2010, Ribno, Slovenia

Graphical Model View of Additive Genetic Models, UGA (link1, link2), Athens, Georgia, USA

Uporaba statistike na področju kvantitativne genetike, IBMI, Ljubljana, Slovenia

2010-08-09

9th WCGALP is finnished

9th WCGALP is finnished. It was a great conference with more than 1300 participants. For me it was quite stresfull to scheudle the list of presentations I would like to attend as there were six sessions in parallel. Well, I managed more or less to get maximum out of it. I believe we can say that the premise of this confernce was genomic selection, but there were also some other interesting stuff on genetics/genomics applied to livestock species. It was fun to met old and new colleagues and friends.

My contribution (Flexible Bayesian Inference of Animal Model Parameters Using BUGS Program) was chosen to be presented as a poster. I hoped for a talk, but this is part of the "game". I got quite some inquries at the poster session, so I am satisfied.

For LaTeX enthusiasts: I created poster using beamerposter package using the tweaked style from Martin Weiglhofer. It is a great LaTeX package, but it took me quite long to get the poster done - the initial layout and setting was fast, but tweaks (positioning a bit left, right, up, down, ...) were quite time consuming. MS PowerPoint still rocks in this regard, though the typsetting is much much nicer with LaTeX.

Flexible Bayesian Inference of Animal Model Parameters Using BUGS Program - poster

2010-07-28

IPSUR book used LyX-Sweave

Jay Kerns wrote a book titled "Introduction to Probability and Statistics Using R" or IPSUR for short. He has put up a web site for the book and Rcmdr plugin. There are at least two very important points from my side (I did not read the book, yet): 1. it is free to download (thank you Jay!!!) and 2. it was written with the help of LyX-Sweave. This shows that combination of LyX and Sweave is really powerful.

2010-07-01

Some links on dog and cat genetics (mainly disorders)

Bellow is a set of links I accidentally came accross today:

Online Mendelian Inheritance in Animals (OMIA)
LIDA - designed to collect, organise and disseminate information on the prevalence of inherited disorders among Australian cats and dogs
http://www.dr-addie.com/Conditions.htm
Inherited disorders in cats - confirmed and suspected
http://www.hsvma.org/pdf/fact_sheets/guide-to-congenital-and-heritable-disorders.pdf
http://www.vetsci.usyd.edu.au/research/disorders/documents/pop_structure.pdf

2010-06-30

Drawing pedigree examples using the kinship R package

I have previously provided sort of an overview about plotting the pedigrees, then specifically using the Graphiviz, while I have lately used the TikZ LaTeX (see slides 11-15) system (see more example). The later gives great (beautiful) results, but at the cost of writing TikZ code - it is not that horible, just time consuming - the same applies to Graphviz. Is there a quick way to plot a pedigree if we already have the data in the file. It is possible to do it in R using the kinship package. Here is an example:

ped <- data.frame( id=c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11),
fid=c( 0, 0, 0, 3, 3, 5, 3, 6, 6, 0, 9),
mid=c( 0, 0, 0, 2, 1, 4, 4, 7, 7, 0, 10),
sex=c( 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 2),
aff=0)
ped[11, "aff"] <- 1

library(package="kinship")
ped <- with(ped, pedigree(id=id, dadid=fid, momid=mid, sex=sex, affected=aff))
plot(ped)

Which gives the following result. It is not great, but it is informative and easy to do. From a practical point of implementation all pedigree members need to have both parents known or no parents known.

2010-06-29

Relationship between correlation and cosine

See very nice blog post by John D Cook here.

Sweave.sh in Eclipse-StatET

Sébastien Bihorel sent the following instructions on how to use my sweave.sh shell script in Eclipse-StatET.

1- First, you need to know the path to your TEXINPUTS settings. Type R CMD env |grep TEXINPUTS in a shell. In my installation (opensuse 11.2), the shell returned the following
TEXINPUTS=.::/usr/lib/R/share/texmf:

2- Edit your .bashrc file (located in your home directory) and add the following statement
export TEXINPUTS

3- Download Gregor Gorjanc's sweave.sh script at http://cran.r-project.org/contrib/extra/scripts/Sweave.sh. Copy it to /usr/local/bin, rename the file to sweave (mv sweave.sh sweave) and make the file executable (chmod +ax sweave) if it is not already.

4- Open Eclipse and create a new Program in External Tools Configuration using the following settings:
>Main:
- location: /usr/local/bin/sweave
- working directory: ${container_loc}
- Arguments: -ld ${resource_loc}

>Refresh:
no selection

>Build:
Build before launch: checked
The project containing the selected resource: checked
Include referenced projects: checked

>Environment: added a new variable
Variable: TEXINPUTS
Value: .:/usr/lib/R/share/texmf: <- Use here the path obtained at step 1 (for some reason you have to add .: before the path and : after. Do not use quotes around the Value) Append environment to native environment: checked >Common:
Local file: checked
Console encoding: default - inherited (UTF-8)
Standard Input and Ouput: Allocate console
Launch in background: checked

Now you should be able to see sweave as a new program in your main screen. Hope it helps

Sebastien

2010-05-29

Beta koeficient iz asociacijske študije

Od Matjaža Stanonika sem prejel sledeče vprašanje:

Interpretirati moram rezultate članka, ki so podani v beta koeficientu. Če sem prav ugotovil je to v slovenščini standardizirani koeficient korelacije. Vendar pa si kljub temu ne znam praktično razložiti rezultatov. Primer iz članka: nek SNP vpliva na nivo LDL-ja v telesu. Na določeni populaciji so ugotovili, da je beta koeficient za ta SNP 0,10. Sedaj pa ne vem, kaj praktično pomeni ta številka.

Moj odgovor:

Predvidevam, da je govora o vrednostih+ v tabeli 3. Vzemimo SNP rs646776c (prvi v tabeli). Ocena za beta koeficient za ta SNP znaša −0.16 s standardno napako (v oklepaju) 0.01.

Kaj je beta koeficient v tem primeru? Pod tabelo je napisano: "Beta-coefficient (β) represents the proportion of 1 s.d. change in standardized LDL cholesterol residual (mean = 0, s.d. = 1 after adjustment for age, age2, gender, and diabetes status) per copy of the allele modeled".

Če prav razumem so naredili sledeče. Izmerili so holesterol vrste LDL (fenotip) in najprej te vrednosti korigirali za vpliv starosti (kot kvadratno regresijo), spol in status diabetesa - predidevam, da so vse vplive vključili v statistični model, ki ga lahko zapišemo kot y = m + x + x^2 + s + d + e, kjer je y - fenotip, m - srednja vrednost populacije, x - starost, s - spol (1 - moški, 2 - ženski), d - diabetes (0 - ne, 1 - da) in e - nepojasnjeni ostanek. Vrednosti (e v enačbi), ki so jih dobili iz korekcije so standardizirali, tako da je bilo povprečje 0 in standardni odklon 1. To pomeni, da lahko po analogiji normalne porazdelitve pričakujemo minimum pri ~ -3 in maksimum pri ~ +3.

Beta koeficient je tako ocena vpliva zamenjave enega SNP allela na povprečje korigiranih vrednostih. Če ima nek osebek genotiip A1A1 nekdo drugi pa A1A2, potem pričakujemo, da se bosta ta dva osebka v povprečju razlikovala za -0.16+-0.01 standardne deviacije holesterola LDL - standardna napaka je praktično zanemarljivo majhna. Pri osebku z genotipom A2A2 pa 2*-0.16+-0.01 standardne deviacije holesterola LDL. Tole s standardno deviacijo je malo zapleteno. Recimo, da je v neki populaciji (vrednosti si bom izmislil!!!) povprečje za holesterol LDL 100 enot in standardna deviacija 10 enot. Potem bi razlika med A1A1 in A1A2 bila -0.16 * 10 enot = -1.6 enot.

2010-04-24

Nice writing on the use of McMC in applied work

See "Inference from simulations and monitoring convergence1" by Gelman and Shirley. The especially point out that quite few McMC samples are needed to get a reasonable picture of posterior distribution, while more samples are needed if the precise knowledge of posterior mean or any similar quantity is needed.

2010-03-30

Update on "LyX and Sweave with R on Windows XP or Vista"

As previously Jeff contributed a not on how to setup LyX and Sweave on Win XP or Vista. See here for the note.

P.S. (2010-04-01) See also LyX Wiki

2010-03-24

Ilustracije velikonočnice (Pulsatilla Grandis)

Velikonočnica je enostavno povedano zelo lepa. V Sloveniji raste le na nekaj mestih. Med najbolj znanimi rastišči je vsekakor Boč. V veliko večjem številu (mogoče tudi zaradi manjše obljudenosti v preteklosti) pa jo je moč najti na rastišču Boletina - Ponikva. Zemljišče je bilo še nedolgo nazaj last družine Gorjanc - kmetija mojih starih staršev in strica. Ob vpisu na študij sem v prvem letniku precej aktivno opazoval rastišče. V tem času sem sošolko Matejo Verbič nahecal, da je narisala (glej spodaj) velikonočnico v različnih stadijih na podlagi mojih fotografij - v naravi sem razgrnil rastje okoli velikonočnice in prislonil bel papir ter naredil nekaj posnetkov. Meni (laiku) so matejine ilustracije naravnost fantastične, a sem z leti in drugimi zanimivostmi povsem pozabil na njih. Pred kratkim sem našel ilustracije pri pospravljanju omare se odločil, da jih "objavim" na spletu in tako delim z "vsemi". Avtorstvo ilustracij je seveda še vedno od Mateje! Originalnih fotografij žal nisem več našel.

Najprej nekaj "naključnih" povezav glede velikonočnice:

Sedaj pa ilustracije:

From Bloggerjeve slike

2010-02-27

9th WCGALP: Flexible Bayesian Inference of Animal Model Parameters Using BUGS Program

Conference madness is continuing ;) Bellow is my contribution for 9th WCGALP (World Congress on Genetics Applied to Livestock Production), which is held every four years. The above site describes it as: "This congress is the premier meeting point for scientists around the world involved in genetic improvement of livestock". My contribution is again on fitting so called animal model (pedigree based mixed model) in BUGS. The contributions must be very short (only four pages), so there is not much to show. I hope the contribution is going to be accepted so that I can spread this idea among the animal breeders.

Flexible Bayesian Inference of Animal Model Parameters Using BUGS Program

2010-02-25

Graphical Model Representation of Pedigree Based Mixed Model

Bellow is a link to my contribution for 32nd International Conference on Information Technology Iinerfaces. This contribution adds more "formal stuff" to previous work.

Graphical Model Representation of Pedigree Based Mixed Model

2010-02-12

WAMWiki - wild animal models wiki

WAMWiki is a great place with tutorials on how to fit animal models with ASREML, WOMBAT and MCMCglmm. Perhaps I could add my work with animal models in BUGS in the future ;) There are several things there (testing for random effects, bivariate models, ...) and a nice description of pedantics software, that can report about information in the pedigree - structure, ...

2010-02-09

LyX and Sweave with R on Windows XP or Vista

Jeﬀ Laake contributed instructions on how to setup LyX to work with Sweave on Windows XP or Vista so that others might benefit. I did not read the instructions thorougly! He expanded the work of Murat. As far as I understand, in comparison to my approach they avoid the need for unix shell, which is nice!

P.S. (2010-04-01) See also LyX Wiki

2010-02-08

LyX-Sweave - use of Ctrl+Enter

Yihui writes:

I have been wondering if it is possible to use Enter instead of Ctrl+Enter in the Sweave code chunk in LyX. It is not quite convenient to press Ctrl+Enter, what's more, we are not able to copy R code and paste it into LyX from other editors (vice versa). I see we are allowed to use a single carriage return in LaTeX code embedded in LyX. Why must Sweave code use Ctrl+Enter? Thanks!

My reply:

LyX prohibits the use of enter key for anything else than starting a new paragraph - this is default behaviour which is great for writing, but notorious for using Sweave with LyX. When you press enter you get a new paragraph, which would start a new chunk of code or is it new paragraph with default style - I do not remember the details at the moment. I agree that this is far from handy, but I was not able to find a solution for this. However, as you mention, it is possible to use ERT LaTeX boxes, to overcome this.

2010-01-29

LyX-Sweave on Ubuntu (Karmic)

Bret Collier sends the following install instructions for LyX-Sweave combination that worked for him on his Ubuntu (Karmic) setup.

Nothing was wrong with your instructions at http://www.bioconductor.org/CRAN/contrib/extra/lyx under install from what I can tell (other than where I put noweb.sty).

Here are the install steps I used (which basically follow the ones you provide on CRAN but I tried to list exactly what I did) to install Lyx 1.6.4 on my Linux Ubuntu Karmic Dual Core 32bit laptop with R v2.10.1 (2009-12-14). Note there are some minor differences with my install routine and what I found at- http://blog.nextbiomotif.com/2009/10/primer-to-sweave-with-lyx.html. Location of Sweave.sty is different.

1: I used Synaptic package manager to install Lyx and LaTex (tex-live). I updated R using sudo apt-get install and sudo apt-get install r-base before anything else.

2. Find the so-called LyX user directory or LyX library directory via the Help -> About LyX menu within LyX (mine was /usr/share/lyx/)

3. Save the 'preferences' file from CRAN (I saved unedited) into- /usr/share/lyx/

4. I saved all the literate-*.* files from CRAN to the layouts sub-directory of the Lyx user directory- /usr/share/lyx/layouts/

5. I put noweb.sty into- /usr/share/texmf/tex/latex/

6. I put Sweave.sty into- /usr/share/texmf/

7. I put any other .sty into- /usr/share/texmf/tex/latex/

8. I used texhash (or sudo texhash) to make sure all the .sty are located.

9. I made sure R was in the path (opened Terminal and typed in "R")

10. I used sudo lyx to open Lyx, then I went Tools->Reconfigure, got the reconfigure successful window, then I closed Lyx

11: Then I opened Lyx and everything was working on the Lyx end.

2010-01-21

R AnalyticFlow

R AnalyticFlow seems to be a nice tool to have a good overview over the analysis. The same kind of mode is available in Orange and SAS Enterprise Guide. I have not tried it yet, though. Does anyone have any experiences with it?

Update on 2010-07-31: there is also Red-R, which is a "sort of mixture" of R and Orange.

R AnalyticFlow

Orange

SAS Enterprise Guide