Gregor Gorjanc (gg): 2011

2011-12-16

VIPER - cool pedigree&genotype data view

Interested in genetics - in looking genotype data on pedigrees? Check out VIPER (Visual Interactive Pedigree ExploreR). Looks really cool!!!

2011-10-25

Ocenjevanje parametrov v Bayesovski statistiki (IBMI)

Prosojnice z današnje predstavitve o ocenjevanju parametrov v Bayesovski statistiki na IBMI so na voljo tukaj.

Ocenjevanje parametrov v Bayesovski statistiki

2011-10-13

Recently discovered study resources

Khan Academy - a set of 2600 free videos from basic to more advances quantitative skills
Introduction to Artifical Inteligence - Stanford on-line course for masses
Machine Learning - Stanford on-line course for masses

2011-10-05

2011-10-04

ASD 2011 - Primošten

Animal Science Days (ASD) conference was held in Primošten this year. I participated with the following contributions as a co-author:

Lambing Interval in Jezersko-Solčava and Improved Jezersko-Solčava Breed. PDF
Partitioning of Genetic Trends by Originin Croatian Simmental Cattle. PDF
Estimation of Variance Components for Litter Size in the First and Later Parities in Improved Jezersko-Solcava Sheep. PDF
The Effect of Coenzyme Q10 and Lipoic Acid Added to the Feed of Hens on Physical Characteristics of Eggs. PDF

2011-09-30

InterBull: Partitioning of international genetic trends by origin in Brown Swiss bulls

I attended the InterBull meeting this year in Stavanger (Norway), which was jointly organised with the EAAP conference in the same place. I participated with contribution titled "Partitioning of international genetic trends by origin in Brown Swiss bulls" co-authored with colleagues. In essence we partitioned breeding values of Brown Swiss animals by origin of selection and summarized those partitions as origin specific genetic trends. This gives us an opportunity to evaluate the effect of selection performed in different countries and how this affects the global genetic trends. Results for this breed are quite shocking! Look into the paper and talk (see bellow) for more ;)

We were looking forward for comments and/or critiques about the applied method from the audience, but did not get any direct questions. Several people approached me after the talk and one of the comments was that our method is nothing more than multiplying breeding values by the share of genes coming from different origin. This is not true and I will demonstrate this in with a simple example.

Let us assume that we have a simple small pedigree as shown bellow with R code, where column names are: id =individual code, fid = father code, mid = mother code, ori = origin, and bv = breeding value. This pedigree could represent situation where we constantly use foreign sires - here only two generations are being shown. Origin represents the country of registering/selecting aninmal.

## Simple example
example <- data.frame( id=c(  1,   2,   3,   4,   5),
                      fid=c( NA,  NA,   1,  NA,   3),
                      mid=c( NA,  NA,   2,  NA,   4),
                      ori=c("A", "B", "A", "B", "A"),
                       bv=c(100, 106, 104, 106, 103))

Now we would like to perform gene proportion analysis and partitioning of supplied breeding values, both according to origin. This is easy to achieve "by hand" for this simple example (see paper bellow for maths), but tedious for bigger examples. I have wrote an R package partAGV (not yet publicly available, but you can contact me) that can be used for such analyses.

## For gene proportion analysis
example$gp <- 1
 
library(partAGV)
partAGV(example, colAGV=6:5)
 
## Gene proportions
##    id  fid  mid ori gp gp_pa gp_w gp_A gp_B
## 1   1 <NA> <NA>   A  1     0    1 1.00 0.00
## 2   2 <NA> <NA>   B  1     0    1 0.00 1.00
## 3   3    1    2   A  1     1    0 0.50 0.50
## 4   4 <NA> <NA>   B  1     0    1 0.00 1.00
## 5   5    3    4   A  1     1    0 0.25 0.75
 
## Partitions of breeding values 
##    id  fid  mid ori  bv bv_pa bv_w  bv_A  bv_B
## 1   1 <NA> <NA>   A 100     0  100 100.0   0.0
## 2   2 <NA> <NA>   B 106     0  106   0.0 106.0
## 3   3    1    2   A 104   103    1  51.0  53.0
## 4   4 <NA> <NA>   B 106     0  106   0.0 106.0
## 5   5    3    4   A 103   105   -2  23.5  79.5

As we can see gene proportions are as expected: 1/2 for each origin in individual 3 and 1/4 vs. 3/4 for individual 5. Partitions of breeding values show that in animal 5 we 23.5 (out of 105) is attributed to selection work done in country A and 79.5 is attributed to selection work done in country B. Now, if we multiply breeding value of individual 5 (105) by origin specific gene proportions (1/4 and 3/4) we get 25.75 and 77.5, which is similar to partitions, but not the same. This shows that our method enables separation of gene flow and selection work preformed by particular country. In order to see this algebraically we can write breeding values of this individual as:

a_5 = 1/2 a_3 + 1/2 a_4 + w_5

= 1/2 (1/2 a_1 + 1/2 a_2 + w_3) + 1/2 w_4 + w_5

= 1/4 a_1 + 1/4 a_2 + 1/2 w_3 + 1/2 w_4 + w_5

= 1/4 w_1 + 1/4 w_2 + 1/2 w_3 + 1/2 w_4 + w_5

= 1/4 100 + 1/4 106 + 1/2 1 + 1/2 106 + -2

= 25 + 26.5 + 0.5 + 53 + -2

If we now collect terms specific to each origin we get:

a_5_A = 25 + 0.5 + + -2 = 23.5

a_5_B = 26.5 + 53 = 79.5

Partitioning of international genetic trends by origin in Brown Swiss bulls

2011-09-28

Polyploidy in sugarcane

While reading UseR conference abstracts I came across this sentence: "Sugarcane is polypoid, i.e., has 8 to 14 copies of every chromosome, with individual alleles in varying numbers." Vau! This generates really complex genotype system. Say we have biallelic gene with alleles being A and B. In diploids the possible genotypes are AA, AB, and BB. Given the above sentence in sugarcane possible genotypes are any permutation of A's and B's in a series of 8 to 14 alleles. I am not sure if 9, 11, and 13 are also allowed, that is having even number of chromosomes. In any case such permutations result in really large numbers!

Thinking about this a bit further it appears that the whole system is not that complex once we realize that genotyping does not tell as about the order of alleles (we can not distinguish between AB and BA), which simplifies from all possible permutations to all possible combinations, e.g., for biallelic gene in tetraploids this would correspond to 5 combinations and 16 permutations.

Bellow is an R snippet that shows how to enumerate all possible combinations or permutations

## Load package having nice combinatorial functions
library(package="gtools")
 
## Specify alleles - just two for simplicity
alleles <- c("A", "B")
 
## Possible genotypes for diploids
combinations(n=length(alleles), r=2, v=alleles, repeats.allowed=TRUE)
##      [,1] [,2]
## [1,] "A"  "A" 
## [2,] "A"  "B" 
## [3,] "B"  "B" 
 
## Possible genotypes for tetraploids
combinations(n=length(alleles), r=4, v=alleles, repeats.allowed=TRUE)ž
##      [,1] [,2] [,3] [,4]
## [1,] "A"  "A"  "A"  "A" 
## [2,] "A"  "A"  "A"  "B" 
## [3,] "A"  "A"  "B"  "B" 
## [4,] "A"  "B"  "B"  "B" 
## [5,] "B"  "B"  "B"  "B" 
 
permutations(n=length(alleles), r=4, v=alleles, repeats.allowed=TRUE)
##       [,1] [,2] [,3] [,4]
##  [1,] "A"  "A"  "A"  "A" 
##  [2,] "A"  "A"  "A"  "B" 
##  [3,] "A"  "A"  "B"  "A" 
##  [4,] "A"  "A"  "B"  "B" 
##  [5,] "A"  "B"  "A"  "A" 
##  [6,] "A"  "B"  "A"  "B" 
##  [7,] "A"  "B"  "B"  "A" 
##  [8,] "A"  "B"  "B"  "B" 
##  [9,] "B"  "A"  "A"  "A" 
## [10,] "B"  "A"  "A"  "B" 
## [11,] "B"  "A"  "B"  "A" 
## [12,] "B"  "A"  "B"  "B" 
## [13,] "B"  "B"  "A"  "A" 
## [14,] "B"  "B"  "A"  "B" 
## [15,] "B"  "B"  "B"  "A" 
## [16,] "B"  "B"  "B"  "B" 
 
## Possible genotypes for 8-14 ploids
spectrum <- seq(from=8, to=14, by=2)
nS <- length(spectrum)
retC <- vector(mode="list", length=nS)
retP <- vector(mode="list", length=nS)
for(i in 1:nS) {
  retC[[i]] <- combinations(n=length(alleles), r=spectrum[i], v=alleles, repeats.allowed=TRUE)
  retP[[i]] <- permutations(n=length(alleles), r=spectrum[i], v=alleles, repeats.allowed=TRUE)
}
combC <- sapply(retC, nrow)
combP <- sapply(retP, nrow)
cbind(spectrum, combC, combP)
##      spectrum combC combP
## [1,]        8     9   256
## [2,]       10    11  1024
## [3,]       12    13  4096
## [4,]       14    15 16384

Created by Pretty R at inside-R.org

2011-08-12

Setup up the inverse of additive relationship matrix in R

Additive genetic covariance between individuals is one of the key concepts in (quantitative) genetics. When doing the prediction of additive genetic values for pedigree members, we need the inverse of the so called numerator relationship matrix (NRM) or simply A. Matrix A has off-diagonal entries equal to numerator of Wright's relationship coefficient and diagonal elements equal to 1 + inbreeding coefficient. I have blogged before about setting up such inverse in R using routine from the ASReml-R program or importing the inverse from the CFC program. However, this is not the only way to "skin this cat" in R. I am aware of the following attempts to provide this feature in R for various things (the list is probably incomplete and I would grateful if you point me to other implementations):

pedigree R package has function makeA() and makeAinv() with obvious meanings; there is also calcG() if you have a lot of marker data instead of pedigree information; there are also some other very handy functions calcInbreeding(), orderPed(), trimPed(), etc.
pedigreemm R package does not have direct implementation to get A inverse, but has all the needed ingredients, which makes the package even more interesting
MCMCglmm R package has function inverseA() which works with pedigree or phlyo objects; there are also handy functions such as prunePed(), rbv(), sm2asreml(), etc.
kinship and kinship2 R packages have function kinship() to setup kinship matrix, which is equal to the half of A; there are also nice functions for plotting pedigrees etc. (see also here)
see also a series of R scripts for relationship matrices

As I described before, the interesting thing is that setting up inverse of A is easier and cheaper than setting up A and inverting it. This is very important for large applications. This is an old result using the following matrix theory. We can decompose symmetric positive definite matrix as A = LU = LL' (Cholesky decomposition) or as A = LDU = LDL' (Generalized Cholesky decomposition), where L (U) is lower (upper) triangular, and D is diagonal matrix. Note that L and U in previous two equations are not the same thing (L from Cholesky is not equal to L from Generalized Cholesky decomposition)! Sorry for sloppy notation. In order to confuse you even more note that Henderson usually wrote A = TDT'. We can even do A = LSSU, where S diagonal is equal to the square root of D diagonal. This can get us back to A = LU = LL' as LSSU = LSSL' = LSS'L' = LS(LS)' = L'L (be ware of sloppy notation)! The inverse rule says that inv(A) = inv(LDU) = inv(U) inv(D) inv(L) = inv(L)' inv(D) inv(L) = inv(L)' inv(S) inv(S) inv(L). I thank to Martin Maechler for pointing out to the last (obviously) bit to me. In Henderson's notation this would be inv(A) = inv(T)' inv(D) inv(T) = inv(T)' inv(S) inv(S) inv(T) Uf ... The important bit is that with NRM (aka A) inv(L) has nice simple structure - it shows the directed graph of additive genetic values in pedigree, while inv(D) tells us about the precision (inverse variance) of additive genetic values given the additive genetic values of parents and therefore depends on knowledge of parents and their inbreeding (the more they are inbred less variation can we expect in their progeny). Both inv(L) and inv(D) are easier to setup.

Packages MCMCglmm and pedigree give us inv(A) directly (we can also get inv(D) in MCMCglmm), but pedigreemm enables us to play around with the above matrix algebra and graph theory. First we need a small example pedigree. Bellow is an example with 10 members and there is also some inbreeding and some individuals have both, one, or no parents known. It is hard to see inbreeding directly from the table, but we will improve that later (see also here).

ped <- data.frame( id=c(  1,   2,   3,   4,   5,   6,   7,   8,   9,  10),
                  fid=c( NA,  NA,   2,   2,   4,   2,   5,   5,  NA,   8),
                  mid=c( NA,  NA,   1,  NA,   3,   3,   6,   6,  NA,   9))

Now we will create an object of a pedigree class and show the A = U'U stuff:

## install.packages(pkgs="pedigreemm")
library(package="pedigreemm")
 
ped2 <- with(ped, pedigree(sire=fid, dam=mid, label=id))
 
U <- relfactor(ped2)
A &lt;- crossprod(U)
 
round(U, digits=2)
## 10 x 10 sparse Matrix of class "dtCMatrix"                                     
##  [1,] 1 . 0.50 .    0.25 0.25 0.25 0.25 . 0.12
##  [2,] . 1 0.50 0.50 0.50 0.75 0.62 0.62 . 0.31
##  [3,] . . 0.71 .    0.35 0.35 0.35 0.35 . 0.18
##  [4,] . . .    0.87 0.43 .    0.22 0.22 . 0.11
##  [5,] . . .    .    0.71 .    0.35 0.35 . 0.18
##  [6,] . . .    .    .    0.71 0.35 0.35 . 0.18
##  [7,] . . .    .    .    .    0.64 .    . .   
##  [8,] . . .    .    .    .    .    0.64 . 0.32
##  [9,] . . .    .    .    .    .    .    1 0.50
## [10,] . . .    .    .    .    .    .    . 0.66

## To check

U - chol(A)

round(A, digits=2)
## 10 x 10 sparse Matrix of class "dsCMatrix"                                     
##  [1,] 1.00 .    0.50 .    0.25 0.25 0.25 0.25 .   0.12
##  [2,] .    1.00 0.50 0.50 0.50 0.75 0.62 0.62 .   0.31
##  [3,] 0.50 0.50 1.00 0.25 0.62 0.75 0.69 0.69 .   0.34
##  [4,] .    0.50 0.25 1.00 0.62 0.38 0.50 0.50 .   0.25
##  [5,] 0.25 0.50 0.62 0.62 1.12 0.56 0.84 0.84 .   0.42
##  [6,] 0.25 0.75 0.75 0.38 0.56 1.25 0.91 0.91 .   0.45
##  [7,] 0.25 0.62 0.69 0.50 0.84 0.91 1.28 0.88 .   0.44
##  [8,] 0.25 0.62 0.69 0.50 0.84 0.91 0.88 1.28 .   0.64
##  [9,] .    .    .    .    .    .    .    .    1.0 0.50
## [10,] 0.12 0.31 0.34 0.25 0.42 0.45 0.44 0.64 0.5

1.

ote tha

pedigreem

m package uses Matrix classes in order to store only what we need to store, e.g., matrix U is triangular (t in "dtCMatrix") and matrix A is symmetric (s in "dsCMatrix"). To show the generalized Cholesky A = LDU (or using Henderson notation A = TDT') we use gchol() from the bdsmatrix R package. Matrix T shows the "flow" of genes in pedigree.

## install.packages(pkgs="bdsmatrix")
library(package="bdsmatrix")
tmp &lt;- gchol(as.matrix(A))
D &lt;- diag(tmp)
(T <- as(as.matrix(tmp), "dtCMatrix"))
## 10 x 10 sparse Matrix of class "dtCMatrix"                                     
##  [1,] 1.000 .      .    .     .    .    . .   .   .
##  [2,] .     1.0000 .    .     .    .    . .   .   .
##  [3,] 0.500 0.5000 1.00 .     .    .    . .   .   .
##  [4,] .     0.5000 .    1.000 .    .    . .   .   .
##  [5,] 0.250 0.5000 0.50 0.500 1.00 .    . .   .   .
##  [6,] 0.250 0.7500 0.50 .     .    1.00 . .   .   .
##  [7,] 0.250 0.6250 0.50 0.250 0.50 0.50 1 .   .   .
##  [8,] 0.250 0.6250 0.50 0.250 0.50 0.50 . 1.0 .   .
##  [9,] .     .      .    .     .    .    . .   1.0 .
## [10,] 0.125 0.3125 0.25 0.125 0.25 0.25 . 0.5 0.5 1

## To chec

lt;

- T %*% diag(sqrt(D))

L - t(U)

Now the A inverse part (inv(A) = inv(T)' inv(D) inv(T) = inv(T)' inv(S) inv(S) inv(T) using Henderson's notation, note that ). The nice thing is that pedigreemm authors provided functions to get inv(T) and D.

(TInv <- as(ped2, "sparseMatrix"))
## 10 x 10 sparse Matrix of class "dtCMatrix" (unitriangular)            
## 1   1.0  .    .    .    .    .   .  .    .   .
## 2   .    1.0  .    .    .    .   .  .    .   .
## 3  -0.5 -0.5  1.0  .    .    .   .  .    .   .
## 4   .   -0.5  .    1.0  .    .   .  .    .   .
## 5   .    .   -0.5 -0.5  1.0  .   .  .    .   .
## 6   .   -0.5 -0.5  .    .    1.0 .  .    .   .
## 7   .    .    .    .   -0.5 -0.5 1  .    .   .
## 8   .    .    .    .   -0.5 -0.5 .  1.0  .   .
## 9   .    .    .    .    .    .   .  .    1.0 .
## 10  .    .    .    .    .    .   . -0.5 -0.5 1
round(DInv <- Diagonal(x=1/Dmat(ped2)), digits=2)
## 10 x 10 diagonal matrix of class "ddiMatrix"
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    .    .    .    .    .    .    .    .     .
##  [2,]    .    1    .    .    .    .    .    .    .     .
##  [3,]    .    .    2    .    .    .    .    .    .     .
##  [4,]    .    .    . 1.33    .    .    .    .    .     .
##  [5,]    .    .    .    .    2    .    .    .    .     .
##  [6,]    .    .    .    .    .    2    .    .    .     .
##  [7,]    .    .    .    .    .    . 2.46    .    .     .
##  [8,]    .    .    .    .    .    .    . 2.46    .     .
##  [9,]    .    .    .    .    .    .    .    .    1     .
## [10,]    .    .    .    .    .    .    .    .    .  2.33
 
round(t(TInv) %*% DInv %*% TInv, digits=2)
## 10 x 10 sparse Matrix of class "dgCMatrix"
## ...
round(crossprod(sqrt(DInv) %*% TInv), digits=2)

## 10 x 10 sparse Matrix of class "dsCMatrix"

##  [1,]  1.5  0.50 -1.0  .     .     .     .     .     .     .   
##  [2,]  0.5  2.33 -0.5 -0.67  .    -1.00  .     .     .     .   
##  [3,] -1.0 -0.50  3.0  0.50 -1.00 -1.00  .     .     .     .   
##  [4,]  .   -0.67  0.5  1.83 -1.00  .     .     .     .     .   
##  [5,]  .    .    -1.0 -1.00  3.23  1.23 -1.23 -1.23  .     .   
##  [6,]  .   -1.00 -1.0  .     1.23  3.23 -1.23 -1.23  .     .   
##  [7,]  .    .     .    .    -1.23 -1.23  2.46  .     .     .   
##  [8,]  .    .     .    .    -1.23 -1.23  .     3.04  0.58 -1.16
##  [9,]  .    .     .    .     .     .     .     0.58  1.58 -1.16
## [10,]  .    .     .    .     .     .     .    -1.16 -1.16

.3

# T

o c

heck

so

ve

(A

) - crossprod(sqrt(DInv) %*% TInv)

The second method (using crossprod) is preferred as it leads directly to symmetric matrix (dsCMatrix), which stores only upper or lower triangle. And make sure you do not do crossprod(TInv %*% sqrt(DInv)) as it is the wrong order of matrices.

As promised we will display (plot) pedigree by use of conversion functions of matrix objects to graph objects using the following code. Two examples are provided using the graph and igraph packages. The former does a very good job on this example, but otherwise igraph seems to have much nicer support for editing etc.

## source("http://www.bioconductor.org/biocLite.R")
## biocLite(pkgs=c("graph", "Rgraphviz"))
library(package="graph")
library(package="Rgraphviz")
g <- as(t(TInv), "graph")
plot(g)

## install.packages(pkgs="igraph")
library(package="igraph")
i &lt;- igraph.from.graphNEL(graphNEL=g)
V(i)$label <- 1:10
plot(i, layout=layout.kamada.kawai)
## tkplot(i)