2010-12-18

ASReml-R: Storing A inverse as a sparse matrix

I was testing ASReml-R program (an R package that links propriety ASReml binaries that can be used only with valid licence) this week and had to do some manipulations with the numerator relationship matrix (A). ASReml-R provides a function (asreml.Ainverse) that can create inverse of A directly from the pedigree as this inverse is needed in pedigree based mixed model. Bulding inverse of A directly from a pedigree is a well known result dating back to Henderson in 1970s or so. The funny thing is that it is cheaper to setup inverse of A directly than to setup up first A and then to invert it. In addition, inverse of A is very spare so it is easy/cheap to store it. Documentation for asreml.Ainverse has a tiny example of usage. Since the result of this function is a list with several elements (data.frame with "triplets" for non-zero elements of inverse of A, inbreeding coefficients, ...) example also shows how to create a matrix object in R as shown bellow:
library(package="asreml")
## Create test pedigree
ped <- data.frame( me=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
dad=c(0, 0, 0, 1, 1, 2, 4, 5, 7, 9),
mum=c(0, 0, 0, 1, 1, 2, 6, 6, 8, 9))
## Create inverse of A in triplet form
tmp <- asreml.Ainverse(pedigree=ped)$ginv
## Create a "proper" matrix
AInv <- asreml.sparse2mat(x=tmp)
## Print AInv
AInv
So the inverse of A would be:
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]      [,9]     [,10]
[1,] 5 0 0 -2 -2 0 0.0 0.0 0.000000 0.000000
[2,] 0 3 0 0 0 -2 0.0 0.0 0.000000 0.000000
[3,] 0 0 1 0 0 0 0.0 0.0 0.000000 0.000000
[4,] -2 0 0 3 0 1 -2.0 0.0 0.000000 0.000000
[5,] -2 0 0 0 3 1 0.0 -2.0 0.000000 0.000000
[6,] 0 -2 0 1 1 4 -2.0 -2.0 0.000000 0.000000
[7,] 0 0 0 -2 0 -2 4.5 0.5 -1.000000 0.000000
[8,] 0 0 0 0 -2 -2 0.5 4.5 -1.000000 0.000000
[9,] 0 0 0 0 0 0 -1.0 -1.0 4.909091 -2.909091
[10,] 0 0 0 0 0 0 0.0 0.0 -2.909091 2.909091
However, this is problematic as it creates a dense matrix - zero values are also stored (you can see them). If we would have 1,000 individuals, such a matrix would consume 7.6 Mb of RAM (= (((1000 * (1000 + 1)) / 2) * 16) / 2^20). This is not a lot, but with 10,000 individuals we would already need 763 Mb of RAM, which can create some bottlenecks. A solution is to create a sparse matrix using the Matrix R package. Luckily we have all the ingredients prepared by asreml.Ainverse function - the triplets. However, the essential R code is a bit daunting and I had to test several options before I figured it out - code from my previous post helped;)
## Load package
library(package="Matrix")
## Number of pedigree members
nI <- nrow(ped)
## Store inverse of A in sparse form
AInv2 <- as(new("dsTMatrix",
Dim=c(nI, nI),
uplo="L",
i=(as.integer(tmp$Row) - 1L),
j=(as.integer(tmp$Column) - 1L),
x=tmp$Ainverse),
"dsCMatrix")
## Add row and column names - optional
dimnames(AInv2) <- list(attr(x=tmp, which="rowNames"),
attr(x=tmp, which="rowNames"))
## Print AInv
AInv2
And the inverse of A is now:
10 x 10 sparse Matrix of class "dsCMatrix"
[[ suppressing 10 column names ‘1’, ‘2’, ‘3’ ... ]]

1 5 . . -2 -2 . . . . .
2 . 3 . . . -2 . . . .
3 . . 1 . . . . . . .
4 -2 . . 3 . 1 -2.0 . . .
5 -2 . . . 3 1 . -2.0 . .
6 . -2 . 1 1 4 -2.0 -2.0 . .
7 . . . -2 . -2 4.5 0.5 -1.000000 .
8 . . . . -2 -2 0.5 4.5 -1.000000 .
9 . . . . . . -1.0 -1.0 4.909091 -2.909091
10 . . . . . . . . -2.909091 2.909091
you can clearly see the structure and it soon becomes obvious why such a storage is more efficient.

If we want to go back from matrix to triplet form (this might be useful if we want to create a matrix for programs as ASReml) we can use the following code:
## Convert back to triplet form - first the matrix
tmp2 <- as(AInv2, "dsTMatrix")
## ... - now to data.frame
tmp3 <- data.frame(Row=tmp2@i + 1, Column=tmp2@j + 1, Ainverse=tmp2@x)
## Sort
tmp3 <- tmp3[order(tmp3$Row, tmp3$Column), ]
## Test that we get the same stuff
any((tmp[, 3] - tmp3[, 3]) > 0)
## ASReml-R specificities
attr(x=tmp3, which="rowNames") <- rownames(tmp)
attr(x=tmp3, which="geneticGroups") <- c(0, 0)

4 comments:

Anonymous said...

Possible typo: Both code snippets are identical (for full and sparse inverse).

Gorjanc Gregor said...

Tnx! It is fixed now. Typing R code is a pain with blogger as it keeps destroying the "layout" - spacing between letters.

Luis said...

Thanks for the code. Nice to see other people doing fun things with asreml-r; it is a shame that the package is proprietary limiting the number of users.

Gorjanc Gregor said...

For reference, in the above code we can omit as(..., "dsCMatrix") part as we can easily also work with the "dsTMatrix" class, especially if we want to export triplets to some file as has been show above (then you do not need to transform back to the "dsTMatrix" class).