2012-03-31

GBLUP example in R

Shirin Amiri was asking about GBLUP (genomic BLUP) and based on her example I set up the following R script to show how GBLUP works. Note that this is the so called marker model, where we estimate allele substitution effects of the markers and not individual based model, where genomic breeding values are inferred directly. The code:

```library(package="MatrixModels")

dat <- data.frame( y=c(1.5, 2.35, 3.4, 2.31, 1.53),
s1=c("AA", "Aa", "Aa", "AA", "aa"),
s2=c("Bb", "BB", "bb", "BB", "bb"),
s3=c("cc", "Cc", "Cc", "cc", "CC"),
s4=c("Dd", "Dd", "DD", "dd", "Dd"))

cols <- paste("s", 1:4, sep="")
dat[, cols] <- lapply(dat[, cols], function(z) as.integer(z) - 1)

lambda <- 1

(y <- dat\$y)
(X <- model.Matrix(y ~  1,                     data=dat, sparse=TRUE))
(Z <- model.Matrix(y ~ -1 + s1 + s2 + s3 + s4, data=dat, sparse=TRUE))

XX <- crossprod(X)
XZ <- crossprod(X, Z)
ZZ <- crossprod(Z)

Xy <- crossprod(X, y)
Zy <- crossprod(Z, y)

(LHS <- rBind(cBind(XX,    XZ),
cBind(t(XZ), ZZ + Diagonal(n=dim(ZZ)) * lambda)))

(RHS <- rBind(Xy,
Zy))

(sol <- solve(LHS, RHS))

(GEBV <- Z %*% sol[-1])```

And the transcript:

```R version 2.14.2 (2012-02-29)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>
> library(package="MatrixModels")

Attaching package: ‘Matrix’

The following object(s) are masked from ‘package:base’:

det

Attaching package: ‘MatrixModels’

The following object(s) are masked from ‘package:stats’:

getCall

>
> dat <- data.frame( y=c(1.5, 2.35, 3.4, 2.31, 1.53),
+                   s1=c("AA", "Aa", "Aa", "AA", "aa"),
+                   s2=c("Bb", "BB", "bb", "BB", "bb"),
+                   s3=c("cc", "Cc", "Cc", "cc", "CC"),
+                   s4=c("Dd", "Dd", "DD", "dd", "Dd"))
>
> cols <- paste("s", 1:4, sep="")
> dat[, cols] <- lapply(dat[, cols], function(z) as.integer(z) - 1)
>
> lambda <- 1
>
> (y <- dat\$y)
 1.50 2.35 3.40 2.31 1.53
> (X <- model.Matrix(y ~  1,                     data=dat, sparse=TRUE))
"dsparseModelMatrix": 5 x 1 sparse Matrix of class "dgCMatrix"
(Intercept)
1           1
2           1
3           1
4           1
5           1
@ assign:  0
@ contrasts:
named list()
> (Z <- model.Matrix(y ~ -1 + s1 + s2 + s3 + s4, data=dat, sparse=TRUE))
"dsparseModelMatrix": 5 x 4 sparse Matrix of class "dgCMatrix"
s1 s2 s3 s4
1  2  1  .  1
2  1  2  1  1
3  1  .  1  2
4  2  2  .  .
5  .  .  2  1
@ assign:  1 2 3 4
@ contrasts:
named list()
>
> XX <- crossprod(X)
> XZ <- crossprod(X, Z)
> ZZ <- crossprod(Z)
>
> Xy <- crossprod(X, y)
> Zy <- crossprod(Z, y)
>
> (LHS <- rBind(cBind(XX,    XZ),
+               cBind(t(XZ), ZZ + Diagonal(n=dim(ZZ)) * lambda)))
5 x 5 sparse Matrix of class "dgCMatrix"
(Intercept) s1 s2 s3 s4
(Intercept)           5  6  5  4  5
s1                    6 11  8  2  5
s2                    5  8 10  2  3
s3                    4  2  2  7  5
s4                    5  5  3  5  8
>
> (RHS <- rBind(Xy,
+               Zy))
5 x 1 Matrix of class "dgeMatrix"
[,1]
(Intercept) 11.09
s1          13.37
s2          10.82
s3           8.81
s4          12.18
>
> (sol <- solve(LHS, RHS))
5 x 1 Matrix of class "dgeMatrix"
[,1]
(Intercept)  1.65468864
s1           0.05223443
s2           0.08655678
s3          -0.05223443
s4           0.45586081
>
> (GEBV <- Z %*% sol[-1])
5 x 1 Matrix of class "dgeMatrix"
[,1]
1 0.6468864
2 0.6289744
3 0.9117216
4 0.2775824
5 0.3513919
>
> proc.time()
user  system elapsed
2.330   0.070   2.412 ``` Anonymous said...

This is not a correct implementation of a genomic based prediction.

Gregor Gorjanc said...

Can you please comment which part is not correct?

Maybe first comment meant that this is not GBLUP but it is SNP-BLUP as you have not used genomic relationship matrix in your model.

Gregor Gorjanc said...

Correct, though with the caveat that SNP-BLUP and G-BLUP are equivalent models so we still get GEBV at the end as shown.

Unknown said...

XX <- crossprod(X)
gives me
Error in crossprod(x, y) :
requires numeric/complex matrix/vector arguments

Unknown said...

The modified scripted as below works just fine on my rstudio:
```
library(package="MatrixModels")

dat <- data.frame( y=c(1.5, 2.35, 3.4, 2.31, 1.53),
s1=c("AA", "Aa", "Aa", "AA", "aa"),
s2=c("Bb", "BB", "bb", "BB", "bb"),
s3=c("cc", "Cc", "Cc", "cc", "CC"),
s4=c("Dd", "Dd", "DD", "dd", "Dd"))

cols <- paste("s", 1:4, sep="")
dat[, cols] <- lapply(dat[, cols], function(z) as.integer(z) - 1)

lambda <- 1

dat
(y <- dat\$y)
(X <- as.matrix(model.Matrix(y ~ 1, data=dat, sparse=TRUE)))
(Z <- as.matrix(model.Matrix(y ~ -1 + s1 + s2 + s3 + s4, data=dat, sparse=TRUE)))

XX <- crossprod(X)
XZ <- crossprod(X, Z)
ZZ <- crossprod(Z)

Xy <- crossprod(X, y)
Zy <- crossprod(Z, y)

(LHS <- rbind(cbind(XX, XZ),
cbind(t(XZ), ZZ + diag(dim(ZZ)) * lambda)))

(RHS <- rbind(Xy,
Zy))

(sol <- solve(LHS, RHS))

(GEBV <- Z %*% sol[-1])
```