- Mathematical Monk channel featuring Information Theory, Machine Learning, and Probability Primer
- Coursera featuring loads of material from Princeton, Stanford, Michigan, and Penn
- Udacity
2012-04-21
Some more study video material
2012-04-17
Simulated data for genomic selection and GWAS using a combination of coalescent and gene drop methods
Our (Hickey and Gorjanc) paper describing simulation using coalescent and gene drop methods has been published in the new open G3 journal. This contribution is a part of the Genomic selection collection of the Genetic Society of America that is nicely described here.
2012-04-15
Chalk letter generator
Here is a cool online application that takes chalkboard with Bart Simpsons and adds your text on it.
2012-04-08
Small pedigree based mixed model example
Pedigree based mixed models (often called animal models, due to modelling animal performance) are the cornerstone of animal breeding and quantitative genetics. There are many programs that can be used for analyzing your data with these models, e.g., ASREML, BLUPf90, MATVEC, MiXBLUP & MiX99, SurvivalKit, PEST/VCE, WOMBAT, ...). There are also R packages you can use: pedigreemm and MCMCglmm. If you want to run your own program you can take the example code bellow and start from it. The code shows the essence of building the system of equations that needs to be solved on a simple example. Note that this is mean only for demonstration purposes and small scale analyses. In addition, variance components are assumed known here. In order to understand the model for this simple example a bit better the graphical model view is shown first.
Graphical model view of simple pedigree based mixed model exampleThe code:
options(width=200) ### --- Required packages --- ## install.packages(pkg=c("pedigreemm", "MatrixModels")) library(package="pedigreemm") ## pedigree functions library(package="MatrixModels") ## sparse matrices ### --- Data --- ## NOTE: ## - some individuals have one or both parents (un)known ## - some individuals have phenotype (un)known ## - some indididuals have repeated phenotype observations example <- data.frame( individual=c( 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10), father=c(NA, NA, NA, 2, 2, 4, 2, 5, 5, NA, 8), mother=c(NA, NA, NA, 1, NA, 3, 3, 6, 6, NA, 9), phenotype=c(NA, 103, 106, 98, 101, 106, 93, NA, NA, NA, 109), group=c(NA, 1, 1, 1, 2, 2, 2, NA, NA, NA, 1)) ## Variance components sigma2e <- 1 sigma2a <- 3 (h2 <- sigma2a / (sigma2a + sigma2e)) ### --- Setup data --- ## Make sure each individual has only one record in pedigree ped <- example[!duplicated(example$individual), 1:3] ## Factors (this eases buliding the design matrix considerably) example$individual <- factor(example$individual) example$group <- factor(example$group) ## Phenotype data dat <- example[!is.na(example$phenotype), ] ### --- Setup MME --- ## Phenotype vector (y <- dat$phenotype) ## Design matrix for the "fixed" effects (group) (X <- model.Matrix( ~ group, data=dat, sparse=TRUE)) ## Design matrix for the "random" effects (individual) (Z <- model.Matrix(~ individual - 1, data=dat, sparse=TRUE)) ## Inverse additive relationship matrix ped2 <- pedigree(sire=ped$father, dam=ped$mother, label=ped$individual) TInv <- as(ped2, "sparseMatrix") DInv <- Diagonal(x=1/Dmat(ped2)) AInv <- crossprod(sqrt(DInv) %*% TInv) ## Variance ratio alpha <- sigma2e / sigma2a ## Mixed Model Equations (MME) ## ... Left-Hand Side (LHS) without pedigree prior (LHS0 <- rBind(cBind(crossprod(X), crossprod(X, Z)), cBind(crossprod(Z, X), crossprod(Z, Z)))) ## ... Left-Hand Side (LHS) with pedigree prior round( LHS <- rBind(cBind(crossprod(X), crossprod(X, Z)), cBind(crossprod(Z, X), crossprod(Z, Z) + AInv * alpha)), digits=1) ## ... Right-Hand Side (RHS) (RHS <- rBind(crossprod(X, y), crossprod(Z, y))) ### --- Solutions --- ## Solve LHSInv <- solve(LHS) sol <- LHSInv %*% RHS ## Standard errors se <- diag(LHSInv) * sigma2e ## Reliabilities r2 <- 1 - diag(LHSInv) * alpha ## Accuracies r <- sqrt(r2) ## Printout cBind(sol, se, r, r2)
And the transcript:
R version 2.14.2 (2012-02-29) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > > options(width=200) > > ### --- Required packages --- > > ## install.packages(pkg=c("pedigreemm", "MatrixModels")) > > library(package="pedigreemm") ## pedigree functions Loading required package: lme4 Loading required package: Matrix Loading required package: lattice Attaching package: ‘Matrix’ The following object(s) are masked from ‘package:base’: det Attaching package: ‘lme4’ The following object(s) are masked from ‘package:stats’: AIC, BIC > library(package="MatrixModels") ## sparse matrices > > ### --- Data --- > > ## NOTE: > ## - some individuals have one or both parents (un)known > ## - some individuals have phenotype (un)known > ## - some indididuals have repeated phenotype observations > example <- data.frame( + individual=c( 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10), + father=c(NA, NA, NA, 2, 2, 4, 2, 5, 5, NA, 8), + mother=c(NA, NA, NA, 1, NA, 3, 3, 6, 6, NA, 9), + phenotype=c(NA, 103, 106, 98, 101, 106, 93, NA, NA, NA, 109), + group=c(NA, 1, 1, 1, 2, 2, 2, NA, NA, NA, 1)) > > ## Variance components > sigma2e <- 1 > sigma2a <- 3 > (h2 <- sigma2a / (sigma2a + sigma2e)) [1] 0.75 > > ### --- Setup data --- > > ## Make sure each individual has only one record in pedigree > ped <- example[!duplicated(example$individual), 1:3] > > ## Factors (this eases buliding the design matrix considerably) > example$individual <- factor(example$individual) > example$group <- factor(example$group) > > ## Phenotype data > dat <- example[!is.na(example$phenotype), ] > > ### --- Setup MME --- > > ## Phenotype vector > (y <- dat$phenotype) [1] 103 106 98 101 106 93 109 > > ## Design matrix for the "fixed" effects (group) > (X <- model.Matrix( ~ group, data=dat, sparse=TRUE)) "dsparseModelMatrix": 7 x 2 sparse Matrix of class "dgCMatrix" (Intercept) group2 2 1 . 3 1 . 4 1 . 5 1 1 6 1 1 7 1 1 11 1 . @ assign: 0 1 @ contrasts: $group [1] "contr.treatment" > > ## Design matrix for the "random" effects (individual) > (Z <- model.Matrix(~ individual - 1, data=dat, sparse=TRUE)) "dsparseModelMatrix": 7 x 10 sparse Matrix of class "dgCMatrix" [[ suppressing 10 column names ‘individual1’, ‘individual2’, ‘individual3’ ... ]] 2 . 1 . . . . . . . . 3 . 1 . . . . . . . . 4 . . 1 . . . . . . . 5 . . . 1 . . . . . . 6 . . . . 1 . . . . . 7 . . . . . 1 . . . . 11 . . . . . . . . . 1 @ assign: 1 1 1 1 1 1 1 1 1 1 @ contrasts: $individual [1] "contr.treatment" > > ## Inverse additive relationship matrix > ped2 <- pedigree(sire=ped$father, dam=ped$mother, label=ped$individual) > TInv <- as(ped2, "sparseMatrix") > DInv <- Diagonal(x=1/Dmat(ped2)) > AInv <- crossprod(sqrt(DInv) %*% TInv) > > ## Variance ratio > alpha <- sigma2e / sigma2a > > ## Mixed Model Equations (MME) > > ## ... Left-Hand Side (LHS) without pedigree prior > (LHS0 <- rBind(cBind(crossprod(X), crossprod(X, Z)), + cBind(crossprod(Z, X), crossprod(Z, Z)))) 12 x 12 sparse Matrix of class "dgCMatrix" [[ suppressing 12 column names ‘(Intercept)’, ‘group2’, ‘individual1’ ... ]] (Intercept) 7 3 . 2 1 1 1 1 . . . 1 group2 3 3 . . . 1 1 1 . . . . individual1 . . . . . . . . . . . . individual2 2 . . 2 . . . . . . . . individual3 1 . . . 1 . . . . . . . individual4 1 1 . . . 1 . . . . . . individual5 1 1 . . . . 1 . . . . . individual6 1 1 . . . . . 1 . . . . individual7 . . . . . . . . . . . . individual8 . . . . . . . . . . . . individual9 . . . . . . . . . . . . individual10 1 . . . . . . . . . . 1 > > ## ... Left-Hand Side (LHS) with pedigree prior > round( + LHS <- rBind(cBind(crossprod(X), crossprod(X, Z)), + cBind(crossprod(Z, X), crossprod(Z, Z) + AInv * alpha)), digits=1) 12 x 12 sparse Matrix of class "dgCMatrix" [[ suppressing 12 column names ‘(Intercept)’, ‘group2’, ‘individual1’ ... ]] (Intercept) 7 3 . 2.0 1.0 1.0 1.0 1.0 . . . 1.0 group2 3 3 . . . 1.0 1.0 1.0 . . . . individual1 . . 0.5 0.2 -0.3 . . . . . . . individual2 2 . 0.2 2.8 -0.2 -0.2 . -0.3 . . . . individual3 1 . -0.3 -0.2 2.0 0.2 -0.3 -0.3 . . . . individual4 1 1 . -0.2 0.2 1.6 -0.3 . . . . . individual5 1 1 . . -0.3 -0.3 2.1 0.4 -0.4 -0.4 . . individual6 1 1 . -0.3 -0.3 . 0.4 2.1 -0.4 -0.4 . . individual7 . . . . . . -0.4 -0.4 0.8 . . . individual8 . . . . . . -0.4 -0.4 . 1.0 0.2 -0.4 individual9 . . . . . . . . . 0.2 0.5 -0.4 individual10 1 . . . . . . . . -0.4 -0.4 1.8 > > ## ... Right-Hand Side (RHS) > (RHS <- rBind(crossprod(X, y), + crossprod(Z, y))) 12 x 1 Matrix of class "dgeMatrix" [,1] (Intercept) 716 group2 300 individual1 0 individual2 209 individual3 98 individual4 101 individual5 106 individual6 93 individual7 0 individual8 0 individual9 0 individual10 109 > > ### --- Solutions --- > > ## Solve > LHSInv <- solve(LHS) > sol <- LHSInv %*% RHS > > ## Standard errors > se <- diag(LHSInv) * sigma2e > > ## Reliabilities > r2 <- 1 - diag(LHSInv) * alpha > > ## Accuracies > r <- sqrt(r2) Warning message: In sqrt(r2) : NaNs produced > > ## Printout > cBind(sol, se, r, r2) 12 x 4 Matrix of class "dgeMatrix" se r r2 [1,] 104.76444809 1.901479 0.6051228 0.3661736 [2,] -4.65061921 1.376348 0.7356748 0.5412175 [3,] -2.62685846 2.432448 0.4349528 0.1891839 [4,] -0.77797260 1.983301 0.5821509 0.3388997 [5,] -4.32927399 1.979138 0.5833415 0.3402874 [6,] 1.54997883 2.316720 0.4772421 0.2277600 [7,] 3.18706240 2.604540 0.3630701 0.1318199 [8,] -5.07852788 2.508673 0.4046922 0.1637758 [9,] -0.94573274 3.459904 NaN -0.1533013 [10,] -0.08765651 3.534636 NaN -0.1782121 [11,] 2.11218764 2.511203 0.4036487 0.1629323 [12,] 2.82742681 2.009539 0.5745901 0.3301538 > > proc.time() user system elapsed 3.350 0.070 3.425
2012-04-06
2012-04-04
Regression - covariate adjustment
Linear regression is one of the key concepts in statistics [wikipedia1, wikipedia2]. However, people are often confuse the meaning of parameters of linear regression - the intercept tells us the average value of y at x=0, while the slope tells us how much change of y can we expect on average when we change x for one unit - exactly the same as in the linear function, though we use averages here due to noise.
Today colleague got confused with the meaning of adjusting covariate (x variable) and the effect of parameter estimates. By shifting the x scale, we also shift the point at which intercept is estimated. I made the following graph to demonstrate this point in the case of nested regression of y on x within a group factor having two levels. R code to produce this plots is shown on bottom.
Regression - covariate adjustment
library(package="MASS") x1 <- mvrnorm(n=100, mu=50, Sigma=20, empirical=TRUE) x2 <- mvrnorm(n=100, mu=70, Sigma=20, empirical=TRUE) mu1 <- mu2 <- 4 b1 <- 0.300 b2 <- 0.250 y1 <- mu1 + b1 * x1 + rnorm(n=100, sd=1) y2 <- mu2 + b2 * x2 + rnorm(n=100, sd=1) x <- c(x1, x2) xK <- x - 60 y <- c(y1, y2) g <- factor(rep(c(1, 2), each=100)) par(mfrow=c(2, 1), pty="m", bty="l") (fit1n <- lm(y ~ g + x + x:g)) ## (Intercept) g2 x g2:x ## 3.06785 2.32448 0.31967 -0.09077 beta <- coef(fit1n) plot(y ~ x, col=c("blue", "red")[g], ylim=c(0, max(y)), xlim=c(0, max(x)), pch=19, cex=0.25) points(x=mean(x1), y=mean(y1), pch=19) points(x=mean(x2), y=mean(y2), pch=19) abline(v=c(mean(x1), mean(x2)), lty=2, col="gray") abline(h=c(mean(y1), mean(y2)), lty=2, col="gray") points(x=0, y=beta["(Intercept)"], pch=19, col="blue") points(x=0, y=beta["(Intercept)"] + beta["g2"], pch=19, col="red") z <- 0:max(x) lines(y= beta["(Intercept)"] + beta["x"] * z , x=z, col="blue") lines(y=(beta["(Intercept)"] + beta["g2"]) + (beta["x"] + beta["g2:x"]) * z, x=z, col="red") (fit2n <- lm(y ~ g + xK + xK:g)) ## (Intercept) g2 xK g2:xK ## 22.24824 -3.12153 0.31967 -0.09077 beta <- coef(fit2n) plot(y ~ x, col=c("blue", "red")[g], ylim=c(0, max(y)), xlim=c(0, max(x)), pch=19, cex=0.25) points(x=mean(x1), y=mean(y1), pch=19) points(x=mean(x2), y=mean(y2), pch=19) abline(v=c(mean(x1), mean(x2)), lty=2, col="gray") abline(h=c(mean(y1), mean(y2)), lty=2, col="gray") abline(v=60, lty=2, col="gray") points(x=60, y=beta["(Intercept)"], pch=19, col="blue") points(x=60, y=beta["(Intercept)"] + beta["g2"], pch=19, col="red") z <- 0:max(x) - 60 lines(y= beta["(Intercept)"] + beta["xK"] * z , x=z + 60, col="blue") lines(y=(beta["(Intercept)"] + beta["g2"]) + (beta["xK"] + beta["g2:xK"]) * z, x=z + 60, col="red")
Subscribe to:
Posts (Atom)