Linear regression is one of the key concepts in statistics [wikipedia1, wikipedia2]. However, people are often confuse the meaning of parameters of linear regression - the intercept tells us the average value of y at x=0, while the slope tells us how much change of y can we expect on average when we change x for one unit - exactly the same as in the linear function, though we use averages here due to noise.
Today colleague got confused with the meaning of adjusting covariate (x variable) and the effect of parameter estimates. By shifting the x scale, we also shift the point at which intercept is estimated. I made the following graph to demonstrate this point in the case of nested regression of y on x within a group factor having two levels. R code to produce this plots is shown on bottom.
Regression - covariate adjustment
library(package="MASS") x1 <- mvrnorm(n=100, mu=50, Sigma=20, empirical=TRUE) x2 <- mvrnorm(n=100, mu=70, Sigma=20, empirical=TRUE) mu1 <- mu2 <- 4 b1 <- 0.300 b2 <- 0.250 y1 <- mu1 + b1 * x1 + rnorm(n=100, sd=1) y2 <- mu2 + b2 * x2 + rnorm(n=100, sd=1) x <- c(x1, x2) xK <- x - 60 y <- c(y1, y2) g <- factor(rep(c(1, 2), each=100)) par(mfrow=c(2, 1), pty="m", bty="l") (fit1n <- lm(y ~ g + x + x:g)) ## (Intercept) g2 x g2:x ## 3.06785 2.32448 0.31967 -0.09077 beta <- coef(fit1n) plot(y ~ x, col=c("blue", "red")[g], ylim=c(0, max(y)), xlim=c(0, max(x)), pch=19, cex=0.25) points(x=mean(x1), y=mean(y1), pch=19) points(x=mean(x2), y=mean(y2), pch=19) abline(v=c(mean(x1), mean(x2)), lty=2, col="gray") abline(h=c(mean(y1), mean(y2)), lty=2, col="gray") points(x=0, y=beta["(Intercept)"], pch=19, col="blue") points(x=0, y=beta["(Intercept)"] + beta["g2"], pch=19, col="red") z <- 0:max(x) lines(y= beta["(Intercept)"] + beta["x"] * z , x=z, col="blue") lines(y=(beta["(Intercept)"] + beta["g2"]) + (beta["x"] + beta["g2:x"]) * z, x=z, col="red") (fit2n <- lm(y ~ g + xK + xK:g)) ## (Intercept) g2 xK g2:xK ## 22.24824 -3.12153 0.31967 -0.09077 beta <- coef(fit2n) plot(y ~ x, col=c("blue", "red")[g], ylim=c(0, max(y)), xlim=c(0, max(x)), pch=19, cex=0.25) points(x=mean(x1), y=mean(y1), pch=19) points(x=mean(x2), y=mean(y2), pch=19) abline(v=c(mean(x1), mean(x2)), lty=2, col="gray") abline(h=c(mean(y1), mean(y2)), lty=2, col="gray") abline(v=60, lty=2, col="gray") points(x=60, y=beta["(Intercept)"], pch=19, col="blue") points(x=60, y=beta["(Intercept)"] + beta["g2"], pch=19, col="red") z <- 0:max(x) - 60 lines(y= beta["(Intercept)"] + beta["xK"] * z , x=z + 60, col="blue") lines(y=(beta["(Intercept)"] + beta["g2"]) + (beta["xK"] + beta["g2:xK"]) * z, x=z + 60, col="red")
5 comments:
Thanks for the post. Correct me if I'm wrong but are with the second model the intercept is 22 yet the lines still cross x=0 below y=5? I haven't had a chance to read the code in full but wondered if I've missed something.
Thanks, are your graphs wrong? The intercept for the second model is 22 yet the lines cross at roughly y=3 still? Haven't had a chance to fully read your code so might be missing something.
Thanks, are your graphs wrong? The intercept for the second model is 22 yet the lines cross at roughly y=3 still? Haven't had a chance to fully read your code so might be missing something.
Intercept is a value of y where x = 0. Since I shifted x by 60 xK=(x-60 we get larger intercept value at xK = 0 which is actually x = 60.
thank you very much boru
Post a Comment