2008-07-29

PROC MIXED vs. lmer()

I am using R for a few years now. I like it a lot. Previously I used SAS (notably its modules SAS/BASE, SAS/STAT, SAS/IML, ...), because that is the statistical package of choice at my department. I had a hard time when I switched to R, mainly because I was thinking in the SAS way. However, I made a switch because I liked Rs' open-source nature and its vibrant community. Up to now I did not want to make any comparisons between them. I simply use R because I like it and get along well with using it. Recently I helped a student to fit a quite big model. She used SAS and PROC MIXED failed due to "out of memory". I was surprised, since SAS is known for its great stability and performance with big datasets. I was almoast sure that function lmer() in lme4 package in R will fail also, but could not resist to try it out. To my surprise, the model was fitted without problems. Perhaps I could tweak SAS to use more memory, but I did not invest time to study that option. Maybe new versions of SAS would perform better.


I used the following code in SAS 8.02:


proc mixed data=podatkiratio;
class genotype litter eage sex type year month herd hys;
model bw = genotype litter eage sex type year month;
random herd hys;
run;
quit;



This is the code for R 2.6.0, lme4 0.99875-9:

lmer(bw ~ genotype + litter + eage + sex + type +
year + month +
(1|herd) + (1|hys),
data=podatki)


And the results, which include also the description of the model size:

Linear mixed-effects model fit by REML
Formula: bw ~ genotype + litter + eage + sex + type + year + month + (1|herd) (1|hys)
Data: podatki
AIC BIC logLik MLdeviance REMLdeviance
152403 152899 -76149 152011 152297
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.313 0.559
hys (Intercept) 0.147 0.383
Residual 0.297 0.545
number of obs: 85035, groups: herd, 346; hys, 9653

Fixed effects:
...
## genotype, 13 levels
## litter, 3 levels
## eage, 9 levels
## sex, 2 levels
## type, 2 levels
## year, 16 levels
## month, 12 levels


I am using SAS and R under Windows XP, running on Intel(R) Core(TM)2 CPU 6300 @ 2x1.86GHz with 2 GB RAM.

11 comments:

Anonymous said...

Gregor,

Thanks for the nice example.

I share your preference for R.

I have used SAS for 25 years. It has marvelous capabilities but in my opinion it is essentially is a work-horse production software. It is good for automating routine analyses that are done many times.

R forces me to think carefully about the assumptions and logic behind an analysis. When I take the time to try an analysis in R (e.g. using lmer) I always learn something new about statistical methodology or concepts. When I use SAS I learn nothing except maybe some new SAS idiosyncracy.

Dave LeBlond
Nonclinical Statistician
Abbott Pharmaceutical
North Chicago, IL, USA

Gregor Gorjanc said...

Dave, I fully agree with all the points you stated. Amen!

Alex said...

Hi Dave,

A plea for help:

I'm trying to fit the very simple SAS model:

proc mixed method=reml;
class plot day
model tracks= ;
random plot day;
run;

Using the R code:


lmer(tracks ~ (1|day)+(1|plot))

#(day and plot specified as.factors and tracks as.numeric)


Which seems to work, but gives quite different variance estimates from SAS. ( That is what I am after, the 3 variance components, day, plot and residual).


Any tips?! Thanks!

Alex.

Alex said...

P.S. You can contact me at alex@diment.org

if you don't want to put it on the blog comment...

Thanks!

Gregor Gorjanc said...

Alex,

I would use exactly the same code as you did. I suggest you verify that the data is the same - compute the overall and group summaries (nobs, mean, sd, ...) in SAS and R.

Doc Bush said...

Hi Alex,

In R using the lmer function, did you specify REML=TRUE? I don't know if it is the default.

Gregor Gorjanc said...

I am sure that it is default in recent versions. Alex did not mention the version of lme4 package.

Alex said...

OK. I'm still struggling with this one after a year!
I've had several people confirm my results in both R and SAS. The code for SAS is as listed above.

The data is here: http://gist.github.com/297797
and R script here: http://gist.github.com/297801

if anyone (Gregor?!) wants a go at solving this mystery...

Gregor Gorjanc said...

In an offline e-mail exchange Douglas Bates reproduced the results and it appeared that the difference in behaviour between SAS and R was in the default staring values, which lead to one or another solution.

Cécile said...

Hi Gregor,

I'm a french PhD student in statistic models, I'm using R for 2 years, and I also share your preference. I've already used mixed models in R, with specific packages for genetic as pedigreemm (wich use lme4), kinship (wich use nlme). So I had compared lme4 and nlme for mixed models in R and I prefer the Douglas Bates' package lme4, more recent, more quick, better documentation ...

But now, I have to traduce a SAS code in R code to try to soluce a problem of convergence, and I don't understand well the SAS code, and how to obtain the same model with R.
So I ask your help :

We have a few data :
Vache = cow
Période = replication
Trait. = traitement
PL = Milk production

In this experiment, only 3 cows for the first replication, and 2 cows for the second replication (so I think we have a problem of unbalanced data ?)

Vache Période Trait. PL
1 7427 1 Témoin 18.25
2 7461 1 Témoin 17.80
3 7528 1 Témoin 16.15
4 7427 1 36h 32.00
5 7461 1 36h 24.45
6 7528 1 36h 22.40
7 7427 2 Témoin 16.90
8 7461 2 Témoin 16.50
10 7427 2 36h 30.60
11 7461 2 36h 32.20


For the variable PL, the SAS code work and give this result :

proc mixed data=...;
class periode vache trait;
model PL = periode trait;
repeated periode / type=cs sub=vache;
lsmeans trait / pdiff;
run;

LSMeans :
Trait 1 17.3694 SE=1.4655 DF=2 t-value=11.85
Trait 2 28.5794 SE=1.4655 DF=2 t-value=19.50

Type 3 test of fixed effect
periode numDF=1 DenDF=1 F-value=1.14 p-value = 0.5
trait numDF=1 DenDF=1 F-value=28.69 p-value = 0.03


So I try different models in lmer() to find the same results, but neither model gives me what I want.

model1=lmer(PL~Trait. +Période + 1|Vache , data=lait)

model2=lmer(PL~ Trait. + Période + Période|Vache , data=lait)

model3=lmer(PL~ Trait. + Période + (1+Période|Vache) + (0+Période|Vache) , data=lait)

model4=lmer(PL~ Trait. + (1+Période|Vache) + (0+Période|Vache) , data=lait)

model5=lmer(PL~ Trait. + Période + (1|Vache) + (0+Période|Vache) , data=lait)


I don't understand how work "repeated" in SAS, I have read the guideline of SAS about this but I don't find an example similar of mine.

I've seen that you know how traduce SAS mixed model in R model, so have you got an idea to soluce my problem?

Thanks for your help and for your very interesting blog.

Best,

Cécile SAUDER
PhD student in statistics
INRA Rennes, France

Gregor Gorjanc said...

To be honest I am also confused with SAS covariance structures regarding repeated data. However, I believe SAS has a richer set of covariance structures. What is the meaning of type=cs? I used to know this:( The key to work your problem out is to get the same covariance structures in both programs. Compare what kind of covariance estimates (structure) you get in SAS and in R.