2008-09-09

Excel 2007 for statistics?

Microsoft Excel is probably one of the most used software for performing various "calculations". I agree that spreadsheets are very usefull and for some tasks MS Excel just excels. However, MS Excel has problems when it somes to statistics. There have been quite some reports about the inaccuraccy of statistical routines. For the 2007 version, McCullough and Heiser (2008)
wrote this in the abstract


Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests in three areas: statistical distributions, random number generation, and estimation. Additional errors in specific Excel procedures are discussed. Microsoft’s continuing inability to correctly fix errors is discussed. No statistical procedure in Excel should be used until Microsoft documents that the procedure is correct; it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.


MS Excel is not the only spredsheet software. Yalta (2008) compared MS Excel with some others and wrote this in the abstract:

We provide an assessment of the statistical distributions in Microsoft® Excel versions 97 through 2007 along with two competing spreadsheet programs, namely Gnumeric 1.7.11 and OpenOffice.org Calc 2.3.0. We find that the accuracy of various statistical functions in Excel 2007 range from unacceptably bad to acceptable but significantly inferior in comparison to alternative implementations. In particular, for the binomial, Poisson, inverse standard normal, inverse beta, inverse student’s t, and inverse F distributions, it is possible to obtain results with zero accurate digits as shown with numerical examples.


McCullough (2008) contrubuted also this:

Microsoft attempted to implement the Wichmann–Hill RNG in Excel 2003 and failed; it did not just produce numbers between zero and unity, it would also produce negative numbers. Microsoft issued a patch that allegedly fixed the problem so that the patched Excel 2003 and Excel 2007 now implement the Wichmann–Hill RNG, as least according to Microsoft. We show that whatever RNG it is that Microsoft has implemented in these versions of Excel, it is not the Wichmann–Hill RNG. Microsoft has now failed twice to implement the dozen lines of code that define the Wichmann–Hill RNG.


Additionally, the graphs made in MS Excel are not really good ones. It depends a lot on the user to make a good graph. I have seen many so called chartjunk commming from Excel. Of course the blame is on users, but software should help us make things easier and not harder! Su (2008) wrote this in the abstract

The purpose of default settings in a graphic tool is to make it easy to produce good graphics that accord with the principles of statistical graphics... If the defaults do not embody these principles, then the only way to produce good graphics is to be sufficiently familiar with the principles of statistical graphics. This paper shows that Excel graphics defaults do not embody the appropriate principles. Users who want to use Excel are advised to know the principles of good graphics well enough so that they can choose the appropriate options to override the defaults. Microsoft® should overhaul the Excel graphics engine so that its defaults embody the principles of statistical graphics and make it easy for non-experts to produce good graphs.


After all this anti-Microsoft abstracts I really, really start to wonder what we get for the money we spend for the software.

No comments: