*Perhaps that's because I'm a programmer, and Python is a general purpose progra...

_delirium · on Feb 22, 2011

There are GUI statistics apps for people who just want the common case, Dropbox-style: packages like Weka for data mining / predictive statistics, SPSS for descriptive statistics, and a dozen other such things.

The statisticians who choose to use a programming language like R or Python typically do it because they actually do want a programming language. I mean, that's why Bell Labs statisticians invented S (R's predecessor) to begin with.

ltjohnson · on Feb 22, 2011

I am a statistician that does both research and applied work.

I use R for three reasons: (1) It's Free Software; (2) It's a programming language; (3) Other statisticians use it so it's easier for me to collaborate.

There are the usual supporting arguments for (1). (2), I've only used SAS a little bit, and it was extremely unpleasant to use it for non-built-in stuff, which makes research harder for no good reason. For (3), I have nothing against Python but most other statisticians don't use it. If I want to share my work in R, it's easy (statisticians know how to install R packages). If I want to share my work in Python, I first have to teach [most] other statisticians how to use Python. There's nothing wrong with that, but why raise the start-up cost for them?

tl;dr I conjecture that most statisticians don't want what the author is suggesting. Also, there are plenty of companies that are trying to do what the author is asking for, but most of them seem to miss the desired sweet spot, or charge lots of money, or both. I haven't taken a survey of the available software in quite some time.

crocowhile · on Feb 22, 2011

can you reccomend a book to get started with r?

ltjohnson · on Feb 22, 2011

No, but I can give some suggestions. It would help to know what you want to do.

First of all, you need to decide if you want a language reference, or an application guide, as R books fall into those two categories.

If you have a specific type of work in mind (bio-informatics, data mining, data visualization, ...) I'd say to find a book that focuses on that topic. I haven't looked in a while, but I haven't seen a general R book that I like, anything I suggest there would be guessing on my part.

There are plenty of good references on the web. I'd start by looking at the material available from the R web site:

R's core manuals [1] are typically correct and reasonable to use. The "Introduction to R" guide will get you up to speed fairly well if you already know another programming language. There is also the contributed documentation [2]. I haven't gone through these, so I can't say much about them, or promise that they are up-to-date. I suspect not, as R develops rapidly. The one reference I can recommend highly is "The R Inferno" by Patrick Burns [3]. This is not a starter guide, but something you read after one. It gives excellent advice on avoiding common pitfalls in R.

[1] http://cran.r-project.org/manuals.html

[2] http://cran.r-project.org/other-docs.html

[3] http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

crocowhile · on Feb 22, 2011

Thanks. I do biology with limited amount of data and my needs are very basic. Here is a software I wrote to do sleep analysis in Drosophila: http://www.pysolo.net

So far I could satisfy most of my statistics needs with the function in numpy and scipy but occasionally I need to do something slightly more fancy and R I guess is the way to go.

ltjohnson · on Feb 23, 2011

Possibly. R is really great at doing "fancy" statistical analyses. It's very lousy at doing things like text manipulation. When I have a project that needs some text manipulation on the front end, I frequently use other tools (Python, vi, sed, ...) on the front end to beat text data into a nicer form for R. I couldn't say without knowing more about your project.

stevenbedrick · on Feb 22, 2011

I always seem to come back to "Introductory Statistics With R".[1] It gives a lot of examples of how to do "the day-to-day stuff". Also, since, as the title suggests, the statistical contents are mostly (very) introductory in nature, it's really easy for me as a reader to decipher what's going on in each example- it's easy to tell which parts are specific to the example itself and which parts are generic to R, if that makes any sense.

[1] http://www.powells.com/biblio/65-9780387790534-0

brianto2010 · on Feb 22, 2011

Here's a site I always go to for reference: http://statmethods.net/

bennylope · on Feb 22, 2011

If you really want a book I would recommend "Data Analysis and Graphics Using R" by Maindonald and Braun, http://books.google.com/books?id=d7OeVD6SKBsC.

kenjackson · on Feb 22, 2011

Right. I wasn't saying that there didn't exist such packages, of course there are. I was pointing out that the reason a programming language looks good to a programmer and not a statistician is due domain expertise. And of course the common trap programmers fall into is assuming the domain is programming.

And don't lump R in with Python. And good statistician would have your neck. You mention S, but again S doesn't look anything like Python either.

masklinn · on Feb 22, 2011

I only see him "lumping R in with Python" in that they're both full-blown programming languages and TFAA apparently hates them both because they're programming languages.

_delirium is merely pointing out that there are push-button packages for statistics, and that statisticians using programming languages (be they statistics-oriented or not) usually do so because they want to or because they need to (as the push-button stuff is not sufficient for their needs, for instance)

jules · on Feb 22, 2011

I'm pretty sure that Python makes a lot more sense to mathematicians than the special purpose syntax of SAS. List comprehensions: mathematicians use set comprehensions all the time. First class functions: same.

If you just need graphs and pivot tables, use some GUI tool.

scott_s · on Feb 22, 2011

That looks close to as simple as possible, if you assume Python is to be used. My point about R was that even in a language designed for statistics, I saw dependence on common programming concepts.

zzleeper · on Feb 22, 2011

It reminds me of Stata, which is has a fairly better syntax than SAS.

(Eg: the weird SAS code gets replaced by something as simple as tabulate x y)