Perhaps that's because I'm a programmer, and Python is a general purpose programming language.
Exactly. You shouldn't have to be a programmer to do statistics. Just like you shouldn't have to be a network engineer to share files. What if DropBox had stuff in there about http, ports, levels of service, bandwidth, etc... You'd probably say, "Great! I always wnated to specify that DropBox use SSL4.7 draft B over CDMA EvoX.1 -- who wouldn't?"
When you're doing a DSL make it is as simple as possible. And if you have time, in v2, give it hooks to just break out and do crazy stuff... but the 90% case should be simple as pi.
There are GUI statistics apps for people who just want the common case, Dropbox-style: packages like Weka for data mining / predictive statistics, SPSS for descriptive statistics, and a dozen other such things.
The statisticians who choose to use a programming language like R or Python typically do it because they actually do want a programming language. I mean, that's why Bell Labs statisticians invented S (R's predecessor) to begin with.
I am a statistician that does both research and applied work.
I use R for three reasons: (1) It's Free Software; (2) It's a programming language; (3) Other statisticians use it so it's easier for me to collaborate.
There are the usual supporting arguments for (1). (2), I've only used SAS a little bit, and it was extremely unpleasant to use it for non-built-in stuff, which makes research harder for no good reason. For (3), I have nothing against Python but most other statisticians don't use it. If I want to share my work in R, it's easy (statisticians know how to install R packages). If I want to share my work in Python, I first have to teach [most] other statisticians how to use Python. There's nothing wrong with that, but why raise the start-up cost for them?
tl;dr I conjecture that most statisticians don't want what the author is suggesting. Also, there are plenty of companies that are trying to do what the author is asking for, but most of them seem to miss the desired sweet spot, or charge lots of money, or both. I haven't taken a survey of the available software in quite some time.
No, but I can give some suggestions. It would help to know what you want to do.
First of all, you need to decide if you want a language reference, or an application guide, as R books fall into those two categories.
If you have a specific type of work in mind (bio-informatics, data mining, data visualization, ...) I'd say to find a book that focuses on that topic. I haven't looked in a while, but I haven't seen a general R book that I like, anything I suggest there would be guessing on my part.
There are plenty of good references on the web. I'd start by looking at the material available from the R web site:
R's core manuals [1] are typically correct and reasonable to use. The "Introduction to R" guide will get you up to speed fairly well if you already know another programming language. There is also the contributed documentation [2]. I haven't gone through these, so I can't say much about them, or promise that they are up-to-date. I suspect not, as R develops rapidly. The one reference I can recommend highly is "The R Inferno" by Patrick Burns [3]. This is not a starter guide, but something you read after one. It gives excellent advice on avoiding common pitfalls in R.
Thanks. I do biology with limited amount of data and my needs are very basic. Here is a software I wrote to do sleep analysis in Drosophila: http://www.pysolo.net
So far I could satisfy most of my statistics needs with the function in numpy and scipy but occasionally I need to do something slightly more fancy and R I guess is the way to go.
Possibly. R is really great at doing "fancy" statistical analyses. It's very lousy at doing things like text manipulation. When I have a project that needs some text manipulation on the front end, I frequently use other tools (Python, vi, sed, ...) on the front end to beat text data into a nicer form for R. I couldn't say without knowing more about your project.
I always seem to come back to "Introductory Statistics With R".[1] It gives a lot of examples of how to do "the day-to-day stuff". Also, since, as the title suggests, the statistical contents are mostly (very) introductory in nature, it's really easy for me as a reader to decipher what's going on in each example- it's easy to tell which parts are specific to the example itself and which parts are generic to R, if that makes any sense.
Right. I wasn't saying that there didn't exist such packages, of course there are. I was pointing out that the reason a programming language looks good to a programmer and not a statistician is due domain expertise. And of course the common trap programmers fall into is assuming the domain is programming.
And don't lump R in with Python. And good statistician would have your neck. You mention S, but again S doesn't look anything like Python either.
I only see him "lumping R in with Python" in that they're both full-blown programming languages and TFAA apparently hates them both because they're programming languages.
_delirium is merely pointing out that there are push-button packages for statistics, and that statisticians using programming languages (be they statistics-oriented or not) usually do so because they want to or because they need to (as the push-button stuff is not sufficient for their needs, for instance)
I'm pretty sure that Python makes a lot more sense to mathematicians than the special purpose syntax of SAS. List comprehensions: mathematicians use set comprehensions all the time. First class functions: same.
If you just need graphs and pivot tables, use some GUI tool.
That looks close to as simple as possible, if you assume Python is to be used. My point about R was that even in a language designed for statistics, I saw dependence on common programming concepts.
Exactly. You shouldn't have to be a programmer to do statistics. Just like you shouldn't have to be a network engineer to share files. What if DropBox had stuff in there about http, ports, levels of service, bandwidth, etc... You'd probably say, "Great! I always wnated to specify that DropBox use SSL4.7 draft B over CDMA EvoX.1 -- who wouldn't?"
When you're doing a DSL make it is as simple as possible. And if you have time, in v2, give it hooks to just break out and do crazy stuff... but the 90% case should be simple as pi.