Most programs aren't fizzbuzz. This is basically hand optimizing a microbenchmark. While i'm sure compilers can always be better i'm not sure this particular example generalizes.
But hand optimizing a few microbenchmarks like this tells us how far we might be able to improve compilers.
And from the looks of it, at least 100x is possible. It's a shame therefore that even a 1% performance increase in a compiler release is considered pretty noteworthy.
This is a very specific problem that lends itself to some fantastically cpu-friendly optimizations. Doing a single hash-map lookup here on the hot path would kill performance, and let's not even get started on any serious I/O (disk or network) as that would be multiple orders of magnitude slower. Not many problems are just "cat something extremely predictable into /dev/null".
And so what? That's what this code does, and so a compiler should be able to generate the optimal assembly. Why would a compiler need to consider something that the code in question is not doing?
How would the compiler know that? This code for example is only optimal if you are piping it somewhere instead of displaying on the terminal. There is no way, even in principle, for the compiler to know that this program is always run with stdout connected to a pipe.
Moreover, if you mean optimal in the absolute sense, pretty sure that is equivalent to solving the halting problem.
If what your asking is: why can't computers take all surounding fuzzy context into account and do precisely the right thing, the answer is: because we haven't invented strong AI yet.
Ah, sorry, I misunderstood you. I thought you were placing the emphasis on what the code was doing, rather than where it was being piped. But why do you think that most of the possible optimisations would only be relevant if the output were being piped to somewhere other than a terminal?
Also, why would making it absolutely optimal be "equivalent to solving the halting problem"? (Though either way, no, I'm not talking about producing 100.0000% optimal code - clearly there's plenty of daylight between the current degree of optimality and that one.)
> Ah, sorry, I misunderstood you. I thought you were placing the emphasis on what the code was doing, rather than where it was being piped. But why do you think that most of the possible optimisations would only be relevant if the output were being piped to somewhere other than a terminal?
One of the primary optimizations is using vm_splice to avoid memory copies. vm_splice only works if the file descriptor is a pipe.
> Also, why would making it absolutely optimal be "equivalent to solving the halting problem"?
Or more specificly, if a program always halts (without outputting things) then the best optimization would be to have it halt immediately but to know that you have to have solved the halting problem.
You're right of course, absolutes are a bit of a distraction we just need something good for 99% of programs. However that's still really hard, and an optimization applied in the wrong context will slow down a program. There's still lots of room for compilers to grow, but the types of optimizations in use in this example are at best probably NP-complete to determine if they apply (i dont have any evidence for that though)
Ah, thank you for a really interesting reply. I appreciate it.
> One of the primary optimizations is using vm_splice to avoid memory copies. vm_splice only works if the file descriptor is a pipe.
Thanks - I hadn't actually read the code in question. I now realise I should have. That's a totally fair point.
That being said, in the general case, there are clearly far more optimisations to be made. Do you think GCC-from-C is really theoretically limited to producing assembly code which is two thirds as performant as handwritten assembly?
Ahhh, I getchu. Yes, that technically means that a compiler can't produce 100% optimal code in 100% of cases. However, it doesn't mean it can't produce 100% optimal code in 99.99999% of cases, or 99.99999% optimal code in 100% of cases, as you say yourself.
This is more of an academic issue about formal properties of a compiler, rather than a practical issue with the optimisations it could make in the overwhelming majority of real-world cases.
That being said, I'm very sympathetic to your point. The (mostly toy) compilers I've written have been extremely weak in the optimisation department. I have a lot of respect for people who are able to do that, and a lot of respect for the challenge it poses.
(I do hope that the growth of big data, and developers' increasing indifference to privacy, might have the salutary effect that compiler authors gain access to more information about the programs their compilers are compiling, out in the real world ... which, at least going by my own experience, would be invaluable in making decisions about particular optimisations. Sadly 99% of optimisations are not peephole optimisations.)
Yeah you could optimize GCC to work really well for those problems, there's a great ROI for the "write something trivial really quickly into /dev/null" class of problems.
Let us know how this goes if you embark on this adventure.
If you genuinely understood me as saying that, then I should clarify, to be absolutely foolproof:
That is not what I'm saying. I'm talking about having a compiler which could produce more optimal code in any given case, or most given cases. I obviously am not suggesting that compilers optimise this and only this case.
> I'm talking about having a compiler which could produce more optimal code in any given case, or most given cases
Funny coincidence, that's exactly what the vast majority of compiler code is for, with tens (or hundreds) of thousands of man-years of work spent on that.
My point, again, is about having a compiler that can produce more optimal code than existing compilers (and whether there's a theoretical upper bound preventing them from more closely approaching the handwritten assembly in this case).
I'm not quite sure what else you could have thought the word 'more' was referring to?
But nearly all problems boil down to nearly zero actual work...
Take booting up a computer for example. All that actually needs to be done is to display the login screen on the screen. And we know from 60 fps video playback, that the hardware is capable of displaying a frame inside 16 milliseconds.
You can create an operating system that just displays a static image within milliseconds yourself (shouldn't take you more than a few weeks), it would work even when compiled at -O0.