That would make sense, if Heroku didn't know which were ruby requests and which weren't. But it seems like they did. If only as a set of VIPs (virtual IPs) landing on the router being tagged as 'for ruby' or 'not ruby' which could pick the appropriate routing algorithm.
Understand that keeping millisecond accurate state on 10 machines is doable, on 100 machines its hard, and on a few thousand machines? It really starts to break down. One way I've seen that done is that on ingress a request is wrapped in a message for server X which is taken out of the 'free' pool, and then when server X returns the answer back through the router it gets added back into the free pool. But the next order effect is that list insertion / removal has different sorts of behaviors, if you shift frees into the end of the list and pop them from the front (a round robin approach) you get good distribution but sometimes send things 'far' away when they could be served locally. If you push/pop things from the front you get some really hot servers and some really cold servers. Early on Google played some games which were designed to maximize the use of available network backbone bandwidth (its always oversubscribed from the server to the 'net'). Like any of the more interesting problems it starts off easy and then gets harder and harder.
Understand that keeping millisecond accurate state on 10 machines is doable, on 100 machines its hard, and on a few thousand machines? It really starts to break down. One way I've seen that done is that on ingress a request is wrapped in a message for server X which is taken out of the 'free' pool, and then when server X returns the answer back through the router it gets added back into the free pool. But the next order effect is that list insertion / removal has different sorts of behaviors, if you shift frees into the end of the list and pop them from the front (a round robin approach) you get good distribution but sometimes send things 'far' away when they could be served locally. If you push/pop things from the front you get some really hot servers and some really cold servers. Early on Google played some games which were designed to maximize the use of available network backbone bandwidth (its always oversubscribed from the server to the 'net'). Like any of the more interesting problems it starts off easy and then gets harder and harder.