Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My recollection, assuming it's the same machine I'm thinking of, is that it wasn't reserved for our team; rather, we left a do-nothing job permanently allocated to it, in order to prevent some poor other sucker from getting their job scheduled on it. (Because we, through painful experience, were well aware the machine had hardware problems; but we had long since given up on convincing the responsible parties to take it out of the pool, since it passed all their internal tests every time we complained. I don't remember how long this situation existed before someone finally took it out back and shot it.)

Could be a different incident and a different machine, though. I'm sure this story happened more than once.



Maybe a different machine? I meant that it was not in one of the general-purpose clusters: the entire pool was dedicated and a random team couldn't request Borg quota in it. For years, though, half of the Oregon datacenter was special for one reason or another.

The infamous machine did go through repairs and part swaps many times, as you could see from its long and troubled hwops history.

The worst machines were the zombies with NICs bad enough to break Stubby RPCs, but still passing heartbeat checks. Or breaking connections only when (re)using specific ports. Fun times!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: