Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not really profound, but a good, quick way to think about clock cycles: A 1 MHz machine executes 1 clock cycle in 1 microsecond (ie. 1 MHz is 1,000,000 Hz, and there are 1 million microseconds per second). So a reasonable machine, say 2 GHz, executes 2000 cycles per microsecond. So when the article says 124 clock cycles to execute a shift, that's really saying that on a 2GHz machine, the instruction will take about 124/2000 microseconds - which means that 16 shifts or so will eat up an entire microsecond of CPU time. In low latency environments, that's bad. (Consider - a top tier high frequency trading firm does about 20 microseconds "round the diamond" - the time from receiving a market snapshot to the time that a decision is made and a trade is executed and put on the wire. Granted, they're not on 2GHz machines, but these things add up).

Also interesting, light travels about 1000 ft in a microsecond.



It is more complicated than this one modern processors. The relationship between latency and throughput is more complicated. An instruction may require 6 clock cycles (latency) to complete but if you can issue 3 instructions per cycle then the throughput is 2 clock cycles per operation.

The Intel i7 has 6 operation execution ports per core, 3 of which can be used for basic integer operations. Shifts execute in a single clock cycle, but since there are 3 ALUs it is possible to execute 3 shifts per clock cycle.

There are a class of low-latency algorithm optimizations based on making sure that all operation ports are being used nearly every clock cycle. It is a special type of fixed, low-level parallelism that most software does not do very well by default. I've seen intentional and careful ALU saturation generate 2x performance gains for algorithms. The resulting code often does not look very different since it is about slightly reorganizing highly localized dependencies. (This is how hyperthreading works; unused execution ports in a clock cycle are given to a second thread to use.)


I think the clocks cycles referred to a very old chip which definitely cannot run at 2GHz. I believe modern processors would do a shift in a single cycle.

Modern compilers will replace multiplies by constant factors of 2 with shifts, making your code cleaner and more readable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: