Wade not in unknown C++ waters. Shift operators .

Jabbles · on Aug 18, 2012

This article makes little mention of the fact that shifting a positive, signed integer can result in undefined behaviour if said shift "would" cause the "sign-bit" to be modified. I find that behaviour far more alarming than shifting negative numbers.

An interesting paper on a related subject:

https://wiki.engr.illinois.edu/download/attachments/15820390...

To entice you to read it, a quote:

To ﬁnd the time bombs, we altered IOC’s overﬂow handler to return a random value from any integer operation whose behavior is undeﬁned by the C or C++ standard. This creates a high probability that the application will break in an observable way if its execution actually depends on the results of an undeﬁned operation. Perhaps amusingly, when operating in this mode, IOC is still a standards-conforming C or C++ compiler—the standard places no requirements on what happens to a program following the execution of an operation with undeﬁned behavior.

pmjordan · on Aug 18, 2012

C and its descendants have some really, really scary integer overflow/underflow and type promotion behaviour. As a lot of my code lately runs in the kernel, I'm now extremely paranoid about overflow/underflow bugs, and it's amazing how much innocuous-looking code can be a security issue. I'm amazed that even the newer languages like Go don't really tackle this problem. Signed integer overflow is at least not left undefined, but wrap-around semantics are just as good at hiding bugs.

I don't know what exactly the best solution is, but I'd probably default to arbitrary-precision integers which have a highly efficient implementation for native word-sized values and a safe, non-overflowing fallback that can still be made to be pretty fast. Repeated growth checks could be eliminated in many places by some basic static analysis. If you find your code would be faster without the safety, you can always selectively switch to unsafe semantics.

Basically, I think the current situation of overflow-by-default is a premature optimisation that's causing far too many bugs.

Jabbles · on Aug 18, 2012

I don't have any data on the performance costs of using arbitrary precision integers in all code, but I'd like to see that if you know of any studies. I would have thought the overhead could be quite significant as you would have to keep a length variable for every single integer and check it every function call...

The use of 64 bit integers may significantly reduce the possible places for wrap-around to occur, but I think we (C,C++,Go etc. developers) have to develop a sense of code smell for overflow situations, so that "innocuous-looking code" stands out. That or static analysis.

pmjordan · on Aug 19, 2012

64-bit integers don't really solve the bugs, they just make them slightly less probable. They basically don't help against malicious input in the general case.

The most famous example for this kind of bug is probably the binary search implementation in Java:

http://googleresearch.blogspot.com/2006/06/extra-extra-read-...

The interesting thing about this bug and many like it, is that you don't need real arbitrary-precision arithmetic to avoid it. The result of the expression (a + b) / 2 can be proven to safely fit into 32 bits if a and b are known to fit into 32 bits as well - the compiler just needs to make the result of the intermediate expression (a + b) be stored 33 bits, or to transform the it into a mathematically equivalent expression that has the same properties as the original expression with a 33-bit intermediate.

Of course, this only can be applied to a subset of algorithms - I suspect this quickly devolves into the halting problem in general. But maybe new languages could specify that such halting problem cases need to be manually determined (explicitly choose between arbitrary precision arithmetic and truncation). The simpler cases can be deduced automatically. I guess that means in addition to uint32_t and uint64_t we'll probably need more specific types, such as "positive (nonzero) integer in the range 1 to 1 << 32". Which would be tedious if you had to specify it explicitly all the time, so you'd need good inference and solid support for generic functions.

Dylan16807 · on Aug 18, 2012

While people like to talk about the 'sign bit' I think it's easier to remember as part of the generic "don't overflow integers" rule. Which is how the part of the standard quoted in the article words it, in terms of what shifted numbers are representable.

mdkess · on Aug 18, 2012

Not really profound, but a good, quick way to think about clock cycles: A 1 MHz machine executes 1 clock cycle in 1 microsecond (ie. 1 MHz is 1,000,000 Hz, and there are 1 million microseconds per second). So a reasonable machine, say 2 GHz, executes 2000 cycles per microsecond. So when the article says 124 clock cycles to execute a shift, that's really saying that on a 2GHz machine, the instruction will take about 124/2000 microseconds - which means that 16 shifts or so will eat up an entire microsecond of CPU time. In low latency environments, that's bad. (Consider - a top tier high frequency trading firm does about 20 microseconds "round the diamond" - the time from receiving a market snapshot to the time that a decision is made and a trade is executed and put on the wire. Granted, they're not on 2GHz machines, but these things add up).

Also interesting, light travels about 1000 ft in a microsecond.

jandrewrogers · on Aug 18, 2012

It is more complicated than this one modern processors. The relationship between latency and throughput is more complicated. An instruction may require 6 clock cycles (latency) to complete but if you can issue 3 instructions per cycle then the throughput is 2 clock cycles per operation.

The Intel i7 has 6 operation execution ports per core, 3 of which can be used for basic integer operations. Shifts execute in a single clock cycle, but since there are 3 ALUs it is possible to execute 3 shifts per clock cycle.

There are a class of low-latency algorithm optimizations based on making sure that all operation ports are being used nearly every clock cycle. It is a special type of fixed, low-level parallelism that most software does not do very well by default. I've seen intentional and careful ALU saturation generate 2x performance gains for algorithms. The resulting code often does not look very different since it is about slightly reorganizing highly localized dependencies. (This is how hyperthreading works; unused execution ports in a clock cycle are given to a second thread to use.)

Jabbles · on Aug 18, 2012

I think the clocks cycles referred to a very old chip which definitely cannot run at 2GHz. I believe modern processors would do a shift in a single cycle.

Modern compilers will replace multiplies by constant factors of 2 with shifts, making your code cleaner and more readable.