Theoretically, yes. Practically, I think that’s a “sufficiently smart compiler” class of problems, insanely hard to solve. Especially given that WASM is a JIT compiler, it simply doesn’t have time for expensive optimizations.
Integer SIMD is weird on AMD64. Even state of the art C++ compilers fail to emit optimal code for rather simple use cases. A trivial example is computing sum of bytes: I’m yet to see a compiler which would optimize that code into _mm[256]_sad_epu8 / _mm[256]_add_epi64 instructions.
> Theoretically, yes. Practically, I think that’s a “sufficiently smart compiler” class of problems, insanely hard to solve. Especially given that WASM is a JIT compiler, it simply doesn’t have time for expensive optimizations.
Detecting every way of doing a 32-bit multiply with a 64-bit mul operator is impossible, yes. But there only needs to be one way of doing it that the compilers knows about, and then people can use that idiom.
It's not pretty, but it works. Compare the common scalar int rotate: x86 can do it in one instruction, but C doesn't have an operator for it. The way to do it in C is to use an idiom that optimizers are known to recognize[1].
> there only needs to be one way of doing it that the compilers knows about
Too much magic to my taste. If compiler will be doing that anyway, why not expose an intrinsic we can use? The SSE instruction in question is rather efficient to emulate on NEON, only takes two instructions, vmovn_u64 and vmull_u32.
It’s the same about scalar code. When I need to rotate an integer, I normally use intrinsics instead of relying on the compiler to optimize the code. Recently, C++ language even added these things in their standard library, <bit> header in C++/20.
IMO, relying on such compiler optimization is fragile in the long run, for 2 reasons.
1. These are undocumented implementation details. Compiler developers don’t make any guarantees they will continue to support these things in exactly the same way.
2. Most real-life software is developed by multiple people. It’s too easy for developers to neglect comments, and slightly change the code in a way which no longer has a shortcut in the compiler.
Theoretically, yes. Practically, I think that’s a “sufficiently smart compiler” class of problems, insanely hard to solve. Especially given that WASM is a JIT compiler, it simply doesn’t have time for expensive optimizations.
Integer SIMD is weird on AMD64. Even state of the art C++ compilers fail to emit optimal code for rather simple use cases. A trivial example is computing sum of bytes: I’m yet to see a compiler which would optimize that code into _mm[256]_sad_epu8 / _mm[256]_add_epi64 instructions.