Best method found: Bit group moving (about 33 cycles on superscalar processors):
x = (x & 0x00000080) | ((x & 0x000a1012) << 1) | ((x & 0x21000000) << 2) | ((x & 0x00100000) << 5) | rol(x & 0x08050001, 6) | ((x & 0x00008040) << 8) | rol(x & 0x40000200, 10) | ((x & 0x02000000) >> 21) | rol(x & 0x10000004, 14) | ((x & 0x04800000) >> 11) | rol(x & 0x00000520, 22) | ((x & 0x00000800) >> 8) | ((x & 0x00000008) << 25) | ((x & 0x00004000) >> 5) | ((x & 0x00200000) >> 4) | ((x & 0x80002000) >> 2) | ((x & 0x00400000) >> 1);
See documentation to
pext and pdep can be emulated with compress_right and expand_right.
This result is not necessarily the best possible, but at least several methods have been challenged.
See also some notes on the inner workings.
There is an even better calculator calcperm.* which is usable for various word sizes (Pascal and C++ sources).
Error reports, comments or questions? E-mail: firstname.lastname@example.org