Comments or questions? E-mail: info@sirrida.de
This is the description of several bit permutation and other miscellaneous routines as they appear in perm_bas.* which are used by the test programs testperm.* and the permutation code generator in source, calcperm.pas resp. calcperm.cpp.
Is the given number odd? This is not a SWAR operation. This function is intrinsic in Pascal.
Parameters:
Return true for an odd number and false for an even number.
Characterization: Auxiliary function.
Greatest common divisor. This is not a SWAR operation.
Parameters:
Calculate the greatest common divisor of a and b.
Characterization: Auxiliary function.
Calculate multiplicative inverse. This is not a SWAR operation.
Parameters:
Calculate multiplicative inverse, i.e. mul_inv(x)*x==1. The result is defined for all odd values of x. For even x we simply return 0.
Characterization: Auxiliary function.
Rotate left a complete word.
Parameters:
Rotate x rot times to the left.
This should be optimized by the compiler to a single assembly instruction if inlined.
Characterization:
Auxiliary function.
Related functions:
rol_lo,
bswap.
Rotate left a single subword. This is not a SWAR operation.
Parameters:
Rotate x rot times to the left.
Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.
Characterization:
Auxiliary function.
Related functions:
rol,
rolc_lo.
Rotate left and complement a single subword. This is not a SWAR operation.
Parameters:
Rotate x rot times to the left and complement each bit as it is rotated. In other words: Each bit that falls out is inverted and re-inserted on the other end.
Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.
Characterization:
Auxiliary function.
Related functions:
rol_lo,
frolc.
Convert a number to Gray code. This is not a SWAR operation.
Parameters:
Convert a number to Gray code. This is the inverse of inverse_gray_code.
Characterization:
Auxiliary function.
Related functions:
inverse_gray_code.
Convert a number to inverse Gray code. This is not a SWAR operation.
Parameters:
Convert a number to the inverse Gray code. This is the inverse of gray_code.
Characterization:
Auxiliary function.
Related functions:
gray_code.
Count the set bits. Also known as population count.
Parameters:
Newer x86 processors know this operation as
POPCNT.
A number x has even/odd parity if
nr_1bits(x)
is even/odd.
Characterization: Auxiliary function.
This function counts the leading zero bits of x, i.e. the length of the contiguous string of zero bits from the most significant toward the least significant bit.
Parameters:
The result is bits if x is zero.
Newer x86 processors know this operation as
LZCNT (which has a dedicated CPUID feature flag).
It can easily be emulated with the older instruction BSR and some glue code.
Characterization:
Auxiliary function.
Related functions:
nr_trailing_0bits.
This function counts the trailing zero bits of x, i.e. the length of the contiguous string of zero bits from the least significant toward the most significant bit.
Parameters:
The result is bits if x is zero.
This operation is part of the BMI1 instruction set
and named TZCNT.
The older BSF instruction almost does the same
(albeit with an undefined behavior for x=0).
Characterization:
Auxiliary function.
Related functions:
nr_leading_0bits.
This function analyzes whether x seen as a bit string contains maximally one contiguous string of one bits.
Parameters:
Characterization: Auxiliary function.
This function performs general trailing bit modifications, i.e. manipulation of rightmost bits. See manipulating rightmost bits for a detailed description.
Parameters:
Characterization: Auxiliary function.
This is the bit equivalent to if m then x else y (Pascal) or m?x:y (C/C++).
Parameters:
Characterization: Auxiliary function.
Here is the implementation:
t_bits blend(t_bits m, t_bits x, t_bits y) { return (m & x) | (~m & y); }
Set mask denoting odd subword values, i.e. propagate each least significant bit to the left.
Parameters:
Characterization: Auxiliary function.
This is a permutation primitive. It is also called delta swap.
Parameters:
The subset of bits of x
masked by m are exchanged with the ones
shifted left by shift.
To work as described,
the mask and the shifted mask must not have bits in common,
i.e. m & (m << shift) == 0
and no bits must be shifted away,
i.e. ((m << shift) >> shift) == m.
If also m ^ (m << shift) == all_bits,
the function
bit_permute_step_simple
is a simpler replacement.
On superscalar x86 processors a proper implementation of this routine
typically needs 5 cycles.
Superscalar ARM processors need at least 4 cycles.
Characterization:
Auxiliary function.
Related functions:
bit_permute_step_simple,
bit_permute_step2.
See also bit permutation primitives.
Here is the implementation:
t_bits bit_permute_step(t_bits x, t_bits m, t_uint shift) { t_bits t; t = ((x >> shift) ^ x) & m; x = (x ^ t) ^ (t << shift); return x; }
An implementation in x86 assembler with constant mask and shift (eax ⇒ eax):
mov edx,eax shr eax,SHIFT xor eax,edx and eax,MASK xor edx,eax shl eax,SHIFT xor eax,edx
This is a permutation primitive.
Parameters:
The subset of bits of x
masked by m are exchanged with the ones
shifted left by shift.
To work as described,
the mask and the shifted mask must not have bits in common,
i.e. m & (m << shift) == 0
and no bits must be shifted away,
i.e. ((m << shift) >> shift) == m.
Also all relevant bits n must be affected:
nr_1bits(bit_permute_step_simple(n,mask,shift)) = nr_1bits(n)
or
((m ^ (m << shift)) & relevant_bits) == relevant_bits
since all bits not caught by m and m << shift
are cleared.
If all this is not the case
the needed function is
bit_permute_step.
On superscalar x86 processors a proper implementation of this routine
typically needs 4 cycles.
For future processors it might be calculated in 3 cycles
when RORX (BMI2) is used or when MOV instructions cost no latency.
Superscalar ARM processors need 2 at least cycles.
Characterization:
Auxiliary function.
Related functions:
bit_permute_step,
bit_permute_step2.
Here is the implementation:
t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) { return ((x & m) << shift) | ((x >> shift) & m); }
An implementation in x86 assembler with constant mask and shift (eax ⇒ eax):
mov edx,eax shr eax,SHIFT and edx,MASK and eax,MASK shl edx,SHIFT or eax,edx
This is a permutation primitive. It is also called delta swap. It can be used to extend to functionality of bit_permute_step to multiple words.
Parameters:
The subset of bits of *x1
masked by m are exchanged with
the ones of *x2 masked by m << shift.
To work as described,
no bits must be shifted away,
i.e. ((m << shift) >> shift) == m.
If x1 == x2 (same address),
the mask and the shifted mask must not have bits in common,
i.e. m & (m << shift) == 0;
also the function
bit_permute_step
is a simpler replacement in this case.
If also m ^ (m << shift) == all_bits,
the function
bit_permute_step_simple
is a yet simpler replacement.
Characterization:
Auxiliary function.
Related functions:
bit_permute_step,
bit_permute_step_simple.
See also bit permutation primitives.
Here is the implementation:
void bit_permute_step2(t_bits* x1, t_bits* x2, t_bits m, t_uint shift) { t_bits t; t = ((*x2 >> shift) ^ *x1) & m; *x1 = *x1 ^ t; *x2 = *x2 ^ (t << shift); }
An implementation in x86 assembler with constant mask and shift and x1 (eax) and x2 (edx) as values instead of references which should cost 5 cycles on superscalar x86 processors (eax, edx ⇒ eax, edx):
mov ecx,edx shr edx,SHIFT xor edx,eax and edx,MASK xor eax,edx shl edx,SHIFT xor edx,ecx
This function supplies the identity index vector, i.e. for all i: tgt[i]=i.
Parameters:
To apply the resulting permutation parameters you may use gen_benes.
Characterization:
Auxiliary function.
Topics:
Related functions:
gen_benes,
invert_bpc,
random_perm.
This function calculates the index vector as needed as a parameter for the generator of the inverse permutation with a Beneš network.
Parameters:
To apply the resulting permutation parameters you may use gen_benes.
Characterization:
Auxiliary function.
Topics:
Related functions:
gen_benes,
identity_perm,
invert_bpc,
random_perm.
This function calculates a randomized index vector as needed e.g. as a parameter for the Beneš network generator.
Parameters:
To apply the resulting permutation parameters you may use gen_benes.
Characterization:
Auxiliary function.
Topics:
Related functions:
gen_benes,
identity_perm,
invert_perm.
This function collects the referenced source indexes from the given index vector as a bit set.
Parameters:
Characterization:
Auxiliary function.
Topics:
Related functions:
identity_perm,
invert_bpc,
used_target_bits.
This function collects target indexes, i.e. the positions of specified (i.e. ≠ no_index) indexes, from the given index vector as a bit set.
Parameters:
Characterization:
Auxiliary function.
Topics:
Related functions:
identity_perm,
invert_bpc,
used_target_bits.
Complement one bit index.
Parameters:
This is effectively the building block of
general_reverse_bits.
On superscalar x86 processors a proper implementation of this routine
typically needs 3…4 cycles.
Characterization:
Normal function.
Topic:
Bit index operations.
Related functions:
bit_index_swap,
bit_index_swap_complement,
general_reverse_bits,
bswap,
permute_bpc.
Exchange two bit indexes.
Parameters:
This is effectively the building block of
shuffle,
unshuffle,
and
bit_index_ror.
On superscalar x86 processors a proper implementation of this routine
typically needs 5 cycles.
Characterization:
Normal function.
Topic:
Bit index operations.
Related functions:
bit_index_complement,
bit_index_swap_complement,
bit_index_ror,
transpose,
shuffle,
unshuffle,
shuffle_power,
unshuffle_power,
permute_bpc.
Exchange and complement two bit indexes.
Parameters:
This function combines
bit_index_swap
and two times
bit_index_complement.
On superscalar x86 processors a proper implementation of this routine
typically needs 5 cycles.
Characterization:
Normal function.
Topic:
Bit index operations.
Related functions:
bit_index_complement,
bit_index_swap,
permute_bpc.
Rotate an bit index field to the right.
Parameters:
0 ≤ field+ofs ≤ ld_bits.
The bit index field is rotated to the right by rot.
All other bit indexes are left unchanged.
Characterization:
Normal function.
Topic:
Bit index operations.
Related functions:
bit_index_swap,
transpose,
shuffle_power,
unshuffle_power,
shuffle,
unshuffle,
permute_bpc.
Transpose matrixes.
Parameters:
0 ≤ sw_entities+ld_row+ld_col ≤ ld_bits.
This function is essentially the same as
bit_index_ror,
albeit with slightly different parameters.
Characterization:
Normal function.
Topic:
Bit index operations.
Related functions:
bit_index_ror,
permute_bpc.
This function calculates powers of shuffle.
Parameters:
0 ≤ sw1 ≤ sw2 ≤ ld_bits.
This function is essentially the same as
bit_index_ror,
albeit with slightly different parameters.
Characterization:
Normal function.
Topics:
Bit index operations,
Shuffle and unshuffle.
Related functions:
bit_index_ror,
unshuffle_power,
shuffle,
permute_bpc.
This function calculates powers of unshuffle.
Parameters:
0 ≤ sw1 ≤ sw2 ≤ ld_bits.
This function is essentially the same as
bit_index_ror,
albeit with slightly different parameters.
Characterization:
Normal function.
Topics:
Bit index operations,
Shuffle and unshuffle.
Related functions:
bit_index_ror,
shuffle_power,
unshuffle,
permute_bpc.
This function calculates general BPC permutations.
Parameters:
To generate the reverse permutation you may use invert_bpc.
Characterization:
Normal function.
Topics:
Generalized bit reversal,
Bit index operations,
BPC permutations.
Related functions:
bit_index_complement,
bit_index_swap,
bit_index_swap_complement,
general_reverse_bits,
invert_bpc.
This function calculates the index vector and complement set as needed as parameters for the inverse BPC permutations.
Parameters:
To apply the resulting permutation parameters you may use permute_bpc.
Characterization:
Auxiliary function.
Topics:
Generalized bit reversal,
Bit index operations,
BPC permutations.
Related functions:
bit_index_complement,
bit_index_swap,
bit_index_swap_complement,
general_reverse_bits,
permute_bpc,
invert_perm.
Swap all subwords of given levels.
Parameters:
Characterization:
Normal function.
Topic:
Generalized bit reversal.
Related functions:
bswap,
prim_swap,
bit_index_complement,
permute_bpc.
This function exchanges the byte order and is typically implementable by an assembler instruction.
Parameters:
= general_reverse_bits(x,~7)
On x86 the assembler instructions BSWAP and MOVBE do the job
and act on 32 and 64 bit arguments.
For 16 bit you can use
ROL ax,16
or
BSWAP al,ah
or similar instead.
For SSE usage (128 bit) you can apply the PSHUFB instruction.
Characterization:
Auxiliary function.
Topic:
Generalized bit reversal.
Related functions:
rol,
general_reverse_bits,
bit_index_complement.
Swap all denoted subwords of given levels.
Parameters:
The highest bit of m has the special function to invert the swapping step order.
Characterization:
Normal function.
Topic:
Swap by primitives.
Related functions:
general_reverse_bits.
Shuffle/zip/interlace entities.
Parameters:
0 ≤ sw1 < sw2 ≤ ld_bits.
This function is the inverse of
unshuffle.
Characterization:
Normal function.
Topic:
Shuffle and unshuffle
Related functions:
unshuffle,
shuffle_power,
unshuffle_power,
bit_index_swap.
Unshuffle/unzip/uninterlace entities.
Parameters:
0 ≤ sw1 < sw2 ≤ ld_bits
This function is the inverse of
shuffle.
Characterization:
Normal function.
Topic:
Shuffle and unshuffle.
Related functions:
shuffle,
shuffle_power,
unshuffle_power,
bit_index_swap.
Compress (gather) bits of the masks itself.
Parameters:
For all corresponding functions the following holds:
compress_mask*(m, …) = compress*(m,m, …)
Characterization:
Normal function.
Topic:
Compress and expand.
Related functions:
expand*,
expand_flip*.
Generators for apply_compress* and apply_expand*.
Parameters:
Characterization:
Function to generate a configuration.
Topic:
Compress and expand.
Related functions:
compress*,
expand*,
gen_cef*.
Compress (gather) selected bits to the given direction using a precalculated configuration.
Parameters:
Characterization:
Function needing a configuration.
Topic:
Compress and expand.
Related functions:
compress*,
apply_expand*,
compress_mask*.
Expand (scatter) bits from the given direction to selected places using a precalculated configuration.
Parameters:
Characterization:
Function needing a configuration.
Topic:
Compress and expand.
Related functions:
expand*,
apply_compress*.
Compress (gather) selected bits to the given direction.
Parameters:
Use
gen_ce*
and
apply_compress*
with the same direction
if you use the same mask more than once.
The instruction for x86 processors named
PEXT
implements
compress_right with sw=ld_bits.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Compress and expand.
Related functions:
expand*,
compress_flip*.
Expand (scatter) bits from the given direction to selected places.
Parameters:
Use
gen_ce*
and
apply_expand*
with the same direction
if you use the same mask more than once.
The instruction for x86 processors named
PDEP
implements
expand_right with sw=ld_bits.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Compress and expand.
Related functions:
compress*,
expand_flip*.
One butterfly step/stage.
Parameters:
For the bit layout of m please look at the diagram in
Butterfly network.
The unused bits should be zero, i.e.
m & a_bfly_mask[sw] = m.
Characterization:
Normal function.
Topic:
Butterfly network.
Related functions:
bfly,
ibfly.
Used to apply configured butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.
Parameters:
This function is the inverse of ibfly.
Characterization:
Function needing a configuration.
Topic:
Butterfly network.
Related functions:
ibfly,
butterfly.
Used to apply configured inverse butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.
Parameters:
This function is the inverse of bfly.
Characterization:
Function needing a configuration.
Topic:
Butterfly network.
Related functions:
bfly,
butterfly.
Return the parity of a permutation given by a precalculated configuration of a butterfly network. This is false for even parity and true for odd parity.
Parameters:
Characterization:
Function needing a configuration.
Topic:
Butterfly network.
Related functions:
bfly,
butterfly.
Rotate all subwords in the given direction by the given amount.
Generators for
bfly (rotate right, fror)
and
ibfly (rotate left, frol).
Parameters:
Characterization:
Function to generate a configuration.
Topic:
Rotate via butterfly.
Related functions:
fror /
frol /
frot,
gen_vrot.
Rotate all subwords in the given direction
by the amount given in the corresponding subwords.
Generators for
bfly (rotate right, vror_bfly)
and
ibfly (rotate left, vrol_bfly).
Parameters:
Characterization:
Function to generate a configuration.
Topic:
Rotate via butterfly.
Related functions:
vror_bfly /
vrol_bfly /
vrot_bfly,
gen_frot.
Rotate all subwords in the given direction: fror_bfly rotates right and frol_bfly rotates left.
Parameters:
Use gen_frot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation value more than once.
The alternative routines fror / frol / frot are much faster and do not need any butterfly stuff.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Rotate via butterfly.
Related functions:
vror /
vrol /
vrot,
fror /
frol /
frot.
Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror_bfly rotates right and vrol_bfly rotates left.
Parameters:
Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Rotate via butterfly.
Related functions:
vror /
vrol /
vrot,
fror /
frol /
frot.
Rotate all subwords in the given direction by the given amount: fror rotates right and frol rotates left.
Parameters:
These routines return the same values as the corresponding routines with the name postfix _bfly (fror_bfly / frol_bfly / frot_bfly) but do not need any butterfly stuff and should be much faster.
Characterization:
Normal function.
Topic:
Rotate via butterfly.
Related functions:
vror /
vrol /
vrot,
fror_bfly /
frol_bfly /
frot_bfly,
frorc /
frolc /
frotc.
Rotate and complement all subwords in the given direction by the given amount: frorc rotates right and frolc rotates left.
Parameters:
Characterization:
Normal function.
Related functions:
frolc,
vror /
vrol /
vrot,
fror /
frol /
frot.
Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror rotates right and vrol rotates left.
Parameters:
Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.
These routines return the same values as the corresponding routines with the name postfix _bfly (vror_bfly / vrol_bfly / vrot_bfly) but do not need any butterfly stuff and should be faster.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Rotate via butterfly.
Related functions:
vror_bfly /
vrol_bfly /
vrot_bfly,
fror /
frol /
frot.
Generators for bfly (expand_flip*) and ibfly (compress_flip*).
Parameters:
Characterization:
Function to generate a configuration.
Topic:
Compress/expand-flip.
Related functions:
compress_flip*,
expand_flip*,
gen_ce*.
Compress (gather) selected bits to the given direction while stuffing the remaining ones in inverse order into the remaining places.
Parameters:
Use
gen_cef*
and
ibfly
if you use the same mask more than once.
This function is the inverse of
expand_flip*,
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Compress/expand-flip.
Related functions:
expand_flip*,
compress*.
Expand (scatter) bits from the given direction to selected places
while stuffing the remaining ones in inverse order into the remaining places.
This function is the inverse of
expand_flip*,
Parameters:
Use gen_cef* and ibfly if you use the same mask more than once.
Characterization:
Normal function (compound, with temporary configuration).
Topic:
Compress/expand-flip.
Related functions:
compress_flip*,
expand*.
Simulate one omega step/stage.
Parameters:
Only the lower bits of the subwords of m are used.
The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of
flip*,
Characterization:
Normal function.
Topic:
Omega-flip network.
Related functions:
flip.
Simulate one flip step/stage.
Parameters:
Only the lower bits of the subwords of m are used.
The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of
flip*,
Characterization:
Normal function.
Topic:
Omega-flip network.
Related functions:
omega.
Generator for benes_fwd / benes_bwd.
Parameters:
This function effectively generates the configuration for a Beneš network which consists of a Butterfly network and a following inverse Butterfly network.
Characterization:
Function to generate a configuration.
Topic:
Beneš network.
Related functions:
gen_benes_ex.
benes_fwd /
benes_bwd.
Generator for benes_fwd_ex / benes_bwd_ex.
Parameters:
This function effectively generates the configuration for a
reordered
Beneš network.
This function is fortunate if you make use of the reordering,
otherwise the standard order in
gen_benes
suffices.
Characterization:
Function to generate a configuration.
Topic:
Beneš network.
Related functions:
gen_benes.
benes_fwd_ex /
benes_bwd_ex.
Perform an arbitrary permutation using a precalculated configuration.
Parameters:
For benes_fwd the index array given to
gen_benes
selected source indexes,
and for
benes_bwd the index array
selected target indexes.
These functions are the inverse of each other.
Characterization:
Function needing a configuration.
Topic:
Beneš network.
Related functions:
gen_benes.
Perform an arbitrary permutation using a precalculated configuration.
Parameters:
For benes_fwd_ex the index array given to
gen_benes_ex
selected source indexes,
and for
benes_bwd_ex the index array
selected target indexes.
These functions are the inverse of each other.
Characterization:
Function needing a configuration.
Topic:
Beneš network.
Related functions:
gen_benes_ex.
Return the parity of an arbitrary permutation given by a precalculated configuration of a Beneš network. This is false for even parity and true for odd parity.
Parameters:
Characterization:
Function needing a configuration.
Topic:
Beneš network.
Related functions:
gen_benes.