Back to main page

Function descriptions


Back to main page

Contents

Comments or questions? E-mail: info@sirrida.de


(To top) Intro

This is the description of several bit permutation and other miscellaneous routines as they appear in perm_bas.* which are used by the test programs testperm.* and the permutation code generator in source, calcperm.pas resp. calcperm.cpp.


(To top) odd

odd

Is the given number odd? This is not a SWAR operation. This function is intrinsic in Pascal.

Parameters:

Return true for an odd number and false for an even number.

Characterization: Auxiliary function.


(To top) gcd

gcd

Greatest common divisor. This is not a SWAR operation.

Parameters:

Calculate the greatest common divisor of a and b.

Characterization: Auxiliary function.


(To top) mul_inv

mul_inv

Calculate multiplicative inverse. This is not a SWAR operation.

Parameters:

Calculate multiplicative inverse, i.e. mul_inv(x)*x==1. The result is defined for all odd values of x. For even x we simply return 0.

Characterization: Auxiliary function.


(To top) rol

rol

Rotate left a complete word.

Parameters:

Rotate x rot times to the left.

This should be optimized by the compiler to a single assembly instruction if inlined.

Characterization: Auxiliary function.
Related functions: rol_lo, bswap.


(To top) rol_lo

rol_lo

Rotate left a single subword. This is not a SWAR operation.

Parameters:

Rotate x rot times to the left.

Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol, rolc_lo.


(To top) rolc_lo

rolc_lo

Rotate left and complement a single subword. This is not a SWAR operation.

Parameters:

Rotate x rot times to the left and complement each bit as it is rotated. In other words: Each bit that falls out is inverted and re-inserted on the other end.

Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol_lo, frolc.


(To top) gray_code

gray_code

Convert a number to Gray code. This is not a SWAR operation.

Parameters:

Convert a number to Gray code. This is the inverse of inverse_gray_code.

Characterization: Auxiliary function.
Related functions: inverse_gray_code.


(To top) inverse_gray_code

inverse_gray_code

Convert a number to inverse Gray code. This is not a SWAR operation.

Parameters:

Convert a number to the inverse Gray code. This is the inverse of gray_code.

Characterization: Auxiliary function.
Related functions: gray_code.


(To top) nr_1bits

nr_1bits

Count the set bits. Also known as population count.

Parameters:

Newer x86 processors know this operation as POPCNT.
A number x has even/odd parity if nr_1bits(x) is even/odd.

Characterization: Auxiliary function.


(To top) nr_leading_0bits

nr_leading_0bits

This function counts the leading zero bits of x, i.e. the length of the contiguous string of zero bits from the most significant toward the least significant bit.

Parameters:

The result is bits if x is zero.
Newer x86 processors know this operation as LZCNT (which has a dedicated CPUID feature flag). It can easily be emulated with the older instruction BSR and some glue code.

Characterization: Auxiliary function.
Related functions: nr_trailing_0bits.


(To top) nr_trailing_0bits

nr_trailing_0bits

This function counts the trailing zero bits of x, i.e. the length of the contiguous string of zero bits from the least significant toward the most significant bit.

Parameters:

The result is bits if x is zero.
This operation is part of the BMI1 instruction set and named TZCNT. The older BSF instruction almost does the same (albeit with an undefined behavior for x=0).

Characterization: Auxiliary function.
Related functions: nr_leading_0bits.


(To top) is_contiguous_1bits

is_contiguous_1bits

This function analyzes whether x seen as a bit string contains maximally one contiguous string of one bits.

Parameters:

Characterization: Auxiliary function.


(To top) tbm

tbm

This function performs general trailing bit modifications, i.e. manipulation of rightmost bits. See manipulating rightmost bits for a detailed description.

Parameters:

Characterization: Auxiliary function.


(To top) blend

blend

This is the bit equivalent to if m then x else y (Pascal) or m?x:y (C/C++).

Parameters:

Characterization: Auxiliary function.

Here is the implementation:

  t_bits blend(t_bits m, t_bits x, t_bits y) {
    return (m & x) | (~m & y);
    }
  

(To top) simd_odd

simd_odd

Set mask denoting odd subword values, i.e. propagate each least significant bit to the left.

Parameters:

Characterization: Auxiliary function.


(To top) bit_permute_step

bit_permute_step

This is a permutation primitive. It is also called delta swap.

Parameters:

The subset of bits of x masked by m are exchanged with the ones shifted left by shift.
To work as described, the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0 and no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
If also m ^ (m << shift) == all_bits, the function bit_permute_step_simple is a simpler replacement.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles. Superscalar ARM processors need at least 4 cycles.

Characterization: Auxiliary function.
Related functions: bit_permute_step_simple, bit_permute_step2.

See also bit permutation primitives.

Here is the implementation:

  t_bits bit_permute_step(t_bits x, t_bits m, t_uint shift) {
    t_bits t;
    t = ((x >> shift) ^ x) & m;
    x = (x ^ t) ^ (t << shift);
    return x;
    }
  

An implementation in x86 assembler with constant mask and shift (eaxeax):

  mov edx,eax
  shr eax,SHIFT
  xor eax,edx
  and eax,MASK
  xor edx,eax
  shl eax,SHIFT
  xor eax,edx
  

(To top) bit_permute_step_simple

bit_permute_step_simple

This is a permutation primitive.

Parameters:

The subset of bits of x masked by m are exchanged with the ones shifted left by shift.
To work as described, the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0 and no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
Also all relevant bits n must be affected:
nr_1bits(bit_permute_step_simple(n,mask,shift)) = nr_1bits(n)
or
((m ^ (m << shift)) & relevant_bits) == relevant_bits
since all bits not caught by m and m << shift are cleared.
If all this is not the case the needed function is bit_permute_step.
On superscalar x86 processors a proper implementation of this routine typically needs 4 cycles. For future processors it might be calculated in 3 cycles when RORX (BMI2) is used or when MOV instructions cost no latency. Superscalar ARM processors need 2 at least cycles.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step2.

Here is the implementation:

  t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
    return ((x & m) << shift) | ((x >> shift) & m);
    }
  

An implementation in x86 assembler with constant mask and shift (eaxeax):

  mov edx,eax
  shr eax,SHIFT
  and edx,MASK
  and eax,MASK
  shl edx,SHIFT
  or eax,edx
  

(To top) bit_permute_step2

bit_permute_step2

This is a permutation primitive. It is also called delta swap. It can be used to extend to functionality of bit_permute_step to multiple words.

Parameters:

The subset of bits of *x1 masked by m are exchanged with the ones of *x2 masked by m << shift.
To work as described, no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
If x1 == x2 (same address), the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0; also the function bit_permute_step is a simpler replacement in this case.
If also m ^ (m << shift) == all_bits, the function bit_permute_step_simple is a yet simpler replacement.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step_simple.

See also bit permutation primitives.

Here is the implementation:

  void bit_permute_step2(t_bits* x1, t_bits* x2, t_bits m, t_uint shift) {
    t_bits t;
    t = ((*x2 >> shift) ^ *x1) & m;
    *x1 = *x1 ^ t;
    *x2 = *x2 ^ (t << shift);
    }
  

An implementation in x86 assembler with constant mask and shift and x1 (eax) and x2 (edx) as values instead of references which should cost 5 cycles on superscalar x86 processors (eax, edxeax, edx):

  mov ecx,edx
  shr edx,SHIFT
  xor edx,eax
  and edx,MASK
  xor eax,edx
  shl edx,SHIFT
  xor edx,ecx
  

(To top) identity_perm

identity_perm

This function supplies the identity index vector, i.e. for all i: tgt[i]=i.

Parameters:

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, invert_bpc, random_perm.


(To top) invert_perm

invert_perm

This function calculates the index vector as needed as a parameter for the generator of the inverse permutation with a Beneš network.

Parameters:

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_bpc, random_perm.


(To top) random_perm

random_perm

This function calculates a randomized index vector as needed e.g. as a parameter for the Beneš network generator.

Parameters:

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_perm.


(To top) used_source_bits

used_source_bits

This function collects the referenced source indexes from the given index vector as a bit set.

Parameters:

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.


(To top) used_target_bits

used_target_bits

This function collects target indexes, i.e. the positions of specified (i.e. ≠ no_index) indexes, from the given index vector as a bit set.

Parameters:

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.


(To top) bit_index_complement

bit_index_complement

Complement one bit index.

Parameters:

This is effectively the building block of general_reverse_bits.
On superscalar x86 processors a proper implementation of this routine typically needs 3…4 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, bit_index_swap_complement, general_reverse_bits, bswap, permute_bpc.


(To top) bit_index_swap

bit_index_swap

Exchange two bit indexes.

Parameters:

This is effectively the building block of shuffle, unshuffle, and bit_index_ror.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap_complement, bit_index_ror, transpose, shuffle, unshuffle, shuffle_power, unshuffle_power, permute_bpc.


(To top) bit_index_swap_complement

bit_index_swap_complement

Exchange and complement two bit indexes.

Parameters:

This function combines bit_index_swap and two times bit_index_complement.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap, permute_bpc.


(To top) bit_index_ror

bit_index_ror

Rotate an bit index field to the right.

Parameters:

0 ≤ field+ofsld_bits.
The bit index field is rotated to the right by rot. All other bit indexes are left unchanged.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, transpose, shuffle_power, unshuffle_power, shuffle, unshuffle, permute_bpc.


(To top) transpose

transpose

Transpose matrixes.

Parameters:

0 ≤ sw_entities+ld_row+ld_colld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_ror, permute_bpc.


(To top) shuffle_power

shuffle_power

This function calculates powers of shuffle.

Parameters:

0 ≤ sw1sw2ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, unshuffle_power, shuffle, permute_bpc.


(To top) unshuffle_power

unshuffle_power

This function calculates powers of unshuffle.

Parameters:

0 ≤ sw1sw2ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, shuffle_power, unshuffle, permute_bpc.


(To top) permute_bpc

permute_bpc

This function calculates general BPC permutations.

Parameters:

To generate the reverse permutation you may use invert_bpc.

Characterization: Normal function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, invert_bpc.


(To top) invert_bpc

invert_bpc

This function calculates the index vector and complement set as needed as parameters for the inverse BPC permutations.

Parameters:

To apply the resulting permutation parameters you may use permute_bpc.

Characterization: Auxiliary function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, permute_bpc, invert_perm.


(To top) general_reverse_bits

general_reverse_bits

Swap all subwords of given levels.

Parameters:

Characterization: Normal function.
Topic: Generalized bit reversal.
Related functions: bswap, prim_swap, bit_index_complement, permute_bpc.


(To top) bswap

bswap

This function exchanges the byte order and is typically implementable by an assembler instruction.

Parameters:

= general_reverse_bits(x,~7)
On x86 the assembler instructions BSWAP and MOVBE do the job and act on 32 and 64 bit arguments. For 16 bit you can use ROL ax,16 or BSWAP al,ah or similar instead. For SSE usage (128 bit) you can apply the PSHUFB instruction.

Characterization: Auxiliary function.
Topic: Generalized bit reversal.
Related functions: rol, general_reverse_bits, bit_index_complement.


(To top) prim_swap

prim_swap

Swap all denoted subwords of given levels.

Parameters:

The highest bit of m has the special function to invert the swapping step order.

Characterization: Normal function.
Topic: Swap by primitives.
Related functions: general_reverse_bits.


(To top) shuffle

shuffle

Shuffle/zip/interlace entities.

Parameters:

0 ≤ sw1 < sw2ld_bits.
This function is the inverse of unshuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle
Related functions: unshuffle, shuffle_power, unshuffle_power, bit_index_swap.


(To top) unshuffle

unshuffle

Unshuffle/unzip/uninterlace entities.

Parameters:

0 ≤ sw1 < sw2ld_bits
This function is the inverse of shuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle.
Related functions: shuffle, shuffle_power, unshuffle_power, bit_index_swap.


(To top) compress_mask*

compress_mask_right
compress_mask_left
compress_mask

Compress (gather) bits of the masks itself.

Parameters:

For all corresponding functions the following holds:
compress_mask*(m, …) = compress*(m,m, …)

Characterization: Normal function.
Topic: Compress and expand.
Related functions: expand*, expand_flip*.


(To top) gen_ce*

gen_ce_right
gen_ce_left

Generators for apply_compress* and apply_expand*.

Parameters:

Characterization: Function to generate a configuration.
Topic: Compress and expand.
Related functions: compress*, expand*, gen_cef*.


(To top) apply_compress*

apply_compress_right
apply_compress_left

Compress (gather) selected bits to the given direction using a precalculated configuration.

Parameters:

Characterization: Function needing a configuration.
Topic: Compress and expand.
Related functions: compress*, apply_expand*, compress_mask*.


(To top) apply_expand*

apply_expand_right
apply_expand_left

Expand (scatter) bits from the given direction to selected places using a precalculated configuration.

Parameters:

Characterization: Function needing a configuration.
Topic: Compress and expand.
Related functions: expand*, apply_compress*.


(To top) compress*

compress_right
compress_left
compress

Compress (gather) selected bits to the given direction.

Parameters:

Use gen_ce* and apply_compress* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PEXT implements compress_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: expand*, compress_flip*.


(To top) expand*

expand_right
expand_left
expand

Expand (scatter) bits from the given direction to selected places.

Parameters:

Use gen_ce* and apply_expand* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PDEP implements expand_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: compress*, expand_flip*.


(To top) butterfly

butterfly

One butterfly step/stage.

Parameters:

For the bit layout of m please look at the diagram in Butterfly network. The unused bits should be zero, i.e.
m & a_bfly_mask[sw] = m.

Characterization: Normal function.
Topic: Butterfly network.
Related functions: bfly, ibfly.


(To top) bfly

bfly

Used to apply configured butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

This function is the inverse of ibfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: ibfly, butterfly.


(To top) ibfly

ibfly

Used to apply configured inverse butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

This function is the inverse of bfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.


(To top) bfly_parity

bfly_parity

Return the parity of a permutation given by a precalculated configuration of a butterfly network. This is false for even parity and true for odd parity.

Parameters:

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.


(To top) gen_frot

gen_frot

Rotate all subwords in the given direction by the given amount.
Generators for bfly (rotate right, fror) and ibfly (rotate left, frol).

Parameters:

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: fror / frol / frot, gen_vrot.


(To top) gen_vrot

gen_vrot

Rotate all subwords in the given direction by the amount given in the corresponding subwords.
Generators for bfly (rotate right, vror_bfly) and ibfly (rotate left, vrol_bfly).

Parameters:

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, gen_frot.


(To top) fror_bfly / frol_bfly / frot_bfly

fror_bfly
frol_bfly
frot_bfly

Rotate all subwords in the given direction: fror_bfly rotates right and frol_bfly rotates left.

Parameters:

Use gen_frot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation value more than once.

The alternative routines fror / frol / frot are much faster and do not need any butterfly stuff.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.


(To top) vror_bfly / vrol_bfly / vrot_bfly

vror_bfly
vrol_bfly
vrot_bfly

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror_bfly rotates right and vrol_bfly rotates left.

Parameters:

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.


(To top) fror / frol / frot

fror
frol
frot

Rotate all subwords in the given direction by the given amount: fror rotates right and frol rotates left.

Parameters:

These routines return the same values as the corresponding routines with the name postfix _bfly (fror_bfly / frol_bfly / frot_bfly) but do not need any butterfly stuff and should be much faster.

Characterization: Normal function.
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror_bfly / frol_bfly / frot_bfly, frorc / frolc / frotc.


(To top) frorc / frolc / frotc

frorc
frolc
frotc

Rotate and complement all subwords in the given direction by the given amount: frorc rotates right and frolc rotates left.

Parameters:

Characterization: Normal function.
Related functions: frolc, vror / vrol / vrot, fror / frol / frot.


(To top) vror / vrol / vrot

vror
vrol
vrot

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror rotates right and vrol rotates left.

Parameters:

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

These routines return the same values as the corresponding routines with the name postfix _bfly (vror_bfly / vrol_bfly / vrot_bfly) but do not need any butterfly stuff and should be faster.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, fror / frol / frot.


(To top) gen_cef*

gen_cef_right
gen_cef_left

Generators for bfly (expand_flip*) and ibfly (compress_flip*).

Parameters:

Characterization: Function to generate a configuration.
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand_flip*, gen_ce*.


(To top) compress_flip*

compress_flip_right
compress_flip_left
compress_flip

Compress (gather) selected bits to the given direction while stuffing the remaining ones in inverse order into the remaining places.

Parameters:

Use gen_cef* and ibfly if you use the same mask more than once.
This function is the inverse of expand_flip*,

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: expand_flip*, compress*.


(To top) expand_flip*

expand_flip_right
expand_flip_left
expand_flip

Expand (scatter) bits from the given direction to selected places while stuffing the remaining ones in inverse order into the remaining places.
This function is the inverse of expand_flip*,

Parameters:

Use gen_cef* and ibfly if you use the same mask more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand*.


(To top) omega

omega

Simulate one omega step/stage.

Parameters:

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: flip.


(To top) flip

flip

Simulate one flip step/stage.

Parameters:

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: omega.


(To top) gen_benes

gen_benes

Generator for benes_fwd / benes_bwd.

Parameters:

This function effectively generates the configuration for a Beneš network which consists of a Butterfly network and a following inverse Butterfly network.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex. benes_fwd / benes_bwd.


(To top) gen_benes_ex

gen_benes_ex

Generator for benes_fwd_ex / benes_bwd_ex.

Parameters:

This function effectively generates the configuration for a reordered Beneš network.
This function is fortunate if you make use of the reordering, otherwise the standard order in gen_benes suffices.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes. benes_fwd_ex / benes_bwd_ex.


(To top) benes_fwd / benes_bwd

benes_fwd
benes_bwd

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

For benes_fwd the index array given to gen_benes selected source indexes, and for benes_bwd the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.


(To top) benes_fwd_ex / benes_bwd_ex

benes_fwd_ex
benes_bwd_ex

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

For benes_fwd_ex the index array given to gen_benes_ex selected source indexes, and for benes_bwd_ex the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex.


(To top) benes_parity

benes_parity

Return the parity of an arbitrary permutation given by a precalculated configuration of a Beneš network. This is false for even parity and true for odd parity.

Parameters:

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.


You may bookmark this page as http://programming.sirrida.de?perm_fn.html.
Last change: 2018-05-07