Function descriptions

Intro
Auxiliary routines
Bit index operations
Generalized bit reversal
- general_reverse_bits
- bswap
Swap by primitives
- prim_swap
Shuffle and unshuffle
- shuffle
- unshuffle
Compress and expand
- Compress bit masks
  - compress_mask*
- Generate configuration
  - gen_ce*
- Usage
  - apply_compress*
  - apply_expand*
- Compound
  - compress*
  - expand*
Butterfly network
Rotate via butterfly
- Generate configuration
  - gen_frot
  - gen_vrot
- Compound
  - fror_bfly / frol_bfly / frot_bfly
  - vror_bfly / vrol_bfly / vrot_bfly
SWAR rotate
Compress/expand-flip via butterfly
- Generate configuration
  - gen_cef*
- Compound
  - compress_flip*
  - expand_flip*
Omega/flip
- omega
- flip
Beneš network
- Generate configuration
  - gen_benes
  - gen_benes_ex
- Usage

Comments or questions? E-mail: info@sirrida.de

Intro

This is the description of several bit permutation and other miscellaneous routines as they appear in perm_bas.* which are used by the test programs testperm.* and the permutation code generator in source, calcperm.pas resp. calcperm.cpp.

`odd`

Is the given number odd? This is not a SWAR operation. This function is intrinsic in Pascal.

Parameters:

x: in t_int
Number to be analyzed.
result: out t_bool

Return true for an odd number and false for an even number.

Characterization: Auxiliary function.

`gcd`

Greatest common divisor. This is not a SWAR operation.

Parameters:

a: in t_longint
One value > 0.
b: in t_longint
One value > 0.
result: out t_longint

Calculate the greatest common divisor of a and b.

Characterization: Auxiliary function.

`mul_inv`

Calculate multiplicative inverse. This is not a SWAR operation.

Parameters:

x: in t_bits
The value to be treaded.
result: out t_bits

Calculate multiplicative inverse, i.e. mul_inv(x)*x==1. The result is defined for all odd values of x. For even x we simply return 0.

Characterization: Auxiliary function.

`rol`

Rotate left a complete word.

Parameters:

x: in t_bits
The value to be rotated.
rot: in t_int
The rotate count. Negative values will rotate right.
Since this operation is cyclic by the operation size, only the low ld_bits bits are needed, and there is no limit on the value.
result: out t_bits

Rotate x rot times to the left.

This should be optimized by the compiler to a single assembly instruction if inlined.

Characterization: Auxiliary function.
Related functions: rol_lo, bswap.

`rol_lo`

Rotate left a single subword. This is not a SWAR operation.

Parameters:

x: in t_bits
The value to be rotated.
rot: in t_int
The rotate count. Negative values will rotate/complement right.
Since this operation is cyclic by 2^sw, only the low sw bits are needed and there is no limit on the value.
sw: in t_subword
log₂ of the number of affected bits. Range: 0…ld_bits.
result: out t_bits

Rotate x rot times to the left.

Only the low 2^sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol, rolc_lo.

`rolc_lo`

Rotate left and complement a single subword. This is not a SWAR operation.

Parameters:

x: in t_bits
The value to be rotated.
rot: in t_int
The rotate count. Negative values will rotate/complement right.
Since this operation is cyclic by 2^sw+1, only the low sw+1 bits are needed and there is no limit on the value. The +1 is due to the complementation.
sw: in t_subword
log₂ of the number of affected bits. Range: 0…ld_bits.
result: out t_bits

Rotate x rot times to the left and complement each bit as it is rotated. In other words: Each bit that falls out is inverted and re-inserted on the other end.

Only the low 2^sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol_lo, frolc.

`gray_code`

Convert a number to Gray code. This is not a SWAR operation.

Parameters:

x: in t_bits
The value to be treaded.
result: out t_bits

Convert a number to Gray code. This is the inverse of inverse_gray_code.

Characterization: Auxiliary function.
Related functions: inverse_gray_code.

`inverse_gray_code`

Convert a number to inverse Gray code. This is not a SWAR operation.

Parameters:

x: in t_bits
The value to be treaded.
result: out t_bits

Convert a number to the inverse Gray code. This is the inverse of gray_code.

Characterization: Auxiliary function.
Related functions: gray_code.

`nr_1bits`

Count the set bits. Also known as population count.

Parameters:

x: in t_bits
The value to be analyzed.
result: out t_int

Newer x86 processors know this operation as POPCNT.
A number x has even/odd parity if nr_1bits(x) is even/odd.

Characterization: Auxiliary function.

`nr_leading_0bits`

This function counts the leading zero bits of x, i.e. the length of the contiguous string of zero bits from the most significant toward the least significant bit.

Parameters:

x: in t_bits
The value to be analyzed.
result: out t_int

The result is bits if x is zero.
Newer x86 processors know this operation as LZCNT (which has a dedicated CPUID feature flag). It can easily be emulated with the older instruction BSR and some glue code.

Characterization: Auxiliary function.
Related functions: nr_trailing_0bits.

`nr_trailing_0bits`

This function counts the trailing zero bits of x, i.e. the length of the contiguous string of zero bits from the least significant toward the most significant bit.

Parameters:

x: in t_bits
The value to be analyzed.
result: out t_int

The result is bits if x is zero.
This operation is part of the BMI1 instruction set and named TZCNT. The older BSF instruction almost does the same (albeit with an undefined behavior for x=0).

Characterization: Auxiliary function.
Related functions: nr_leading_0bits.

`is_contiguous_1bits`

This function analyzes whether x seen as a bit string contains maximally one contiguous string of one bits.

Parameters:

x: in t_bits
The value to be analyzed.
result: out t_bool
true only if x is zero or contains only one contiguous string of one bits.

Characterization: Auxiliary function.

`tbm`

This function performs general trailing bit modifications, i.e. manipulation of rightmost bits. See manipulating rightmost bits for a detailed description.

Parameters:

x: in t_bits
The value to be acted upon.
mode: in t_int
The function index (mode) to be applied.
result: out t_bool

Characterization: Auxiliary function.

`blend`

This is the bit equivalent to if m then x else y (Pascal) or m?x:y (C/C++).

Parameters:

m: in t_bits
The blending mask.
x: in t_bits
The source bits selected when the mask bits are 1.
y: in t_bits
The source bits selected when the mask bits are 0.
result: out t_bits

Characterization: Auxiliary function.

Here is the implementation:

  t_bits blend(t_bits m, t_bits x, t_bits y) {
    return (m & x) | (~m & y);
    }

`simd_odd`

Set mask denoting odd subword values, i.e. propagate each least significant bit to the left.

Parameters:

x: in t_bits
The source.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
result: out t_bits

Characterization: Auxiliary function.

`bit_permute_step`

This is a permutation primitive. It is also called delta swap.

Parameters:

x: in t_bits
The value to be acted upon.
m: in t_bits
The mask.
shift: in t_uint
The shift value. Range: 0…bits-1.
result: out t_bits

Characterization: Auxiliary function.
Related functions: bit_permute_step_simple, bit_permute_step2.

`bit_permute_step_simple`

This is a permutation primitive.

Parameters:

x: in t_bits
The value to be acted upon.
m: in t_bits
The mask.
shift: in t_uint
The shift value. Range: 0…bits-1.
result: out t_bits

The subset of bits of x masked by m are exchanged with the ones shifted left by shift.
To work as described, the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0 and no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
Also all relevant bits n must be affected:
nr_1bits(bit_permute_step_simple(n,mask,shift)) = nr_1bits(n)
or
((m ^ (m << shift)) & relevant_bits) == relevant_bits
since all bits not caught by m and m << shift are cleared.
If all this is not the case the needed function is bit_permute_step.
On superscalar x86 processors a proper implementation of this routine typically needs 4 cycles. For future processors it might be calculated in 3 cycles when RORX (BMI2) is used or when MOV instructions cost no latency. Superscalar ARM processors need 2 at least cycles.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step2.

Here is the implementation:

  t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
    return ((x & m) << shift) | ((x >> shift) & m);
    }

An implementation in x86 assembler with constant mask and shift (eax ⇒ eax):

  mov edx,eax
  shr eax,SHIFT
  and edx,MASK
  and eax,MASK
  shl edx,SHIFT
  or eax,edx

`bit_permute_step2`

This is a permutation primitive. It is also called delta swap. It can be used to extend to functionality of bit_permute_step to multiple words.

Parameters:

x1: in out t_bits
The first value to be acted upon.
x2: in out t_bits
The second value to be acted upon.
m: in t_bits
The mask.
shift: in t_uint
The shift value. Range: 0…bits-1.

The subset of bits of *x1 masked by m are exchanged with the ones of *x2 masked by m << shift.
To work as described, no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
If x1 == x2 (same address), the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0; also the function bit_permute_step is a simpler replacement in this case.
If also m ^ (m << shift) == all_bits, the function bit_permute_step_simple is a yet simpler replacement.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step_simple.

`identity_perm`

This function supplies the identity index vector, i.e. for all i: tgt[i]=i.

Parameters:

tgt: out ta_subword
The index vector (0…ld_bits-1) for the identity permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, invert_bpc, random_perm.

`invert_perm`

This function calculates the index vector as needed as a parameter for the generator of the inverse permutation with a Beneš network.

Parameters:

src: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
tgt: out ta_subword
The index vector (0…ld_bits-1) for the inverted permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_bpc, random_perm.

`random_perm`

This function calculates a randomized index vector as needed e.g. as a parameter for the Beneš network generator.

Parameters:

tgt: out ta_subword
The index vector (0…ld_bits-1) for a randomized permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_perm.

`used_source_bits`

This function collects the referenced source indexes from the given index vector as a bit set.

Parameters:

perm: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
result: out t_bits
The set of used source indexes.

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.

`used_target_bits`

This function collects target indexes, i.e. the positions of specified (i.e. ≠ no_index) indexes, from the given index vector as a bit set.

Parameters:

perm: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
result: out t_bits
The set of specified indexes.

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.

`bit_index_complement`

Complement one bit index.

Parameters:

x: in t_bits
The source value.
k: in t_subword
The offending bit index. Range: 0…ld_bits-1.
result: out t_bits

This is effectively the building block of general_reverse_bits.
On superscalar x86 processors a proper implementation of this routine typically needs 3…4 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, bit_index_swap_complement, general_reverse_bits, bswap, permute_bpc.

`bit_index_swap`

Exchange two bit indexes.

Parameters:

x: in t_bits
The source value.
j: in t_subword
The first offending bit index. Range: 0…ld_bits-1.
k: in t_subword
The second offending bit index. Range: 0…ld_bits-1.
result: out t_bits

This is effectively the building block of shuffle, unshuffle, and bit_index_ror.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap_complement, bit_index_ror, transpose, shuffle, unshuffle, shuffle_power, unshuffle_power, permute_bpc.

`bit_index_swap_complement`

Exchange and complement two bit indexes.

Parameters:

x: in t_bits
The source value.
j: in t_subword
The first offending bit index. Range: 0…ld_bits-1.
k: in t_subword
The second offending bit index. Range: 0…ld_bits-1.
result: out t_bits

This function combines bit_index_swap and two times bit_index_complement.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap, permute_bpc.

`bit_index_ror`

Rotate an bit index field to the right.

Parameters:

x: in t_bits
The source value.
ofs: in t_subword
The number of lower bit index to be left unchanged (entities). Range: 0…ld_bits-1.
field: in t_subword
The number of bit indexes to be rotated. Range: 0…ld_bits-1.
rot: in t_int
The rotate count. Since this operation is cyclic modulo field there is no practical limit on the value.
result: out t_bits

0 ≤ field+ofs ≤ ld_bits.
The bit index field is rotated to the right by rot. All other bit indexes are left unchanged.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, transpose, shuffle_power, unshuffle_power, shuffle, unshuffle, permute_bpc.

`transpose`

Transpose matrixes.

Parameters:

x: in t_bits
The source value.
ld_fields: in t_subword
log₂ of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
ld_col: in t_subword
log₂ of the matrix width (columns) of the matrix. Range: 0…ld_bits-1.
ld_row: in t_subword
log₂ of the matrix height (rows) of the matrix. Range: 0…ld_bits-1.
result: out t_bits

0 ≤ sw_entities+ld_row+ld_col ≤ ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_ror, permute_bpc.

`shuffle_power`

This function calculates powers of shuffle.

Parameters:

x: in t_bits
The source value.
sw1: in t_subword
log₂ of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
sw2: in t_subword
log₂ of affected bit fields. Range: 0…ld_bits-1.
pwr: in t_int
The number of shuffle operations to be emulated. Since this operation is cyclic modulo field there is no practical limit on the value.
result: out t_bits

0 ≤ sw1 ≤ sw2 ≤ ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, unshuffle_power, shuffle, permute_bpc.

`unshuffle_power`

This function calculates powers of unshuffle.

Parameters:

x: in t_bits
The source value.
sw1: in t_subword
log₂ of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
sw2: in t_subword
log₂ of affected bit fields. Range: 0…ld_bits-1.
pwr: in t_int
The number of unshuffle operations to be emulated. Since this operation is cyclic modulo field there is no practical limit on the value.
result: out t_bits

0 ≤ sw1 ≤ sw2 ≤ ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, shuffle_power, unshuffle, permute_bpc.

`permute_bpc`

This function calculates general BPC permutations.

Parameters:

x: in t_bits
The source value.
tgt: in ta_subword
The vector (0…ld_bits-1) of target bit indexes. Range: 0…ld_bits-1. All ld_bits must be different.
k: in t_subword_set
The bit set of inversions (set of t_subword).
result: out t_bits

To generate the reverse permutation you may use invert_bpc.

Characterization: Normal function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, invert_bpc.

`invert_bpc`

This function calculates the index vector and complement set as needed as parameters for the inverse BPC permutations.

Parameters:

src: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1. All ld_bits must be different.
src_k: in t_subword_set
The original bit set of inversions.
tgt: out ta_subword
The index vector (0…ld_bits-1) for the inverted permutation. Range: 0…ld_bits-1.
tgt_k: out t_subword_set
The bit set of inversions for the inverted permutation (set of t_subword).

To apply the resulting permutation parameters you may use permute_bpc.

Characterization: Auxiliary function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, permute_bpc, invert_perm.

`general_reverse_bits`

Swap all subwords of given levels.

Parameters:

x: in t_bits
The source value.
k: in t_int
The bit set of levels.
result: out t_bits

Characterization: Normal function.
Topic: Generalized bit reversal.
Related functions: bswap, prim_swap, bit_index_complement, permute_bpc.

`bswap`

This function exchanges the byte order and is typically implementable by an assembler instruction.

Parameters:

x: in t_bits
The value to be acted upon.
result: out t_bits

= general_reverse_bits(x,~7)
On x86 the assembler instructions BSWAP and MOVBE do the job and act on 32 and 64 bit arguments. For 16 bit you can use ROL ax,16 or BSWAP al,ah or similar instead. For SSE usage (128 bit) you can apply the PSHUFB instruction.

Characterization: Auxiliary function.
Topic: Generalized bit reversal.
Related functions: rol, general_reverse_bits, bit_index_complement.

`prim_swap`

Swap all denoted subwords of given levels.

Parameters:

x: in t_bits
The source value.
m: in t_bits
The bit set marking the subwords to be swapped.
result: out t_bits

The highest bit of m has the special function to invert the swapping step order.

Characterization: Normal function.
Topic: Swap by primitives.
Related functions: general_reverse_bits.

`shuffle`

Shuffle/zip/interlace entities.

Parameters:

x: in t_bits
The source value.
sw1: in t_subword
log₂ of the length of the entities to move. This describes the size of what to interlace.
sw2: in t_subword
log₂ of the (independent) areas. This describes the size of where to interlace.
result: out t_bits

0 ≤ sw1 < sw2 ≤ ld_bits.
This function is the inverse of unshuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle
Related functions: unshuffle, shuffle_power, unshuffle_power, bit_index_swap.

`unshuffle`

Unshuffle/unzip/uninterlace entities.

Parameters:

x: in t_bits
The source value.
sw1: in t_subword
log₂ of the length of the entities to move. This describes the size of what to uninterlace.
sw2: in t_subword
log₂ of the (independent) areas. This describes the size of where to uninterlace.
result: out t_bits

0 ≤ sw1 < sw2 ≤ ld_bits
This function is the inverse of shuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle.
Related functions: shuffle, shuffle_power, unshuffle_power, bit_index_swap.

`compress_mask*`

`compress_mask_right`
`compress_mask_left`
`compress_mask`

Compress (gather) bits of the masks itself.

Parameters:

m: in t_bits
The mask whose bits shall be compressed.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for compress_mask
The direction as a parameter.
result: out t_bits
The compressed mask. The other bits are zeroed.

For all corresponding functions the following holds:
compress_mask*(m, …) = compress*(m,m, …)

Characterization: Normal function.
Topic: Compress and expand.
Related functions: expand*, expand_flip*.

`gen_ce*`

`gen_ce_right`
`gen_ce_left`

Generators for apply_compress* and apply_expand*.

Parameters:

self: out tr_bfly
Resulting configuration.
m: in t_bits
The mask of bits marking the places to be extracted from or to be deposited at.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Compress and expand.
Related functions: compress*, expand*, gen_cef*.

`apply_compress*`

`apply_compress_right`
`apply_compress_left`

Compress (gather) selected bits to the given direction using a precalculated configuration.

Parameters:

self: in tr_bfly
Configuration as calculated by gen_ce*.
x: in t_bits
The value where the bits shall be extracted from.
result: out t_bits
The extracted bits. The other bits are zeroed.

Characterization: Function needing a configuration.
Topic: Compress and expand.
Related functions: compress*, apply_expand*, compress_mask*.

`apply_expand*`

`apply_expand_right`
`apply_expand_left`

Expand (scatter) bits from the given direction to selected places using a precalculated configuration.

Parameters:

self: in tr_bfly
Configuration as calculated by gen_ce*.
x: in t_bits
The value where the bits shall be taken from.
result: out t_bits
The deposited bits. The other bits are zeroed.

Characterization: Function needing a configuration.
Topic: Compress and expand.
Related functions: expand*, apply_compress*.

`compress*`

`compress_right`
`compress_left`
`compress`

Compress (gather) selected bits to the given direction.

Parameters:

x: in t_bits
The value where the bits shall be extracted from.
m: in t_bits
The mask of bits marking the places to be extracted from.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for compress
The direction as a parameter.
result: out t_bits
The extracted bits. The other bits are zeroed.

Use gen_ce* and apply_compress* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PEXT implements compress_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: expand*, compress_flip*.

`expand*`

`expand_right`
`expand_left`
`expand`

Expand (scatter) bits from the given direction to selected places.

Parameters:

x: in t_bits
The value where the bits shall be taken from.
m: in t_bits
The mask of bits marking the places to be deposited at.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for expand
The direction as a parameter.
result: out t_bits
The deposited bits. The other bits are zeroed.

Use gen_ce* and apply_expand* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PDEP implements expand_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: compress*, expand_flip*.

`butterfly`

One butterfly step/stage.

Parameters:

x: in t_bits
The source value.
m: in t_bits
This is the butterfly steering mask.
sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
result: out t_bits
The shuffled bits.

For the bit layout of m please look at the diagram in Butterfly network. The unused bits should be zero, i.e.
m & a_bfly_mask[sw] = m.

Characterization: Normal function.
Topic: Butterfly network.
Related functions: bfly, ibfly.

`bfly`

Used to apply configured butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

self: in tr_bfly
The [inverse] butterfly configuration.
x: in t_bits
The source value.
result: out t_bits

This function is the inverse of ibfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: ibfly, butterfly.

`ibfly`

Used to apply configured inverse butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

self: in tr_bfly
The [inverse] butterfly configuration.
x: in t_bits
The source value.
result: out t_bits

This function is the inverse of bfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.

`bfly_parity`

Return the parity of a permutation given by a precalculated configuration of a butterfly network. This is false for even parity and true for odd parity.

Parameters:

self: in tr_bfly
Configuration as calculated by a suitable generator function.
result: out t_bool
The (odd) parity.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.

`gen_frot`

Rotate all subwords in the given direction by the given amount.
Generators for bfly (rotate right, fror) and ibfly (rotate left, frol).

Parameters:

self: out tr_bfly
Resulting configuration.
rot: in t_int
The rotate count.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: fror / frol / frot, gen_vrot.

`gen_vrot`

Rotate all subwords in the given direction by the amount given in the corresponding subwords.
Generators for bfly (rotate right, vror_bfly) and ibfly (rotate left, vrol_bfly).

Parameters:

self: out tr_bfly
Resulting configuration.
rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of the usage routine's source.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, gen_frot.

`fror_bfly` / `frol_bfly` / `frot_bfly`

`fror_bfly`
`frol_bfly`
`frot_bfly`

Rotate all subwords in the given direction: fror_bfly rotates right and frol_bfly rotates left.

Parameters:

x: in t_bits
The source value.
rot: in t_int
The rotate count.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for frot
The direction as a parameter.
result: out t_bits

Use gen_frot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation value more than once.

The alternative routines fror / frol / frot are much faster and do not need any butterfly stuff.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.

`vror_bfly` / `vrol_bfly` / `vrot_bfly`

`vror_bfly`
`vrol_bfly`
`vrot_bfly`

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror_bfly rotates right and vrol_bfly rotates left.

Parameters:

x: in t_bits
The source value.
rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of x.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for vrot_bfly
The direction as a parameter.
result: out t_bits

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.

`fror` / `frol` / `frot`

`fror`
`frol`
`frot`

Rotate all subwords in the given direction by the given amount: fror rotates right and frol rotates left.

Parameters:

x: in t_bits
The source value.
rot: in t_int
The rotate count.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for frot
The direction as a parameter.
result: out t_bits

These routines return the same values as the corresponding routines with the name postfix _bfly (fror_bfly / frol_bfly / frot_bfly) but do not need any butterfly stuff and should be much faster.

Characterization: Normal function.
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror_bfly / frol_bfly / frot_bfly, frorc / frolc / frotc.

`frorc` / `frolc` / `frotc`

`frorc`
`frolc`
`frotc`

Rotate and complement all subwords in the given direction by the given amount: frorc rotates right and frolc rotates left.

Parameters:

x: in t_bits
The source value.
rot: in t_int
The rotate count.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for frotc
The direction as a parameter.
result: out t_bits

Characterization: Normal function.
Related functions: frolc, vror / vrol / vrot, fror / frol / frot.

`vror` / `vrol` / `vrot`

`vror`
`vrol`
`vrot`

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror rotates right and vrol rotates left.

Parameters:

x: in t_bits
The source value.
rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of x.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for vrot
The direction as a parameter.
result: out t_bits

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

These routines return the same values as the corresponding routines with the name postfix _bfly (vror_bfly / vrol_bfly / vrot_bfly) but do not need any butterfly stuff and should be faster.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, fror / frol / frot.

`gen_cef*`

`gen_cef_right`
`gen_cef_left`

Generators for bfly (expand_flip*) and ibfly (compress_flip*).

Parameters:

self: out tr_bfly
Resulting configuration.
m: in t_bits
The mask of bits marking the places to be extracted from or to be deposited at.
The other bits will be stored in inverse order into the remaining places.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand_flip*, gen_ce*.

`compress_flip*`

`compress_flip_right`
`compress_flip_left`
`compress_flip`

Compress (gather) selected bits to the given direction while stuffing the remaining ones in inverse order into the remaining places.

Parameters:

x: in t_bits
The source value.
m: in t_bits
The mask of bits marking the places to be extracted from.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for compress_flip
The direction as a parameter.
result: out t_bits

Use gen_cef* and ibfly if you use the same mask more than once.
This function is the inverse of expand_flip*,

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: expand_flip*, compress*.

`expand_flip*`

`expand_flip_right`
`expand_flip_left`
`expand_flip`

Expand (scatter) bits from the given direction to selected places while stuffing the remaining ones in inverse order into the remaining places.
This function is the inverse of expand_flip*,

Parameters:

x: in t_bits
The source value.
m: in t_bits
The mask of bits marking the places to be deposited at.
sw: in t_subword
This is log₂ of the subword size. Range: 0…ld_bits.
d: in t_direction // only for expand_flip
The direction as a parameter.
result: out t_bits

Use gen_cef* and ibfly if you use the same mask more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand*.

`omega`

Simulate one omega step/stage.

Parameters:

x: in t_bits
The source value.
m: in t_bits
This is the butterfly steering mask.
sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
result: out t_bits

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: flip.

`flip`

Simulate one flip step/stage.

Parameters:

x: in t_bits
The source value.
m: in t_bits
This is the butterfly steering mask.
sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
result: out t_bits

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
m & a_bfly_mask[sw-1] = m
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: omega.

`gen_benes`

Generator for benes_fwd / benes_bwd.

Parameters:

self: out tr_benes
Resulting configuration.
c_tgt: in ta_index
The array of source bit indexes for each target bit. The indexes must be ≥0. This bit index array must not contain dupes.

This function effectively generates the configuration for a Beneš network which consists of a Butterfly network and a following inverse Butterfly network.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex. benes_fwd / benes_bwd.

`gen_benes_ex`

Generator for benes_fwd_ex / benes_bwd_ex.

Parameters:

self: out tr_benes
Resulting configuration.
c_tgt: in ta_index
The array of source bit indexes for each target bit. The indexes may be -1 to denote "don't care" entries. This bit index array must not contain dupes.
a_stage: in ta_subword
The stage order of the Butterfly network; and the inverse order of the inverse Butterfly network. The stages must not contain dupes.

This function effectively generates the configuration for a reordered Beneš network.
This function is fortunate if you make use of the reordering, otherwise the standard order in gen_benes suffices.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes. benes_fwd_ex / benes_bwd_ex.

`benes_fwd` / `benes_bwd`

`benes_fwd`
`benes_bwd`

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

self: in tr_benes
Configuration as calculated by gen_benes.
x: in t_bits
The value where the source bits shall be taken from.
result: out t_bits
The permuted bits.

For benes_fwd the index array given to gen_benes selected source indexes, and for benes_bwd the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.

`benes_fwd_ex` / `benes_bwd_ex`

`benes_fwd_ex`
`benes_bwd_ex`

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

self: in tr_benes
Configuration as calculated by gen_benes_ex.
x: in t_bits
The value where the source bits shall be taken from.
result: out t_bits
The permuted bits.

For benes_fwd_ex the index array given to gen_benes_ex selected source indexes, and for benes_bwd_ex the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex.

`benes_parity`

Return the parity of an arbitrary permutation given by a precalculated configuration of a Beneš network. This is false for even parity and true for odd parity.

Parameters:

self: in tr_benes
Configuration as calculated by gen_benes.
result: out t_bool
The (odd) parity.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.

You may bookmark this page as http://programming.sirrida.de?perm_fn.html.
Last change: 2018-05-07

Function descriptions

Contents

Intro

odd

odd

gcd

gcd

mul_inv

mul_inv

rol

rol

rol_lo

rol_lo

rolc_lo

rolc_lo

gray_code

gray_code

inverse_gray_code

inverse_gray_code

nr_1bits

nr_1bits

nr_leading_0bits

nr_leading_0bits

nr_trailing_0bits

nr_trailing_0bits

is_contiguous_1bits

is_contiguous_1bits

tbm

tbm

blend

blend

simd_odd

simd_odd

bit_permute_step

bit_permute_step

bit_permute_step_simple

bit_permute_step_simple

bit_permute_step2

bit_permute_step2

identity_perm

identity_perm

invert_perm

invert_perm

random_perm

random_perm

used_source_bits

used_source_bits

used_target_bits

used_target_bits

bit_index_complement

bit_index_complement

bit_index_swap

bit_index_swap

bit_index_swap_complement

bit_index_swap_complement

bit_index_ror

bit_index_ror

transpose

transpose

shuffle_power

shuffle_power

unshuffle_power

unshuffle_power

permute_bpc

permute_bpc

invert_bpc

invert_bpc

general_reverse_bits

general_reverse_bits

bswap

bswap

prim_swap

prim_swap

shuffle

shuffle

unshuffle

unshuffle

compress_mask*

compress_mask_right compress_mask_left compress_mask

gen_ce*

`odd`

`odd`

`gcd`

`gcd`

`mul_inv`

`mul_inv`

`rol`

`rol`

`rol_lo`

`rol_lo`

`rolc_lo`

`rolc_lo`

`gray_code`

`gray_code`

`inverse_gray_code`

`inverse_gray_code`

`nr_1bits`

`nr_1bits`

`nr_leading_0bits`

`nr_leading_0bits`

`nr_trailing_0bits`

`nr_trailing_0bits`

`is_contiguous_1bits`

`is_contiguous_1bits`

`tbm`

`tbm`

`blend`

`blend`

`simd_odd`

`simd_odd`

`bit_permute_step`

`bit_permute_step`

`bit_permute_step_simple`

`bit_permute_step_simple`

`bit_permute_step2`

`bit_permute_step2`

`identity_perm`

`identity_perm`

`invert_perm`

`invert_perm`

`random_perm`

`random_perm`

`used_source_bits`

`used_source_bits`

`used_target_bits`

`used_target_bits`

`bit_index_complement`

`bit_index_complement`

`bit_index_swap`

`bit_index_swap`

`bit_index_swap_complement`

`bit_index_swap_complement`

`bit_index_ror`

`bit_index_ror`

`transpose`

`transpose`

`shuffle_power`

`shuffle_power`

`unshuffle_power`

`unshuffle_power`

`permute_bpc`

`permute_bpc`

`invert_bpc`

`invert_bpc`

`general_reverse_bits`

`general_reverse_bits`

`bswap`

`bswap`

`prim_swap`

`prim_swap`

`shuffle`

`shuffle`

`unshuffle`

`unshuffle`

`compress_mask*`

`compress_mask_right`
`compress_mask_left`
`compress_mask`

`gen_ce*`

`gen_ce_right`
`gen_ce_left`

`apply_compress*`

`apply_compress_right`
`apply_compress_left`