 # Function descriptions

Back to main page

## Intro

This is the description of several bit permutation and other miscellaneous routines as they appear in perm_bas.* which are used by the test programs testperm.* and the permutation code generator in source, calcperm.pas resp. calcperm.cpp.

## odd

### odd

Is the given number odd? This is not a SWAR operation. This function is intrinsic in Pascal.

Parameters:

• x: in t_int
Number to be analyzed.
• result: out t_bool

Return true for an odd number and false for an even number.

Characterization: Auxiliary function.

## gcd

### gcd

Greatest common divisor. This is not a SWAR operation.

Parameters:

• a: in t_longint
One value > 0.
• b: in t_longint
One value > 0.
• result: out t_longint

Calculate the greatest common divisor of a and b.

Characterization: Auxiliary function.

## mul_inv

### mul_inv

Calculate multiplicative inverse. This is not a SWAR operation.

Parameters:

• x: in t_bits
• result: out t_bits

Calculate multiplicative inverse, i.e. mul_inv(x)*x==1. The result is defined for all odd values of x. For even x we simply return 0.

Characterization: Auxiliary function.

## rol

### rol

Rotate left a complete word.

Parameters:

• x: in t_bits
The value to be rotated.
• rot: in t_int
The rotate count. Negative values will rotate right.
Since this operation is cyclic by the operation size, only the low ld_bits bits are needed, and there is no limit on the value.
• result: out t_bits

Rotate x rot times to the left.

This should be optimized by the compiler to a single assembly instruction if inlined.

Characterization: Auxiliary function.
Related functions: rol_lo, bswap.

## rol_lo

### rol_lo

Rotate left a single subword. This is not a SWAR operation.

Parameters:

• x: in t_bits
The value to be rotated.
• rot: in t_int
The rotate count. Negative values will rotate/complement right.
Since this operation is cyclic by 2sw, only the low sw bits are needed and there is no limit on the value.
• sw: in t_subword
log2 of the number of affected bits. Range: 0…ld_bits.
• result: out t_bits

Rotate x rot times to the left.

Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol, rolc_lo.

## rolc_lo

### rolc_lo

Rotate left and complement a single subword. This is not a SWAR operation.

Parameters:

• x: in t_bits
The value to be rotated.
• rot: in t_int
The rotate count. Negative values will rotate/complement right.
Since this operation is cyclic by 2sw+1, only the low sw+1 bits are needed and there is no limit on the value. The +1 is due to the complementation.
• sw: in t_subword
log2 of the number of affected bits. Range: 0…ld_bits.
• result: out t_bits

Rotate x rot times to the left and complement each bit as it is rotated. In other words: Each bit that falls out is inverted and re-inserted on the other end.

Only the low 2sw bits of x are processed and returned; the remaining bits are zeroed.

Characterization: Auxiliary function.
Related functions: rol_lo, frolc.

## gray_code

### gray_code

Convert a number to Gray code. This is not a SWAR operation.

Parameters:

• x: in t_bits
• result: out t_bits

Convert a number to Gray code. This is the inverse of inverse_gray_code.

Characterization: Auxiliary function.
Related functions: inverse_gray_code.

## inverse_gray_code

### inverse_gray_code

Convert a number to inverse Gray code. This is not a SWAR operation.

Parameters:

• x: in t_bits
• result: out t_bits

Convert a number to the inverse Gray code. This is the inverse of gray_code.

Characterization: Auxiliary function.
Related functions: gray_code.

## nr_1bits

### nr_1bits

Count the set bits. Also known as population count.

Parameters:

• x: in t_bits
The value to be analyzed.
• result: out t_int

Newer x86 processors know this operation as POPCNT.
A number x has even/odd parity if nr_1bits(x) is even/odd.

Characterization: Auxiliary function.

## nr_leading_0bits

This function counts the leading zero bits of x, i.e. the length of the contiguous string of zero bits from the most significant toward the least significant bit.

Parameters:

• x: in t_bits
The value to be analyzed.
• result: out t_int

The result is bits if x is zero.
Newer x86 processors know this operation as LZCNT (which has a dedicated CPUID feature flag). It can easily be emulated with the older instruction BSR and some glue code.

Characterization: Auxiliary function.
Related functions: nr_trailing_0bits.

## nr_trailing_0bits

### nr_trailing_0bits

This function counts the trailing zero bits of x, i.e. the length of the contiguous string of zero bits from the least significant toward the most significant bit.

Parameters:

• x: in t_bits
The value to be analyzed.
• result: out t_int

The result is bits if x is zero.
This operation is part of the BMI1 instruction set and named TZCNT. The older BSF instruction almost does the same (albeit with an undefined behavior for x=0).

Characterization: Auxiliary function.

## is_contiguous_1bits

### is_contiguous_1bits

This function analyzes whether x seen as a bit string contains maximally one contiguous string of one bits.

Parameters:

• x: in t_bits
The value to be analyzed.
• result: out t_bool
true only if x is zero or contains only one contiguous string of one bits.

Characterization: Auxiliary function.

## tbm

### tbm

This function performs general trailing bit modifications, i.e. manipulation of rightmost bits. See manipulating rightmost bits for a detailed description.

Parameters:

• x: in t_bits
The value to be acted upon.
• mode: in t_int
The function index (mode) to be applied.
• result: out t_bool

Characterization: Auxiliary function.

## blend

### blend

This is the bit equivalent to if m then x else y (Pascal) or m?x:y (C/C++).

Parameters:

• m: in t_bits
• x: in t_bits
The source bits selected when the mask bits are 1.
• y: in t_bits
The source bits selected when the mask bits are 0.
• result: out t_bits

Characterization: Auxiliary function.

Here is the implementation:

```  t_bits blend(t_bits m, t_bits x, t_bits y) {
return (m & x) | (~m & y);
}
```

## simd_odd

### simd_odd

Set mask denoting odd subword values, i.e. propagate each least significant bit to the left.

Parameters:

• x: in t_bits
The source.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• result: out t_bits

Characterization: Auxiliary function.

## bit_permute_step

### bit_permute_step

This is a permutation primitive. It is also called delta swap.

Parameters:

• x: in t_bits
The value to be acted upon.
• m: in t_bits
• shift: in t_uint
The shift value. Range: 0…bits-1.
• result: out t_bits

The subset of bits of x masked by m are exchanged with the ones shifted left by shift.
To work as described, the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0 and no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
If also m ^ (m << shift) == all_bits, the function bit_permute_step_simple is a simpler replacement.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles. Superscalar ARM processors need at least 4 cycles.

Characterization: Auxiliary function.
Related functions: bit_permute_step_simple, bit_permute_step2.

Here is the implementation:

```  t_bits bit_permute_step(t_bits x, t_bits m, t_uint shift) {
t_bits t;
t = ((x >> shift) ^ x) & m;
x = (x ^ t) ^ (t << shift);
return x;
}
```

An implementation in x86 assembler with constant mask and shift (eaxeax):

```  mov edx,eax
shr eax,SHIFT
xor eax,edx
xor edx,eax
shl eax,SHIFT
xor eax,edx
```

## bit_permute_step_simple

### bit_permute_step_simple

This is a permutation primitive.

Parameters:

• x: in t_bits
The value to be acted upon.
• m: in t_bits
• shift: in t_uint
The shift value. Range: 0…bits-1.
• result: out t_bits

The subset of bits of x masked by m are exchanged with the ones shifted left by shift.
To work as described, the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0 and no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
Also all relevant bits n must be affected:
or
((m ^ (m << shift)) & relevant_bits) == relevant_bits
since all bits not caught by m and m << shift are cleared.
If all this is not the case the needed function is bit_permute_step.
On superscalar x86 processors a proper implementation of this routine typically needs 4 cycles. For future processors it might be calculated in 3 cycles when RORX (BMI2) is used or when MOV instructions cost no latency. Superscalar ARM processors need 2 at least cycles.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step2.

Here is the implementation:

```  t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
return ((x & m) << shift) | ((x >> shift) & m);
}
```

An implementation in x86 assembler with constant mask and shift (eaxeax):

```  mov edx,eax
shr eax,SHIFT
shl edx,SHIFT
or eax,edx
```

## bit_permute_step2

### bit_permute_step2

This is a permutation primitive. It is also called delta swap. It can be used to extend to functionality of bit_permute_step to multiple words.

Parameters:

• x1: in out t_bits
The first value to be acted upon.
• x2: in out t_bits
The second value to be acted upon.
• m: in t_bits
• shift: in t_uint
The shift value. Range: 0…bits-1.

The subset of bits of *x1 masked by m are exchanged with the ones of *x2 masked by m << shift.
To work as described, no bits must be shifted away, i.e. ((m << shift) >> shift) == m.
If x1 == x2 (same address), the mask and the shifted mask must not have bits in common, i.e. m & (m << shift) == 0; also the function bit_permute_step is a simpler replacement in this case.
If also m ^ (m << shift) == all_bits, the function bit_permute_step_simple is a yet simpler replacement.

Characterization: Auxiliary function.
Related functions: bit_permute_step, bit_permute_step_simple.

Here is the implementation:

```  void bit_permute_step2(t_bits* x1, t_bits* x2, t_bits m, t_uint shift) {
t_bits t;
t = ((*x2 >> shift) ^ *x1) & m;
*x1 = *x1 ^ t;
*x2 = *x2 ^ (t << shift);
}
```

An implementation in x86 assembler with constant mask and shift and x1 (eax) and x2 (edx) as values instead of references which should cost 5 cycles on superscalar x86 processors (eax, edxeax, edx):

```  mov ecx,edx
shr edx,SHIFT
xor edx,eax
xor eax,edx
shl edx,SHIFT
xor edx,ecx
```

## identity_perm

### identity_perm

This function supplies the identity index vector, i.e. for all i: tgt[i]=i.

Parameters:

• tgt: out ta_subword
The index vector (0…ld_bits-1) for the identity permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, invert_bpc, random_perm.

## invert_perm

### invert_perm

This function calculates the index vector as needed as a parameter for the generator of the inverse permutation with a Beneš network.

Parameters:

• src: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
• tgt: out ta_subword
The index vector (0…ld_bits-1) for the inverted permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_bpc, random_perm.

## random_perm

### random_perm

This function calculates a randomized index vector as needed e.g. as a parameter for the Beneš network generator.

Parameters:

• tgt: out ta_subword
The index vector (0…ld_bits-1) for a randomized permutation. Range: 0…ld_bits-1.

To apply the resulting permutation parameters you may use gen_benes.

Characterization: Auxiliary function.
Topics:
Related functions: gen_benes, identity_perm, invert_perm.

## used_source_bits

### used_source_bits

This function collects the referenced source indexes from the given index vector as a bit set.

Parameters:

• perm: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
• result: out t_bits
The set of used source indexes.

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.

## used_target_bits

### used_target_bits

This function collects target indexes, i.e. the positions of specified (i.e. ≠ no_index) indexes, from the given index vector as a bit set.

Parameters:

• perm: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1, or -1 (no_index) for unspecified. All specified values must be different.
• result: out t_bits
The set of specified indexes.

Characterization: Auxiliary function.
Topics:
Related functions: identity_perm, invert_bpc, used_target_bits.

## bit_index_complement

### bit_index_complement

Complement one bit index.

Parameters:

• x: in t_bits
The source value.
• k: in t_subword
The offending bit index. Range: 0…ld_bits-1.
• result: out t_bits

This is effectively the building block of general_reverse_bits.
On superscalar x86 processors a proper implementation of this routine typically needs 3…4 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, bit_index_swap_complement, general_reverse_bits, bswap, permute_bpc.

## bit_index_swap

### bit_index_swap

Exchange two bit indexes.

Parameters:

• x: in t_bits
The source value.
• j: in t_subword
The first offending bit index. Range: 0…ld_bits-1.
• k: in t_subword
The second offending bit index. Range: 0…ld_bits-1.
• result: out t_bits

This is effectively the building block of shuffle, unshuffle, and bit_index_ror.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap_complement, bit_index_ror, transpose, shuffle, unshuffle, shuffle_power, unshuffle_power, permute_bpc.

## bit_index_swap_complement

### bit_index_swap_complement

Exchange and complement two bit indexes.

Parameters:

• x: in t_bits
The source value.
• j: in t_subword
The first offending bit index. Range: 0…ld_bits-1.
• k: in t_subword
The second offending bit index. Range: 0…ld_bits-1.
• result: out t_bits

This function combines bit_index_swap and two times bit_index_complement.
On superscalar x86 processors a proper implementation of this routine typically needs 5 cycles.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_complement, bit_index_swap, permute_bpc.

## bit_index_ror

### bit_index_ror

Rotate an bit index field to the right.

Parameters:

• x: in t_bits
The source value.
• ofs: in t_subword
The number of lower bit index to be left unchanged (entities). Range: 0…ld_bits-1.
• field: in t_subword
The number of bit indexes to be rotated. Range: 0…ld_bits-1.
• rot: in t_int
The rotate count. Since this operation is cyclic modulo field there is no practical limit on the value.
• result: out t_bits

0 ≤ field+ofsld_bits.
The bit index field is rotated to the right by rot. All other bit indexes are left unchanged.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_swap, transpose, shuffle_power, unshuffle_power, shuffle, unshuffle, permute_bpc.

## transpose

### transpose

Transpose matrixes.

Parameters:

• x: in t_bits
The source value.
• ld_fields: in t_subword
log2 of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
• ld_col: in t_subword
log2 of the matrix width (columns) of the matrix. Range: 0…ld_bits-1.
• ld_row: in t_subword
log2 of the matrix height (rows) of the matrix. Range: 0…ld_bits-1.
• result: out t_bits

0 ≤ sw_entities+ld_row+ld_colld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topic: Bit index operations.
Related functions: bit_index_ror, permute_bpc.

## shuffle_power

### shuffle_power

This function calculates powers of shuffle.

Parameters:

• x: in t_bits
The source value.
• sw1: in t_subword
log2 of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
• sw2: in t_subword
log2 of affected bit fields. Range: 0…ld_bits-1.
• pwr: in t_int
The number of shuffle operations to be emulated. Since this operation is cyclic modulo field there is no practical limit on the value.
• result: out t_bits

0 ≤ sw1sw2ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, unshuffle_power, shuffle, permute_bpc.

## unshuffle_power

### unshuffle_power

This function calculates powers of unshuffle.

Parameters:

• x: in t_bits
The source value.
• sw1: in t_subword
log2 of the width of the bit fields (entities) to be moved. Range: 0…ld_bits-1.
• sw2: in t_subword
log2 of affected bit fields. Range: 0…ld_bits-1.
• pwr: in t_int
The number of unshuffle operations to be emulated. Since this operation is cyclic modulo field there is no practical limit on the value.
• result: out t_bits

0 ≤ sw1sw2ld_bits.
This function is essentially the same as bit_index_ror, albeit with slightly different parameters.

Characterization: Normal function.
Topics: Bit index operations, Shuffle and unshuffle.
Related functions: bit_index_ror, shuffle_power, unshuffle, permute_bpc.

## permute_bpc

### permute_bpc

This function calculates general BPC permutations.

Parameters:

• x: in t_bits
The source value.
• tgt: in ta_subword
The vector (0…ld_bits-1) of target bit indexes. Range: 0…ld_bits-1. All ld_bits must be different.
• k: in t_subword_set
The bit set of inversions (set of t_subword).
• result: out t_bits

To generate the reverse permutation you may use invert_bpc.

Characterization: Normal function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, invert_bpc.

## invert_bpc

### invert_bpc

This function calculates the index vector and complement set as needed as parameters for the inverse BPC permutations.

Parameters:

• src: in ta_subword
The original index vector (0…ld_bits-1). Range: 0…ld_bits-1. All ld_bits must be different.
• src_k: in t_subword_set
The original bit set of inversions.
• tgt: out ta_subword
The index vector (0…ld_bits-1) for the inverted permutation. Range: 0…ld_bits-1.
• tgt_k: out t_subword_set
The bit set of inversions for the inverted permutation (set of t_subword).

To apply the resulting permutation parameters you may use permute_bpc.

Characterization: Auxiliary function.
Topics: Generalized bit reversal, Bit index operations, BPC permutations.
Related functions: bit_index_complement, bit_index_swap, bit_index_swap_complement, general_reverse_bits, permute_bpc, invert_perm.

## general_reverse_bits

### general_reverse_bits

Swap all subwords of given levels.

Parameters:

• x: in t_bits
The source value.
• k: in t_int
The bit set of levels.
• result: out t_bits

Characterization: Normal function.
Topic: Generalized bit reversal.
Related functions: bswap, prim_swap, bit_index_complement, permute_bpc.

## bswap

### bswap

This function exchanges the byte order and is typically implementable by an assembler instruction.

Parameters:

• x: in t_bits
The value to be acted upon.
• result: out t_bits

= general_reverse_bits(x,~7)
On x86 the assembler instructions BSWAP and MOVBE do the job and act on 32 and 64 bit arguments. For 16 bit you can use ROL ax,16 or BSWAP al,ah or similar instead. For SSE usage (128 bit) you can apply the PSHUFB instruction.

Characterization: Auxiliary function.
Topic: Generalized bit reversal.
Related functions: rol, general_reverse_bits, bit_index_complement.

## prim_swap

### prim_swap

Swap all denoted subwords of given levels.

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
The bit set marking the subwords to be swapped.
• result: out t_bits

The highest bit of m has the special function to invert the swapping step order.

Characterization: Normal function.
Topic: Swap by primitives.
Related functions: general_reverse_bits.

## shuffle

### shuffle

Shuffle/zip/interlace entities.

Parameters:

• x: in t_bits
The source value.
• sw1: in t_subword
log2 of the length of the entities to move. This describes the size of what to interlace.
• sw2: in t_subword
log2 of the (independent) areas. This describes the size of where to interlace.
• result: out t_bits

0 ≤ sw1 < sw2ld_bits.
This function is the inverse of unshuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle
Related functions: unshuffle, shuffle_power, unshuffle_power, bit_index_swap.

## unshuffle

### unshuffle

Unshuffle/unzip/uninterlace entities.

Parameters:

• x: in t_bits
The source value.
• sw1: in t_subword
log2 of the length of the entities to move. This describes the size of what to uninterlace.
• sw2: in t_subword
log2 of the (independent) areas. This describes the size of where to uninterlace.
• result: out t_bits

0 ≤ sw1 < sw2ld_bits
This function is the inverse of shuffle.

Characterization: Normal function.
Topic: Shuffle and unshuffle.
Related functions: shuffle, shuffle_power, unshuffle_power, bit_index_swap.

## compress_mask*

Compress (gather) bits of the masks itself.

Parameters:

• m: in t_bits
The mask whose bits shall be compressed.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for compress_mask
The direction as a parameter.
• result: out t_bits
The compressed mask. The other bits are zeroed.

For all corresponding functions the following holds:

Characterization: Normal function.
Topic: Compress and expand.
Related functions: expand*, expand_flip*.

## gen_ce*

### gen_ce_rightgen_ce_left

Generators for apply_compress* and apply_expand*.

Parameters:

• self: out tr_bfly
Resulting configuration.
• m: in t_bits
The mask of bits marking the places to be extracted from or to be deposited at.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Compress and expand.
Related functions: compress*, expand*, gen_cef*.

## apply_compress*

### apply_compress_rightapply_compress_left

Compress (gather) selected bits to the given direction using a precalculated configuration.

Parameters:

• self: in tr_bfly
Configuration as calculated by gen_ce*.
• x: in t_bits
The value where the bits shall be extracted from.
• result: out t_bits
The extracted bits. The other bits are zeroed.

Characterization: Function needing a configuration.
Topic: Compress and expand.

## apply_expand*

### apply_expand_rightapply_expand_left

Expand (scatter) bits from the given direction to selected places using a precalculated configuration.

Parameters:

• self: in tr_bfly
Configuration as calculated by gen_ce*.
• x: in t_bits
The value where the bits shall be taken from.
• result: out t_bits
The deposited bits. The other bits are zeroed.

Characterization: Function needing a configuration.
Topic: Compress and expand.
Related functions: expand*, apply_compress*.

## compress*

### compress_rightcompress_leftcompress

Compress (gather) selected bits to the given direction.

Parameters:

• x: in t_bits
The value where the bits shall be extracted from.
• m: in t_bits
The mask of bits marking the places to be extracted from.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for compress
The direction as a parameter.
• result: out t_bits
The extracted bits. The other bits are zeroed.

Use gen_ce* and apply_compress* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PEXT implements compress_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: expand*, compress_flip*.

## expand*

### expand_rightexpand_leftexpand

Expand (scatter) bits from the given direction to selected places.

Parameters:

• x: in t_bits
The value where the bits shall be taken from.
• m: in t_bits
The mask of bits marking the places to be deposited at.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for expand
The direction as a parameter.
• result: out t_bits
The deposited bits. The other bits are zeroed.

Use gen_ce* and apply_expand* with the same direction if you use the same mask more than once.
The instruction for x86 processors named PDEP implements expand_right with sw=ld_bits.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress and expand.
Related functions: compress*, expand_flip*.

## butterfly

### butterfly

One butterfly step/stage.

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
This is the butterfly steering mask.
• sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
• result: out t_bits
The shuffled bits.

For the bit layout of m please look at the diagram in Butterfly network. The unused bits should be zero, i.e.

Characterization: Normal function.
Topic: Butterfly network.
Related functions: bfly, ibfly.

## bfly

### bfly

Used to apply configured butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

• self: in tr_bfly
The [inverse] butterfly configuration.
• x: in t_bits
The source value.
• result: out t_bits

This function is the inverse of ibfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: ibfly, butterfly.

## ibfly

### ibfly

Used to apply configured inverse butterfly network on x configured by gen_frot, gen_vrot or gen_cef*.

Parameters:

• self: in tr_bfly
The [inverse] butterfly configuration.
• x: in t_bits
The source value.
• result: out t_bits

This function is the inverse of bfly.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.

## bfly_parity

### bfly_parity

Return the parity of a permutation given by a precalculated configuration of a butterfly network. This is false for even parity and true for odd parity.

Parameters:

• self: in tr_bfly
Configuration as calculated by a suitable generator function.
• result: out t_bool
The (odd) parity.

Characterization: Function needing a configuration.
Topic: Butterfly network.
Related functions: bfly, butterfly.

## gen_frot

### gen_frot

Rotate all subwords in the given direction by the given amount.
Generators for bfly (rotate right, fror) and ibfly (rotate left, frol).

Parameters:

• self: out tr_bfly
Resulting configuration.
• rot: in t_int
The rotate count.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: fror / frol / frot, gen_vrot.

## gen_vrot

### gen_vrot

Rotate all subwords in the given direction by the amount given in the corresponding subwords.
Generators for bfly (rotate right, vror_bfly) and ibfly (rotate left, vrol_bfly).

Parameters:

• self: out tr_bfly
Resulting configuration.
• rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of the usage routine's source.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, gen_frot.

## fror_bfly / frol_bfly / frot_bfly

### fror_bflyfrol_bflyfrot_bfly

Rotate all subwords in the given direction: fror_bfly rotates right and frol_bfly rotates left.

Parameters:

• x: in t_bits
The source value.
• rot: in t_int
The rotate count.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for frot
The direction as a parameter.
• result: out t_bits

Use gen_frot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation value more than once.

The alternative routines fror / frol / frot are much faster and do not need any butterfly stuff.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.

## vror_bfly / vrol_bfly / vrot_bfly

### vror_bflyvrol_bflyvrot_bfly

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror_bfly rotates right and vrol_bfly rotates left.

Parameters:

• x: in t_bits
The source value.
• rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of x.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for vrot_bfly
The direction as a parameter.
• result: out t_bits

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror / frol / frot.

## fror / frol / frot

### frorfrolfrot

Rotate all subwords in the given direction by the given amount: fror rotates right and frol rotates left.

Parameters:

• x: in t_bits
The source value.
• rot: in t_int
The rotate count.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for frot
The direction as a parameter.
• result: out t_bits

These routines return the same values as the corresponding routines with the name postfix _bfly (fror_bfly / frol_bfly / frot_bfly) but do not need any butterfly stuff and should be much faster.

Characterization: Normal function.
Topic: Rotate via butterfly.
Related functions: vror / vrol / vrot, fror_bfly / frol_bfly / frot_bfly, frorc / frolc / frotc.

## frorc / frolc / frotc

### frorcfrolcfrotc

Rotate and complement all subwords in the given direction by the given amount: frorc rotates right and frolc rotates left.

Parameters:

• x: in t_bits
The source value.
• rot: in t_int
The rotate count.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for frotc
The direction as a parameter.
• result: out t_bits

Characterization: Normal function.
Related functions: frolc, vror / vrol / vrot, fror / frol / frot.

## vror / vrol / vrot

### vrorvrolvrot

Rotate all subwords in the given direction by the amount given in the corresponding subwords: vror rotates right and vrol rotates left.

Parameters:

• x: in t_bits
The source value.
• rot: in t_bits
The "mask" containing subwords holding the rotate counts for the corresponding subwords of x.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for vrot
The direction as a parameter.
• result: out t_bits

Use gen_vrot and bfly (rotate right) or ibfly (rotate left). if you use the same rotation values (rot) more than once.

These routines return the same values as the corresponding routines with the name postfix _bfly (vror_bfly / vrol_bfly / vrot_bfly) but do not need any butterfly stuff and should be faster.

Characterization: Normal function (compound, with temporary configuration).
Topic: Rotate via butterfly.
Related functions: vror_bfly / vrol_bfly / vrot_bfly, fror / frol / frot.

## gen_cef*

### gen_cef_rightgen_cef_left

Generators for bfly (expand_flip*) and ibfly (compress_flip*).

Parameters:

• self: out tr_bfly
Resulting configuration.
• m: in t_bits
The mask of bits marking the places to be extracted from or to be deposited at.
The other bits will be stored in inverse order into the remaining places.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.

Characterization: Function to generate a configuration.
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand_flip*, gen_ce*.

## compress_flip*

### compress_flip_rightcompress_flip_leftcompress_flip

Compress (gather) selected bits to the given direction while stuffing the remaining ones in inverse order into the remaining places.

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
The mask of bits marking the places to be extracted from.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for compress_flip
The direction as a parameter.
• result: out t_bits

Use gen_cef* and ibfly if you use the same mask more than once.
This function is the inverse of expand_flip*,

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: expand_flip*, compress*.

## expand_flip*

### expand_flip_rightexpand_flip_leftexpand_flip

Expand (scatter) bits from the given direction to selected places while stuffing the remaining ones in inverse order into the remaining places.
This function is the inverse of expand_flip*,

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
The mask of bits marking the places to be deposited at.
• sw: in t_subword
This is log2 of the subword size. Range: 0…ld_bits.
• d: in t_direction   // only for expand_flip
The direction as a parameter.
• result: out t_bits

Use gen_cef* and ibfly if you use the same mask more than once.

Characterization: Normal function (compound, with temporary configuration).
Topic: Compress/expand-flip.
Related functions: compress_flip*, expand*.

## omega

### omega

Simulate one omega step/stage.

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
This is the butterfly steering mask.
• sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
• result: out t_bits

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: flip.

## flip

### flip

Simulate one flip step/stage.

Parameters:

• x: in t_bits
The source value.
• m: in t_bits
This is the butterfly steering mask.
• sw: in t_subword
This is the butterfly stage number. Range: 0…ld_bits-1.
• result: out t_bits

Only the lower bits of the subwords of m are used. The remaining bits should be zero, i.e.
This function is the inverse of flip*,

Characterization: Normal function.
Topic: Omega-flip network.
Related functions: omega.

## gen_benes

### gen_benes

Generator for benes_fwd / benes_bwd.

Parameters:

• self: out tr_benes
Resulting configuration.
• c_tgt: in ta_index
The array of source bit indexes for each target bit. The indexes must be ≥0. This bit index array must not contain dupes.

This function effectively generates the configuration for a Beneš network which consists of a Butterfly network and a following inverse Butterfly network.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex. benes_fwd / benes_bwd.

## gen_benes_ex

### gen_benes_ex

Generator for benes_fwd_ex / benes_bwd_ex.

Parameters:

• self: out tr_benes
Resulting configuration.
• c_tgt: in ta_index
The array of source bit indexes for each target bit. The indexes may be -1 to denote "don't care" entries. This bit index array must not contain dupes.
• a_stage: in ta_subword
The stage order of the Butterfly network; and the inverse order of the inverse Butterfly network. The stages must not contain dupes.

This function effectively generates the configuration for a reordered Beneš network.
This function is fortunate if you make use of the reordering, otherwise the standard order in gen_benes suffices.

Characterization: Function to generate a configuration.
Topic: Beneš network.
Related functions: gen_benes. benes_fwd_ex / benes_bwd_ex.

## benes_fwd / benes_bwd

### benes_fwdbenes_bwd

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

• self: in tr_benes
Configuration as calculated by gen_benes.
• x: in t_bits
The value where the source bits shall be taken from.
• result: out t_bits
The permuted bits.

For benes_fwd the index array given to gen_benes selected source indexes, and for benes_bwd the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.

## benes_fwd_ex / benes_bwd_ex

### benes_fwd_exbenes_bwd_ex

Perform an arbitrary permutation using a precalculated configuration.

Parameters:

• self: in tr_benes
Configuration as calculated by gen_benes_ex.
• x: in t_bits
The value where the source bits shall be taken from.
• result: out t_bits
The permuted bits.

For benes_fwd_ex the index array given to gen_benes_ex selected source indexes, and for benes_bwd_ex the index array selected target indexes.
These functions are the inverse of each other.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes_ex.

## benes_parity

### benes_parity

Return the parity of an arbitrary permutation given by a precalculated configuration of a Beneš network. This is false for even parity and true for odd parity.

Parameters:

• self: in tr_benes
Configuration as calculated by gen_benes.
• result: out t_bool
The (odd) parity.

Characterization: Function needing a configuration.
Topic: Beneš network.
Related functions: gen_benes.