Contents
Comments or questions? E-mail:
info@sirrida.de
Intro
This is a description of
SIMD
/
SWAR
operations
as well as some routines performing them.
You can
download
an
implementation of the routines.
The library was partly inspired by the proposal of
standardizing bit operations
in C++ and could serve as such.
Some stuff found on other places of this site is repeated here.
The routines described here
usually have an additional parameter which describes the
subword size.
During the history of processors the register sizes
have been increased several times, usually by doubling the size.
For example, the first microprocessor
Intel 4004
(1971)
had registers of a size of only 4 bit.
Later cousins had 8, 16, 32, and 64 bit register sizes.
While these wide registers allow for huge numbers,
very often only a small subrange of these numbers are really needed
thus wasting memory, processing capacity and power usage.
On the other hand many operations act on several elements in the same way.
In order to cope with this,
some architectures sported some kind of mini-vectors
such as
3d-now (AMD),
MMX, SSE, AVX (Intel)
or
Altivec (PowerPC).
There are means to simulate some these functions
with a fair amount of overhead
(but without accessing each subelement one by one)
as is done in my library.
The emulation is most effective for small subwords within large words
as the overhead per element is minimized.
The acronym "SWAR" was coined in fall 1996 by
Henry G. (Hank) Dietz
and
Randall J. Fisher
as an abbreviation to "SIMD within a register"
where "SIMD" means "single instruction, multiple data".
Some operations such as the binary operations NOT, AND, OR and XOR
are inherently working in parallel on arbitrary subwords
since there is no interaction between bits on different places.
However most others such as addition have to deal with such interactions.
As an example we work out how to get
addition to work on subwords
without letting carries from one subword disturb other subwords
which will occur for an unprepared addition of whole words.
3 bit addition:
+ |
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
000 |
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
001 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
1000 |
010 |
010 |
011 |
100 |
101 |
110 |
111 |
1000 |
1001 |
011 |
011 |
100 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
100 |
100 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
1011 |
101 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
1011 |
1100 |
110 |
110 |
111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
111 |
111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
As you may notice there is no carry to the next subword
(overflow, extra leftmost digit)
when both operands have their MSB clear (yellow quadrant).
One approach is to simply use these smaller values,
i.e. set the MSB always to zero ("spacer bits")
before and after the addition.
This is very fast and easy but we loose these spacer bits for calculation.
More interesting is utilizing these bits too
at the expense of some more effort.
The 3 other quadrants can be constructed to be without overflow
from the simple one
by XORing the result's MSB with the MSB of both operands
since XOR is a carry-less addition of single bits.
Amazingly, this even works when we additionally add 1,
i.e. a+b+1.
We also call this 1 "carry" (c).
This extra 1 can be utilized e.g. for subtraction (see below).
Thus we may write the expected sum without carries between subwords as follows
(M: MSB mask = 100, xx: mask of mantissa = 011):
simple_sum = (a & xx) + (b & xx) + c
sum = simple_sum ^ (a & M) ^ (b & M) = simple_sum ^ ((a ^ b) & M)
We can thus perform multiple additions at once on disjunct bit fields
such as 8 4-bit-fields within a 32 bit word.
All we need to let this kind of calculation
create the expected results is the relevant masks
such as MSB.
In this case even irregular subword partitions are possible
which might be of interest, e.g.
for holding dimension vectors
or color values ("Hi-Color": 5+6+5 bits) of pixels.
By the way:
a-b can be calculated as a+(-b) = a+(~b+1) = a+~b+1
using the algorithm just described.
Moreover, a-b-1 = a+(-b)-1 = a+(~b+1)-1 = a+~b.
Overview
I have started to provide a quite comprehensive library of such functions
by using features of newer versions of C++.
You will find these functions as well as some test code in
swartmps.cpp.
Several routines are presumably published here for the first time.
On this site I assume that the underlying machine has a
word
size of a power of 2 using two's complement for
integer numbers.
For the very rare non-conforming systems
I have not taken any precautions
and thus you should not use the routines on them
but with extreme care and exhaustive tests.
In most cases it is easily possible to provide means to
support subword sizes which are not powers of 2 or even non-uniform ones
but this implies more complex parameterizing,
thus for the sake of simplicity this is left out.
Concrete classes
For convenience the template classes have standard implementations
for the usual 8, 16, 32 and 64 bit number types.
If supported, 128 bit is also implemented.
There are these concrete types:
Base \ Bits |
8 |
16 |
32 |
64 |
128 |
t_bits<...> |
t_bits_8 |
t_bits_16 |
t_bits_32 |
t_bits_64 |
t_bits_128 |
t_simd<...> |
t_simd_8 |
t_simd_16 |
t_simd_32 |
t_simd_64 |
t_simd_128 |
t_bfly<...> |
t_bfly_8 |
t_bfly_16 |
t_bfly_32 |
t_bfly_64 |
t_bfly_128 |
t_cef<...> |
t_cef_8 |
t_cef_16 |
t_cef_32 |
t_cef_64 |
t_cef_128 |
t_ce_right<...> |
t_ce_right_8 |
t_ce_right_16 |
t_ce_right_32 |
t_ce_right_64 |
t_ce_right_128 |
t_ce_left<...> |
t_ce_left_8 |
t_ce_left_16 |
t_ce_left_32 |
t_ce_left_64 |
t_ce_left_128 |
t_vrot<...> |
t_vrot_8 |
t_vrot_16 |
t_vrot_32 |
t_vrot_64 |
t_vrot_128 |
Also, there some test routines which check the integrity of the library
and which can be seen as a sample application as well.
General functions which ought to be in std::
-
M_abs(x,y): Absolute value.
-
M_gcd(x,y): Greatest common divisor.
-
M_odd(x): Return true for odd values, false for even.
-
M_sgn(x): Return the sign: <0: -1, =0: 0, >0: 1
-
M_min(x,y): Minimum value.
-
M_max(x,y): Maximum value.
-
M_abs_diff(x,y):
Equivalent to abs(x-y) via a larger type;
works for signed and unsigned.
Result type ought to be unsigned.
-
M_sgn_diff(x,y):
Equivalent to sgn(x-y) via a larger type;
works for signed and unsigned.
t_subword
This is the enumeration type t_subword of all subword sizes.
It can be casted to int and then interpreted as the
binary logarithm (log2)
of the subword size in bits, i.e.
(int) (t_subword::swar_byte) == 3
A byte has 23 = 8 bits.
Name |
log2(bits) |
bits |
bit | 0 | 1 |
nyp | 1 | 2 |
nibble | 2 | 4 |
byte | 3 | 8 |
word | 4 | 16 |
dword | 5 | 32 |
qword | 6 | 64 |
dqword | 7 | 128 |
t_tbm_mode
This is the enumeration type t_tbm_mode of all modes for
tbm
(trailing bit modification operations), see
t_bits_base.
Name |
Value |
Description |
lo_0 | 0x00 | Set lower bits to 0 |
lo_1 | 0x01 | Set lower bits to 1 |
lo_mask | 0x01 | Set lower bits (mask) |
tgt_0 | 0x00 | Set found least significant bit to 0 |
tgt_1 | 0x02 | Set found least significant bit to 1 |
tgt_mask | 0x02 | Set found least significant bit (mask) |
hi_0 | 0x00 | Set upper bits to 0 |
hi_1 | 0x04 | Set upper bits to 1 |
hi_keep | 0x08 | Keep upper bits |
hi_invert | 0x0c | Invert upper bits |
hi_mask | 0x0c | Treat upper bits (mask) |
search_0 | 0x00 | Search for least significant 0 bit |
search_1 | 0x10 | Search for least significant 1 bit |
search_mask | 0x10 | Search for least significant (mask) |
t_round_mode
This is the enumeration type t_round_mode of all
floating point rounding modes, see also
rounding modes of Java.
In some cases this mode can also be applied to integer arithmetic.
For unsigned functions u/d are preferred over f/c (same meaning).
Suffix |
Name |
Description |
-2 | -1.6 | -1.5 | -1.4 |
-1 | -0.6 | -0.5 | -0.4 |
0 | 0.4 | 0.5 | 0.6 |
1 | 1.4 | 1.5 | 1.6 |
2 |
e |
half_even |
round, *.5: to next even, banker's rounding, x87:0 |
-2 | -2 | -2 | -1 |
-1 | -1 | 0 | 0 |
0 | 0 | 0 | 1 |
1 | 1 | 2 | 2 |
2 |
f |
floor |
floor, toward -infinity, sar, x87:1 |
-2 | -2 | -2 | -2 |
-1 | -1 | -1 | -1 |
0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 |
2 |
c |
ceil |
ceil, toward infinity, x87:2 |
-2 | -1 | -1 | -1 |
-1 | 0 | 0 | 0 |
0 | 1 | 1 | 1 |
1 | 2 | 2 | 2 |
2 |
d |
down |
trunc, toward 0, chop, div, x87:3 |
-2 | -1 | -1 | -1 |
-1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 |
2 |
u |
up |
away from 0 |
-2 | -2 | -2 | -2 |
-1 | -1 | -1 | -1 |
0 | 1 | 1 | 1 |
1 | 2 | 2 | 2 |
2 |
o |
half_odd |
round, *.5: to next odd |
-2 | -2 | -1 | -1 |
-1 | -1 | -1 | 0 |
0 | 0 | 1 | 1 |
1 | 1 | 1 | 2 |
2 |
hf |
half_floor |
round, *.5: floor |
-2 | -2 | -2 | -1 |
-1 | -1 | -1 | 0 |
0 | 0 | 0 | 1 |
1 | 1 | 1 | 2 |
2 |
hc |
half_ceil |
round, *.5: ceil |
-2 | -2 | -1 | -1 |
-1 | -1 | 0 | 0 |
0 | 0 | 1 | 1 |
1 | 1 | 2 | 2 |
2 |
hd |
half_down |
round, *.5: trunk |
-2 | -2 | -1 | -1 |
-1 | -1 | 0 | 0 |
0 | 0 | 0 | 1 |
1 | 1 | 1 | 2 |
2 |
hu |
half_up |
round, *.5: away from 0 |
-2 | -2 | -2 | -1 |
-1 | -1 | -1 | 0 |
0 | 0 | 1 | 1 |
1 | 1 | 2 | 2 |
2 |
t_bits_base
The template class t_bits_base is the
base implementation for bit operations acting on a whole type.
Rotating
-
t_bits_base::rol(x,rot)
Rotate x left by rot.
Gives correct results for all values of rot.
Should usually be optimizable to one instruction (ROL).
-
t_bits_base::ror(x,rot)
Rotate x right by rot.
Gives correct results for all values of rot.
Should usually be optimizable to one instruction (ROR).
Shifting
-
t_bits_base::shl_fast(x,shift)
Logically/arithmetically shift x left by shift.
Only as guaranteed by the C standard, i.e. 0 ≤ shift < bits.
Should usually be implementable by one instruction (SHL).
-
t_bits_base::shl_safe(x,shift)
Logically/arithmetically shift x left by shift.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_bits_base::shr_fast(x,shift)
Logically shift x right by shift.
Only as guaranteed by the C standard, i.e. 0 ≤ shift < bits.
Should usually be implementable by one instruction (SHR).
-
t_bits_base::shr_safe(x,shift)
Logically shift x right by shift.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_bits_base::sar_fast(x,shift)
Arithmetically shift right, duplicating the most significant (sign) bit.
Only as guaranteed by the C standard, i.e. 0 ≤ shift < bits.
Should usually be optimizable to one instruction (SAR).
-
t_bits_base::sar_safe(x,shift)
Arithmetically shift x right by shift.
Gives correct results for all values of shift;
please note that shift is unsigned.
Counting bits
-
t_bits_base::nr_1bits(x)
Count the set bits, aka population count.
On newer x86 implementable as POPCNT.
See Hacker's Delight, 5.1 "Counting 1-Bits".
-
t_bits_base::nr_0bits(x)
Count the reset bits.
-
t_bits_base::nr_trailing_0bits(x)
On x86 implementable as TZCNT or using BSF.
Also implementable by multiplying x&-x with a De Bruijn number,
shifting, and a lookup table.
Special case 0: word size in bits.
See Hacker's Delight, 5.4 "Counting Trailing 0's".
-
t_bits_base::nr_trailing_1bits(x)
Number of contiguous least significant set bits.
-
t_bits_base::nr_leading_0bits(x)
On x86 implementable using BSR or as LZCNT.
Special case 0: word size in bits.
See Hacker's Delight, 5.4 "Counting Trailing 0's".
-
t_bits_base::nr_leading_1bits(x)
Number of contiguous most significant set bits
-
t_bits_base::is_parity_odd(x)
Count the set bits and return whether this is an odd number.
Result as mask.
-
t_bits_base::is_parity_oddh(x)
Count the set bits and return whether this is an odd number.
Result as MSB.
-
t_bits_base::is_parity_oddl(x)
Count the set bits and return whether this is an odd number.
Result as LSB.
Miscellaneous
-
t_bits_base::sign_extend(x,b)
Sign extended the low b bits of x.
-
t_bits_base::auto blend(m,x,y)
The bit equivalent to m?x:y.
-
t_bits_base::gray_code(x)
Gray code.
See Hacker's Delight, 13.1 "Gray Code"
-
t_bits_base::inv_gray_code(x)
Inverse Gray code.
See Hacker's Delight, 13.1 "Gray Code"
-
t_bits_base::is_contiguous_1bits(x)
Is x a contiguous string of 1 bits?
-
t_bits_base::tbm(x,mode)
General trailing bit modification operations.
Mode: The operating mode, see t_tbm_mode.
All of these operations can be performed by ≤ 3 instructions.
Some of these operations are realized as BMI1 or TBM instructions.
-
t_bits_base::isign(x,y)
The Fortran function ISIGN: copy sign for integer values
(y≥0) ? abs(x) : -abs(x)
Should be very fast.
See Hacker's Delight, 2.9 "Transfer of Sign"
-
t_bits_base::xsign(x,y)
Simple transfer sign
(y≥0) ? x : -x
Should be very fast.
Permuting
-
t_bits_base::bit_permute_step(x,m,shift)
Can be replaced by bit_permute_step_simple,
if for the relevant bits n the following holds:
nr_1bits(bit_permute_step_simple(n, m, shift)) = nr_1bits(n)
shift should be in 0..bits-1.
This is a candidate for a new instruction.
x86: ≥ 6/5 cycles
ARM: ≥ 4/4 cycles
-
t_bits_base::bit_permute_step_simple(x,m,shift)
Simplified replacement of bit_permute_step
Can always be replaced by bit_permute_step (not vice-versa).
shift should be in 0..bits-1.
x86: ≥ 5/4 (5/3) cycles
ARM: ≥ 3/2 cycles
-
t_bits_base::bit_permute_step2(x1,x2,m,shift)
Extended variant of bit_permute_step.
Will be slow if not inlined.
shift should be in 0..bits-1.
This is a candidate for a new instruction.
-
t_bits_base::bit_permute_step_rot(x,m,shift)
This is similar to bit_permute_step but rotating instead of shifting.
shift should be in 0..bits-1.
This is a candidate for a new instruction.
x86: ≥ 6/5 cycles
ARM: ≥ 4/4 cycles
-
t_bits_base::bit_permute_step2_rot(x1,x2,m,shift)
This is similar to bit_permute_step2 but rotating instead of shifting.
Extended variant of bit_permute_step_rot.
Will be slow if not inlined.
shift should be in 0..bits-1.
This is a candidate for a new instruction.
Multiplication
-
t_bits_base::mul_inv(x)
Calculate multiplicative inverse, i.e. mul_inv(x)*x==1.
The result is defined for all odd values of x.
For even x we simply return 0.
See Hacker's Delight, 10.16 "Exact division by constants"
Multiplicative inverse modulo 2**bits by Newton's method.
Carry-less multiplication
"Carry-less multiplication" (cl_mul).
The properties are quite similar to the usual multiplication (and addition).
Properties (^ is the XOR operator; * is the cl_mul operator):
- Associative: a^(b^c) = (a^b)^c, a*(b*c) = (a*b)*c
- Commutative: a^b = b^a, a*b = b*a
- Distributive: a*(b^c) = (a*b)^(a*c), (a^b)*c = (a*c)^(b*c)
- Neutral element: 0^a = a^0 = a, 1*a = a*1 = a
- Zero element: 0*a = a*0 = 0
- ^ is always invertible: a^a = 0
- * is invertible for odd values: x*inv(x) = inv(x)*x = 1
- Shift function: shl(a,x)*shl(b,y) = shl(a*b,x+y)
cl_mul and xor form a commutative ring.
-
t_bits_base::cl_mat_inv(src,inv)
Bit matrix inversion by Gauß/Jordan elimination.
-
t_bits_base::cl_mul(x,y)
Carry-less multiplication, lower bits of result.
Invertible by cl_mul_inv for odd numbers.
-
t_bits_base::cl_mul_inv(x)
Calculate carry-less multiplicative inverse,
i.e. cl_mul(cl_mul_inv(x),x) == 1.
The result is defined for all odd values of x.
For even x we simply return 0.
-
t_bits_base::cl_power(x,y)
x to the y'th via cl_mul.
For negative y see cl_mul_inv.
Cyclic carry-less multiplication
Here is funny idea I've never seen elsewhere:
"Cyclic carry-less multiplication" (ccl_mul).
The properties are quite similar to the usual carry-less multiplication.
Properties (^ is the XOR operator; * is the ccl_mul operator):
- Associative: a^(b^c) = (a^b)^c, a*(b*c) = (a*b)*c
- Commutative: a^b = b^a, a*b = b*a
- Distributive: a*(b^c) = (a*b)^(a*c), (a^b)*c = (a*c)^(b*c)
- Neutral element: 0^a = a^0 = a, 1*a = a*1 = a
- Zero element: 0*a = a*0 = 0
- ^ is always invertible: a^a = 0
- * is invertible for odd popcnt: x*inv(x) = inv(x)*x = 1
- Rotate function: rot(a,x)*rot(b,y) = rot(a*b,x+y)
ccl_mul and xor form a commutative ring.
I've implemented the inverse function by a bit matrix inversion
by Gauß/Jordan elimination.
Not nice, not fast, but feasible.
-
t_bits_base::ccl_mul(x,y)
Cyclic carry-less multiplication.
Invertible by ccl_mul_inv for numbers with odd parity.
-
t_bits_base::ccl_mul_inv(x)
Calculate cyclic carry-less multiplicative inverse,
i.e. ccl_mul(x, ccl_mul_inv(x))==1.
The result is defined for all values of x with odd popcount.
For x with even popcount we simply return 0.
-
t_bits_base::ccl_power(x,y)
x to the y'th via ccl_mul.
For negative y see ccl_mul_inv.
Operations on low bits with specified bit count
-
t_bits_base::mask_ex(b)
Calculate the mask for b bits to zero out the remaining bits.
-
t_bits_base::ror_ex(x,r,b)
Rotate right low b bits of x by r;
remaining bits are zeroed out.
= rol_ex(x,-r,b);
-
t_bits_base::rol_ex(x,r,b)
Rotate left low b bits of x by r;
remaining bits are zeroed out.
= ror_ex(x,-r,b);
-
t_bits_base::shr_ex(x,r,b)
Shift right low b bits of x by r;
remaining bits are zeroed out.
-
t_bits_base::shl_ex(x,r,b)
Shift left low b bits of x by r;
remaining bits are zeroed out.
-
t_bits_base::sar_ex(x,r,b)
Arithmetically (duplicating the MSB) shift right low b bits of x by r;
remaining bits are sign extended.
-
t_bits_base::sal_ex(x,r,b)
Shift left low b bits of x by r duplicating the LSB;
remaining bits are zeroed out.
-
t_bits_base::shr1_ex(x,r,b)
Shift right low b bits of x by r shifting in 1 bits;
remaining bits are zeroed out.
-
t_bits_base::shl1_ex(x,r,b)
Shift left low b bits of x by r shifting in 1 bits;
remaining bits are zeroed out.
-
t_bits_base::ccl_mul_ex(x,y,b)
Cyclic carry-less multiplication.
Invertible by ccl_mul_inv_ex for numbers with odd parity.
-
t_bits_base::ccl_mul_inv_ex(x,b)
Calculate cyclic carry-less multiplicative inverse,
i.e. ccl_mul(x, ccl_mul_inv(x))==1.
The result is defined for all values of x with odd popcount.
For x with even popcount we simply return 0.
-
t_bits_base::ccl_power_ex(x,y,b)
x to the y'th via ccl_mul.
For negative y see ccl_mul_inv_ex.
t_simd_base
Base implementation for SWAR operations using t_bits.
The base type is splitted in subwords as described by
the parameter sw.
Basic functions
-
t_simd_base::constant(sw,c)
Fill every subword with c, i.e. broadcast.
Also implementable by inductive doubling.
-
t_simd_base::element_mask(sw,i)
Create a mask for the subword #i.
-
t_simd_base::extract(sw,x,i)
Extract the subword #i of x.
-
t_simd_base::implant(sw,x,e,i)
Implant e into the subword #i of x.
Count bits/elements
-
t_simd_base::nr_1bits(sw,x)
Count the number of set bits.
Equivalent to loop i in 0..sw-1: hadd_u(sw, x)
-
t_simd_base::nr_0bits(sw,x)
Count the reset bits
-
t_simd_base::nr_trailing_0bits(sw,x)
Count the number of trailing zero bits.
~x&(x-1): smear least significant 1 bit to the left.
This is simpler than the parallel prefix method.
-
t_simd_base::nr_trailing_1bits(sw,x)
Number of contiguous least significant set bits
-
t_simd_base::nr_leading_0bits(sw,x)
Count the number of leading zero bits.
-
t_simd_base::nr_leading_1bits(sw,x)
Number of contiguous most significant set bits
-
t_simd_base::contains_0(sw,x)
True iff there is at least one subword with the value 0.
See http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
See Paul Curtis, 2004-12-15 in http://hackersdelight.org/corres.txt
See also http://9.douban.com/subject/9065343/
See discussion on
http://stackoverflow.com/questions/20021066/
how-the-glibc-strlen-implementation-works
Thereby we can also find the lowest zero subword.
Not suitable to find all zeros, i.e. this can not replace eq0
(subwords left to the lowest zero might be flagged as zero).
-
t_simd_base::nr_trailing_ne0(sw,x)
See http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
See Paul Curtis, 2004-12-15 in http://hackersdelight.org/corres.txt
See contains_0.
-
t_simd_base::is_parity_odd(x)
Count the set bits and return whether this is an odd number
by setting mask.
-
t_simd_base::is_parity_oddl(x)
Count the set bits and return whether this is an odd number
by setting low bit.
-
t_simd_base::is_parity_oddh(x)
Count the set bits and return whether this is an odd number
by setting high bit.
Math
Wrap around, ignore overflows
-
t_simd_base::add(sw,x,y)
Add with wrap-around x+y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::sub(sw,x,y)
Subtract with wrap-around x-y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::add1(sw,x,y)
Add with wrap-around x+y+1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::sub1(sw,x,y)
Subtract with wrap-around x-y-1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::addc(sw,x,y,carry)
Add with wrap-around x+y+carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::subc(sw,x,y,carry)
Subtract with wrap-around x-y-carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::inc(sw,x)
Increment with wrap-around.
=add(sw, x, constant(sw, 1))
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::dec(sw,x)
Decrement with wrap-around.
=sub(sw, x, constant(sw, 1))
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::neg(sw,x)
Negate with wrap-around.
=sub(sw, 0, x)
Saturate overflows
-
t_simd_base::add_sat_u(sw,x,y)
Unsigned saturated add x+y.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::sub_sat_u(sw,x,y)
Unsigned saturated subtract x-y.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::add1_sat_u(sw,x,y)
Unsigned saturated add x+y+1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::sub1_sat_u(sw,x,y)
Unsigned saturated subtract x-y-1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::addc_sat_u(sw,x,y,carry)
Unsigned saturated add x+y+carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::subc_sat_u(sw,x,y,carry)
Unsigned saturated subtract x-y-carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::add_sat_s(sw,x,y)
Signed saturated add x+y.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::sub_sat_s(sw,x,y)
Signed saturated sub x-y.
Derived from add_sat_s.
-
t_simd_base::add1_sat_s(sw,x,y)
Signed saturated add x+y+1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::sub1_sat_s(sw,x,y)
Signed saturated sub x-y-1.
Derived from add_sat_s.
-
t_simd_base::addc_sat_s(sw,x,y,carry)
Signed saturated add x+y+carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::subc_sat_s(sw,x,y,carry)
Signed saturated sub x-y-carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
Derived from add_sat_s.
-
t_simd_base::neg_sat_s(sw,x)
Signed saturated negate;
the only value which will be saturated is the
smallest possible signed number; see abs for a similar problem.
=sub_sat_s(sw,0,x).
Detect overflows
-
t_simd_base::overflow_add_u(sw,x,y)
Detect overflow in unsigned addition x+y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_sub_u(sw,x,y)
Detect overflow in unsigned subtraction x-y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_add_s(sw,x,y)
Detect overflow in signed addition x+y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
-
t_simd_base::overflow_sub_s(sw,x,y)
Detect overflow in signed subtraction x-y.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
-
t_simd_base::overflow_add1_u(sw,x,y)
Detect overflow in unsigned addition x+y+1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_sub1_u(sw,x,y)
Detect overflow in unsigned subtraction x-y-1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_add1_s(sw,x,y)
Detect overflow in signed addition x+y+1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
-
t_simd_base::overflow_sub1_s(sw,x,y)
Detect overflow in signed subtraction x-y-1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
-
t_simd_base::overflow_addc_u(sw,x,y,carry)
Detect overflow in unsigned addition x+y+carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_subc_u(sw,x,y,carry)
Detect overflow in unsigned subtraction x-y-carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Unsigned Add/Subtract)
-
t_simd_base::overflow_addc_s(sw,x,y,carry)
Detect overflow in signed addition x+y+carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
-
t_simd_base::overflow_subc_s(sw,x,y,carry)
Detect overflow in signed subtraction x-y-carry.
Only least significant bits of carry are used, i.e. carry may be 0 or 1.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
See HD 2.13 "Overflow Detection" (Signed Add/Subtract)
Averages
-
t_simd_base::avgd_u(sw,x,y)
Unsigned average rounding down, i.e. (x+y)>>1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::avgu_u(sw,x,y)
Unsigned average rounding up, i.e. (x+y+1)>>1.
See Falk Hueffner 2004-01-23 "Multibyte (or SIMD) arithmetic"
in http://hackersdelight.org/corres.txt
-
t_simd_base::avgf_s(sw,x,y)
Signed average rounding toward -∞.
-
t_simd_base::avgc_s(sw,x,y)
Signed average rounding toward ∞.
Miscellaneous
-
t_simd_base::abs_diff_u(sw,x,y)
abs(x-y) for unsigned values.
=sub_sat_u(sw, x, y)+sub_sat_u(sw, y, x);
Alternative: ((x-y) xor (x<y)) - (x<y).
-
t_simd_base::abs_diff_s(sw,x,y)
abs(x-y) for signed values resulting in unsigned values.
-
t_simd_base::sgn_diff_u(sw,x,y)
sgn(x-y) for unsigned values resulting in unsigned values.
-
t_simd_base::sgn_diff_s(sw,x,y)
sgn(x-y) for signed values.
-
t_simd_base::smul_lo(sw,c,x)
Multiply c with subwords of x yielding the lower bits (scalar multiplication).
Suitable for signed and unsigned values.
-
t_simd_base::smul_hi_u(sw,c,x)
Multiply c with subwords of x yielding the upper bits (scalar multiplication).
Unsigned values.
-
t_simd_base::smul_hi_s(sw,c,x)
Multiply c with subwords of x yielding the upper bits (scalar multiplication).
Signed values.
-
t_simd_base::smule_u(sw,c,x)
Multiply c with even subwords (#0, #2, #4, ...) of x (scalar multiplication).
Suitable for unsigned values.
sw must not be the maximum value, i.e. there must be at least 2 subwords.
-
t_simd_base::smule_s(sw,c,x)
Multiply c with even subwords (#0, #2, #4, ...) of x (scalar multiplication).
Suitable for signed values.
sw must not be the maximum value, i.e. there must be at least 2 subwords.
-
t_simd_base::mul_lo(sw,x,y)
Multiply corresponding subwords of x and y yielding the lower bits.
Suitable for signed and unsigned values.
Stupid emulation due to absence of a matching instruction/operation.
Emulation of the multiplication usually
is far too slow (O(subword size)).
Comparisons
Comparisons returning MSB
-
t_simd_base::eq0h(sw,x)
Set MSB if x==0.
See HD 6.1 "Find First 0-Byte"
-
t_simd_base::ne0h(sw,x)
Set MSB if x!=0.
-
t_simd_base::eqh(sw,x,y)
Set MSB if x==y.
-
t_simd_base::neh(sw,x,y)
Set MSB if x!=y.
-
t_simd_base::leh_u(sw,x,y)
Set MSB if x≤y (unsigned).
HD 6.1 "Searching for a Value in a Given Range"
-
t_simd_base::lth_u(sw,x,y)
Set MSB if x<y (unsigned).
-
t_simd_base::geh_u(sw,x,y)
Set MSB if x≥y (unsigned).
-
t_simd_base::gth_u(sw,x,y)
Set MSB if x>y (unsigned).
-
t_simd_base::lt0h_s(sw,x)
Set MSB if x<0 (signed), i.e. propagate high (sign) bit.
-
t_simd_base::leh_s(sw,x,y)
Set MSB if x≤y (signed).
-
t_simd_base::lth_s(sw,x,y)
Set MSB if x<y (signed).
-
t_simd_base::geh_s(sw,x,y)
Set MSB if x≥y (signed).
-
t_simd_base::gth_s(sw,x,y)
Set MSB if x>y (signed).
-
t_simd_base::oddh(sw,x)
Set MSB for odd values, i.e. propagate low bit to the left.
Comparisons returning LSB
-
t_simd_base::eq0l(sw,x)
Set LSB if x==0.
See HD 6.1 "Find First 0-Byte"
-
t_simd_base::ne0l(sw,x)
Set LSB if x!=0.
-
t_simd_base::eql(sw,x,y)
Set LSB if x==y.
-
t_simd_base::nel(sw,x,y)
Set LSB if x!=y.
-
t_simd_base::lel_u(sw,x,y)
Set LSB if x≤y (unsigned).
HD 6.1 "Searching for a Value in a Given Range"
-
t_simd_base::ltl_u(sw,x,y)
Set LSB if x<y (unsigned).
-
t_simd_base::gel_u(sw,x,y)
Set LSB if x≥y (unsigned).
-
t_simd_base::gtl_u(sw,x,y)
Set LSB if x>y (unsigned).
-
t_simd_base::lt0l_s(sw,x)
Set LSB if x<0 (signed), i.e. propagate high (sign) bit.
-
t_simd_base::lel_s(sw,x,y)
Set LSB if x≤y (signed).
-
t_simd_base::ltl_s(sw,x,y)
Set LSB if x<y (signed).
-
t_simd_base::gel_s(sw,x,y)
Set LSB if x≥y (signed).
-
t_simd_base::gtl_s(sw,x,y)
Set LSB if x>y (signed).
-
t_simd_base::oddl(sw,x)
Set LSB for odd values, i.e. propagate low bit to the left.
Comparisons returning masks
-
t_simd_base::eq0(sw,x)
Set mask denoting x==0.
See HD 6.1 "Find First 0-Byte"
-
t_simd_base::ne0(sw,x)
Set mask denoting x!=0.
-
t_simd_base::eq(sw,x,y)
Set mask denoting x==y.
-
t_simd_base::ne(sw,x,y)
Set mask denoting x!=y.
-
t_simd_base::le_u(sw,x,y)
Set mask denoting x≤y (unsigned).
HD 6.1 "Searching for a Value in a Given Range"
-
t_simd_base::lt_u(sw,x,y)
Set mask denoting x<y (unsigned).
-
t_simd_base::ge_u(sw,x,y)
Set mask denoting x≥y (unsigned).
-
t_simd_base::gt_u(sw,x,y)
Set mask denoting x>y (unsigned).
-
t_simd_base::lt0_s(sw,x)
Set mask denoting x<0 (signed), i.e. propagate high (sign) bit.
-
t_simd_base::le_s(sw,x,y)
Set mask denoting x≤y (signed).
-
t_simd_base::lt_s(sw,x,y)
Set mask denoting x<y (signed).
-
t_simd_base::ge_s(sw,x,y)
Set mask denoting x≥y (signed).
-
t_simd_base::gt_s(sw,x,y)
Set mask denoting x>y (signed).
-
t_simd_base::odd(sw,x)
Set mask denoting odd values, i.e. propagate low bit to the left.
Other operations
-
t_simd_base::max0_s(sw,x)
Clamp negative signed numbers to 0.
=max_s(sw, x, 0)
-
t_simd_base::max_u(sw,x,y)
Unsigned maximum.
=add(sw, sub_sat_u(sw, x, y), y);
HD 2.19 "Doz, Max, Min"
-
t_simd_base::min_u(sw,x,y)
Unsigned minimum.
HD 2.19 "Doz, Max, Min"
-
t_simd_base::max_s(sw,x,y)
Signed maximum.
-
t_simd_base::min_s(sw,x,y)
Signed minimum.
-
t_simd_base::abs(sw,x)
Absolute values.
The result contains unsigned values,
however smallest possible signed number remains as is;
the similar function nabs does not have this problem.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::nabs(sw,x)
Negated absolute values, i.e. -abs(x).
The result contains signed values.
See HD 2.18 "Multibyte Add, Subtract, Absolute Value"
-
t_simd_base::sgn(sw,x)
Sign of x, i.e. <0: -1, =0: 0, >0: 1.
See eq0.
See HD 6.1 "Find First 0-Byte"
-
t_simd_base::sgn_sat(sw,x)
Sign of x enlarged to saturate.
<0: minimum value, =0: 0, >0: maximum value
Shift by common amount
-
t_simd_base::shl(sw,x,shift)
Shift left, shifting in 0 bits; also known as logical shift.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_simd_base::shr(sw,x,shift)
Shift right, shifting in 0 bits; also known as logical shift.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_simd_base::sal(sw,x,shift)
Shift left, duplicating the least significant bit.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_simd_base::sar(sw,x,shift)
Arithmetically shift right, duplicating the most significant (sign) bit.
Gives correct results for all values of shift;
please note that shift is unsigned.
This is also known as arithmetic shift.
-
t_simd_base::shl1(sw,x,shift)
Shift left, shifting in 1 bits.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_simd_base::shr1(sw,x,shift)
Shift right, shifting in 1 bits.
Gives correct results for all values of shift;
please note that shift is unsigned.
-
t_simd_base::shl_sat_s(sw,x,shift)
-
t_simd_base::shl_sat_u(sw,x,shift)
-
t_simd_base::sal_sat_s(sw,x,shift)
-
t_simd_base::sal_sat_u(sw,x,shift)
-
t_simd_base::shl1_sat_s(sw,x,shift)
-
t_simd_base::shl1_sat_u(sw,x,shift)
Rotate by common amount
-
t_simd_base::rol(sw,x,rot)
Every (1 << (t_int)(sw)) bits are rotated left.
Gives correct results for all values of rot.
x: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
-
t_simd_base::ror(sw,x,rot)
Every (1 << (t_int)(sw)) bits are rotated right.
Gives correct results for all values of rot.
x: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
-
t_simd_base::rolc(sw,x,rot)
Every (1 << (t_int)(sw)) bits are rotated left
complementing the shifted in bits.
Gives correct results for all values of rot.
x: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
-
t_simd_base::rorc(sw,x,rot)
Every (1 << (t_int)(sw)) bits are rotated right
complementing the shifted in bits.
Gives correct results for all values of rot.
x: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
-
t_simd_base::rold(sw,l,h,rot)
Rotate every (1 << (t_int)(sw)) bits of l and h
where the bit groups of l and h are considered to be one group.
This can be used to implement many kinds of shift and rotate.
Gives correct results for all values of rot.
l, h: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
Example: l=hgfedcba, h=HGFEDCBA, sw=2, rot=1: gfeHcbaD
-
t_simd_base::rord(sw,l,h,rot)
Rotate every (1 << (t_int)(sw)) bits of l and h
where the bit groups of l and h are considered to be one group.
This can be used to implement many kinds of shift and rotate.
Gives correct results for all values of rot.
l, h: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
-
t_simd_base::do_rold(sw,l,h,rot)
Variant of rold modifying *l and *h.
Rotate every (1 << (t_int)(sw)) bits of *l and *h
where the bit groups of l and h are considered to be one group.
This can be used to implement many kinds of shift and rotate.
Gives correct results for all values of rot.
*l, *h: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
Example: l=hgfedcba, h=HGFEDCBA, sw=2, rot=1: gfeHcbaD
-
t_simd_base::do_rord(sw,l,h,rot)
Variant of rord modifying *l and *h.
Rotate every (1 << (t_int)(sw)) bits of *l and *h
where the bit groups of l and h are considered to be one group.
This can be used to implement many kinds of shift and rotate.
Gives correct results for all values of rot.
*l, *h: value
sw: log_2(#bits), must be ≤ld_bits
rot: rotate count
Shift by fieldwise specified amount
-
t_simd_base::vshl(sw,x,shift)
Field variable variant of shl.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vshr(sw,x,shift)
Field variable variant of shr.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vsar(sw,x,shift)
Field variable variant of sar.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vsal(sw,x,shift)
Field variable variant of sal.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vshl1(sw,x,shift)
Field variable variant of shl1.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vshr1(sw,x,shift)
Field variable variant of shr1.
Gives correct results for all values of shift;
please note that shift values are treated as unsigned.
-
t_simd_base::vshl_sat_s(sw,x,shift)
-
t_simd_base::vshl_sat_u(sw,x,shift)
-
t_simd_base::vsal_sat_s(sw,x,shift)
-
t_simd_base::vsal_sat_u(sw,x,shift)
-
t_simd_base::vshl1_sat_s(sw,x,shift)
-
t_simd_base::vshl1_sat_u(sw,x,shift)
Rotate by fieldwise specified amount
-
t_simd_base::vrol(sw,x,rot)
Field variable variant of rol.
Gives correct results for all values of rot.
-
t_simd_base::vror(sw,x,rot)
Field variable variant of ror.
Gives correct results for all values of rot.
-
t_simd_base::vrolc(sw,x,rot)
Field variable variant of rolc.
Gives correct results for all values of rot.
-
t_simd_base::vrorc(sw,x,rot)
Field variable variant of rorc.
Gives correct results for all values of rot.
-
t_simd_base::vrold(sw,l,h,rot)
Field variable variant of rold.
Gives correct results for all values of rot.
-
t_simd_base::vrord(sw,l,h,rot)
Field variable variant of rord.
Gives correct results for all values of rot.
-
t_simd_base::do_vrold(sw,l,h,rot)
Variant of vrold modifying *l and *h.
Gives correct results for all values of rot.
-
t_simd_base::do_vrord(sw,l,h,rot)
Variant of vrord modifying *l and *h.
Gives correct results for all values of rot.
Special functions
-
t_simd_base::gray_code(sw,x)
See Hacker's Delight, 13.1 "Gray Code"
x ^ (x >> 1)
-
t_simd_base::inv_gray_code(sw,x)
See Hacker's Delight, 13.1 "Gray Code"
-
t_simd_base::tbm(sw,x,mode)
General trailing bit modification operations.
Mode: The operating mode, see t_tbm_mode.
For whole words:
All of these operations can be performed by ≤ 3 instructions.
Some of these operations are realized as BMI1 or TBM instructions.
Horizontal math
-
t_simd_base::hand(sw,x)
AND of neighboring subwords.
When processing a loop i in 0..(t_int)(sw): subword == -1.
-
t_simd_base::hor(sw,x)
OR of neighboring subwords.
When processing a loop i in 0..(t_int)(sw): subword != 0.
-
t_simd_base::hxor(sw,x)
XOR of neighboring subwords.
When processing a loop i in 0..(t_int)(sw): Gray code / Parity.
-
t_simd_base::hadd_u(sw,x)
Sum of neighboring unsigned subwords.
See nr_1bits.
-
t_simd_base::hadd_s(sw,x)
Sum of neighboring signed subwords.
-
t_simd_base::hsub_u(sw,x)
Difference between neighboring unsigned subwords giving signed results.
Calculate lower - higher subword for all pairs.
-
t_simd_base::hsub_s(sw,x)
Difference between neighboring signed subwords giving signed results.
Calculate lower - higher subword for all pairs.
-
t_simd_base::havgd_u(sw,x)
Unsigned average rounding down,
i.e. (x+y)>>1, of neighboring unsigned subwords.
-
t_simd_base::havgu_u(sw,x)
Unsigned average rounding up,
i.e. (x+y+1)>>1, of neighboring unsigned subwords.
-
t_simd_base::hmax_u(sw,x)
Maximum between neighboring unsigned subwords.
HD 2.19 "Doz, Max, Min"
max(a, b) = max(0, a-b)+b
-
t_simd_base::hmin_u(sw,x)
Minimum between neighboring unsigned subwords.
HD 2.19 "Doz, Max, Min"
min(a, b) = a-max(0, a-b)
-
t_simd_base::hmax_s(sw,x)
Maximum between neighboring signed subwords.
HD 2.19 "Doz, Max, Min"
max(a, b) = max(0, a-b)+b
-
t_simd_base::hmin_s(sw,x)
Minimum between neighboring signed subwords.
HD 2.19 "Doz, Max, Min"
min(a, b) = a-max(0, a-b)
-
t_simd_base::hmovzx(sw,x)
Zero extend even (lower) elements.
-
t_simd_base::hmovsx(sw,x)
Sign extend even (lower) elements.
-
t_simd_base::hmul_u(sw,x)
Product of neighboring unsigned subwords.
Stupid emulation due to absence of a matching instruction/operation.
-
t_simd_base::hmul_s(sw,x)
Product of neighboring signed subwords.
Stupid emulation due to absence of a matching instruction/operation.
Shuffle operations
-
t_simd_base::butterfly(sw,x,m)
One butterfly step/stage.
sw: 0..ld_bits-1
m & a_bfly_mask[sw] should be == m
-
t_simd_base::bit_index_complement(x,k)
See ARM System Developer's Guide, 7.6.2 "Bit Permutations"
as used in the loop of general_reverse_bits.
-
t_simd_base::bit_index_swap(x,j,k)
See ARM System Developer's Guide, 7.6.2 "Bit Permutations"
See shuffle, unshuffle
-
t_simd_base::bit_index_swap_complement(x,j,k)
See ARM System Developer's Guide, 7.6.2 "Bit Permutations"
-
t_simd_base::bit_index_ror(x,rot,ofs,field)
Rotate an bit index field to the right by rot
by executing the necessary calls to bit_index_swap.
q+field+ofs=ld_bits
q: upper bit indexes (unchanged)
field: size of the affected bit string
ofs: lower bit indexes (unchanged)
-
t_simd_base::transpose(x,ld_fields,ld_col,ld_row)
Transpose bit matrixes.
ld_fields: width of the bit fields
ld_col: ld(bit columns)
ld_row: ld(bit rows)
-
t_simd_base::shuffle_power(x,sw1,sw2,pwr)
pwr times shuffle(x, sw1, sw2)
See Hacker's Delight, 7.6/7.8 "Rearrangements and Index Transformations"
-
t_simd_base::unshuffle_power(x,sw1,sw2,pwr)
pwr times unshuffle(x, sw1, sw2)
See Hacker's Delight, 7.6/7.8 "Rearrangements and Index Transformations"
-
t_simd_base::general_reverse_bits(x,k)
Swap all subwords of given levels.
See Hacker's Delight, 7.1 "Generalized Bit Reversal"
k: set of t_subword, i.e. one bit per subword size.
This is a candidate for a new instruction.
-
t_simd_base::bswap(x)
Exchange byte order.
This can be expressed in assembler:
bits = 8: n/a
bits = 16: "xchg al, ah" or "rol ax, 16"
bits = 32: "bswap eax"
bits = 64: "bswap rax"
bits = 128: "xchg rax, rdx; bswap rax; bswap rdx"
-
t_simd_base::prim_swap(x,m)
Swap by primitives.
Generalization of general_reverse_bits.
See "Matters Computational" by Joerg Arndt, "A restricted method"
See Hacker's Delight, 7.1 "Generalized Bit Reversal"
-
t_simd_base::shuffle(x,sw1,sw2)
Shuffle/zip/interlace entities.
See Hacker's Delight, 7.2 "Shuffling Bits"
sw1: log_2(subword_length): entities to move
sw2: log_2(word_length): moving area
0 ≤ sw1 < sw2 ≤ ld_bits
Example: sw1=2, sw2=5: Shuffle nibbles in dword
= shuffle_power(x, sw1, sw2, 1)
-
t_simd_base::unshuffle(x,sw1,sw2)
Unshuffle/unzip/uninterlace entities.
See Hacker's Delight, 7.2 "Shuffling Bits"
sw1: log_2(subword_length): entities to move
sw2: log_2(word_length): moving area
0 ≤ sw1 < sw2 ≤ ld_bits
Example: sw1=0, sw2=3: Unshuffle bits in bytes
= unshuffle_power(x, sw1, sw2, 1)
CEF operations (via aux object)
-
t_simd_base::compress_flip_right(sw,x,m)
-
t_simd_base::expand_flip_right(sw,x,m)
-
t_simd_base::compress_flip_left(sw,x,m)
-
t_simd_base::expand_flip_left(sw,x,m)
CE operations (via aux object)
-
t_simd_base::compress_right(sw,x,m)
-
t_simd_base::expand_right(sw,x,m)
-
t_simd_base::compress_left(sw,x,m)
-
t_simd_base::expand_left(sw,x,m)
t_bfly_base
This structure is used to hold the configuration of
butterfly-based operations as well as compress and expand.
-
t_bfly_base::bfly(x)
Apply butterfly configured network on x
-
t_bfly_base::ibfly(x)
Apply inverse butterfly configured network on x
-
t_bfly_base::bfly_parity()
Return the parity of a permutation given by a butterfly network.
This is false for even parity and true for odd parity.
t_cef_base
Compress/Expand flip left/right.
-
t_cef_base::gen_cef_right(sw,m)
Scatter/gather-flip, compress/expand+flip.
Generate configuration for [inverse] butterfly network.
To compress use ibfly.
To expand use bfly.
-
t_cef_base::gen_cef_left(sw,m)
Scatter/gather-flip, compress/expand+flip.
Generate configuration for [inverse] butterfly network.
To compress use ibfly.
To expand use bfly.
-
t_cef_base::gen_right(sw,m)
-
t_cef_base::gen_left(sw,m)
-
t_cef_base::compress(x)
-
t_cef_base::expand(x)
t_ce_right_base
Compress/Expand right, rest filled with 0.
-
t_ce_right_base::gen_ce_right(sw,m)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
See Hacker's Delight, 7.5 "Expand, or Generalized Insert"
To compress use apply_compress_right.
To expand use apply_expand_right.
-
t_ce_right_base::apply_compress_right(x)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
This should be configured by gen_ce_right.
-
t_ce_right_base::apply_expand_right(x)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
See Hacker's Delight, 7.5 "Expand, or Generalized Insert"
This should be configured by gen_ce_right.
(a & b) | (~a & c) See ((b ^ c) & a) ^ c
-
t_ce_right_base::gen(sw,m)
-
t_ce_right_base::compress(x)
-
t_ce_right_base::expand(x)
t_ce_left_base
Compress/Expand left, rest filled with 0.
-
t_ce_left_base::gen_ce_left(sw,m)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
See Hacker's Delight, 7.5 "Expand, or Generalized Insert"
To compress use apply_compress_left.
To expand use apply_expand_left.
-
t_ce_left_base::apply_compress_left(x)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
This should be configured by gen_ce_left.
-
t_ce_left_base::apply_expand_left(x)
See Hacker's Delight, 7.4 "Compress, or Generalized Extract"
See Hacker's Delight, 7.5 "Expand, or Generalized Insert"
This should be configured by gen_ce_left.
(a & b) | (~a & c) See ((b ^ c) & a) ^ c
-
t_ce_left_base::gen(sw,m)
-
t_ce_left_base::compress(x)
-
t_ce_left_base::expand(x)
t_vrot_base
Faster variant of t_simd_base::vrol/vror
when using the same rotate counts repeatedly.
-
t_vrot_base::gen_vrol(sw,rot)
Field variable rotate.
Generate configuration for [inverse] butterfly network.
Simulate rolc for every subword of the masks.
-
t_vrot_base::vrol(x)
-
t_vrot_base::vror(x)
Index
Here is a list of words, types and functions
and where you find them.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
- abs, t_simd_base::
- abs_diff_s, t_simd_base::
- abs_diff_u, t_simd_base::
- add1, t_simd_base::
- add1_sat_s, t_simd_base::
- add1_sat_u, t_simd_base::
- add, t_simd_base::
- add_sat_s, t_simd_base::
- add_sat_u, t_simd_base::
- addc, t_simd_base::
- addc_sat_s, t_simd_base::
- addc_sat_u, t_simd_base::
- apply_compress_left, t_ce_left_base::
- apply_compress_right, t_ce_right_base::
- apply_expand_left, t_ce_left_base::
- apply_expand_right, t_ce_right_base::
- avgc_s, t_simd_base::
- avgd_u, t_simd_base::
- avgf_s, t_simd_base::
- avgu_u, t_simd_base::
- bfly, t_bfly_base::
- bfly_parity, t_bfly_base::
- bit_index_complement, t_simd_base::
- bit_index_ror, t_simd_base::
- bit_index_swap, t_simd_base::
- bit_index_swap_complement, t_simd_base::
- bit_permute_step2, t_bits_base::
- bit_permute_step2_rot, t_bits_base::
- bit_permute_step, t_bits_base::
- bit_permute_step_rot, t_bits_base::
- bit_permute_step_simple, t_bits_base::
- bswap, t_simd_base::
- butterfly, t_simd_base::
- ccl_mul, t_bits_base::
- ccl_mul_ex, t_bits_base::
- ccl_mul_inv, t_bits_base::
- ccl_mul_inv_ex, t_bits_base::
- ccl_power, t_bits_base::
- ccl_power_ex, t_bits_base::
- cl_mat_inv, t_bits_base::
- cl_mul, t_bits_base::
- cl_mul_inv, t_bits_base::
- cl_power, t_bits_base::
- compress, t_cef_base::
- compress, t_ce_left_base::
- compress, t_ce_right_base::
- compress_flip_left, t_simd_base::
- compress_flip_right, t_simd_base::
- compress_left, t_simd_base::
- compress_right, t_simd_base::
- constant, t_simd_base::
- contains_0, t_simd_base::
- dec, t_simd_base::
- do_rold, t_simd_base::
- do_rord, t_simd_base::
- do_vrold, t_simd_base::
- do_vrord, t_simd_base::
- element_mask, t_simd_base::
- eq0, t_simd_base::
- eq0h, t_simd_base::
- eq0l, t_simd_base::
- eq, t_simd_base::
- eqh, t_simd_base::
- eql, t_simd_base::
- expand, t_cef_base::
- expand, t_ce_left_base::
- expand, t_ce_right_base::
- expand_flip_left, t_simd_base::
- expand_flip_right, t_simd_base::
- expand_left, t_simd_base::
- expand_right, t_simd_base::
- extract, t_simd_base::
- gen, t_ce_left_base::
- gen, t_ce_right_base::
- general_reverse_bits, t_simd_base::
- gen_cef_left, t_cef_base::
- gen_cef_right, t_cef_base::
- gen_ce_left, t_ce_left_base::
- gen_ce_right, t_ce_right_base::
- gen_left, t_cef_base::
- gen_right, t_cef_base::
- gen_vrol, t_vrot_base::
- ge_s, t_simd_base::
- ge_s, t_simd_base::
- ge_s, t_simd_base::
- ge_u, t_simd_base::
- ge_u, t_simd_base::
- ge_u, t_simd_base::
- gray_code, t_bits_base::
- gray_code, t_simd_base::
- gt_s, t_simd_base::
- gth_s, t_simd_base::
- gtl_s, t_simd_base::
- gt_u, t_simd_base::
- gth_u, t_simd_base::
- gtl_u, t_simd_base::
- hadd_s, t_simd_base::
- hadd_u, t_simd_base::
- hand, t_simd_base::
- havgd_u, t_simd_base::
- havgu_u, t_simd_base::
- hmax_s, t_simd_base::
- hmax_u, t_simd_base::
- hmin_s, t_simd_base::
- hmin_u, t_simd_base::
- hmovsx, t_simd_base::
- hmovzx, t_simd_base::
- hmul_s, t_simd_base::
- hmul_u, t_simd_base::
- hor, t_simd_base::
- hsub_s, t_simd_base::
- hsub_u, t_simd_base::
- hxor, t_simd_base::
- ibfly, t_bfly_base::
- implant, t_simd_base::
- inc, t_simd_base::
- inv_gray_code, t_bits_base::
- inv_gray_code, t_simd_base::
- isign, t_bits_base::
- is_contiguous_1bits, t_bits_base::
- is_parity_odd, t_bits_base::
- is_parity_oddh, t_simd_base::
- is_parity_oddl, t_simd_base::
- le_s, t_simd_base::
- leh_s, t_simd_base::
- lel_s, t_simd_base::
- le_u, t_simd_base::
- leh_u, t_simd_base::
- lel_u, t_simd_base::
- lt0_s, t_simd_base::
- lt0h_s, t_simd_base::
- lt0l_s, t_simd_base::
- lt_s, t_simd_base::
- lth_s, t_simd_base::
- ltl_s, t_simd_base::
- lt_u, t_simd_base::
- lth_u, t_simd_base::
- ltl_u, t_simd_base::
- mask_ex, t_bits_base::
- max0_s, t_simd_base::
- max_s, t_simd_base::
- max_u, t_simd_base::
- min_s, t_simd_base::
- min_u, t_simd_base::
- mul_inv, t_bits_base::
- mul_lo, t_simd_base::
- M_abs
- M_abs_diff
- M_gcd
- M_max
- M_min
- M_odd
- M_sgn
- M_sgn_diff
- nabs, t_simd_base::
- ne0, t_simd_base::
- ne0h, t_simd_base::
- ne0l, t_simd_base::
- ne, t_simd_base::
- neh, t_simd_base::
- nel, t_simd_base::
- neg, t_simd_base::
- neg_sat_s, t_simd_base::
- nr_0bits, t_bits_base::
- nr_0bits, t_simd_base::
- nr_1bits, t_bits_base::
- nr_1bits, t_simd_base::
- nr_leading_0bits, t_bits_base::
- nr_leading_0bits, t_simd_base::
- nr_leading_1bits, t_bits_base::
- nr_leading_1bits, t_simd_base::
- nr_trailing_0bits, t_bits_base::
- nr_trailing_0bits, t_simd_base::
- nr_trailing_1bits, t_bits_base::
- nr_trailing_1bits, t_simd_base::
- nr_trailing_ne0, t_simd_base::
- odd, t_simd_base::
- overflow_add1_s, t_simd_base::
- overflow_add1_u, t_simd_base::
- overflow_add_s, t_simd_base::
- overflow_add_u, t_simd_base::
- overflow_addc_s, t_simd_base::
- overflow_addc_u, t_simd_base::
- overflow_sub1_s, t_simd_base::
- overflow_sub1_u, t_simd_base::
- overflow_sub_s, t_simd_base::
- overflow_sub_u, t_simd_base::
- overflow_subc_s, t_simd_base::
- overflow_subc_u, t_simd_base::
- prim_swap, t_simd_base::
- rol, t_bits_base::
- rol, t_simd_base::
- rolc, t_simd_base::
- rold, t_simd_base::
- rol_ex, t_bits_base::
- ror, t_bits_base::
- ror, t_simd_base::
- rorc, t_simd_base::
- rord, t_simd_base::
- ror_ex, t_bits_base::
- sal, t_simd_base::
- sal_ex, t_bits_base::
- sal_sat_s, t_simd_base::
- sal_sat_u, t_simd_base::
- sar, t_simd_base::
- sar_ex, t_bits_base::
- sar_fast, t_bits_base::
- sar_safe, t_bits_base::
- sgn, t_simd_base::
- sgn_diff_s, t_simd_base::
- sgn_diff_u, t_simd_base::
- sgn_sat, t_simd_base::
- shl1, t_simd_base::
- shl1_ex, t_bits_base::
- shl1_sat_s, t_simd_base::
- shl1_sat_u, t_simd_base::
- shl, t_simd_base::
- shl_ex, t_bits_base::
- shl_fast, t_bits_base::
- shl_safe, t_bits_base::
- shl_safe, t_bits_base::
- shl_sat_s, t_simd_base::
- shl_sat_u, t_simd_base::
- shr1, t_simd_base::
- shr1_ex, t_bits_base::
- shr, t_simd_base::
- shr_ex, t_bits_base::
- shr_fast, t_bits_base::
- shr_safe, t_bits_base::
- shuffle, t_simd_base::
- shuffle_power, t_simd_base::
- sign_extend, t_bits_base::
- smul_hi_s, t_simd_base::
- smul_hi_u, t_simd_base::
- smul_lo, t_simd_base::
- sub1, t_simd_base::
- sub1_sat_s, t_simd_base::
- sub1_sat_u, t_simd_base::
- sub, t_simd_base::
- sub_sat_s, t_simd_base::
- sub_sat_u, t_simd_base::
- subc, t_simd_base::
- subc_sat_s, t_simd_base::
- subc_sat_u, t_simd_base::
- tbm, t_bits_base::
- tbm, t_simd_base::
- transpose, t_simd_base::
- t_bfly_base
- t_bits_base
- t_cef_base
- t_ce_left_base
- t_ce_right_base
- t_round_mode
- t_simd_128
- t_simd_16
- t_simd_32
- t_simd_64
- t_simd_8
- t_simd_base
- t_subword
- t_tbm_mode
- t_vrot_base
- unshuffle, t_simd_base::
- unshuffle_power, t_simd_base::
- vrol, t_simd_base::
- vrol, t_vrot_base::
- vrolc, t_simd_base::
- vrold, t_simd_base::
- vror, t_simd_base::
- vror, t_vrot_base::
- vrorc, t_simd_base::
- vrord, t_simd_base::
- vsal, t_simd_base::
- vsal_sat_s, t_simd_base::
- vsal_sat_u, t_simd_base::
- vsar, t_simd_base::
- vshl1, t_simd_base::
- vshl1_sat_s, t_simd_base::
- vshl1_sat_u, t_simd_base::
- vshl, t_simd_base::
- vshl_sat_s, t_simd_base::
- vshl_sat_u, t_simd_base::
- vshr1, t_simd_base::
- vshr, t_simd_base::
- xsign, t_bits_base::
You may
bookmark
this page as
http://programming.sirrida.de?swar.html.
Last change: 2016-07-02