This is a collection of programming hints and techniques. Although primarily dedicated to Pascal and/or C/C++ many ideas are applicable for other programming languages as well. In some cases also a compiler developer can profit.
Comments or questions? E-mail: info@sirrida.de
The implementation of Free Pascal's case for strings
is not more than an else-if chain where the strings
are compared one by one;
this effectively is only "syntactic sugar".
This applies at least to versions up to 2.7.1 (2013-01).
It is a shame that Free Pascal creates such poor code.
It is even more a shame that Delphi has no string-case at all,
at least up to the
version XE3
(2013-01-24)
although this language feature was demanded many times.
As you can see in
Dispatching on strings in theory,
the compiler could do much better.
On work we often have quite large string-case constructs.
As written above Delphi does not know about a string-case.
Therefore as a kludge to speed up our Delphi programs
I often pre-dispatch on the first character and then use an
else-if chain.
This is very simple and
reasonably fast (at least considering the simplicity),
but still quite readable and maintainable.
The goto statements are somewhat ugly but there is no real alternative.
This transformation is essentially an example of
what the compiler should have done.
I have special cased the empty string because
the empty string is represented by a nil pointer;
access to any character of it typically results in an access violation.
This is true for all dynamic string types.
label _ELSE; … if s='' then // very fast, preventing AV … else begin case s[1] of // s is non-empty, medium fast 'a': begin if false then else if s='abc' then … else if s='axx' then … else goto _ELSE; end; 'z': begin if false then else if s='zzz' then … else goto _ELSE; end; else begin _ELSE: … end; end; end;
Using a precompiler could also be a solution to problems like this
but being forced to always apply an extra compilation step is probably overkill.
Unfortunately Pascal typically has no such thing as a preprocessor.
See also the chapter
Switch on strings (C/C++).
Have you ever tried to program a counting loop in C
which works correctly in all cases,
especially including empty ranges and
the complete range of a type such as 0…255
for unsigned char aka byte without using a bigger type?
Your first attempt probably resulted in an endless loop such as this:
unsigned char i; for (i=0; i<=255; ++i) {…}
Here obviously i<=255 is always true.
The problem is that C's for loop
is essentially only a while loop
with an additional initialization and an incrementing part,
i.e. only
syntactic sugar.
While being quite flexible
allowing for almost any kind of loop
this construction is much more clumsy and error-prone than
Pascal's and Modula's for statement:
The looping variable must be written at least 3 times
and you must pay attention to use the correct relation operator
which must match the incrementing or decrementing step.
Also, you should pay attention that the comparison is as simple as
possible, i.e. that the ending value is only a variable or constant.
It's just too easy to make it wrong.
As far as I can see
there is no general solution using only one simple conditional expression;
also a simple while loop is unable to iterate over the whole range.
In the following I assume that the ending value is constant for the whole loop.
The looping variable should be considered read-only
in the loop body and undefined elsewhere.
We do not want to duplicate the loop body.
One valid general solution for a loop
on v
using type t
over values from s to e
is the following:
t e = … t s = … if (s <= e) { t v=s; while (true) { … if (v == e) break; ++v; } }
It would be nice if we could find a macro which does the stuff. Unfortunately a single macro forces us to pack the statement list into one macro parameter, i.e.
FOR(t,v,s,e, statement_list )
This not only looks strange, i.e. ending a statement list with ")", but also trashes line numbering for these statements if multiple lines are used. In case of a syntax error the compiler would not give us a helpful line number. Also this effectively disables the some of the macros mentioned in Switch on strings (C/C++) where line numbers are part of the inner magic.
A pair of matching macros
FOR1(t,v,s,e) statement_list FOR2(v)
obviously does the job:
#define FOR1(t,v,s,e) \ { \ t v,_e_; \ v = (s); \ _e_ = (e); \ if (v <= _e_) { \ while (true) { \ { #define FOR2(v) \ } \ if (v == _e_) \ break; \ ++v; \ } \ } \ }
This however is ugly because we have to provide the looping variable twice. We also must pay attention that these macros match, especially when we want to provide macros for incrementing and decrementing loops.
Much better is a looping macro and a generalized END macro which does not need any parameters, e.g.:
FOR(unsigned char, v, 0, 255) statement_list END
This can be realized as follows:
#define FOR(t,v,s,e) \ { \ t _v_, _e_; \ _v_ = (s); \ _e_ = (e); \ if (_v_ <= _e_) { \ switch (0) { \ while (_v_ != _e_) { \ ++_v_; \ default: \ { \ const t v = _v_; #define END }}}}}
Here I give up some simplicity and possibly some speed to achieve the goal.
You can find this macro and other related macros for
Modula-style control flow statements in
modula.h
and a small test program for the loop macros in
testloop.c.
By the way:
The many seemingly superfluous braces are needed
in order to be able to use the same END macro
not only for this FOR macro, but also for the
string switch macros
and all other related macros.
You will find their formal grammar in
modula.txt.
For data exchange purposes it is necessary to be explicit
about the endian (i.e. byte order) of stored values.
I have often seen the usage of converting routines
which do a byte swap if needed;
as far as I can see this is error-prone and not very elegant.
To cope with the problem in an easy way
we often use special objects/records/structs/classes which make
the endian explicit;
the access thereof is done by methods which perform the byte swaps;
doing so makes it nearly impossible to use them in a wrong way.
Additionally the source code directly shows the endian
in the definition of the transfer record.
Several such records
(such as a 3 byte Motorola type for scanner usage, a reminiscent of SCSI)
can easily be combined into byte packed records which can be
directly exchanged with the file I/O system.
Obviously it is often necessary to force byte-wise structure packing
with a compiler directive.
You should always use the same interface for all endian types.
It might make sense to force the exclusive usage of such types in
exchange records as a company policy.
You can find a simple example in endian.*.
Free Pascal defines one of these symbols:
Obviously this method to divide the storage from its access can easily be extended to other uses as well such as dealing with uncommon floating point formats.
If you want to acquire a singleton or need to build a structure only once as the hash maps in Switch on strings (C/C++), the simplest solution is this:
bool ready = false; … if (!ready) { // Acquire singleton / build structure ready = true; }
However this is not at all thread-safe. Instead you may use the following locking strategy (the first if should be as fast as possible):
volatile bool ready = false; … if (!ready) { // Set a load barrier // Enter critical section if (!ready) { // Acquire singleton / build structure // Set a store barrier ready = true; } // Leave critical section }
This is the fastest method I could find.
Unfortunately this pattern is not without problems,
especially on multi-processor machines,
which are related to memory access ordering and caching.
volatile is necessary to force ready
to be always be fetched from memory,
because otherwise other threads have no chance to see a change
in the second if, when ready is cached in a register;
moreover the second if could be optimized away.
The critical section and the second if
is necessary to be thread-safe.
The memory barriers are necessary when there are multiple processors
which use out-of-order memory accesses.
For many systems these barriers are not necessary.
See also an article about
Double-checked locking.
Assuming that numbers are processed in
two's complement representation
(i.e. -1 = ~0)
several simple operations are possible
which look for the rightmost clear / set bit.
Unfortunately there is no similar set of operations for the leftmost bits.
The (yet?) only x86 instruction which can mirror bits is
AMD's
new
XOP
instruction
VPPERM.
As an example the source bit pattern
…x01…
stands for an arbitrary number of bits (possibly none)
followed by a 0 bit
followed a contiguous row of 1 bits (possibly none);
this means that we have to look for the rightmost 0 bit.
There is e.g. an operation
…001…
which zeros out all the x bits
[x & ~(x+1)].
Another operation
…~11…
inverts all x bits and toggles the rightmost 0 bit
[~x | (~x-1)].
For some of the non-trivial operations there are
special instructions
for newer x86 processors which are from the sets
BMI1
("bit modification instructions")
and
TBM
("trailing bit modification", currently AMD only).
The given x86 instruction sequences modify the eax register.
Please note that the table sports hints when you move your
mouse over the fields.
To illustrate a concrete example
we take the binary number 1011 (decimal 11)
and apply the INC instruction
(…x01… => …x10…).
We locate the rightmost 0 bit (1011)
and then invert this 0 and all 1 bits to the right of it;
we get 1100 (decimal 12).
Let us repeat this action.
We locate the rightmost 0 bit 1100
and then invert this 0 and all 1 bits to the right of it (there are none);
we get 1101 (decimal 13).
Indeed, this function increments the given number.
Source | Target | Mode | Instruction | Set | Formula(s) | Implementation in x86 assembler | 0 | -1 |
---|---|---|---|---|---|---|---|---|
…x01… Search for lowest 0 bit |
…000… | 0x00 | MOV | 0 | xor eax,eax / mov eax,0 | 0 | 0 | |
…001… | 0x01 | x & ~(x+1); x & (~x-1) | lea edx,[eax+1]; not edx; and eax,edx | 0 | -1 | |||
…010… | 0x02 | BLCIC | TBM | ~x & (x+1) | lea edx,[eax+1]; not eax; and eax,edx | 1 | 0 | |
…011… | 0x03 | BLCMSK | TBM | x ^ (x+1) | lea edx,[eax+1]; xor eax,edx | 1 | -1 | |
…100… | 0x04 | ~(x ^ (x+1)); x ^ ~(x+1) | lea edx,[eax+1]; xor eax,edx; not eax | -2 | 0 | |||
…101… | 0x05 | BLCI | TBM | x | ~(x+1); x | (~x-1) | lea edx,[eax+1]; not edx; or eax,edx | -2 | -1 | |
…110… | 0x06 | T1MSKC | TBM | ~x | (x+1) | lea edx,[eax+1]; not eax; or eax,edx | -1 | 0 | |
…111… | 0x07 | MOV | -1 | or eax,-1 / mov eax,-1 | -1 | -1 | ||
…x00… | 0x08 | BLCFILL | TBM | x & (x+1) | lea edx,[eax+1]; and eax,edx | 0 | 0 | |
…x01… | 0x09 | - | x | - | 0 | -1 | ||
…x10… | 0x0a | INC | x+1 | inc eax | 1 | 0 | ||
…x11… | 0x0b | BLCS | TBM | x | (x+1) | lea edx,[eax+1]; or eax,edx | 1 | -1 | |
…~00… | 0x0c | ~(x | (x+1)); ~x & ~(x+1) | lea edx,[eax+1]; or eax,edx; not eax | -2 | 0 | |||
…~01… | 0x0d | ~x-1; ~(x+1); -2-x | not eax; dec eax | -2 | -1 | |||
…~10… | 0x0e | NOT | ~x | not eax | -1 | 0 | ||
…~11… | 0x0f | ~(x & (x+1)); ~x | ~(x+1) | lea edx,[eax+1]; and eax,edx; not eax | -1 | -1 | |||
…x10… Search for lowest 1 bit |
…000… | 0x10 | MOV | 0 | xor eax,eax / mov eax,0 | 0 | 0 | |
…001… | 0x11 | TZMSK | TBM | ~x & (x-1); ~(x | -x); (x & -x)-1 | lea edx,[eax-1]; not eax; and eax,edx | -1 | 0 | |
…010… | 0x12 | BLSI | BMI1 | x & -x; x & (~x+1) | mov edx,eax; neg eax; and eax,edx | 0 | 1 | |
…011… | 0x13 | BLSMSK | BMI1 | x ^ (x-1); -x ^ ~x | lea edx,[eax-1]; xor eax,edx | -1 | 1 | |
…100… | 0x14 | x ^ -x | mov edx,eax; neg eax; xor eax,edx | 0 | -2 | |||
…101… | 0x15 | BLSIC | TBM | ~x | (x-1) | lea edx,[eax-1]; not eax; or eax,edx | -1 | -2 | |
…110… | 0x16 | x | -x | mov edx,eax; neg eax; or eax,edx | 0 | -1 | |||
…111… | 0x17 | MOV | -1 | or eax,-1 / mov eax,-1 | -1 | -1 | ||
…x00… | 0x18 | BLSR | BMI1 | x & (x-1) | lea edx,[eax-1]; and eax,edx | 0 | -2 | |
…x01… | 0x19 | DEC | x-1 | dec eax | -1 | -2 | ||
…x10… | 0x1a | - | x | - | 0 | -1 | ||
…x11… | 0x1b | BLSFILL | TBM | x | (x-1) | lea edx,[eax-1]; or eax,edx | -1 | -1 | |
…~00… | 0x1c | ~(x | (x-1)); -x & ~x | lea edx,[eax-1]; or eax,edx; not eax | 0 | 0 | |||
…~01… | 0x1d | NOT | ~x | not eax | -1 | 0 | ||
…~10… | 0x1e | NEG | -x | neg eax | 0 | 1 | ||
…~11… | 0x1f | ~(x & (x-1)); -x | ~x | lea edx,[eax-1]; and eax,edx; not eax | -1 | 1 |
I assigned the operation modes
in order to demonstrate additional properties.
Let us define the function
tbm(x, mode)
where mode is the operation mode.
You will find an implementation in
perm_bas.*.
Generally the following holds:
tbm(x, mode) = tbm(x, mode & 0x11) | tbm(x, mode & 0x12) | tbm(x, mode & 0x1c)
For (mode & 0x08) == 0 (upper bits are filled with 0 or 1) the following holds:
tbm(x, mode) = ~tbm(x, mode ^ 0x07) tbm(x, mode) = tbm(~x, mode ^ 0x10)
For (mode & 0x08) != 0 (upper bits are kept or inverted) the following holds:
tbm(x, mode) = ~tbm(~x, mode ^ 0x13) tbm(x, mode) = tbm(~x, mode ^ 0x14) tbm(x, mode) = tbm(x+q, mode ^ 0x10)
Here q = 1 for (mode & 0x10) == 0 and q = -1 for (mode & 0x10) != 0.
Would not it be nice to be able to externalize the logic of looping
or other "framing routines"?
There is a solution:
iterators!
It is a shame that this simple mechanism has not caught on
but instead is usually mimicked by inferior and/or complicated constructions;
try to implement an iterator which steps a tree.
Life could be so simple.
Caution: Sarcasm present in this section.
Once upon a time there were some clever folks who designed ALGOL 58 aka IAL (1958). This language proposed a special parameter passing method which was called pass-by-name and allowed to evaluate parameters in the context of the caller whenever the called procedure wanted to do so. At least in ALGOL-60 (1960) this feature really worked. Some gurus such as Donald E. Knuth liked it but it seems that most programmers have not understood its striking value. As it is very simple to abuse this feature, especially when using it to also write to parameters, it became discredited and consequently was more or less banned from subsequent languages…
A different approach specialized for loops was implemented in Alphard and a bit later in CLU (1975). The idea was to provide a special loop statement where most of the loop control was performed by an "iterator".
The
C++ STL
features iterators which are somewhat flexible
but neither simple nor are intuitively or fool-proof to use.
The delegates of
C++11
do the intended job but are
also neither simple nor readable nor produce perfect code.
C/C++ does not support nested functions.
MetaWare
aka
Synopsys
however demonstrated a collection of compilers (C/C++/Pascal)
which supported nested functions and iterators probably since 1985.
The committees seem to shy away from introducing simple things like
nested functions because
"nobody needs these and it can be emulated anyway",
also, communities like
BOOST
seem to prefer hacks to clean solutions
in the very same spirit.
This is IMHO programming with pincers and hammers.
C# also features iterators but these can not delegate the yield
to other routines nor can act recursively.
Instead of doing what is needed in a straight-forward way,
they do a
monstrous job
of code morphing,
thereby generating a class and a state machine.
Bizarre!
The delegates of C# do the intended job like the ones of C++ but are
also too complicated.
There must be a reason why some containers such as lists
offer a dedicated iterator method such as
ForEach
which is much faster and simpler than the
foreach
language feature…
C# does not support nested functions.
ISO Pascal features procedural parameters capable of transferring nested procedures for a long time. Also, Free Pascal supports nested procedures as parameters since version 2.6.0. Unfortunately Borland/Embarcadero did not implement them until now although their Turbo-Vision (1990) sketched iterators by using nested procedures in an unsafe way.
IMHO the solution was present almost from the very beginning
but maybe the necessities were not really understood correctly.
All that is needed is the ability to transfer a block of code
with the meaning of a nested procedure;
evidently this block should have access to all variables as usual.
The potential looping should not be thought
to be the duty of the calling code,
hence break is not an option.
Also, there is no need to generate any object,
capture any variable
or to switch between contexts or even coroutines;
simple function calls are sufficient;
needless to say, the appropriate scope parameter
must be provided as is usual when dealing with nested functions.
Several years ago
I published a
working emulation of iterators
for Delphi on the Internet
but got almost no feedback.
I also posted a
feature request
on Borland's site.
Same result.
Nevertheless this emulation has become an essential tool in
the code of the company I work for
and proved to greatly simplify life by avoiding complicated alternatives.
What is your opinion? Please tell me, e-mail: info@sirrida.de