These are my proposals for Pascal
but some ideas are applicable for other programming languages as well.
I had most of the proposals posted in a talk of
Modernized Pascal
and/or in the
development forum
of
Free Pascal.
Comments or questions? E-mail: info@sirrida.de
In many cases it makes sense to allow an additional trailing separator as well as allowing for empty lists, especially when dealing with conditional compilation. As an additional benefit it makes it easier to move around lines of code.
Here are some (more or less stupid) examples:
// no hassle with the last comma, empty uses (if neither x1 nor x2 defined) uses (*$ifdef a *) unit_a, (*$endif *) (*$ifdef b *) unit_b, (*$endif *) ;
// empty case statement (if neither x1 nor x2 defined) case expr of (*$ifdef x1 *) 1: write('x1'); (*$endif *) (*$ifdef x2 *) 2: write('x2'); (*$endif *) end;
// no hassle with the last semicolon: extensible parameters procedure f( const s: string; x,y: integer; );
// no hassle with the last comma: extensible sets / arrays function_call([ 1, 2, ]);
// no hassle with the last comma: extensible parameters writeln( var_1, var_2, );
// empty type list (if neither x1 nor x2 defined) // in some cases types must be in one block type (*$ifdef x1 *) t1=integer; (*$endif *) (*$ifdef x2 *) t2=integer; (*$endif *)
There should be no problems concerning compatibility and implementation.
What about better readable numbers such as 100_000 or $_1234_5678 (note the "_" signs)?
A number still should be started with "$" or a digit. All occurring "_" shall simply be accepted and ignored.
A base should be specifiable as well, e.g.
These ideas are stolen from e.g. Ada.
There should be no problems concerning compatibility and implementation.
I would like to propose to enhance the case statement for floats. What is needed then is the ability to test for open ranges and half open ranges.
case float_expr of <2 : ; // match all below 2 >=2 .. <4: ; // match 2<=x<4 4 : ; // match 4 >4 .. <=6: ; // match 4<x<=6 8 .. 9 : ; // match 8<=x<=9 >9 .. 10 : ; // match 9<x<=10 end;
In ranges the left expression should be less than the right one and therefore the first relational operator should be ">" or ">=" and the second one "<" or "<="; if the operators are absent ">=" resp. "<=" should be assumed.
In Visual Basic .NET an "is" is added before the relational operator.
An alternative syntax idea could be ranges like 4<..<=6.
The string-case currently realized in Free Pascal allows for ranges which is nice but in reality most often not what is effectively needed. Typically needed instead is a matching of string beginnings (prefixes). A more or less stupid idea for the syntax could be:
case str_expr of 'match'.. :; // match all strings beginning with 'match' end;
However the parser needs to be changed to accept the missing second argument of "..".
See also speeding up case of string and other chapters nearby as well as the chapters about hashing.
What about arrays with an unspecified upper bound such as this?:
type ta_byte = array [0..] of byte;
The lower bound 0 could be optional. No bounds checking should occur on such array accesses. The size of such type should be 0.
Example usages:
I often would like to explicitly discard a function result such as
void my_func(x);
The usual workaround is
my_func(x); // using "extended syntax"
or
dummy := my_func(x); // dummy is not used, this gives a hint
Both workarounds are not the best way to express the intention. There should be a way to allow for the new syntax and at the same time provide a hint/warning for the workaround with the extended syntax.
There should be no problems concerning compatibility (extra switch) and implementation.
May I propose multiple assignments, e.g.
a,b := 0;
or
a,b += c+d;
The right hand expression should be evaluated once and assigned
(independently and in no particular order) to all the entries
to the left of ":=".
The operator "+=" is a C operator copied by Free Pascal.
In Delphi you would write inc(v,x).
See below.
A variant of the theme is the assignment of multiple initialized global variables:
var a,b: integer = 0;
I would like to propose an enhanced replacement for the special assignment operators +=, -=, *= and /=:
More Pascal-like would be constructs of the form "operator :=" such as
a + := 1; a div := 3; a or := b; a xor := 3; a shl := 1;
Together with my prior proposal of multiple assignments there could be stuff like below as well:
a,b + := c+d; a,b shl := 1;
I am not sure as how to further enhance this to monadic operators such as "-" and "not".
The C operators can be thought of a composition of a binary operator ("+", "-", "*", "/") and the C assignment operator "=", so my solution should be the natural pendant in Pascal.
My proposal is to replace the operators "op="
by one operator "op" and a
following ":=".
This can be further enhanced to work together with my other proposal for
multiple assignments.
Examples:
a + := 1; a + := pi; // float a + := 'bla'; // string a + := [1,2]; // set a < := b; // booleans, this is ugly but should be OK (but not with C-style operators), see PS a div := 3; a or := b; a xor := 3; a shl := 1; f()^ + := 1; // f might have side-effects! a[i+j] * := 2; // @a[i+j] only evaluated once a,b + := c+d; // c+d only evaluated once a,b shl := 1; a^.b[f(i)], c^.d[g(j)] + := h(k); // f,g,h might have side-effects, stuff evaluated only once
I cannot think of a more readable and more Pascal-like solution. For me the C-like pendants are less readable and appear to me as a too limited quick hack. The Delphi-style replacements such as inc/dec/include/exclude are even less usable since both assignment and the operation itself is no longer visible and to wordy. A new pseudo-procedure for every such operator is for sure not the best idea.
There are some reasons for my solution:
I object to the existing Free Pascal operators "+=", "-=", "/=", "*=" because:
Borland's alternative procedure notation with inc, dec, include, exclude is not better since they introduce a flood of global names and hide the operator nature.
The main reason for the proposal is not saving keystrokes but functionality and speed. It is meant to be an optimizing aid to the compiler being able to omit the common expression optimizer in this case without loss and to avoid side-effects. All parts of the statement should be evaluated once. Saving keystrokes and avoiding to write things twice are welcome side-effects.
This has also been the reason for including these: inc, dec, include, exclude, "+=", "-=", "*=", "/=".
I have often missed e.g. "* :=" and "shl :=".
Semantic of "a op := expr":
p := @a; // only once! p^ := p^ op expr;
Semantic of "obj.prop op := expr":
p := @obj; // address of object, only once! p^.prop := p^.prop op expr; // access property
Semantic of "a,b := expr":
tmp := expr; // only once! a := tmp; b := tmp;
Semantic of "a,b op := expr":
tmp := expr; // only once! p := @a; // only once! p^ := p^ op tmp; p := @b; // only once! p^ := p^ op tmp;
BTW: "a[random(2)] + := 1" is not equivalent to "a[random(2)] := a[random(2)] + 1" due to the side-effect of random()!
Let me propose some kinds of type attributes which are handled by the compiler. They mostly have no consequences to the generated code but can catch several common programming bugs. I am not really sure about the syntax.
CONST km = 1000 * m; mm = 0.001 * m; inch = 25.4 * mm; N = kg * m/(s*s); // Newton C = A*s; // Coulomb minute = 60 * s; speed_light = 2.99792458e+8 * m/s; // aka c earth_standard_velocity = 9.80665 * m/(s*s);To avoid confusion at least these constants should be defined in separate modules or other name spaces in order to be able to qualify the constants with the name space, e.g. dimensions.m or physics.c.
Assume you have a record containing several fields
where some of them have the same type,
e.g. forename, surname, street, or city in personnel records.
Also assume that you do not want to pack these into an array
but nevertheless want to take the same actions upon them
such as searching or sorting
without repeating the relevant code,
i.e. access them like array entries.
There ought to be facility to specify a field (at offset o)
independent of the used entity (base v).
This is where my proposal comes handy.
Under the hood, the compiler, the runtime library
and especially the runtime type information deals therewith,
but there was yet no corresponding high level mechanism
to express it.
The internal assembler features a similar syntax,
albeit one which is not always evident and not without ambiguity.
Syntax example,
see hints on green lines for an unoptimized (and untested)
realization in assembler:
TYPE ta_integer = array [0..1000] of integer; tr_rec = RECORD a,b: integer; c: array [0..1] of integer; END; VAR v1,v2: tr_rec; o1,o2: tr_rec ^ integer; // offset type v3: array [3..5] of integer; o3: ta_integer ^ integer; p: ^ integer; BEGIN
o1 := tr_rec @ a; // offset value
v1.[o1] := v2.[o1]; // offset usage
o2 := v1 @ c[1]; // another offset value (using type of v1)
p := @ v1.[o2]; // another offset usage
if o1 = o2 then ...
o3 := v3 @ [4]; // yet another offset value (array offset)
v3[3] := v3.[o3]; // yet another offset usage
... END.
The syntax proposal overloads the pointer and respectively address operator.
An alternative syntax might be
"OFFSET tr_rec OF integer"
and
"OFFSET(tr_rec,a)".
The symbol pair ".[" denotes "access at offset"
and should be applicable on
records,
objects,
classes and
arrays.
This allows fields to be addressed similar to array elements.
It might help to speed up multiple array accesses on the
same offset because
the multiplication can happen earlier.
In the implementation in assembler
the offset is simply added to the base address.
Nested offsets might be accessable like nested arrays as
".[i, j]".
A simple overloaded "[" would do the job too but
looks like an array access and thus
hides that we apply offsets.
Better be explicit.
There is no sensible default value as 0 might oder might not be valid.
In contrast to general offsets,
these specialized offsets can not be added, subtracted or even scaled.
They should be comparable for equivalence
(also via CASE statements);
other relations are possible but rarely make sense.
The WITH statement could be enhanced to handle offsets as well.
You may emulate the functionality as follows:
TYPE tp_integer = ^integer; ta_8u = ARRAY [0..maxint] OF byte; tpa_8u = ^ta_8u; tpr_rec = ^tr_rec; VAR v: tr_rec; o: integer; // tr_rec ^ integer; BEGIN
o := integer(@tpr_rec(nil)^.a); // o :=tr_rec @ a;
tp_integer(@tpa_8u(@v)^[o])^ := v.b; // v.[o] := v.b;
... END.
As you can see, the emulation is easy but some nasty things are necessary: We assume that nil is zero and we use a cast from pointer to integer, and there are some more castings. An alternative approach using "extended syntax" (pointer arithmetic) is a little better:
o:=PChar(@tpr_rec(nil)^.a)-PChar(nil); // o :=tr_rec @ a;
tp_integer(PChar(@v)+o)^ := v.b; // v.[o] := v.b;
My proposal however is clean and type-safe and does not need any castings.
An alternative emulation involving dispatcher procedures/methods
is as clean as my solution but many times more expensive.
ARRAY [0..maxint] as well as ta_integer from intro
are examples of a poor substitute of a
half open array definition.
Offsets could be convertible to and from (unsigned) integers in order to make them applicable to general routines using untyped pointers and/or parameters. This is obviously as unsafe as other hacks such as pointer conversion but allows for things which are not possible otherwise.
Sketched example from intro text:
type t_relation = (less,equal,greater); tr_personnel = record forename: string; surname: string; birthday: date; street: string; zip_code: integer; city: string; end; tr_personnel_string = tr_personnel ^ string; // offset type t_compare_personnel_string = class field: tr_personnel_string; function compare(const a,b:tr_personnel):t_relation; end; function compare_string(const a,b:string):t_relation; begin if a=b then result:=equal else if a<b then result:=less else result:=greater; end; function t_compare_personnel_string.compare(const a,b:tr_personnel):t_relation; begin
result:=compare_string(a.[field], b.[field]); // offset usage
end; const fields: array [0..3] of tr_personnel_string = (
tr_personnel @ forename, // offset value
tr_personnel @ surname,
tr_personnel @ street,
tr_personnel @ city);
var compare_personnel_string: t_compare_personnel_string; index: 0..3; begin compare_personnel_string := t_compare_personnel_string.create; ... // Select cities to be compared
compare_personnel_string.field := tr_personnel @ city; // offset value
... // Select streets to be compared
compare_personnel_string.field := tr_personnel @ street; // offset value
... // Select a string field to be compared index := 2; // tr_personnel @ street
compare_personnel_string.field := fields[index];
... // Assuming l_personnel contains a collection of tr_personnel. // Sort them according to field: l_personnel.sort(compare_personnel_string); ...
A built-in 3-way-comparison relation operator would be helpful for compare_string obsoletizing t_relation…
While Delphi's method references (delegates) are very handy and useful
combining an object reference with a method reference,
sometimes it makes sense to have a facility which does not combine these.
A typical situation is that you want to preselect a method which
should be called for every object of a collection
or when you want to sort using different comparing methods,
and you do not want to duplicate steering code.
As for
offsets
you can emulate this costly by dispatcher procedures/methods.
C++ supports only such a
mechanism
but the resulting code is not as fast as the delegates mentioned above
because in contrast to these
the method resolution (dispatching) is deferred to call time.
Also, there is no way to specify the kind of the call
which can be virtual or non-virtual;
this would simplify dispatching.
The deferred dispatching task can be solved by a flag and resolved at runtime
or by using thunks.
Multiple inheritance (especially virtual multiple inheritance) of C++
makes things even worse.
Thus, using this mechanism to simulate
delegates is needlessly complicated and limited.
In other words: Both, delegates and C++ style method references are needed.
Methods can directly be invoked
and can carry as additional parameters
The meaning of virtual and dynamic is effectively the same
but differs in implementation.
Unfortunately and without convincing reason Delphi
does not allow the very useful combination of
static (misleading word)
and
virtual/dynamic.
This asininity is also present in C++ and C# as well
which even do not know about proper class references.
Declaration:
ret_type "(" class_type
"::*" method_ref ")"
"(" parameters ")"
Assignment:
method_ref "=" "&" class_type
"::" method_name
Usage via object:
"(" object ".*" method_ref ")"
"(" parameters ")"
Usage via pointer to object:
"(" object_pointer "->*" method_ref ")"
"(" parameters ")"
Example:
class t { public: int m(char x) { return 0; } }; typedef int (t::*tpm) (char x); // declaration of pointer to method type void test() { t o; // object t *po = &o; // pointer to object tpm pm; // declaration of pointer to method variable pm = &t::m; // assignment int i = (o.*pm)('x'); // usage via object int j = (po->*pm)('x'); // usage via pointer to object }
Declaration:
dispatch
[class_type]
"^"
kind method_signature
Assignment:
method_ref ":="
(class_type | object) "@" method_name
Usage:
(class_type | object) ".["
method_ref "]"
["(" parameters ")"]
method_signature:
(PROCEDURE
["(" parameters ")"])
| (FUNCTION
["(" parameters ")"] ":"
ret_type )
dispatch:
ABSOLUTE
| VIRTUAL
| DYNAMIC
| VARIANT
kind:
STATIC
| CLASS
| [OBJECT]
Example:
type t = class public function m(x:char):integer; end; tpm = absolute t ^ object function(x:char):integer; // declaration of method reference type function t.m(x:char):integer; begin result := 0; end; procedure test(); var o: t; // reference to object pm: tpm; // declaration of method reference variable i,j: integer; begin pm := t @ m; // assignment i := o.[pm]('x'); // usage end;
The proposed syntax is very similar to the one proposed for
calculations with offsets.
However, the mandatory keyword for the dispatching mechanism
tells the compiler that we want to map methods instead of fields.
In the declaration,
class_type is mandatory for object/class methods
and optional only for static methods.
Applying class_type on usage (call) evidently
does not make sense for object methods
but can be used for static and class methods.
As an extension,
class_type might be also an INTERFACE.
Currently in Delphi only supports object interfaces for
object methods of classes;
here, properties effectively being syntactic sugar
must be implemented by such methods.
The implementation of method references via INTERFACE
is essentially the same as for virtual object methods.
As methods may be non-virtual, virtual and dynamic in Object Pascal,
there are the dispatching styles
ABSOLUTE, VIRTUAL and DYNAMIC.
In context with an object,
these method references which point to regular methods (non-static)
shall be assignable to delegates;
likewise the static ones
shall be assignable to procedure pointers.
The used calling convention is part of the method types.
Usually, different calling conventions are incompatible
and not convertible between each other.
When dispatching on VARIANT style method references
one of the other dispatching styles are chosen at runtime.
A small RTL routine can do the job.
To do this, some kind of flag must be stored along with the method references.
Let us assume for 32 bit code that
procedure addresses are aligned 4 (lowest 2 bits are 0)
and
VMT offsets may be signed but do not need more than 31 bits.
DMT indexes already have an even smaller range (16 bit).
Hence, I propose the following:
Lower 2 bits of VARIANT style method reference Q:
For 16 and 64 bit code similar assumptions hold or could be forced. As a last resort an additional flag could be used.
static | class | object | |
---|---|---|---|
absolute | Details | Details | Details |
virtual | Details | Details | Details |
dynamic | Details | Details | Details |
variant | Details | Details | Details |
Here are some other ideas, not fully specified and only sketched. Keywords are displayed in uppercase letters.
FOR VAR i: type := ... FOR VAR i := ... // implicit type
FOR REPEAT count DO ...
FOR VAR i := start REPEAT count DO // start..start+count-1 ... FOR VAR i := start DOWNTO REPEAT count DO // start..start-count+1 ...
CONST s = [1,2]; ... CASE e OF s: ...or
CASE e OF [1,2]: ...instead of
CASE e OF 1,2: ...
LOCAL // declarations BEGIN ... ENDAn alternative keyword could be BLOCK or DECLARE. The local declarations shall only be visible inside LOCAL…END.
WITH v: expr DO ...This form can (and IMHO should) replace the classical unnamed form. The compiler should optionally output a hint or warning when using unnamed WITH.
CASE rel_expr OF <: ... =: ... >: ... END;or, similar to FORTRAN's "three-way if", we could write in Pascal the expression if(rel_expr,less_expr,equal_expr,greater_expr); evidently this syntax could be applied to boolean values as well such as if(bool_expr,true_expr,false_expr) (comparable to C's ternary "?:" operator). Some other languages such as Perl, PHP (since version 7), Ruby and Groovy support such a three-way comparison operator featuring the "spaceship-operator" "<=>". As an extension such expressions might be chainable in a short-circuit way such that the first relation yielding less or greater determines the result prematurely.