Forum: War Ensemble BBS

locals (was: Coroutines in Forth)

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 04:47:12 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

There's also the realization that computer memory except for a few >specialized Forth chips is always made from RAM. So ideological
devotion to a pure stack VM seems to pass up perfectly good hardware >capabilities.

With competent Forth compilers, the machine code is 1) the same when
using stack operations, when using the return stack, or when using
locals, and 2) no RAM access is happens (unless the compiler runs out
of registers). This is demonstrated by lxf on the 3DUP variants <[email protected]>; to spare you having to
look this posting up, here's the relevant part:

|: 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;
|: 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;
|: 3dup.3 {: a b c :} a b c a b c ;
|: 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;
|
|These four ways of expressing 3DUP are all compiled to exactly the
|same code by lxf/ntf:
|
| 804FC0A 8B4500 mov eax , [ebp]
| 804FC0D 8945F4 mov [ebp-Ch] , eax
| 804FC10 8B4504 mov eax , [ebp+4h]
| 804FC13 8945F8 mov [ebp-8h] , eax
| 804FC16 895DFC mov [ebp-4h] , ebx
| 804FC19 8D6DF4 lea ebp , [ebp-Ch]
| 804FC1C C3 ret near

That leads to the questions in this discussion:

1) Should we optimize for less competent compilers? Why?

a) If yes, should we optimize all code, or only the part of the
code that is actually executed frequently?

2) Are there other criteria for deciding between the alternatives?
Which ones?

Gforth does support address-like locals if you want to use them.

Gforth has provided variable-flavoured locals since I implemented
locals (in 1994), because I had the idea that using ! is preferable to
using TO, but in practice I did not use variable-flavoured locals, and
instead preferred to avoid TO by defining locals where their value is
known, and then just using them (possibly defining additional locals
instead of using TO on existing locals). And AFAIK others have rarely
used variable-flavoured locals, either.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 24 23:21:28 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

With competent Forth compilers, the machine code is 1) the same when
using stack operations, when using the return stack, or when using
locals

"Competent Forth compilers" there describes what by Forth standards
would be called quite fancy optimizing compilers ("analytic compilers").
They are a significant technical feat and there aren't that many of
them. Traditionally Forth has been implemented as simple interpreters.

In that case, a pure stack VM seems to ignore capabilities of the
underlying hardware. Particularly, the the stack's memory actually
being RAM. Doesn't PICK go back to the earliest days of Forth, as a way
to bypass the limitation?
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 05:26:47 2026

From Newsgroup: comp.lang.forth

Hans Bezemer <[email protected]> writes:

If you want to use a language that is "ideologically devoted" to the >architecture, maybe you shouldn't use Forth at all - and stick with C.

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register allocation to those that are so reliable that you usually count on
them).

Given the stuff I read about Chuck Moore's goals in designing Forth
and what I read about the development of BCPL, B, and C, it's not too surprising that they are close to the hardware of the time when they
were designed. It is interesting that both Forth and C standards
(and, to some extent, implementations) have not reflected newer
architectural features such as SIMD instructions. At least they
managed to reflect different machine-word sizes (BLISS didn't,
resulting in differences between BLISS-10, BLISS-11, and BLISS-32, and
its losing against C despite having superior compilers for more than a
decade.

I know there are situations when there are six values on the data stack
and four on the return stack which leave you with few other options. But
you can always use vanilla variables or an extra stack (which is trivial
to implement) to remedy that.

Using Forth means being resourceful. Not to choose the most convenient
and lazy solution imaginable.

According to <https://www.dictionary.com/browse/resourceful>:

|able to deal skillfully and promptly with new situations,
|difficulties, etc.

Forth systems that do not implement locals are not a new situation.
So do you mean to say that it is a difficulty? I would agree. That's
fine if you are using a tiny system and do not want to use an umbilical/tethered system, but if the system is big enough to support
locals, lack of locals of the system shows the lazyness of the system implementor.

But blaming the programmer for the system implementor's failings is a
tactic used widely by system implementors (in the C world as well as
in the Forth world), and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich Drepper)), and enough people fall for this that they repeat such
arguments and come up with additional arguments of this kind.

In any case, why should it be better to use an inconvenient solution
that requires more work rather than a convenient solution that
requires less work (i.e., is lazy)?

For me virtues in programming are to produce correct code, to produce
it quickly, the code should use the resources economically (which does
not mean that saving a few bytes on a machine with GBs of memory is
virtuos), and the code should be readable and maintainable.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 24 23:55:16 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register allocation of any of the three is similarly difficult...

I believe early C compilers didn't attempt much if any register
allocation. You could say "register int x" to manually assign a
register to x if one was available. You were limited to 2 or 3 of those
on the PDP-11. Local variables in C otherwise lived in the stack. The difference was that the C compiler generated straightforward assembly
code to access those variables even when they were in the stack
interior. You didn't have to use ROT or juggle stuff to the R stack to
get to the inner elements.

In assembler, you could also program in a stack-oriented style yet straightforwardly access the inner elements. Forth for whatever reason
chose strict stack discipline (with some loopholes like PICK). I
understand wanting to stay with purity of a model, but a more hardware-sympathetic model would have been "stack implemented in RAM".

So I still don't understand the benefit of the "pure abstract stack"
approach, other than for a few weird special CPU's.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 06:43:23 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

With competent Forth compilers, the machine code is 1) the same when
using stack operations, when using the return stack, or when using
locals

"Competent Forth compilers" there describes what by Forth standards
would be called quite fancy optimizing compilers ("analytic compilers").
They are a significant technical feat and there aren't that many of
them. Traditionally Forth has been implemented as simple interpreters.

And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work; on current non-tiny machines, the latter aspect still
exists, and IMO is a big motivation for anti-locals advocacy (i.e., a sour-grapes argument).

It's a bit perverse: You argue for locals with simple implementations,
while anti-locals advocates argue against locals with simple
implementations.

And because it's more work, there are fewer sophisticated than simple
systems. But who cares how many there are? The question is what
programmers and users use and what their goals are.

In any case, when it comes to performance measurements on "simple
interpreters" like the Gforth of 1994, Forth code with locals usually
turns out to be slower and consume more memory than Forth code using
(and trying to avoid) stack juggling. E.g., my paper [ertl94l]
contains the following comparison:

locals
with without ratio
max 3.56us 2.69us 1.32
strcmp 83.20us 70.50us 1.18

Numbers from a 486DX2/66, strcmp compares a string with 17 characters
with itself.

The explanation given is:

|The slowdown factor of using locals is due to the execution of more |primitives (e.g., 14 instead of 12 per character in
|"strcmp"). Originally there was also a large overhead due to fetching
|inline arguments, resulting in slowdowns of 1.58 for "max" and 1.41
|for "strcmp". This overhead has been eliminated mostly by using
|versions of the primitives specialized for frequent inline arguments
|(e.g., "8lp+!" as specialization of "lp+!#" with the inline
|argument 8).

@InProceedings{ertl94l,
author = "M. Anton Ertl",
title = "Automatic Scoping of Local Variables",
booktitle = "EuroForth~'94 Conference Proceedings",
year = "1994",
address = "Winchester, UK",
pages = "31--37",
url = "https://www.complang.tuwien.ac.at/papers/ertl94l.ps.gz",
abstract = "In the process of lifting the restrictions on using
locals in Forth, an interesting problem poses
itself: What does it mean if a local is defined in a
control structure? Where is the local visible? Since
the user can create every possible control structure
in ANS Forth, the answer is not as simple as it may
seem. Ideally, the local is visible at a place if
the control flow {\em must} pass through the
definition of the local to reach this place. This
paper discusses locals in general, the visibility
problem, its solution, the consequences and the
implementation as well as related programming style
questions."
}

It might be interesting to measure this again on current hardware with
the current, somewhat more sophisticated, but not yet "competent"
Gforth, and maybe I will, at some other time. However, looking at the
code for Gforth for 3DUP.3 compared to the ohers, Gforth still uses
more primitives (even with superinstructions) and more machine
instructions; From <[email protected]>:

: 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;
: 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;
: 3dup.3 {: a b c :} a b c a b c ;
: 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;

And here's the gforth-fast code on AMD64:

3dup.1 3dup.2 3dup.3 3dup.4

r 1->0 third 1->2 >l >l 1->1 dup 1->1

mov -$08[r14],r13 mov r15,$10[r10] >l 1->1 mov [r10],r13
sub r14,$08 third 2->3 mov -$08[rbp],r13 sub r10,$08 2dup 0->2 mov r9,$08[r10] mov rdx,$08[r10] 2over 1->3
mov r13,$10[r10] third 3->1 mov rax,rbp mov r15,$18[r10
mov r15,$08[r10] mov [r10],r13 add r10,$10 mov r9,$10[r10]
i 2->3 sub r10,$18 lea rbp,-$10[rbp] rot 3->1
mov r9,[r14] mov $10[r10],r15 mov -$10[rax],rdx mov [r10],r15 -rot 3->2 mov $08[r10],r9 mov r13,[r10] sub r10,$10
mov [r10],r9 ;s 1->1 >l @local0 1->1 mov $08[r10],r9
sub r10,$08 mov rbx,[r14] @local0 1->1 ;s 1->1

2->1 add r14,$08 mov rax,rbp mov rbx,[r14]

mov -$08[r10],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08
sub r10,$10 jmp eax mov -$08[rax],r13 mov rax,[rbx]
mov $10[r10],r13 @local1 1->2 jmp eax
mov r13,[r14] mov r15,$08[rbp]
add r14,$08 @local2 2->1
;s 1->1 mov -$08[r10],r15
mov rbx,[r14] sub r10,$10
add r14,$08 mov $10[r10],r13
mov rax,[rbx] mov r13,$10[rbp]
jmp eax @local0 1->2
mov r15,$00[rbp]
@local1 2->3
mov r9,$08[rbp]
@local2 3->1
mov -$10[r10],r9
sub r10,$18
mov $10[r10],r15
mov $18[r10],r13
mov r13,$10[rbp]
lit 1->2
#24
mov r15,$50[rbx]
lp+! 2->1
add rbp,r15
;s 1->1
mov rbx,[r14]
add r14,$08
mov rax,[rbx]
jmp eax

[Note that for a superinstruction like ">l >l" or ">l @local0", all
threaded code cells are shown, the first as superinstruction, and the
remaining ones as the simple primitive in that threaded-code slot; but
the other threaded-code slots have no separate code generated.]

You seem to argue that the random-access aspect of locals provides a performance advantage on simple systems, but in most cases, code using
locals is at a performance disadvantage on such systems (and
traditionalists have often used that to argue against locals).

In that case, a pure stack VM seems to ignore capabilities of the
underlying hardware. Particularly, the the stack's memory actually
being RAM.

Keeping at least one stack item in a register leads to a smaller and
faster implementation, and is not more complex than keeping all the
stack memory in RAM. It does require enough registers, however (i.e.,
you do not use this technique on the 6502).

Doesn't PICK go back to the earliest days of Forth, as a way
to bypass the limitation?

A way to use RAM that is less frowned upon by Forth traditionalists is
(global) variables. The fact that the use of global variables is
frowned upon in the wider programming community for various reasons
seems to pour oil into the fire of their elitism.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 08:21:41 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

I believe early C compilers didn't attempt much if any register
allocation.

Yes, they did not allocate auto variables (what we consider locals) to registers.

The
difference was that the C compiler generated straightforward assembly
code to access those variables even when they were in the stack
interior. You didn't have to use ROT or juggle stuff to the R stack to
get to the inner elements.

That's the same with unsophisticated locals implementations like those
of Gforth (I do not mention other Forth systems with such
implementations to protect the guilty).

Forth for whatever reason
chose strict stack discipline (with some loopholes like PICK). I
understand wanting to stay with purity of a model, but a more >hardware-sympathetic model would have been "stack implemented in RAM".

What do you mean by that? Forth already provides PICK. ROLL and
-ROLL are either slow to implement in RAM or require significant sophistication. In addition, Gforth has

: stick ( x0 x1 ... xu x u -- x x1 ... xu ) \ gforth-internal
\ replace x0 with x; e.g., 5 PICK 1+ 5 STICK increments the 6th
\ stack element (not recommended).
2 + cells sp@ + ! ;

which is used in the Gforth source code 7 times (compared to 20 times
for PICK, 4 for FOURTH, 38 for THIRD, 308 for OVER and 1128 for DUP),
always with colon-sys-xt-offset as U, so STICK is only used to
manipulate colon-sys control-flow stack items. I have also had little
appetite to use it elsewhere.

In general, in Forth programming one copies things from various places
in stacks with DUP, OVER, PICK, and R@; sometimes you do not need the
item in its original place any more, then you SWAP, ROT or ROLL it
instead of keeping it on the stack and dropping it later (and the item
might be in the way). Very occasuinally, you copy an item deeper into
the stack, as with TUCK, or -ROT or -ROLL it out of the way.

But overwriting an existing stack item with something else as done by
STICK is not something we tend to do, and this also shows in the
absence of such words for the top few stack items (while 1 PICK is
OVER, there is no word that corresponds to 1 STICK). I think the
reason why it is not done is that we avoid keeping dead stack items
around that we might overwrite. Such dead stack items would often be
in the way.

And if someone has the desire for having a storage location that they
want to overwrite, Forth has locals (although I avoid overwriting
them, too, see <https://net2o.de/gforth/Locals-programming-style.html>).

So I still don't understand the benefit of the "pure abstract stack" >approach, other than for a few weird special CPU's.

The benefit of not implementing locals is that implementing the Forth
system takes less time and the resulting system is smaller.

PICK tends to be frowned upon because it is a code small that suggests
that you have too much going on on the stack, which makes the program
hard to understand, and you should be looking for alternatives.

ROLL and -ROLL are avoided for the same reason and because they are
slow on many implementations.

As for STICK, see above.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sat Apr 25 11:27:36 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

[email protected] (Anton Ertl) writes:

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult...

I believe early C compilers didn't attempt much if any register
allocation. You could say "register int x" to manually assign a
register to x if one was available. You were limited to 2 or 3 of those
on the PDP-11. Local variables in C otherwise lived in the stack. The >difference was that the C compiler generated straightforward assembly
code to access those variables even when they were in the stack
interior. You didn't have to use ROT or juggle stuff to the R stack to
get to the inner elements.

In assembler, you could also program in a stack-oriented style yet >straightforwardly access the inner elements. Forth for whatever reason
chose strict stack discipline (with some loopholes like PICK). I
understand wanting to stay with purity of a model, but a more >hardware-sympathetic model would have been "stack implemented in RAM".

There are more loopholes, once you think of it.
Suppose you have a recursive integration algorithm. Define an object
that contains all relevant recursive data. Allocate it on the data
stack ( DSP@ size - DSP! ) and make it the current object (DSP@ ^recdat !)
Free the stack once you're done ( DSP@ size + DSP! ) .
In this context you are using normal float, not weird locals, and .
your choice of normal or single floats.
[ More politically correct is probably to ALLOCATE FREE for no
clear benefit. ]

So I still don't understand the benefit of the "pure abstract stack" >approach, other than for a few weird special CPU's.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sat Apr 25 11:43:30 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
<SNIP>

locals
with without ratio
max 3.56us 2.69us 1.32
strcmp 83.20us 70.50us 1.18

Interestingly, I don't allow complicated definitions with assembler implementations in ciforth.
E.g. + XOR 0< EXECUTE are all low level, not much more.
String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.

<SNIP>

- anton

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 10:22:16 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.

In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.

A definition of STRCMP in the paper is

: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;

So in the loop we have a loop count (on the return stack), two cursors
(s1 and s2) into the compared strings, and within the loop body we
additionally have the two characters, for a total of five live values,
three of which survive across iterations and are changed in every
iteration. One could implement it as

\ untested, and the following versions, too
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

where only one of the values changes in each iteration, but now the
?DO...LOOP cannot be replaced with a version that does not store a
second value but counts down (or up) to 0, so now we have a total of 6
live values, four of which survive across iterations, and one is
changed on every iteration.

One can reduce this by one value by keeping one of the cursors in the
loop counter:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

So now we have five live values in the body of the loop at the same
time, three of which live across iterations, and one of which changes
in each iteration. Keeping the loop parameters separate significantly
lessens the load on the data stack.

Let's see if we can eliminate the local from the loop body:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - ?dup
if
nip unloop exit
then
loop
drop u1 u2 - ;

That leaves stack purists with the task of eliminating the locals from
the prologue and epilogue of this word. Two items have to be stored
across the loop, or the difference could be computed speculatively and
only one item stored across the loop. And the computations before the
loop involve four values alive at the same time (fortunately addr2 is
does not live long). Let's see:

: strcmp {: addr1 u1 addr2 u2 -- n :}
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - ?dup
if
nip unloop r> drop exit
then
loop
drop r> negate ;

As can be seen by the many stack comments, the stack load here is more
than I can easily deal with.

Maybe a stack purist can improve on that. But can he improve it
enough to make it as easy to understand as any of the versions with
locals?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Sat Apr 25 15:43:06 2026

From Newsgroup: comp.lang.forth

On 25-04-2026 07:26, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register allocation to those that are so reliable that you usually count on
them).

Well, you're actually shooting at Paul Rubin - not at me. Thank you! I
take all the help I can get!

Using Forth means being resourceful. Not to choose the most convenient
and lazy solution imaginable.

According to <https://www.dictionary.com/browse/resourceful>:

|able to deal skillfully and promptly with new situations,
|difficulties, etc.

That's EXACTLY what I meant!

Forth systems that do not implement locals are not a new situation.
So do you mean to say that it is a difficulty?

You're completely beside the point I wanted to make. I meant the design
or algorithm one has to implement.

But blaming the programmer for the system implementor's failings is a
tactic used widely by system implementors (in the C world as well as
in the Forth world).

YAGNI is not a "system implementers failing". It is a choice he made,
because you (a) really don't need it - or (b) if you need it you can add
it yourself. Which all seems very Forth like.

(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich Drepper).

It's your own pal Bernd that said: "A good programmer will write even
better code in Forth. A bad programmer will write abysmal code in Forth.
And I'm sorry to say - but most programmers are quite bad."

So, either you agree with him or we have an unfortunate departure of one
of the most foremost members of Gforth. Because this states - in no
uncertain words - that Forth programmers *ARE* elite.

Which in itself is a defensible position. I mean - we're 0.1% of the programming population according to TIOBE. I blame it soley on our
inability to procreate, but you may put up some other viable explanation.

Moore himself thinks we're elite: "I must say that I'm appalled at the
code I see. Because all this code suffers the same failings, I conclude
it's not a sporadic problem."

I mean - there is nothing wrong from being a subpar programmer. Plenty
of languages to choose from - and still get bread on the table.

Of course, it's expected that one states that "All humans are equal -
even if they're programming". That's the time we live in.

But I quote Jan Cremer, a famous Dutch writer: "'I'm okay and you're
okay.' That sounds quite nice. But 'I'm okay and you're a dick' feels
much better."

Humanity can be divided in four groups:
1. Those who can not write Forth;
2. Those who tried Forth, but failed;
3. Those who pretend to write Forth, but still fail;
4. Those who can write Forth.

I mean: the truth must be said. I'm Dutch. I can't help myself.

In any case, why should it be better to use an inconvenient solution
that requires more work rather than a convenient solution that
requires less work (i.e., is lazy)?

It would be better to think deeply, find an original solution and learn.
Like Albert with his brilliant ;: word.

For me virtues in programming are to produce correct code, to produce
it quickly, the code should use the resources economically (which does
not mean that saving a few bytes on a machine with GBs of memory is
virtuos), and the code should be readable and maintainable.

Well, to me it's something different. Who cares what you or I think.
It's about what you can prove decisively.

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Sat Apr 25 16:07:47 2026

From Newsgroup: comp.lang.forth

On Sat, 25 Apr 2026 10:22:16 GMT
[email protected] (Anton Ertl) wrote:

[email protected] writes:

String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.

In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.

A definition of STRCMP in the paper is

: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;

So in the loop we have a loop count (on the return stack), two cursors
(s1 and s2) into the compared strings, and within the loop body we additionally have the two characters, for a total of five live values,
three of which survive across iterations and are changed in every
iteration. One could implement it as

\ untested, and the following versions, too
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

where only one of the values changes in each iteration, but now the ?DO...LOOP cannot be replaced with a version that does not store a
second value but counts down (or up) to 0, so now we have a total of 6
live values, four of which survive across iterations, and one is
changed on every iteration.

One can reduce this by one value by keeping one of the cursors in the
loop counter:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

So now we have five live values in the body of the loop at the same
time, three of which live across iterations, and one of which changes
in each iteration. Keeping the loop parameters separate significantly lessens the load on the data stack.

Let's see if we can eliminate the local from the loop body:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - ?dup
if
nip unloop exit
then
loop
drop u1 u2 - ;

That leaves stack purists with the task of eliminating the locals from
the prologue and epilogue of this word. Two items have to be stored
across the loop, or the difference could be computed speculatively and
only one item stored across the loop. And the computations before the
loop involve four values alive at the same time (fortunately addr2 is
does not live long). Let's see:

: strcmp {: addr1 u1 addr2 u2 -- n :}
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - ?dup
if
nip unloop r> drop exit
then
loop
drop r> negate ;

As can be seen by the many stack comments, the stack load here is more
than I can easily deal with.

Maybe a stack purist can improve on that. But can he improve it
enough to make it as easy to understand as any of the versions with
locals?

I recently reviewed the string comparison for search-wordlist
and came up with the following

The string stored in the word header is already uppercased.
So string comparison will be case insensitive

: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;

: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;

First iteration in the loop it does not compare chars but the length!

BR
Peter

- anton

--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Sat Apr 25 17:38:11 2026

From Newsgroup: comp.lang.forth

On 25-04-2026 16:07, peter wrote:

On Sat, 25 Apr 2026 10:22:16 GMT
[email protected] (Anton Ertl) wrote:

[email protected] writes:

String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.

In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.

A definition of STRCMP in the paper is

: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;

So in the loop we have a loop count (on the return stack), two cursors
(s1 and s2) into the compared strings, and within the loop body we
additionally have the two characters, for a total of five live values,
three of which survive across iterations and are changed in every
iteration. One could implement it as

\ untested, and the following versions, too
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

where only one of the values changes in each iteration, but now the
?DO...LOOP cannot be replaced with a version that does not store a
second value but counts down (or up) to 0, so now we have a total of 6
live values, four of which survive across iterations, and one is
changed on every iteration.

One can reduce this by one value by keeping one of the cursors in the
loop counter:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;

So now we have five live values in the body of the loop at the same
time, three of which live across iterations, and one of which changes
in each iteration. Keeping the loop parameters separate significantly
lessens the load on the data stack.

Let's see if we can eliminate the local from the loop body:

: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - ?dup
if
nip unloop exit
then
loop
drop u1 u2 - ;

That leaves stack purists with the task of eliminating the locals from
the prologue and epilogue of this word. Two items have to be stored
across the loop, or the difference could be computed speculatively and
only one item stored across the loop. And the computations before the
loop involve four values alive at the same time (fortunately addr2 is
does not live long). Let's see:

: strcmp {: addr1 u1 addr2 u2 -- n :}
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - ?dup
if
nip unloop r> drop exit
then
loop
drop r> negate ;

As can be seen by the many stack comments, the stack load here is more
than I can easily deal with.

Maybe a stack purist can improve on that. But can he improve it
enough to make it as easy to understand as any of the versions with
locals?

I recently reviewed the string comparison for search-wordlist
and came up with the following

The string stored in the word header is already uppercased.
So string comparison will be case insensitive

: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;

: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;

First iteration in the loop it does not compare chars but the length!

BR
Peter

This one is about a third bigger than yours - if we disregard the "UC",
that is:

: comp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;

In 4tH, it is even visually more compact:

: comp
rot over - if drop 2drop true ;then
0 ?do over i [] c@ over i [] c@ - if drop drop unloop true ;then loop
drop drop false
;

The extra length comes mainly from the three different possible exits:
- It's not the same size (first line);
- It's not the same content (exit within loop);
- It's the same thing (after loop).

I can't say I particularly like the use of "COUNT" here - because it
actually represents "C@+" - except for the first run. Neither am I very
happy with the BEGIN..WHILE..WHILE..REPEAT..THEN construct - but that's
not your fault ;-)

All that being said, I cannot deny it is a clever piece of code using
the full capabilities of the language, bravo!

Hans Bezemer
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 25 17:21:11 2026

From Newsgroup: comp.lang.forth

Hans Bezemer <[email protected]> writes:

On 25-04-2026 07:26, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

[reinserted deleted, relevant context]

If you want to use a language that is "ideologically devoted" to the
architecture, maybe you shouldn't use Forth at all - and stick with C.

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register
allocation to those that are so reliable that you usually count on
them).

Well, you're actually shooting at Paul Rubin - not at me. Thank you! I
take all the help I can get!

Actually, this whole paragraph is a reaction on your statement, not
his. You deleted it for whatever reason, so I reinserted it.
Concerning Paul Rubin, just because he is wrong does not mean you are
right.

(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich
Drepper).

It's your own pal Bernd that said: "A good programmer will write even
better code in Forth. A bad programmer will write abysmal code in Forth.
And I'm sorry to say - but most programmers are quite bad."

So, either you agree with him or we have an unfortunate departure of one
of the most foremost members of Gforth. Because this states - in no >uncertain words - that Forth programmers *ARE* elite.

What departure? We disagree on a number of things.

And the issue is not whether Forth programmers or any other
programmers are elite, but that many programmers think that they are
elite (whether they are or aren't) and that the designers or advocates
of deficient programming systems make use of that to dupe them, along
the lines of: "You as elite programmers can cope with this deficiency
[of course they don't call it a definiency], it's only subpar
programmers [more elaborate denigrations are common, see Ulrich
Drepper] who complain about it."

In the case of Forth and locals this tactic has not worked very well,
so even Forth, Inc. (who have been the most vocal among the commercial
Forth providers about their dislike of locals) have implemented
locals. But of course we see the echo of all of this still around
here.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 00:34:19 2026

From Newsgroup: comp.lang.forth

In article <nnd$1196d1a5$0da70c85@6de98b5b6c1b0418>,
Hans Bezemer <[email protected]> wrote:
<SNIP>

It would be better to think deeply, find an original solution and learn.
Like Albert with his brilliant ;: word.

Chuck Moore invented and coined the ;: word.
I came up with CO with is similar, or maybe the same.

<SNIP>

Hans Bezemer

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 00:51:56 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

[email protected] writes:

String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.

In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.

A definition of STRCMP in the paper is

: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;

Compare this with
REPZ CMPSB ; Intel 86
once the registers are filled.
There are some extra instruction to massage the resulting
zero/carry in the required form (-1/0/1)

I choose to implement the primitive CORA
HEADER( {MEMORY},{CORA},{CORA},{addr1 addr2 len --- n},{CIF},
{Compare the memory areas at forthvar({addr1}) and forthvar({addr2})
over a length forthvar({len}) .
For the first bytes that differ, return -1 if the byte
from forthvar({addr1}) is less (unsigned) than the one from forthvar({addr2}), and 1 if it is greater.
If all forthvar({len}) bytes are equal, return zero. }

- anton

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 01:13:55 2026

From Newsgroup: comp.lang.forth

In article <nnd$548d4f1b$1e104571@905dda44db1f54ae>,
Hans Bezemer <[email protected]> wrote:

This one is about a third bigger than yours - if we disregard the "UC",
that is:

: comp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;

In 4tH, it is even visually more compact:

: comp
rot over - if drop 2drop true ;then
0 ?do over i [] c@ over i [] c@ - if drop drop unloop true ;then loop
drop drop false
;

The extra length comes mainly from the three different possible exits:
- It's not the same size (first line);
- It's not the same content (exit within loop);
- It's the same thing (after loop).

I can't say I particularly like the use of "COUNT" here - because it
actually represents "C@+" - except for the first run. Neither am I very
happy with the BEGIN..WHILE..WHILE..REPEAT..THEN construct - but that's
not your fault ;-)

All that being said, I cannot deny it is a clever piece of code using
the full capabilities of the language, bravo!

The corresponding word in ciforth is
( caddr len dea -- matchflag dea )
: ~MATCH >R OVER R@ >NFA @ $@ CORA R> SWAP ;
The pointer in the input stream is compared to the name over the
length of the name. (Its length is ignored).
This works because the names of words in the
dictionary cannot contain spaces. It also finds := in the context
of a pascal interpreter where the text is ":=(a+eb)" provided
:= is a PREFIX word.
(Also note that CORA can serve to make a strcmp. )

Hans Bezemer

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Sun Apr 26 15:21:19 2026

From Newsgroup: comp.lang.forth

On 26/04/2026 3:21 am, Anton Ertl wrote:

...
In the case of Forth and locals this tactic has not worked very well,
so even Forth, Inc. (who have been the most vocal among the commercial
Forth providers about their dislike of locals) have implemented
locals.

Well, they seemed reluctant to adopt {: :} having previously implemented
and used ANS LOCALS| .

If one examines Forth Inc's use of locals within SwiftForth, one finds it
is the exception and confined to 'subpar' code. That they use locals so infrequently suggests to me it was more effort deciding *whether* to use locals. This, I can understand perfectly - 'Shall I use forth, or shall I
use locals?' - as they represent different mindsets. I just can't do that.
I feel no need to do that.

AFAIK Forth Inc doesn't offer locals for its SwiftX products.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Sat Apr 25 22:40:01 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work

A simple implementation of locals doesn't sound like that much work?
Mostly you need a runtime scheme to make sure the locals are cleaned up
in case of exceptions being thrown. If you're willing to ignore the
standard you don't need to complicate the text interpreter much. I
remember Mark Wills' TI99/4A Forth simply reserved 4 extra return stack
cells at each level of subroutine call, that you could treat like
registers. Maybe that burns too much memory for a very small CPU. I've imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.

In any case, when it comes to performance measurements on "simple interpreters" like the Gforth of 1994, Forth code with locals usually
turns out to be slower and consume more memory than Forth code using
(and trying to avoid) stack juggling.

The slowdown doesn't surprise me but it's not that big a deal, compared
to the slowdown of using interpreted Forth instead of assembly language
in the first place. Words that don't use the locals won't be affected.

... looking at the code for Gforth for 3DUP.3 compared to the others,
Gforth still uses more primitives ...

That's a lot of code in the expansion! I wonder how it will look in a
simple interpreter.

You seem to argue that the random-access aspect of locals provides a performance advantage on simple systems, but in most cases, code using
locals is at a performance disadvantage on such systems

Well, if the slowdown is less than say 2x, I'd say the code cleanup
matters more, due to the traditional 90/10 rule (maybe now 99/1) of
where CPU cycles go. Code the hot spots for speed and the rest for convenience.

(and traditionalists have often used that to argue against locals).

The REAL traditionalists (machine language programmers) can use the same argument against Forth itself.

Keeping at least one stack item in a register leads to a smaller and
faster implementation, and is not more complex than keeping all the
stack memory in RAM.

That's only with a fancy compiler AND a requirement of the application
code having statically determined stack effects. Traditional words like
?DUP would confuse this scheme amirite?

A way to use RAM that is less frowned upon by Forth traditionalists is (global) variables. The fact that the use of global variables is
frowned upon in the wider programming community for various reasons
seems to pour oil into the fire of their elitism.

I see what you mean by that. But, whole-program C compilers do
something like register allocation to re-use those "global" cells when
sets of them won't be needed at the same time. The Forth approach would
need either a similar fancy compiler, or else require the programmer to
do an error-prone manual memory layout process, or else burn memory unnecessarily for those cells whose usage doesn't overlap.

Currently I'm thinking about an 8051 part which has 256 bytes of RAM, so
that issue is potentially significant.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 26 05:55:04 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

Hans Bezemer <[email protected]> writes:

We do have N>R (https://forth-standard.org/standard/tools/NtoR). So if
the whole problem is "there is no more room on the FP stack", there is
a way out.

That must be pretty new (it's not in gforth 0.7.3)

It was accepted into Forth-200x at the 2010 standards meeting.

so I wonder how
helpful it really is.

We have two uses in the Gforth sources. I.e., not particularly
useful.

In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.

My take on FP stack depth limitations in some systems is that you use
as much FP stack as you need, and a Forth system (like Gforth) where
you can make the FP stack as deep as available memory and address
space permit, and publish that. Maybe it will inspire the system
implementors with shallow FP stacks to provide deep FP stacks, at
least optionally.

However, when I did something that required a deep FP stack (adding up
an array with pairwise addition
<[email protected]>), I actually worked
around the limitations of systems that only provide a shallow FP
stack. But that was easy enough in that case.

Concerning systems with FP stack limits, AFAIK VFX has FP packages
that support very deep stacks, including the SSE-based package that
used to be the default in VFX64 for a while.

iForth implements a deep stack: it uses the 387 stack within a
definition and stores the FP stack items that are on the 387 stack to
memory on calls, and if the FP stack would overflow from the
computations within a word. I think this is a good approach: Much FP computation time is spent in words that do not call other words, or at
least the FP stack items do not live across the calls. iForth seems
to overdo it, however, even code like

: bar
dup f@ cell+ dup f@ cell+ dup f@ cell+
dup f@ cell+ dup f@ cell+ dup f@ cell+
f+ f+ f+ f+ f+ ;

which uses only 6 FP stack items does not produce the obvious code,
but something significantly longer: It first performs 6 FLD
instructions corresponding to the 6 F@, then stores 4 FP items,
presumably on the memory FP stack, and only then starts the additions (interleaved with some other code).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Sun Apr 26 00:28:06 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.

In the code I mentioned, I wasn't running out of FP stack space, but
rather, I didn't see how to write the function in any non-horrible way
without using FP locals. Horrible ways included: 1) implementing a
separate FP stack in memory for intermediate values during the
recursion, or 2) using ugly hacks to stash FP values on the regular data
stack.

R was suggested as a way to implement horribleness #2 but it would

actually have to be FN>R or something like that.
--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Sun Apr 26 09:57:52 2026

From Newsgroup: comp.lang.forth

On Sun, 26 Apr 2026 05:55:04 GMT
[email protected] (Anton Ertl) wrote:

Paul Rubin <[email protected]d> writes:

Hans Bezemer <[email protected]> writes:

We do have N>R (https://forth-standard.org/standard/tools/NtoR). So if
the whole problem is "there is no more room on the FP stack", there is
a way out.

That must be pretty new (it's not in gforth 0.7.3)

It was accepted into Forth-200x at the 2010 standards meeting.

so I wonder how
helpful it really is.

We have two uses in the Gforth sources. I.e., not particularly
useful.

In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.

My take on FP stack depth limitations in some systems is that you use
as much FP stack as you need, and a Forth system (like Gforth) where
you can make the FP stack as deep as available memory and address
space permit, and publish that. Maybe it will inspire the system implementors with shallow FP stacks to provide deep FP stacks, at
least optionally.

However, when I did something that required a deep FP stack (adding up
an array with pairwise addition <[email protected]>), I actually worked
around the limitations of systems that only provide a shallow FP
stack. But that was easy enough in that case.

Concerning systems with FP stack limits, AFAIK VFX has FP packages
that support very deep stacks, including the SSE-based package that
used to be the default in VFX64 for a while.

iForth implements a deep stack: it uses the 387 stack within a
definition and stores the FP stack items that are on the 387 stack to
memory on calls, and if the FP stack would overflow from the
computations within a word. I think this is a good approach: Much FP computation time is spent in words that do not call other words, or at
least the FP stack items do not live across the calls. iForth seems
to overdo it, however, even code like

: bar
dup f@ cell+ dup f@ cell+ dup f@ cell+
dup f@ cell+ dup f@ cell+ dup f@ cell+
f+ f+ f+ f+ f+ ;

which uses only 6 FP stack items does not produce the obvious code,
but something significantly longer: It first performs 6 FLD
instructions corresponding to the 6 F@, then stores 4 FP items,
presumably on the memory FP stack, and only then starts the additions (interleaved with some other code).

- anton

lxf uses the cpu FP stack. I think that is one of the worse decisions
I made for it. It will fail on all but the simplest complex fp math
operations. For lxf64 a priority was to have a separate in memory
FP stack. It has worked out very well!

You bar example becomesseea bar
0x4282B0 C5FB100B vmovsd xmm1, qword [rbx]
0x4282B4 4883C308 add rbx, 0x8
0x4282B8 C5FB1013 vmovsd xmm2, qword [rbx]
0x4282BC 4883C308 add rbx, 0x8
0x4282C0 C5FB101B vmovsd xmm3, qword [rbx]
0x4282C4 4883C308 add rbx, 0x8
0x4282C8 C5FB1023 vmovsd xmm4, qword [rbx]
0x4282CC 4883C308 add rbx, 0x8
0x4282D0 C5FB102B vmovsd xmm5, qword [rbx]
0x4282D4 4883C308 add rbx, 0x8
0x4282D8 C5FB1033 vmovsd xmm6, qword [rbx]
0x4282DC 4883C308 add rbx, 0x8
0x4282E0 C5CB58F5 vaddsd xmm6, xmm6, xmm5
0x4282E4 C5CB58F4 vaddsd xmm6, xmm6, xmm4
0x4282E8 C5CB58F3 vaddsd xmm6, xmm6, xmm3
0x4282EC C5CB58F2 vaddsd xmm6, xmm6, xmm2
0x4282F0 C5CB58F1 vaddsd xmm6, xmm6, xmm1
0x4282F4 C4C17B1145F8 vmovsd qword [r13-0x8], xmm0
0x4282FA C5FB10C6 vmovsd xmm0, xmm0, xmm6
0x4282FE 4D8D6DF8 lea r13, [r13-0x8]
0x428302 C3 ret
83 bytes, 21 instructions

As seen all the fp registers can be put to good use!

BR
Peter

--- Synchronet 3.21f-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Sun Apr 26 19:55:06 2026

From Newsgroup: comp.lang.forth

On 26/04/2026 5:28 pm, Paul Rubin wrote:

[email protected] (Anton Ertl) writes:

In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.

In the code I mentioned, I wasn't running out of FP stack space, but
rather, I didn't see how to write the function in any non-horrible way without using FP locals. Horrible ways included: 1) implementing a
separate FP stack in memory for intermediate values during the
recursion, or 2) using ugly hacks to stash FP values on the regular data stack.

R was suggested as a way to implement horribleness #2 but it would

actually have to be FN>R or something like that.

Probably the flocals are more complicated but users rarely look beyond
the interface.

--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 14:34:29 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
peter <[email protected]> wrote:
<SNIP>

lxf uses the cpu FP stack. I think that is one of the worse decisions
I made for it. It will fail on all but the simplest complex fp math >operations. For lxf64 a priority was to have a separate in memory
FP stack. It has worked out very well!

You give up on portable fp programs. The standard guarantees only
8 items. So for "all but the simplest" you must take care.

P.S. The example Anton gave is silly.
MHX undoubtedly programs more like this:

[ this example is ciforth code. Only 64 bits floats. ]
WANT -fp-
: BAR 0_ ( addr -- ) ( F: -- f )
6 0 DO
DUP F@ F+ CELL+ CELL+
LOOP DROP ;

How convenient, I could have added 100 floats !

In building the transputer Forth I was obliged to generate
Chebychov approximations for every transcendental function.
You have to do that too if you forego the Intel fp stack.

The Intel internal stack gives FSIN FEXP etc. in single instruction.
CODE FCOS FCOS, NEXT, END-CODE
CODE FLOG FLDLG2, FXCH, ST1| FYL2X, NEXT, END-CODE
CODE 0_ FLDZ, NEXT, END-CODE

BR
Peter

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 14:55:52 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

[email protected] (Anton Ertl) writes:

And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work

A simple implementation of locals doesn't sound like that much work?
Mostly you need a runtime scheme to make sure the locals are cleaned up
in case of exceptions being thrown. If you're willing to ignore the
standard you don't need to complicate the text interpreter much. I

I get by with one screen, ignoring the standard only in the sense that
LOCALs should be recursive.

For a long time FORTRAN was the way to go. There are no locals in
FORTRAN, only (c-speak) static variables, no recursion.
Modules, including "named commons" only perform name hiding amongst each
other.

For decennia the equivalent in Forth was
(This is present in a lot of Marcel Hendrix programs. )
PRIVATES
VARIABLE x PRIVATE
: aap .... x .. ;
DEPRIVE
This prevents visiblity of x in the remainder of the program.
It doesn't catch on.

Also namespaces ("wordlist") has the same functionality for hiding.
You can emulate a FORTRAN program using wordlists.
This is much more powerful than defining LOCAL then DLOCAL then FLOCAL
then DFLOCAL then scratching your head inventing arrays of xxLOCAL
stuff.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Sun Apr 26 15:08:28 2026

From Newsgroup: comp.lang.forth

On 25-04-2026 19:21, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

On 25-04-2026 07:26, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

[reinserted deleted, relevant context]

If you want to use a language that is "ideologically devoted" to the
architecture, maybe you shouldn't use Forth at all - and stick with C.

I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register
allocation to those that are so reliable that you usually count on
them).

Well, you're actually shooting at Paul Rubin - not at me. Thank you! I
take all the help I can get!

Actually, this whole paragraph is a reaction on your statement, not
his. You deleted it for whatever reason, so I reinserted it.
Concerning Paul Rubin, just because he is wrong does not mean you are
right.

I leave it here, because it doesn't hurt my point in any way whatsoever.

What you obviously fail to recognize is that I'm just using a debating technique. You see, I'm not too interested in hardware. To me it's just
a bottle to get to the soda - i.e. you need hardware to run a program.
I'm not completely ignorant on the subject, but I'm not an expert by any measure.

So what I do is *assume* the statement is true - and work out the consequences. In this case, if Forth is not the right language for an
x86_64 architecture, why not turn to the most logical candidate (in this
case, C, because it features local variables) instead of manhandling
this alien concept into the Forth language?

Yes, it would also mean I'm using an "inferior language", but watch my
face and see how much I care.

What you actually do is nullify his original statement, so there is no
reason for either local variables or me changing my favorite language.

Effectively, it's turned into a Catch-22. Look it up, if you don't know
what this means. So yes - I win in both cases. Take it or leave it. :-)

(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich
Drepper).

It's your own pal Bernd that said: "A good programmer will write even
better code in Forth. A bad programmer will write abysmal code in Forth.
And I'm sorry to say - but most programmers are quite bad."

So, either you agree with him or we have an unfortunate departure of one
of the most foremost members of Gforth. Because this states - in no
uncertain words - that Forth programmers *ARE* elite.

What departure? We disagree on a number of things.

You must be great friends! :-)

And the issue is not whether Forth programmers or any other
programmers are elite, but that many programmers think that they are
elite (whether they are or aren't) and that the designers or advocates
of deficient programming systems make use of that to dupe them, along
the lines of: "You as elite programmers can cope with this deficiency
[of course they don't call it a definiency], it's only subpar
programmers [more elaborate denigrations are common, see Ulrich
Drepper] who complain about it."

"Deficient" can be considered a secondary quality in Lockes ideas on properties, which makes the entire discussion futile at best. BTW, the
same goes for "elite or subpar programmers". In order to validate such a discussion one might need to agree on standards. Which we rarely do, is
my experience ;-)

But I noticed you get triggered when dividing the world into "elite" and "subpar" programmers. Nietzsche called that (real or performed) humility "slave mentality". I see that a lot in governmental agencies. Everyone
is afraid to stand up and put out their ideas - because you never know
who is gonna punish you for challenging the boss.

I can tell you with utmost certainty it kills innovation - and drives
your best people out of your organization. So I can't stand it. I truly believe challenging ideas - no matter how established - is the only way forward. The point is, the only true difference between ideas is, which achieve the desired result - and which don't.

The mindset classical Forth breeds has done wonders for me. And that experience simply cannot be denied.

In the case of Forth and locals this tactic has not worked very well,
so even Forth, Inc. (who have been the most vocal among the commercial
Forth providers about their dislike of locals) have implemented
locals. But of course we see the echo of all of this still around
here.

That is the most ridiculous argument I've ever seen appear from your
hand. Really! Let me take myself as an example. I'm *NOT* a fan of
locals, agree?

But I have and maintain *FOUR* different "locals" libraries and *THREE* preprocessor libraries. Darn, I even got libraries for PICK, ROLL and
?DUP - which I usually refuse to touch without having a cross in my hands.

You know, I think this statement says more about you than about me or
Forth inc. Yeah, sometimes I port a library with locals of deep stack operators.

And I think there are situations where one of my users might need those
cursed words. So, I see no need to say: "You are not worthy of Forth. Go
to hell, you sinner - and repent for your questionable choices!"

They chose how they use my product. They don't need me to do that for
them. So I provide it, no problem.

But thank you for providing this insight. Although it is *completely*
contrary to your last argument, it explains a lot.

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Sun Apr 26 15:10:37 2026

From Newsgroup: comp.lang.forth

On 26-04-2026 00:34, [email protected] wrote:

In article <nnd$1196d1a5$0da70c85@6de98b5b6c1b0418>,
Hans Bezemer <[email protected]> wrote:
<SNIP>

It would be better to think deeply, find an original solution and learn.
Like Albert with his brilliant ;: word.

Chuck Moore invented and coined the ;: word.
I came up with CO with is similar, or maybe the same.

Thank you for that correction! Consider my mistake as a sign of respect ;-)

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 26 09:50:59 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work

A simple implementation of locals doesn't sound like that much work?

Bernd Paysan wrote a simple locals implementation <https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:

[~/gforth:167833] cat locals.fs|grep -v '^\\'|grep -v '^$'|wc -l
84

When loaded it takes 3096 bytes on a 32-bit gforth-0.4.0, so at least
1500 bytes on a system with 16-bit cells. Given the memory limits in
the old days, it's no surprise that they did without that at first.
Later a number of Forth programmers were proud of their skill in
working without locals and found reasons (or, maybe, justifications)
why it was still relevant when memory was no longer so scarce. You
can read some of those reasons in this thread.

I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.

If you cannot chose the names, locals lose a lot of their benefits in
making the code more understandable (OTOH, mathematicians have made to
with similar naming schemes for a long time). You might then just as
well work with >R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.

The slowdown doesn't surprise me but it's not that big a deal, compared
to the slowdown of using interpreted Forth instead of assembly language
in the first place.

It means that the argument line about locals making better use of the random-access memory provided by hardware does not hold water.

As for assembly language, that has been part of Forth since the
beginning, and telling people to write code words has not only been
suggested in cases where more performance was necessary than
high-level Forth provided [1], but also in cases like the strcmp
example where so many values are active at the same time that
high-level Forth without locals becomes cumbersome. We have also seen
that in this thread.

[1] In a reversal of earlier Forth marketing, IIRC VFX was later
described as having the benefit of no longer needing to write code
words.

... looking at the code for Gforth for 3DUP.3 compared to the others,
Gforth still uses more primitives ...

That's a lot of code in the expansion! I wonder how it will look in a
simple interpreter.

In the code you see the threaded code interspersed with the native
code. If you ignore the native code, you see what a simple
interpreter would see (if it had a locals implementation that produced
code similar to that of Gforth).

You seem to argue that the random-access aspect of locals provides a
performance advantage on simple systems, but in most cases, code using
locals is at a performance disadvantage on such systems

Well, if the slowdown is less than say 2x, I'd say the code cleanup
matters more, due to the traditional 90/10 rule (maybe now 99/1) of
where CPU cycles go. Code the hot spots for speed and the rest for >convenience.

So it's "code cleanup", not making use of hardware facilities for
efficiency on simple interpreters, that you see as the benefit of
locals.

Keeping at least one stack item in a register leads to a smaller and
faster implementation, and is not more complex than keeping all the
stack memory in RAM.

That's only with a fancy compiler AND a requirement of the application
code having statically determined stack effects. Traditional words like
?DUP would confuse this scheme amirite?

No. No fancy compiler; the compiler does not know about how the stack
is represented. No statically determined stack effect necessary,
because every word begins and ends in the same stack representation;
Even with multi-representation stack-caching as used since Gforth 0.7
(which does require more compiler smarts), no statically determined
stack effect is necessary, because the code generator returns to the
canonical state on control-flow.

?DUP also benefits: Implementation when TOS is in memory:

tmp = sp[0]
if tmp == 0 goto done
sp = sp - cell
sp[0] = tmp
done:
NEXT

Implementation when TOS is in a register:

if TOS == 0 goto done
sp = sp - cell
sp[0] = TOS #if SP points to the second item
done:
NEXT

So the first instruction is left away. The code that gcc generates
for Gforth (TOS in memory for gforth, TOS in register for gforth-fast)
is suboptimal, but if you really want, you can inspect it with SEE
?DUP and puzzle out which instruction corresponds to which part of the pseudocode above, and which instructions are just a sign of
suboptimality.

A way to use RAM that is less frowned upon by Forth traditionalists is
(global) variables. The fact that the use of global variables is
frowned upon in the wider programming community for various reasons
seems to pour oil into the fire of their elitism.

I see what you mean by that. But, whole-program C compilers do
something like register allocation to re-use those "global" cells when
sets of them won't be needed at the same time. The Forth approach would
need either a similar fancy compiler, or else require the programmer to
do an error-prone manual memory layout process, or else burn memory >unnecessarily for those cells whose usage doesn't overlap.

Yes. It's even worse: Such variables are often user variables. But
looking at the usage of such things in Forth systems, we have user
variables like BASE and HLD (in F83, HOLDPTR in gforth). They are
used across multiple words, and the fact that you don't have to pass
them and put them into a local has been touted as an advantage over
locals: Definitions that use global variables are easier to factor.

BASE lives during the whole session, and its memory cannot be reused.
The memory of HLD lives only between <# and #>, and could be reused,
but has not been.

In any case, this approach is not taken often, and when it is, often
to good effect (that may be survivor's bias). I don't see a lot of
overlap with the cases where one uses locals, but one can argue that
it reduces stack pressure in those places where one would otherwise be
tempted to use locals.

Another case of reducing stack pressure is ?DO...LOOP and friends.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Sun Apr 26 16:22:50 2026

From Newsgroup: comp.lang.forth

On 26-04-2026 11:50, Anton Ertl wrote:

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work

A simple implementation of locals doesn't sound like that much work?

Bernd Paysan wrote a simple locals implementation <https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:

With all respect to Bernd, but yeah - compare that to this 0.5 SLOC implementation of local:

: local r> swap dup >r @ >r ;: r> r> ! ;

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 26 14:03:03 2026

From Newsgroup: comp.lang.forth

peter <[email protected]> writes:

I recently reviewed the string comparison for search-wordlist
and came up with the following

The string stored in the word header is already uppercased.
So string comparison will be case insensitive

: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;

: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;

First iteration in the loop it does not compare chars but the length!

Clever, but, at least without comment, too clever.

This code, and, more clearly, Hans Bezemers version demonstrate that
STR= is easier than COMPARE, STRCMP, or STR<, because you can deal
with the case of length difference right at the start, whereas the
latter words have to check the characters up to the end of the shorter
string first before dealing with the length. This shows the greatest
benefit in cases like

s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp

As for STRCMP, I have measured the five versions shown in my earlier
posting (whole program posted below), with the bugs fixed, and the
?DUP IF replaced by DUP IF ... THEN DROP, because it produces better
code.

I have also included the following versions:

: strcmp { addr1 u1 addr2 u2 -- n }
u1 u2 min 0
?do
addr1 c@ addr2 c@ - ?dup
if
unloop exit
then
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;

This comes from the '94 paper and is the version that uses TO instead
of defining new locals at every iteration. Paul Rubin will love the
code that current Gforth produces for "addr2 char+ TO addr2":

<strcmp+$E0> @local2 1->2
$7F337DA71BBA: mov 0x10(%rbp),%r15
<strcmp+$E8> char+ 2->2
$7F337DA71BBE: add $0x1,%r15
<strcmp+$F0> !local2 2->1
$7F337DA71BC2: mov %r15,0x10(%rbp)

The TO <local> code was not that efficient in earlier Gforth versions.

The other version I added is:

: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;

This is the STRCMP3 from <[email protected]>
and may be the locals-less version I compared against in the '94
paper.

I also included your version (without the UC call) and Hans Bezemer's
version.

I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.

I ran three different benchmarks on these words, which performed the
following a number of times:

s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3

In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.

The cycles numbers are per invocation of STRCMP, including benchmark overhead.

The benchmarks are run on a Ryzen 8700G (Zen4)>

In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).

Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer

So the champion among the full-featured strcmps for bench1 and bench3
is version5, for bench2 version6. The str= variants are much faster
for bench3 (of course), but slower than several other versions for
bench1 and slower than version6 for bench2. The native code size is
smallest for version2 (among the full-featured strcmp
implementations), so the locals-less versions do not win everything.

So locals-less (version5 and version6) is somewhat faster on both
gforth-fast and gforth-itc.

lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.

cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer

On lxf version0 (with locals) is the fastest for bench1 and bench3,
and version6 is the fastest for bench2. Hans Bezemers version wins
everything if we are only interested in str= functionality.

And here's the code (measurement scripts at the bottom): ----------------------------------------------------------
[defined] version0 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 c@ addr2 c@ - dup
if
unloop exit
then
drop
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;
[then]

[defined] version1 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr1 addr2
u1 u2 min 0
?do {: s1 s2 :}
s1 c@ s2 c@ - dup
if
unloop exit
then
drop s1 char+ s2 char+
loop
2drop
u1 u2 - ;
[then]

[defined] version2 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]

[defined] version3 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]

[defined] version4 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - dup
if
nip negate unloop exit
then
drop
loop
drop u1 u2 - ;
[then]

[defined] version5 [if]
: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - dup
if
nip negate unloop r> drop exit
then
drop
loop
drop r> negate ;
[then]

[defined] version6 [if]
[undefined] 2rdrop [if]
: 2rdrop postpone 2r> postpone 2drop ; immediate
[then]

: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;
[then]

[defined] versionpeter [if]
\ from <[email protected]>
\ renamed and deleted the call to UC
: strcmp ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
[then]

[defined] versionbezemer [if]
\ from <nnd$548d4f1b$1e104571@905dda44db1f54ae>
\ renamed
: strcmp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;
[then]

[defined] t{ [if]
t{ s" abc" s" abc" strcmp -> 0 }t
t{ s" abc" s" abcd" strcmp -> -1 }t
t{ s" abc" s" abd" strcmp -> -1 }t
t{ s" abd" s" abc" strcmp -> 1 }t
t{ s" cbc" s" abc" strcmp -> 2 }t
t{ s" abc" s" adc" strcmp -> -2 }t
[then]

\ Benchmarks

[undefined] iterations [if]
100000000 constant iterations
[then]

: benchmark ( c-addr1 u1 c-addr2 u2 -- )
iterations 0 do
2over 2over strcmp drop
loop
2drop 2drop ;

: bench1
s" 0123456789abcdefg" 2dup benchmark ;

: bench2
s" 0123456789abcdefg" s" 2123456789abcdefg" benchmark ;

: bench3
s" 0123456789abcdefg" s" 0123456789abcdefgh" benchmark ;

0 [if]
# bash script for producing the cycles
IFS=":"
for i in 0 1 2 3 4 5 6 peter bezemer; do
for forthit in gforth-fast:100000000 gforth-itc:10000000; do
fields=($forthit); forth="${fields[0]}"; iterations="${fields[1]}"
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth -e "create version$i $iterations constant iterations" ~/forth/strcmp.4th -e "bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
done
echo version$i
done
IFS=":"
for i in 0 2 3 4 5 6 peter bezemer; do
forth=lxf; iterations=100000000
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth "create version$i $iterations constant iterations include $HOME/forth/strcmp.4th bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
echo version$i
done
[then]
--------------------------------------------------------------

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 26 17:04:39 2026

From Newsgroup: comp.lang.forth

Hans Bezemer <[email protected]> writes:

On 26-04-2026 11:50, Anton Ertl wrote:

Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:

With all respect to Bernd, but yeah - compare that to this 0.5 SLOC >implementation of local:

: local r> swap dup >r @ >r ;: r> r> ! ;

Let's see:

[~:167902] gforth-0.5.0
GForth 0.5.0, Copyright (C) 1995-2000 Free Software Foundation, Inc.
GForth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
warnings off include locals.fs ok
ok
: local r> swap dup >r @ >r ;: r> r> ! ;
*the terminal*:1: Undefined word
: local r> swap dup >r @ >r ;: r> r> ! ;
^^
Backtrace:
$F7B5A158 throw
$F7B6418C no.extensions

Although, admittedly, while Bernd Paysan's locals.fs loads, it does
not work AFAICT (I tried it on gforth-0.4 and gforth-0.5; it does not
load on gforth-0.6 and later). Apparently it had bitrotted between
the time when it was written in 1992 and gforth-0.4 in 1998.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Mon Apr 27 11:12:03 2026

From Newsgroup: comp.lang.forth

On 26/04/2026 7:50 pm, Anton Ertl wrote:

Paul Rubin <[email protected]d> writes:
...

I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.

If you cannot chose the names, locals lose a lot of their benefits in
making the code more understandable (OTOH, mathematicians have made to
with similar naming schemes for a long time). You might then just as
well work with >R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.

That Julian Noble (among others) felt the need for FTRAN INTRAN etc informs what scientists and academics really want - and it's a long way from the
'stack based' locals offered by most forth systems. The latter represent
a concession to forth before a user has even begun to consider identifiers.
To an outsider, forth locals do nothing to ameliorate what they see as fundamentally broken about the language. ISTM if a forther has conceded to
use stack-based locals, he can certainly make choices about what form identifiers take.

--- Synchronet 3.21f-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Mon Apr 27 11:51:17 2026

From Newsgroup: comp.lang.forth

On 27/04/2026 3:04 am, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

On 26-04-2026 11:50, Anton Ertl wrote:

Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:

With all respect to Bernd, but yeah - compare that to this 0.5 SLOC
implementation of local:

: local r> swap dup >r @ >r ;: r> r> ! ;

Let's see:

[~:167902] gforth-0.5.0
GForth 0.5.0, Copyright (C) 1995-2000 Free Software Foundation, Inc.
GForth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
warnings off include locals.fs ok
ok
: local r> swap dup >r @ >r ;: r> r> ! ;
*the terminal*:1: Undefined word

That only tells what Gforth doesn't have. DX-Forth comes with four
variants of locals. The next release will include a variant of FSL's
flocals. It's not an endorsement of locals, rather a way of saying
there's lots of ways to skin a cat and one isn't necessarily best.

--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Mon Apr 27 09:31:03 2026

From Newsgroup: comp.lang.forth

On Sun, 26 Apr 2026 14:03:03 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

I recently reviewed the string comparison for search-wordlist
and came up with the following

The string stored in the word header is already uppercased.
So string comparison will be case insensitive

: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;

: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;

First iteration in the loop it does not compare chars but the length!

Clever, but, at least without comment, too clever.

This code, and, more clearly, Hans Bezemers version demonstrate that
STR= is easier than COMPARE, STRCMP, or STR<, because you can deal
with the case of length difference right at the start, whereas the
latter words have to check the characters up to the end of the shorter
string first before dealing with the length. This shows the greatest
benefit in cases like

s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp

As for STRCMP, I have measured the five versions shown in my earlier
posting (whole program posted below), with the bugs fixed, and the
?DUP IF replaced by DUP IF ... THEN DROP, because it produces better
code.

I have also included the following versions:

: strcmp { addr1 u1 addr2 u2 -- n }
u1 u2 min 0
?do
addr1 c@ addr2 c@ - ?dup
if
unloop exit
then
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;

This comes from the '94 paper and is the version that uses TO instead
of defining new locals at every iteration. Paul Rubin will love the
code that current Gforth produces for "addr2 char+ TO addr2":

<strcmp+$E0> @local2 1->2
$7F337DA71BBA: mov 0x10(%rbp),%r15
<strcmp+$E8> char+ 2->2
$7F337DA71BBE: add $0x1,%r15
<strcmp+$F0> !local2 2->1
$7F337DA71BC2: mov %r15,0x10(%rbp)

The TO <local> code was not that efficient in earlier Gforth versions.

The other version I added is:

: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;

This is the STRCMP3 from <[email protected]>
and may be the locals-less version I compared against in the '94
paper.

I also included your version (without the UC call) and Hans Bezemer's version.

I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.

I ran three different benchmarks on these words, which performed the following a number of times:

s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3

In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.

The cycles numbers are per invocation of STRCMP, including benchmark overhead.

The benchmarks are run on a Ryzen 8700G (Zen4)>

In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).

Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer

So the champion among the full-featured strcmps for bench1 and bench3
is version5, for bench2 version6. The str= variants are much faster
for bench3 (of course), but slower than several other versions for
bench1 and slower than version6 for bench2. The native code size is
smallest for version2 (among the full-featured strcmp
implementations), so the locals-less versions do not win everything.

So locals-less (version5 and version6) is somewhat faster on both
gforth-fast and gforth-itc.

lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.

cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer

On lxf version0 (with locals) is the fastest for bench1 and bench3,
and version6 is the fastest for bench2. Hans Bezemers version wins everything if we are only interested in str= functionality.

Anton, thanks for running all these tests.
I have now also run them on my Ryzen 9950X.
There is an error in version 6 that i corrected.
2rdrop needs to be after unloop. On lxf64 that uses registers for
loop parameters this is necessary!
version3 does not run as lxf64 does not support defining locals
several times. I will see if this can be changed.

I needed also to change the log-fd to 5 to get it to run.
The tests are run with Debian under WSL2.

Here are the results

lxf64
59.1 10.0 57.6 version0
48.1 10.0 48.4 version2
43.0 10.7 42.5 version4
42.2 9.1 42.2 version5
55.1 9.0 55.0 version6
65.7 8.0 6.0 versionpeter
32.8 9.0 4.2 versionbezemer

lxf
64.2 8.5 64.2 version0
112.3 10.2 90.1 version2
78.8 10.6 75.6 version4
88.1 9.4 88.2 version5
112.2 7.5 114.7 version6
71.0 8.2 7.4 versionpeter
50.9 8.3 4.3 versionbezemer

There is a significant impact in having loop parameters in registers!
version 2 and 6 are interesting for lxf. The full stat gives some more
info. Sorry for the long lines
version 2 compared to version 0

Peter@R9950WSL:/mnt/d/Dev/forth/lxf32v17$ perf stat ./lxf "create version2 100000000 constant iterations include strcmp.4th bench1 bye"

Performance counter stats for './lxf create version2 100000000 constant iterations include strcmp.4th bench1 bye':

1,955.50 msec task-clock:u # 0.998 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
64 page-faults:u # 32.728 /sec
10,973,742,845 cycles:u # 5.612 GHz
966,332,718 stalled-cycles-frontend:u # 8.81% frontend cycles idle
34,901,611,693 instructions:u # 3.18 insn per cycle
# 0.03 stalled cycles per insn
3,900,350,964 branches:u # 1.995 G/sec
36,727 branch-misses:u # 0.00% of all branches

1.960183288 seconds time elapsed

1.955783000 seconds user
0.000000000 seconds sys

peter@R9950WSL:/mnt/d/Dev/forth/lxf32v17$ perf stat ./lxf "create version0 100000000 constant iterations include strcmp.4th bench1 bye"

Performance counter stats for './lxf create version0 100000000 constant iterations include strcmp.4th bench1 bye':

1,158.97 msec task-clock:u # 0.996 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
64 page-faults:u # 55.221 /sec
6,415,119,211 cycles:u # 5.535 GHz
4,510,117 stalled-cycles-frontend:u # 0.07% frontend cycles idle
38,301,605,801 instructions:u # 5.97 insn per cycle
# 0.00 stalled cycles per insn
3,900,348,894 branches:u # 3.365 G/sec
19,563 branch-misses:u # 0.00% of all branches

1.163667408 seconds time elapsed

1.151174000 seconds user
0.007966000 seconds sys

BR
Peter

And here's the code (measurement scripts at the bottom): ----------------------------------------------------------
[defined] version0 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 c@ addr2 c@ - dup
if
unloop exit
then
drop
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;
[then]

[defined] version1 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr1 addr2
u1 u2 min 0
?do {: s1 s2 :}
s1 c@ s2 c@ - dup
if
unloop exit
then
drop s1 char+ s2 char+
loop
2drop
u1 u2 - ;
[then]

[defined] version2 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]

[defined] version3 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]

[defined] version4 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - dup
if
nip negate unloop exit
then
drop
loop
drop u1 u2 - ;
[then]

[defined] version5 [if]
: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - dup
if
nip negate unloop r> drop exit
then
drop
loop
drop r> negate ;
[then]

[defined] version6 [if]
[undefined] 2rdrop [if]
: 2rdrop postpone 2r> postpone 2drop ; immediate
[then]

: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;
[then]

[defined] versionpeter [if]
\ from <[email protected]>
\ renamed and deleted the call to UC
: strcmp ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
[then]

[defined] versionbezemer [if]
\ from <nnd$548d4f1b$1e104571@905dda44db1f54ae>
\ renamed
: strcmp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;
[then]

[defined] t{ [if]
t{ s" abc" s" abc" strcmp -> 0 }t
t{ s" abc" s" abcd" strcmp -> -1 }t
t{ s" abc" s" abd" strcmp -> -1 }t
t{ s" abd" s" abc" strcmp -> 1 }t
t{ s" cbc" s" abc" strcmp -> 2 }t
t{ s" abc" s" adc" strcmp -> -2 }t
[then]

\ Benchmarks

[undefined] iterations [if]
100000000 constant iterations
[then]

: benchmark ( c-addr1 u1 c-addr2 u2 -- )
iterations 0 do
2over 2over strcmp drop
loop
2drop 2drop ;

: bench1
s" 0123456789abcdefg" 2dup benchmark ;

: bench2
s" 0123456789abcdefg" s" 2123456789abcdefg" benchmark ;

: bench3
s" 0123456789abcdefg" s" 0123456789abcdefgh" benchmark ;

0 [if]
# bash script for producing the cycles
IFS=":"
for i in 0 1 2 3 4 5 6 peter bezemer; do
for forthit in gforth-fast:100000000 gforth-itc:10000000; do
fields=($forthit); forth="${fields[0]}"; iterations="${fields[1]}"
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth -e "create version$i $iterations constant iterations" ~/forth/strcmp.4th -e "bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
done
echo version$i
done
IFS=":"
for i in 0 2 3 4 5 6 peter bezemer; do
forth=lxf; iterations=100000000
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth "create version$i $iterations constant iterations include $HOME/forth/strcmp.4th bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
echo version$i
done
[then]
--------------------------------------------------------------

- anton

--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Mon Apr 27 07:53:58 2026

From Newsgroup: comp.lang.forth

peter <[email protected]> writes:

On Sun, 26 Apr 2026 14:03:03 GMT
[email protected] (Anton Ertl) wrote:

I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.

I ran three different benchmarks on these words, which performed the
following a number of times:

s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3

In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.

The cycles numbers are per invocation of STRCMP, including benchmark overhead.

The benchmarks are run on a Ryzen 8700G (Zen4)>

In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).

Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer

...

lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.

cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer

And, to top it off, sf64 and vfx64, after correcting the bug in
version6 that you pointed out:

cycles sf-4.0.0-RC89 | cycles vfx64 5.43 |
bench1 bench2 bench3 | bench1 bench2 bench3 |
195.1 62.0 194.5 | 124.2 42.2 123.3 | version0
136.3 63.0 136.2 | 200.4 124.1 204.4 | version2
143.7 69.6 143.4 | 90.7 36.7 91.3 | version4
115.1 36.0 114.1 | 102.0 30.2 101.8 | version5
132.8 38.0 133.3 | 85.8 28.2 88.2 | version6
182.0 19.0 9.0 | 95.7 10.2 6.2 | versionpeter
224.9 40.2 8.0 | 63.2 29.2 6.2 | versionbezemer

Interesting performance variations.

Anton, thanks for running all these tests.
I have now also run them on my Ryzen 9950X.
There is an error in version 6 that i corrected.
2rdrop needs to be after unloop. On lxf64 that uses registers for
loop parameters this is necessary!

Thanks. In sf64 and vfx64 this change is necessary, too.

I needed also to change the log-fd to 5 to get it to run.
The tests are run with Debian under WSL2.

WSL2 supports performance counters. Great!

What happens with log-fd=3?

Here are the results

lxf64
59.1 10.0 57.6 version0
48.1 10.0 48.4 version2
43.0 10.7 42.5 version4
42.2 9.1 42.2 version5
55.1 9.0 55.0 version6
65.7 8.0 6.0 versionpeter
32.8 9.0 4.2 versionbezemer

lxf
64.2 8.5 64.2 version0
112.3 10.2 90.1 version2
78.8 10.6 75.6 version4
88.1 9.4 88.2 version5
112.2 7.5 114.7 version6
71.0 8.2 7.4 versionpeter
50.9 8.3 4.3 versionbezemer

There is a significant impact in having loop parameters in registers!
version 2 and 6 are interesting for lxf. The full stat gives some more
info.

Not any info that I find helpful. But my guess is as follows: Keeping
the loop index in memory has reliably meant that counted loops take at
least 5 cycles per iteration. In recent processors (from this decade
or a little earlier), hardware can perform zero-cycle store-to-load
forwarding, but it is not reliable. So my guess is that in version2
and version6 we are seeing cases where this hardware optimization has
not worked. So, yes, keeping loop parameters that change in registers
is a good idea even on recent CPUs.

The differences between Zen4 and Zen5 on lxf are significant, but I
guess that if you take the average, you get the picture of small
progress that I see on various websites.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Mon Apr 27 11:52:41 2026

From Newsgroup: comp.lang.forth

On Mon, 27 Apr 2026 07:53:58 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

On Sun, 26 Apr 2026 14:03:03 GMT
[email protected] (Anton Ertl) wrote:

I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.

I ran three different benchmarks on these words, which performed the
following a number of times:

s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3

In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.

The cycles numbers are per invocation of STRCMP, including benchmark overhead.

The benchmarks are run on a Ryzen 8700G (Zen4)>

In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).

Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer ...
lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.

cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer

And, to top it off, sf64 and vfx64, after correcting the bug in
version6 that you pointed out:

cycles sf-4.0.0-RC89 | cycles vfx64 5.43 |
bench1 bench2 bench3 | bench1 bench2 bench3 |
195.1 62.0 194.5 | 124.2 42.2 123.3 | version0
136.3 63.0 136.2 | 200.4 124.1 204.4 | version2
143.7 69.6 143.4 | 90.7 36.7 91.3 | version4
115.1 36.0 114.1 | 102.0 30.2 101.8 | version5
132.8 38.0 133.3 | 85.8 28.2 88.2 | version6
182.0 19.0 9.0 | 95.7 10.2 6.2 | versionpeter
224.9 40.2 8.0 | 63.2 29.2 6.2 | versionbezemer

Interesting performance variations.

Anton, thanks for running all these tests.
I have now also run them on my Ryzen 9950X.
There is an error in version 6 that i corrected.
2rdrop needs to be after unloop. On lxf64 that uses registers for
loop parameters this is necessary!

Thanks. In sf64 and vfx64 this change is necessary, too.

I needed also to change the log-fd to 5 to get it to run.
The tests are run with Debian under WSL2.

WSL2 supports performance counters. Great!

What happens with log-fd=3?

Here are the results

lxf64
59.1 10.0 57.6 version0
48.1 10.0 48.4 version2
43.0 10.7 42.5 version4
42.2 9.1 42.2 version5
55.1 9.0 55.0 version6
65.7 8.0 6.0 versionpeter
32.8 9.0 4.2 versionbezemer

lxf
64.2 8.5 64.2 version0
112.3 10.2 90.1 version2
78.8 10.6 75.6 version4
88.1 9.4 88.2 version5
112.2 7.5 114.7 version6
71.0 8.2 7.4 versionpeter
50.9 8.3 4.3 versionbezemer

There is a significant impact in having loop parameters in registers! >version 2 and 6 are interesting for lxf. The full stat gives some more >info.

Not any info that I find helpful. But my guess is as follows: Keeping
the loop index in memory has reliably meant that counted loops take at
least 5 cycles per iteration. In recent processors (from this decade
or a little earlier), hardware can perform zero-cycle store-to-load forwarding, but it is not reliable. So my guess is that in version2
and version6 we are seeing cases where this hardware optimization has
not worked. So, yes, keeping loop parameters that change in registers
is a good idea even on recent CPUs.

The differences between Zen4 and Zen5 on lxf are significant, but I
guess that if you take the average, you get the picture of small
progress that I see on various websites.

- anton

I think that code placement in memory plays a role. look at this 2 runs of version 6

Performance counter stats for './lxf create version6 100000000 constant iterations include strcmp.4th bench1 bye':

2,008.59 msec task-clock:u # 0.998 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
64 page-faults:u # 31.863 /sec
11,290,674,838 cycles:u # 5.621 GHz
1,554,765,442 stalled-cycles-frontend:u # 13.77% frontend cycles idle
34,301,643,546 instructions:u # 3.04 insn per cycle
# 0.05 stalled cycles per insn
3,900,356,221 branches:u # 1.942 G/sec
32,958 branch-misses:u # 0.00% of all branches

2.013169221 seconds time elapsed

2.004809000 seconds user
0.003993000 seconds sys

compare with this one where i only allot 16 bytes to the code segment before loading

Performance counter stats for './lxf create version6 100000000 constant iterations 16 allot-c include strcmp.4th bench1 bye':

1,202.67 msec task-clock:u # 0.996 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
64 page-faults:u # 53.215 /sec
6,630,029,444 cycles:u # 5.513 GHz
439,780,595 stalled-cycles-frontend:u # 6.63% frontend cycles idle
34,301,649,298 instructions:u # 5.17 insn per cycle
# 0.01 stalled cycles per insn
3,900,358,028 branches:u # 3.243 G/sec
146,947 branch-misses:u # 0.00% of all branches

1.207212030 seconds time elapsed

1.202879000 seconds user
0.000000000 seconds sys

I have observed the same behavior on other benchmarks
BR
Peter

--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Tue Apr 28 08:21:37 2026

From Newsgroup: comp.lang.forth

On 26-04-2026 19:04, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

On 26-04-2026 11:50, Anton Ertl wrote:

Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:

With all respect to Bernd, but yeah - compare that to this 0.5 SLOC
implementation of local:

: local r> swap dup >r @ >r ;: r> r> ! ;

Let's see:

[~:167902] gforth-0.5.0
GForth 0.5.0, Copyright (C) 1995-2000 Free Software Foundation, Inc.
GForth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
warnings off include locals.fs ok
ok
: local r> swap dup >r @ >r ;: r> r> ! ;
*the terminal*:1: Undefined word
: local r> swap dup >r @ >r ;: r> r> ! ;
^^
Backtrace:
$F7B5A158 throw
$F7B6418C no.extensions

Although, admittedly, while Bernd Paysan's locals.fs loads, it does
not work AFAICT (I tried it on gforth-0.4 and gforth-0.5; it does not
load on gforth-0.6 and later). Apparently it had bitrotted between
the time when it was written in 1992 and gforth-0.4 in 1998.

- anton

Oh dear, huge Gforth doesn't feature a ;: word? Let me help you. From
the humble 4tH repository:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

Well, that boots it to almost 0.625 SLOC. It's almost bloatware!

Now. Let's see how it performs:

Gforth 0.7.9_20250321
Authors: Anton Ertl, Bernd Paysan, Jens Wilke et al., for more type
`authors'
Copyright © 2025 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<https://gnu.org/licenses/gpl.html>
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ; ok
variable x ok
: test x local x ! x ? cr ; ok
ok
25 x ! 12 test x ? cr 12
25
ok

Well - it actually works! It's amazing! What a solid piece of software engineering!

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Tue Apr 28 14:34:39 2026

From Newsgroup: comp.lang.forth

On 27-04-2026 03:12, dxf wrote:

On 26/04/2026 7:50 pm, Anton Ertl wrote:

Paul Rubin <[email protected]d> writes:
...

I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.

As a matter of fact, this thingy creates locals:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

If you cannot chose the names, locals lose a lot of their benefits in
making the code more understandable (OTOH, mathematicians have made to
with similar naming schemes for a long time). You might then just as
well work with >R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.

That Julian Noble (among others) felt the need for FTRAN INTRAN etc

informs

what scientists and academics really want - and it's a long way from the 'stack based' locals offered by most forth systems. The latter represent
a concession to forth before a user has even begun to consider

identifiers.

To an outsider, forth locals do nothing to ameliorate what they see as fundamentally broken about the language. ISTM if a forther has

conceded to

use stack-based locals, he can certainly make choices about what form identifiers take.

You can do that with the 4tH preprocessor. This uses the above code:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

variable x \ var x
variable y \ var y

: multiply ( n1 n2 -- n1*n2)
x local \ turn into local
y local \ turn into local

let y=; let x=; \ take values from the stack

let x = (x * y); \ multiply them
let x,|. cr|; \ get x, perform ". cr"
;

23 x ! \ proof it is a local
7 6 multiply \ multiply 6 by y
x ? cr \ now let's check that

And this is the output:

$ pp4th -x testme.4pp
42
23

Note that the output of the preprocessor works fine on a vanilla Forth:

Gforth 0.7.9_20250321
Authors: Anton Ertl, Bernd Paysan, Jens Wilke et al., for more type
`authors'
Copyright © 2025 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<https://gnu.org/licenses/gpl.html>
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ; ok
ok
variable x ok
variable y ok
ok
: multiply compiling
x local compiling
y local compiling
compiling
y ! x ! compiling
compiling
x @ y @ * x ! compiling
x @ . cr compiling
; ok
ok
23 x ! ok
7 6 multiply 42
ok
x ? cr 23
ok

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From Gerry Jackson@[email protected] to comp.lang.forth on Wed Apr 29 12:44:30 2026

From Newsgroup: comp.lang.forth

On 28/04/2026 13:34, Hans Bezemer wrote:

As a matter of fact, this thingy creates locals:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

LOCAL can also be defined as:
: local r> over @ rot 2>r ;: 2r> ! ;
which I guess you won't like, but is a bit shorter. It also survives
your pre-processor conversion of 2>r to >r >r, similarly 2r>
--
Gerry
--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Wed Apr 29 14:37:28 2026

From Newsgroup: comp.lang.forth

On 29-04-2026 13:44, Gerry Jackson wrote:

On 28/04/2026 13:34, Hans Bezemer wrote:

As a matter of fact, this thingy creates locals:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

LOCAL can also be defined as:
: local r> over @ rot 2>r ;: 2r> ! ;
which I guess you won't like, but is a bit shorter. It also survives
your pre-processor conversion of 2>r to >r >r, similarly 2r>

I don't say you're wrong, but there is some logic to this madness:

1. In 4tH, "2>R" is the same as ">R >R". The compiler expands it like
that. So -- there is no advantage to do "2>R". Yes, you can do "2R@",
but not "R@". It won't be portable;

2. I don't consider "2>R" as an optimization. To me it is an operator to
a different type. There *HAS* to be a connection between two values.
Like addr/count, double number, array/size, etc. To me it means I'm
dealing with a "two cells type". My future me will thank me.

And that's why I don't agree with you ;-)

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Wed Apr 29 14:44:02 2026

From Newsgroup: comp.lang.forth

On 29-04-2026 14:37, Hans Bezemer wrote:

On 29-04-2026 13:44, Gerry Jackson wrote:

On 28/04/2026 13:34, Hans Bezemer wrote:

As a matter of fact, this thingy creates locals:

: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;

LOCAL can also be defined as:
: local r> over @ rot 2>r ;: 2r> ! ;
which I guess you won't like, but is a bit shorter. It also survives
your pre-processor conversion of 2>r to >r >r, similarly 2r>

I don't say you're wrong, but there is some logic to this madness:

1. In 4tH, "2>R" is the same as ">R >R". The compiler expands it like
that. So -- there is no advantage to do "2>R". Yes, you can do "2R@",
but not "R@". It won't be portable;

QED:

Addr| Opcode Operand Argument

0| branch 2 ;:
1| >r 0
2| exit 0
3| branch 14 local
4| r> 0
5| over 0
6| @ 0
7| rot 0
8| >r 0
9| >r 0
10| call 0 ;:
11| r> 0
12| r> 0
13| ! 0
14| exit 0

No trickery. That's the way it is. :-)

Hans Bezemer

--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri May 1 23:50:04 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

If you cannot chose the names... You might then just as well work with

R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.

That actually might be a workable idea. Thanks.

In the code you see the threaded code interspersed with the native
code. If you ignore the native code, you see what a simple
interpreter would see (if it had a locals implementation that produced
code similar to that of Gforth).

I wonder if gforth would get less code bloat if you added some
primitives for pushing more than one local. E.g. 2>L, 3>L, etc. would
push that many stack elements to LOCAL0, LOCAL1, LOCAL2. Then there
wouldn't be that big chunk of replicated code.

So it's "code cleanup", not making use of hardware facilities for
efficiency on simple interpreters, that you see as the benefit of
locals.

Well, I had hoped to get both, but yeah, ultimately cleaner and more
reliable code takes precedence in most situations, by the 90/10 rule.

Even with multi-representation stack-caching as used since Gforth 0.7
(which does require more compiler smarts), no statically determined
stack effect is necessary, because the code generator returns to the canonical state on control-flow.

I see, yeah, but that means stack juggling to get to the canonical
state.

... we have user variables like BASE and HLD (in F83, HOLDPTR in
gforth). They are used across multiple words, and the fact that you
don't have to pass them and put them into a local has been touted as
an advantage over locals: Definitions that use global variables are
easier to factor.

Urgggh...
--- Synchronet 3.22a-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri May 1 23:54:11 2026

From Newsgroup: comp.lang.forth

[email protected] (Anton Ertl) writes:

You might then just as well work with >R >R >R >R and R@, R'@, 2 RPICK
and 3 RPICK.

But, now you have to avoid mixing that style with using the R stack for temporaries, including stuff like loop indexes which sometimes go
there. And you have to clean up the R stack before returning, and maybe arrange for that to happen in case of an exception.

Flashforth has a separate P stack which can be used for temporaries
within a word, but I don't remember how cleanup is handled, if at all.
--- Synchronet 3.22a-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Sat May 2 17:36:25 2026

From Newsgroup: comp.lang.forth

On 2/05/2026 4:54 pm, Paul Rubin wrote:

...
Flashforth has a separate P stack which can be used for temporaries
within a word, but I don't remember how cleanup is handled, if at all.

It's a cpu register - not a stack. For re-entrancy old value must first
be pushed onto the cpu stack before loading the new. IIRC FF has a word
that combines those. Basically a variable.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Sat May 2 01:11:53 2026

From Newsgroup: comp.lang.forth

dxf <[email protected]> writes:

It's a cpu register - not a stack. For re-entrancy old value must first
be pushed onto the cpu stack before loading the new. IIRC FF has a word
that combines those. Basically a variable.

Aha, thanks, I mis-remembered how it worked.

https://pajacobs-ghub.github.io/flashforth/ff5-quick-ref.html#_the_p_register --- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat May 2 10:34:29 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

[...]

I wonder if gforth would get less code bloat if you added some
primitives for pushing more than one local. E.g. 2>L, 3>L, etc. would
push that many stack elements to LOCAL0, LOCAL1, LOCAL2. Then there
wouldn't be that big chunk of replicated code.

Gforth has superinstructions for the sequences

sequence count AMD64 code gforth-fast

l >l 62 len= 4+ 26+ 3
l >l >l 9 len= 4+ 34+ 3
l >l >l >l 5 len= 4+ 42+ 3

l f>l 2 len= 4+ 42+ 3

l @local0 20 len= 4+ 11+ 3

l lit f@localn 1 len= 4+ 24+ 3

compared to

primitive count AMD64 code gforth-fast

l 67 len= 4+ 18+ 3

l 10 len= 4+ 23+ 3

The counts are static occurences of the code for this primitive, i.e.,
if for a sequence >l >l the superinstruction is selected, the superinstruction's count is increase, while the count of >l stays the
same. The whole data is for gforth-fast with disabled static stack
caching. With static stack caching enabled, there are more variants
of >l that the counts distribute over.

As far as native code is concerned, these superinstructions already
give the benefit of the additional primitives you suggest. On the threaded-code side there is a threaded-code slot for each >L. That
simplifies the implementation of superinstructions: no need to
rearrange the threaded code when the decision about the use of superinstructions is taken.

We could add some special mechanism to the locals implementation that
produces 2>L etc. instead of just producing a sequence of >Ls, and
letting the ordinary superinstruction mechanism in Gforth combine
them. But would such a mechanism cost less in code size than the
62+9*2+5*3=95 cells that it saves? Not to mention the development and maintenance effort.

While we are at it, here are the other locals-related primitives:

primitive count AMD64 code gforth-fast
@localn 1 len= 4+ 5+ 3
@local0 115 len= 4+ 11+ 3
@local1 87 len= 4+ 11+ 3
@local2 62 len= 4+ 11+ 3
@local3 31 len= 4+ 11+ 3
@local4 18 len= 4+ 11+ 3
@local5 16 len= 4+ 11+ 3
@local6 10 len= 4+ 11+ 3
@local7 4 len= 4+ 11+ 3
!localn 0 len= 4+ 16+ 3
!local0 4 len= 4+ 11+ 3
!local1 1 len= 4+ 11+ 3
!local2 4 len= 4+ 11+ 3
!local3 2 len= 4+ 11+ 3
!local4 0 len= 4+ 11+ 3
!local5 0 len= 4+ 11+ 3
!local6 0 len= 4+ 11+ 3
!local7 0 len= 4+ 11+ 3
+!localn 3 len= 4+ 16+ 3
lp+n 82 len= 4+ 3+ 3
f@localn 11 len= 4+ 24+ 3
lp@ 14 len= 4+ 10+ 3
lp+! 67 len= 4+ 10+ 3
lp- 3 len= 4+ 4+ 3
lp+ 36 len= 4+ 4+ 3
lp+2 33 len= 4+ 4+ 3
lp! 12 len= 4+ 10+ 3

Even with multi-representation stack-caching as used since Gforth 0.7
(which does require more compiler smarts), no statically determined
stack effect is necessary, because the code generator returns to the
canonical state on control-flow.

I see, yeah, but that means stack juggling to get to the canonical
state.

In clf, "Stack juggling" usually means using words like ROT (see the
cartoon about ROT in starting Forth). That's not what happens here.

What happens is that there is code that performs a transition between
stack representations. Between any two primitives, as well as at the
start and end of a sequence, the code generator can insert such code.
It uses a shortest-path algorithm to find the shortest native-code
sequence for the threaded-code sequence. The result is never longer
than the native-code sequence that you get when you always use the implementation of the primitive that starts in the canonical
representation and ends in the canonical representation, and it often
is shorter.

Here is the usage of the transitions between the stack representations
on AMD64 for the gforth.fi image:

trans count AMD64 code gforth-fast
1-0 2932 len= 0+ 7+ 3
2-0 944 len= 0+ 12+ 3
3-0 135 len= 0+ 16+ 3
0-1 152 len= 0+ 8+ 3
2-1 87 len= 0+ 10+ 3
3-1 35 len= 0+ 14+ 3
0-2 39 len= 0+ 12+ 3
1-2 151 len= 0+ 10+ 3
3-2 0 len= 0+ 13+ 3
0-3 15 len= 0+ 15+ 3
1-3 48 len= 0+ 15+ 3
2-3 5 len= 0+ 13+ 3

The transition is shown as M-N, where M is the number of stack items
in registers before the transition and N is the number of stack items
in registers after the transition. The high number of transitions
with N=0 is interesting, given that the canonical representation is 1.

My impression is, that for a primitive that pushes a value, such as
lit, r@ or @local0, the code generator selects the transition to 0
followed by the 0-1 variant of the primitive over the 1-1 variant of
the primitive when the code size is the same.

... we have user variables like BASE and HLD (in F83, HOLDPTR in
gforth). They are used across multiple words, and the fact that you
don't have to pass them and put them into a local has been touted as
an advantage over locals: Definitions that use global variables are
easier to factor.

Urgggh...

When that is combined with proper wrapping, it can be a useful
mechanism. E.g., environment variables in shell scripts work that
way, or the graphics state in Postscript. But it's something that
needs a lot of restraint to avoid creating a mess. Hanson and
Proebsting presented a case (and efficient implementation) for this
kind of mechanism:

@InProceedings{hanson&proebsting01,
author = {David. R. Hanson and Todd A. Proebsting},
title = {Dynamic Variables},
crossref = {sigplan01},
pages = {264--273},
annote = {}
}

@Proceedings{sigplan01,
booktitle = "SIGPLAN '01 Conference on Programming Language
Design and Implementation",
title = "SIGPLAN '01 Conference on Programming Language
Design and Implementation",
year = "2001",
key = "PLDI '01"
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat May 2 15:58:27 2026

From Newsgroup: comp.lang.forth

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

You might then just as well work with >R >R >R >R and R@, R'@, 2 RPICK
and 3 RPICK.

But, now you have to avoid mixing that style with using the R stack for >temporaries, including stuff like loop indexes which sometimes go
there.

Standard locals have some of these restrictions, too, but not all of
them. Concerning counted loops, you may want to take their return
stack usage into account when RPICKing.

And you have to clean up the R stack before returning,

Yes. What's easier to implement is often harder to use.

and maybe
arrange for that to happen in case of an exception.

THROW resets the return stack to the CATCH depth, so no extra work
necessary.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Thu Jun 4 22:26:49 2026

From Newsgroup: comp.lang.forth

On 2026-04-08 10:34, [email protected] wrote:

In article <10r3nfo$33464$[email protected]>,
Gerry Jackson <[email protected]> wrote:

On 07/04/2026 12:35, [email protected] wrote:

A similar situation applies to "TO must scan". It turns out there
is no standard program that can detect this. It steers implementation
towards a scanning TO.

On the contrary, Ruvim posted some code that is a standard program and
which distinguises between a parsing TO and one that sets a flag for a
following VALUE to act on.

I can't find the post but the gist of it was (I think):

1 value v1
2 value v2 immediate
: test to v2 v1 ;

Yes, something similar.

Running test with
3 test
A parsing TO will set v2 to 3

Yes, and this is specified by the standard.

A flagging TO will execute v2 during compilation of test because it is
immediate. So test will set v1 to 3 leaving v2 unchanged.

No it doesn't. It leaves garbage on the stack during compilation,
leading to mostly an error.
I leave it up to the reader whether this counts as a standard program.

The provided program conforms to the Forth-94 standard and later versions.

A system that fails that test does not conform to the standard with
respect to `to`.

Historically, there were two approaches to implement `to`: "parsing" and "non-parsing" [1]. Forth-94 formally specified the "parsing" approach:

| ANS Forth explicitly requires that TO must parse,
| so that TO's effect will be predictable when
| it is used at the end of the parse area.

OTOH, it disallowed applying the words `postpone` and `[compile]` to
`to` [2]. Perhaps, this was done as a concession to implementations that adhered to the "non-parsing" approach, to prevent behavior variations in
a standard program caused by deviations in implementations of `to`.

[1] <https://forthhub.github.io/forth-sf-net/standard/dpans/dpansa6.htm#A.6.2.2295> [2]
https://forthhub.github.io/forth-sf-net/standard/dpans/dpans6.htm#6.2.2295

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`.
Let's introduce a multiple assignment construct of the following form:
`1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )
>in @
;
: set-source-offset ( u -- )
source nip over u< invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

I overlooked this clever example.
So I guess my VALUE is non compliant, proven by a contrived test.

It still makes no sense to forbid a flagging implementation.
(Also VALUE's don't make sense, anyway.)

I recently used an immediate value for conditional compilation in a code similar to the following.

0 value [building-target] immediate ( -- flag )

: start-building-target
...
true to [building-target]
;

: some-word
...
[building-target] [if]
...
[then]
...
;

Gerry

Groetjes Albert

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Thu Jun 4 22:30:37 2026

From Newsgroup: comp.lang.forth

On 2026-04-09 10:12, Stephen Pelc wrote:

On 7 Apr 2026 at 21:55:37 CEST, "Gerry Jackson" <[email protected]> wrote:

On 07/04/2026 12:35, [email protected] wrote:

A similar situation applies to "TO must scan". It turns out there
is no standard program that can detect this. It steers implementation
towards a scanning TO.

On the contrary, Ruvim posted some code that is a standard program and
which distinguises between a parsing TO and one that sets a flag for a
following VALUE to act on.

VFX sets a flag for TO and friends and has done so for 30+ years. We have no intention of changing despite the cleverness of Ruvim's detection scheme. We take the "as if" position because
a) it simplifies implementation.
b) no user has complained.

As far as I can see, this approach does not simplify implementation to
any significant extent; instead, it limits the use cases. In VFX, it
also brakes `find` for `to` and other similar words. As a user, I would complain.

To ensure that all operators in VfxForth parse the parse area for their immediate argument, it suffices to modify the word `operator` as follows:

: take-name>xt ( "ccc" -- xt )
bl word find ?undef
;
: translate-xt ( any xt -- any )
state @ if compile, else execute then
;
: operator \ n -- ; define an operator
create
here swap , OperatorChain @ , OperatorChain !
immediate
does> @ OperatorType !
take-name>xt translate-xt
;

This also makes `find` correctly works for `to` and other similar words.

Stephen

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Fri Jun 5 16:24:59 2026

From Newsgroup: comp.lang.forth

On 2026-06-04 22:30, Ruvim wrote:

On 2026-04-09 10:12, Stephen Pelc wrote:

On 7 Apr 2026 at 21:55:37 CEST, "Gerry Jackson" <[email protected]>
wrote:

On 07/04/2026 12:35, [email protected] wrote:

A similar situation applies to "TO must scan". It turns out there
is no standard program that can detect this. It steers implementation
towards a scanning TO.

On the contrary, Ruvim posted some code that is a standard program and
which distinguises between a parsing TO and one that sets a flag for a
following VALUE to act on.

VFX sets a flag for TO and friends and has done so for 30+ years. We
have no
intention of changing despite the cleverness of Ruvim's detection
scheme. We
take the "as if" position because
   a) it simplifies implementation.
   b) no user has complained.

As far as I can see, this approach does not simplify implementation to
any significant extent; instead, it limits the use cases. In VFX, it
also brakes `find` for `to` and other similar words. As a user, I would complain.

To ensure that all operators in VfxForth parse the parse area for their immediate argument, it suffices to modify the word `operator` as follows:

: take-name>xt ( "ccc" -- xt )
    bl word find ?undef
;
: translate-xt ( any xt -- any )
    state @ if compile, else execute then
;
: operator      \ n -- ; define an operator
    create
      here swap , OperatorChain @ , OperatorChain !
      immediate
    does> @ OperatorType !
      take-name>xt translate-xt
;

This also makes `find` correctly works for `to` and other similar words.

In VFX Forth 5.43, the above implementation still does not work for an immediate value (a child of `value` marked as immediate), because
`compile,` is broken for immediate words.

Namely, `compile,` executes xt of an immediate word, instead of compile
it. Thus, the following test fails:
t{ : [foo] 0 ; immediate -> }t
t{ :noname [ 1 ' [foo] compile, ?dup nip ] literal ; execute -> 0 1 }t

In VFX, the last line results to ( 0 ) instead of ( 0 1 ).

Even if we fix this problem, there is another one. In VFX, each child of `value` has its own compiler, and when a child of `value` is made
immediate, its compiler xt is *replaced* with an xt that performs the *interpretation semantics* for that child.

This is a design flaw to use the same slot for both: a helper definition
that performs compilation with optimizations, and a helper definition
that performs non-default compilation semantics.

A possible approach is to associate an optimizer with xt, and a helper
for non-default compilation semantics with nt.

--
Ruvim
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Sat Jun 6 12:54:15 2026

From Newsgroup: comp.lang.forth

On 2026-06-04 22:26, Ruvim wrote:

On 2026-04-08 10:34, [email protected] wrote:

In article <10r3nfo$33464$[email protected]>,
Gerry Jackson <[email protected]> wrote:

On 07/04/2026 12:35, [email protected] wrote:

A similar situation applies to "TO must scan". It turns out there
is no standard program that can detect this. It steers implementation
towards a scanning TO.

On the contrary, Ruvim posted some code that is a standard program and
which distinguises between a parsing TO and one that sets a flag for a
following VALUE to act on.

I can't find the post but the gist of it was (I think):

1 value v1
2 value v2 immediate
: test to v2 v1 ;

Yes, something similar.

Running test with
3 test
A parsing TO will set v2 to 3

Yes, and this is specified by the standard.

A flagging TO will execute v2 during compilation of test because it is
immediate. So test will set v1 to 3 leaving v2 unchanged.

No it doesn't. It leaves garbage on the stack during compilation,
leading to mostly an error.
I leave it up to the reader whether this counts as a standard program.

The provided program conforms to the Forth-94 standard and later versions.

A system that fails that test does not conform to the standard with
respect to `to`.

Historically, there were two approaches to implement `to`: "parsing" and "non-parsing" [1]. Forth-94 formally specified the "parsing" approach:

    | ANS Forth explicitly requires that TO must parse,
    | so that TO's effect will be predictable when
    | it is used at the end of the parse area.

OTOH, it disallowed applying the words `postpone` and `[compile]` to
`to` [2]. Perhaps, this was done as a concession to implementations that adhered to the "non-parsing" approach, to prevent behavior variations in
a standard program caused by deviations in implementations of `to`.

[1] <https://forthhub.github.io/forth-sf-net/standard/dpans/ dpansa6.htm#A.6.2.2295>
[2] https://forthhub.github.io/forth-sf-net/standard/dpans/ dpans6.htm#6.2.2295

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`. Let's introduce a multiple assignment construct of the following form:
`1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
    dup 3 pick <> if 2drop 2drop false exit then
    compare 0=
;
: source-offset ( -- u )
    >in @
;
: set-source-offset ( u -- )
    source nip over u< invert if >in ! exit then
    -18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
    take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
    ?comp source-offset ( u.offset )
    take-lexeme s" )" equals if drop exit then
    ( u.offset ) recurse ( u.offset )
    source-offset swap set-source-offset
      postpone to
    set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

How do you implement a construct `to( ... )` that works both in
interpretation state and in compilation state?

Obviously, we should remove `?comp` and store the offsets on the return
stack.

Also, we could replace `postpone to` with
`state @ if postpone to else ['] to execute then`

In classic single-xt systems, things are simpler: it suffices to replace
it with `['] to execute`.

Interestingly, the Recognizer API does not help in implementing this construct.

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Jun 6 16:17:24 2026

From Newsgroup: comp.lang.forth

Ruvim <[email protected]> writes:

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`.
Let's introduce a multiple assignment construct of the following form:
`1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )
>in @
;
: set-source-offset ( u -- )
source nip over u< invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

Depending on what you mean by "work". Anything that contains ?COMP is deficient by design.

How do you implement a construct `to( ... )` that works both in >interpretation state and in compilation state?

I don't. As for how someone else could do it, see below.

Also, we could replace `postpone to` with
`state @ if postpone to else ['] to execute then`

Also deficient by design.

Interestingly, the Recognizer API does not help in implementing this >construct.

One way would be to add an immediate word TO( that changes rec-forth
(the default recognizer sequence; damn renamings) to a special
recognizer. This special recognizer just pushes every string it
should recognizer to a TO(-stack, except when it recognizes ")".

When it recognizes ")", it restores the original REC-FORTH, takes the
top string "<word>" off the TO(-stack, constructs a string "TO
<word>", and EVALUATEs it (you may try to take precautions such that
the right TO is found, but the standard gives us little to play with
here). Repeat until the TI(-stack is empty. Not even non-standard
POSTPONE TO is needed, and it also works with non-parsing TO
implementations. It does not work if the user has defined TO to mean
something else (the curse of EVALUATE).

However, instead of going for recognizers, you might play the same
trick by letting TO( parse the strings up to ")" and push them on the TO(-stack. And that's simpler to implement, so yes, the recognizer
API does not help here.

But then recognizers are not designed for more than a word (the string recognizer is already a stretch). So what Gforth has is a REC-TO that recognizes "-><word>".

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Gforth has an API for defing words with user-defined TO <https://net2o.de/gforth/Words-with-user_002ddefined-TO-etc_002e.html>,
but there is currently no proper API for defining words that perform
the function of TO or one of its siblings (+TO etc), in particular
there is no API that would support a user-defined REC-TO or TO(. This
shows in using internal words like TO-SLOTS in REC-TO.

Given the large differences between TO implementations in systems, I
expect that we will have a hard time (as in: it won't happen)
standardizing TO-related APIs.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Sat Jun 6 21:09:08 2026

From Newsgroup: comp.lang.forth

On 2026-06-06 16:17, Anton Ertl wrote:

Ruvim <[email protected]> writes:

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`. >>> Let's introduce a multiple assignment construct of the following form:
`1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )
>in @
;
: set-source-offset ( u -- )
source nip over u< invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

Depending on what you mean by "work".

By "does not work" I mean that the provided program does not behave as specified in the usage example (a kind of test).

Anything that contains ?COMP is deficient by design.

Do you mean that preventing accidentally execution of some word in interpretation state (by throwing an exception) is deficient?

How do you implement a construct `to( ... )` that works both in
interpretation state and in compilation state?

I don't. As for how someone else could do it, see below.>

Also, we could replace `postpone to` with
`state @ if postpone to else ['] to execute then`

Also deficient by design.

Do you mean an ambiguous condition on postponing and ticking `to`?

Otherwise, if you mean something other than applying `postpone` to
immediate words, please, clarify.

Interestingly, the Recognizer API does not help in implementing this
construct.

One way would be to add an immediate word TO( that changes rec-forth
(the default recognizer sequence; damn renamings) to a special
recognizer. This special recognizer just pushes every string it
should recognizer to a TO(-stack, except when it recognizes ")".

When it recognizes ")", it restores the original REC-FORTH, takes the
top string "<word>" off the TO(-stack, constructs a string "TO
<word>", and EVALUATEs it (you may try to take precautions such that
the right TO is found, but the standard gives us little to play with
here). Repeat until the TI(-stack is empty. Not even non-standard
POSTPONE TO is needed, and it also works with non-parsing TO
implementations. It does not work if the user has defined TO to mean something else (the curse of EVALUATE).

However, instead of going for recognizers, you might play the same
trick by letting TO( parse the strings up to ")" and push them on the TO(-stack. And that's simpler to implement,
so yes, the recognizer API does not help here.

Agreed.

But then recognizers are not designed for more than a word (the string recognizer is already a stretch). So what Gforth has is a REC-TO that recognizes "-><word>".

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Thus, this implementation is more complex and unhygienic [2], and these drawbacks are introduced solely for the sake of a few Forth systems that provide non-standard `to`. The cost seems unjustified.

[2] It is unhygienic, as it requires `to` to be present in the context.
See: <https://en.wikipedia.org/wiki/Hygienic_macro>

Gforth has an API for defing words with user-defined TO <https://net2o.de/gforth/Words-with-user_002ddefined-TO-etc_002e.html>,
but there is currently no proper API for defining words that perform
the function of TO or one of its siblings (+TO etc), in particular
there is no API that would support a user-defined REC-TO or TO(. This
shows in using internal words like TO-SLOTS in REC-TO.

There could be a basic factor similar to `defer!`:

`execute-setter` Execution: ( any1 xt1 -- )
Set `xt1` to return `any1` on execution. An ambiguous condition exists
if `xt1` cannot be set to return `any1`.
`xt1` is the execution token of a word created with `value`, `2value`,
or `fvalue`.

Given the large differences between TO implementations in systems, I
expect that we will have a hard time (as in: it won't happen)
standardizing TO-related APIs.

A side note: I think this is another argument against methods based on
`TO` or `IS` in the Recognizers API, and in APIs in general.

As far as I can see, all implementations that perform parsing (as
specified in the standard) meet the reasonable expectations.

But the implementations that do not perform parsing, vary in their
behavior. It is interesting to consider which other modern systems,
besides VfxForth, iForth, and ciForth, fall into this category.

In most Forth systems `to` is a parsing word: <https://github.com/search?q=NOT+is%3Afork+language%3Aforth+%2F%28%5E%7C+%29%3A+to+%2F&type=code>

- anton

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Pelc@[email protected] to comp.lang.forth on Sun Jun 7 11:59:20 2026

From Newsgroup: comp.lang.forth

On 6 Jun 2026 at 18:17:24 CEST, "Anton Ertl" <Anton Ertl> wrote:

Ruvim <[email protected]> writes:

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`. >>> Let's introduce a multiple assignment construct of the following form:
`1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )

in @

;
: set-source-offset ( u -- )
source nip over u<invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

Depending on what you mean by "work". Anything that contains ?COMP is deficient by design.

How do you implement a construct `to( ... )` that works both in
interpretation state and in compilation state?

I don't. As for how someone else could do it, see below.

Also, we could replace `postpone to` with
`state @ if postpone to else ['] to execute then`

Also deficient by design.

Interestingly, the Recognizer API does not help in implementing this
construct.

One way would be to add an immediate word TO( that changes rec-forth
(the default recognizer sequence; damn renamings) to a special
recognizer. This special recognizer just pushes every string it
should recognizer to a TO(-stack, except when it recognizes ")".

When it recognizes ")", it restores the original REC-FORTH, takes the
top string "<word>" off the TO(-stack, constructs a string "TO
<word>", and EVALUATEs it (you may try to take precautions such that
the right TO is found, but the standard gives us little to play with
here). Repeat until the TI(-stack is empty. Not even non-standard
POSTPONE TO is needed, and it also works with non-parsing TO
implementations. It does not work if the user has defined TO to mean something else (the curse of EVALUATE).

However, instead of going for recognizers, you might play the same
trick by letting TO( parse the strings up to ")" and push them on the TO(-stack. And that's simpler to implement, so yes, the recognizer
API does not help here.

But then recognizers are not designed for more than a word (the string recognizer is already a stretch). So what Gforth has is a REC-TO that recognizes "-><word>".

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Gforth has an API for defing words with user-defined TO <https://net2o.de/gforth/Words-with-user_002ddefined-TO-etc_002e.html>,
but there is currently no proper API for defining words that perform
the function of TO or one of its siblings (+TO etc), in particular
there is no API that would support a user-defined REC-TO or TO(. This
shows in using internal words like TO-SLOTS in REC-TO.

Given the large differences between TO implementations in systems, I
expect that we will have a hard time (as in: it won't happen)
standardizing TO-related APIs.

- anton

Surely this only serves to demonstrate that recognisers are not the answer
to all maidens' prayers.

Stephen
--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Sun Jun 7 13:31:04 2026

From Newsgroup: comp.lang.forth

On 2026-06-07 11:53, Stephen Pelc wrote:

On 7 Apr 2026 at 21:55:37 CEST, "Gerry Jackson" <[email protected]> wrote:

On 07/04/2026 12:35, [email protected] wrote:

A similar situation applies to "TO must scan". It turns out there
is no standard program that can detect this. It steers implementation
towards a scanning TO.

On the contrary, Ruvim posted some code that is a standard program and
which distinguises between a parsing TO and one that sets a flag for a
following VALUE to act on.

I can't find the post but the gist of it was (I think):

1 value v1
2 value v2 immediate
: test to v2 v1 ;
Running test with
3 test
A parsing TO will set v2 to 3
A flagging TO will execute v2 during compilation of test because it is
immediate. So test will set v1 to 3 leaving v2 unchanged.

The test depends on being able to define V2 as IMMEDIATE.

In a standard Forth system, a child of `value`, as well as *any*
user-defined named definition (with the exception of `synonym`
children), can be made immediate. A child of synonym inherits immediacy
from its original word.

<https://forth-standard.org/standard/core/IMMEDIATE>

Where in the standard does it specify that children on VALUE are not IMMEDIATE ?

It is specified in the glossary entry 6.2.2405 VALUE <https://forth-standard.org/standard/core/VALUE>

Namely, it specifies "_name_ Execution", where _name_ is a child of
`value`. And it does not specify the compilation semantics in
"Compilation:" section.

Therefore, we apply 3.4.3.3 Compilation semantics: <https://forth-standard.org/standard/usage#usage:compile>

| Unless otherwise specified in a "Compilation:" section
| of the glossary entry, the compilation semantics
| of a Forth definition shall be to append
| its execution semantics to the execution semantics
| of the current definition.

Thus, a word created by `value` is not an immediate word unless the user applies `immediate` to it.

This does not prevent us from performing any optimizations when adding
the execution semantics of a `value` child to the current definition.

If does so specify, then the test is implementation dependent,
and contradicts the intention of the standard.

Be careful what you wish for.

VFX and its predecessor ProForth have used a flagging TO for well over 30 years and I am not breaking the world's largest Forth application
(32 bit, over 30 Mb binary) for a language lawyer and a probably invalid test.

I would like to check whether such a change in `to` and `value` will
brake this application.

In any case, nothing prevents VfxForth from providing such a `value`
word whose children cannot be made immediate. However, this should be documented, without passing off such a `value` as a standard word.

Ditto for the provided word `to`, which is not a parsing word (whereas
the standard `to` is required to perform parsing).

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Jun 7 13:23:35 2026

From Newsgroup: comp.lang.forth

Ruvim <[email protected]> writes:

On 2026-06-06 16:17, Anton Ertl wrote:

Ruvim <[email protected]> writes:

Here is an example of a program that relies on a parsing `to`, but is
not compliant due to that very ambiguous condition regarding `postpone`. >>>> Let's introduce a multiple assignment construct of the following form: >>>> `1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )
>in @
;
: set-source-offset ( u -- )
source nip over u< invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

Depending on what you mean by "work".

By "does not work" I mean that the provided program does not behave as >specified in the usage example (a kind of test).

That is already satisfied by:

: to-b-to-a to b to a ;

: to(
')' parse 2drop
postpone to-b-to-a ; immediate

A much simpler implementation that does not have the deficiency
discussed below. It also demonstrates that you cannot use a test as a specification.

Anything that contains ?COMP is deficient by design.

Do you mean that preventing accidentally execution of some word in >interpretation state (by throwing an exception) is deficient?

Any word that is not the text interpreter and that uses "STATE @" is
deficient.

Also, we could replace `postpone to` with
`state @ if postpone to else ['] to execute then`

Also deficient by design.

Do you mean an ambiguous condition on postponing and ticking `to`?

I mean the STATE @, but true, that's not the only deficiency.

But then recognizers are not designed for more than a word (the string
recognizer is already a stretch). So what Gforth has is a REC-TO that
recognizes "-><word>".

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Thus, this implementation is more complex and unhygienic [2], and these >drawbacks are introduced solely for the sake of a few Forth systems that >provide non-standard `to`. The cost seems unjustified.

[2] It is unhygienic, as it requires `to` to be present in the context.
See: <https://en.wikipedia.org/wiki/Hygienic_macro>

Accidental capture of identifiers is not to only problem of
EVALUATE-based macros, even in this case. Another is accidental
nonvisibility of TO. And yet, the EVALUATE-based implementation of
TO( is superior to the one you give above in several aspects:

1) Shorter. And I would say it is less complex (as evidenced by its shortness), but you claim that it is more complex without explaining
why you think so.

2) No STATE @ deficiency.

3) Implementable in standard Forth (>STRING-EXECUTE can be implemented
in standard Forth), while the implementation you give above uses
non-standard POSTPONE TO and ['] TO.

Gforth has an API for defing words with user-defined TO
<https://net2o.de/gforth/Words-with-user_002ddefined-TO-etc_002e.html>,
but there is currently no proper API for defining words that perform
the function of TO or one of its siblings (+TO etc), in particular
there is no API that would support a user-defined REC-TO or TO(. This
shows in using internal words like TO-SLOTS in REC-TO.

There could be a basic factor similar to `defer!`:

`execute-setter` Execution: ( any1 xt1 -- )
Set `xt1` to return `any1` on execution. An ambiguous condition exists
if `xt1` cannot be set to return `any1`.
`xt1` is the execution token of a word created with `value`, `2value`,
or `fvalue`.

Gforth has an internal word (TO):

method (to) ( val operation xt -- ) \ gforth-internal paren-to
\G @i{xt} is of a value like word @i{name}. Stores @i{val} @code{to}
\G @i{name}. @i{operation} selects between @code{to} (0), @code{+to} (1),
\G @code{addr} (2), @code{action-of} (3) and @code{is} (4).

There is a reason why this is not a supported word.

And for the compilation semantics:

: (to), ( xt -- ) ( generated code: v -- )
\g in compiled @code{to @i{name}}, xt is that of @i{name}. This
\g word generates code for storing v (of type appropriate for
\g @i{name}) there. This word is a factor of @code{to}.

Again, not a supported word.

One might use these as follows:

variable to(-sem \ true/false for compilation/interpretation semantics

: to(1 ( ... n -- )
0 >r begin
parse-name dup 0= abort" unfinished TO("
s" )" str= 0= while
find-name dup 0= -13 and throw >r
repeat
begin
r> dup while
name>interpret \ check for 0 is left as exercise to the reader
to(-sem @ if (to), else 0 swap (to) then
repeat
drop
; immediate

: to(-int false to(-sem ! to(1 ;
: to(-comp true to(-sem ! to(1 ;
' to(-int ' to(-comp interpret/compile: to(

If you think that this is too complex, I agree. Don't do
multi-parsing words like TO(. Ideally, don't do parsing words at all.

There is one thing that this TO( cannot do: It does not work as
intended inside ]] ... [[. And if somebody writes

POSTPONE TO( a b )

they will not get the equivalent of

POSTPONE ->b POSTPONE ->a

Can we deal with this by defining a recognizer for TO(, which could
then use the TRANSLATE-TO designed for REC-TO?

'translate-to' ( n xt - translation ) gforth-experimental
xt belongs to a value-flavoured (or defer-flavoured) word, n is the
index into the 'to-table:' for xt (see Words with user-defined TO etc.). Interpreting run-time: '( ... -- ... )'
Perform the to-action with index n in the 'to-table:' of xt. Additional
stack effects depend on n and xt.

It is probably possible to do that, but it would be complex: The
recognizer would produce a translation that contains all the xts of
the value-like words, and a TRANSLATE-TO(. The TRANSLATE-TO( action
for interpreting would have to get all of the xts out of the way. The
it woul get the xt for the last one and perform TRANSLATE-TO
INTERPRETING; repeat for the next one, and repeat until all the xts
are done. The action for compiling and postponing could be
implemented in a similar way (but call COMPILING and POSTPONING
instead of INTERPRETING), or maybe simpler because there is no need to
get the xts out of the way.

In any case, that's quite a bit of work. Maybe someone (Stephen
Pelc?) thinks this demonstrates that the recognizer words are too
limited. I think it demonstrates that TO( is not a good idea.

Given the large differences between TO implementations in systems, I
expect that we will have a hard time (as in: it won't happen)
standardizing TO-related APIs.

A side note: I think this is another argument against methods based on
`TO` or `IS` in the Recognizers API, and in APIs in general.

There are no such words in <https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11?hideDiff#reply-1623>.
REC-FORTH is proposed as a deferred word, but there is no requirement
to use IS on it. And unlike for value-flavoured words, there are
non-parsing words for dealing with deferred words: DEFER@ and DEFER!.
So if you want to define IS( ... ), there is no need to concern
yourself with non-standard words like (TO).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Jun 7 15:25:03 2026

From Newsgroup: comp.lang.forth

Stephen Pelc <[email protected]> writes:

Surely this only serves to demonstrate that recognisers are not the answer
to all maidens' prayers.

I have no idea what all maidens pray for, but this very much has the
smell of a straw-man argument.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Sun Jun 7 17:44:45 2026

From Newsgroup: comp.lang.forth

On 2026-06-07 13:23, Anton Ertl wrote:

Ruvim <[email protected]> writes:

On 2026-06-06 16:17, Anton Ertl wrote:

Ruvim <[email protected]> writes:

Here is an example of a program that relies on a parsing `to`, but is >>>>> not compliant due to that very ambiguous condition regarding `postpone`. >>>>> Let's introduce a multiple assignment construct of the following form: >>>>> `1 2 3 to( a b c )`.

: ?comp ( -- ) state @ if exit then -14 throw ;

: equals ( sd2 sd1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
: source-offset ( -- u )
>in @
;
: set-source-offset ( u -- )
source nip over u< invert if >in ! exit then
-18 throw \ "parsed string overflow"
;

synonym take-lexeme-maybe parse-name

: take-lexeme ( "ccc" -- sd )
take-lexeme-maybe dup if exit then -16 throw
;

: to( ( "ccc<rparen>" -- ) \ " a b c )"
?comp source-offset ( u.offset )
take-lexeme s" )" equals if drop exit then
( u.offset ) recurse ( u.offset )
source-offset swap set-source-offset
postpone to
set-source-offset
; immediate

\ usage example

0 value a
0 value b

: init-foo ( -- ) 2 3 to( a b ) ;

init-foo a . b . \ it should print "2 3"

Do you know a Forth system in which `to` parses the parse area and in
which the definition for `to(` given above *does not* work?

Depending on what you mean by "work".

By "does not work" I mean that the provided program does not behave as
specified in the usage example (a kind of test).

That is already satisfied by:

: to-b-to-a to b to a ;

: to(
')' parse 2drop
postpone to-b-to-a ; immediate

No. I referred to the specific definition for `to(`, have a look again:

| Do you know a Forth system in which
| `to` parses the parse area and in which
| the definition for `to(` given above
| *does not* work?

Whether the definitions works as expected is checked by
`init-foo a . b .`

To be tested automatically, two last lines can be written as:

t{ 0 value a 0 value b -> }t
t{ :noname 2 3 to( a b ) ; execute a b -> 2 3 }t

A much simpler implementation that does not have the deficiency
discussed below. It also demonstrates that you cannot use a test as a specification.

I don't use a test as a specification. Actually, I use the whole
program, including the definition for `to(` to test a Forth system.

Anything that contains ?COMP is deficient by design.

Do you mean that preventing accidentally execution of some word in
interpretation state (by throwing an exception) is deficient?

Any word that is not the text interpreter and that uses "STATE @" is deficient.

I would classify my word `to(` is a text interpreter. Is it still deficient?

[...]

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Thus, this implementation is more complex and unhygienic [2], and these
drawbacks are introduced solely for the sake of a few Forth systems that
provide non-standard `to`. The cost seems unjustified.

[2] It is unhygienic, as it requires `to` to be present in the context.
See: <https://en.wikipedia.org/wiki/Hygienic_macro>

Accidental capture of identifiers is not to only problem of
EVALUATE-based macros, even in this case. Another is accidental nonvisibility of TO. And yet, the EVALUATE-based implementation of
TO( is superior to the one you give above in several aspects:

1) Shorter. And I would say it is less complex (as evidenced by its shortness), but you claim that it is more complex without explaining
why you think so.

YOur definition for `to(` is visually longer: approximately 27 lexemes
vs 16 lexemes. But this depends on the basis (it is different). If we
take the standard words as the basis, I expect the approach with two
loops and `evaluate` will be longer than the approach with recursion and `postpone`.

Also, the approach with two loops contains more conditions than the
approach with one recursion.

2) No STATE @ deficiency.

Your definition also contains `STATE @`, but indirectly, inside
`evaluate`. In this respect, I see no conceptual differences.

3) Implementable in standard Forth (>STRING-EXECUTE can be implemented
in standard Forth), while the implementation you give above uses
non-standard POSTPONE TO and ['] TO.

This non-standard use was the main purpose of my implementation. I
wonder which Forth systems (that provide standard compliant `to`) behave
in unexpected ways under these conditions.

[...]

Further comments later.

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Mon Jun 8 05:04:29 2026

From Newsgroup: comp.lang.forth

Ruvim <[email protected]> writes:

On 2026-06-07 13:23, Anton Ertl wrote:

Ruvim <[email protected]> writes:

On 2026-06-06 16:17, Anton Ertl wrote:

Ruvim <[email protected]> writes:

Do you know a Forth system in which `to` parses the parse area and in >>>>> which the definition for `to(` given above *does not* work?

Depending on what you mean by "work".

By "does not work" I mean that the provided program does not behave as
specified in the usage example (a kind of test).

That is already satisfied by:

: to-b-to-a to b to a ;

: to(
')' parse 2drop
postpone to-b-to-a ; immediate

[...]

Whether the definitions works as expected is checked by
`init-foo a . b .`

To be tested automatically, two last lines can be written as:

t{ 0 value a 0 value b -> }t
t{ :noname 2 3 to( a b ) ; execute a b -> 2 3 }t

My point stands.

But I see your point, too. You have some program where your test case
does not behave as you intended on some Forth system.

But what's the point of your question.

If there is some system where TO parses and where your program does
not behave as intended, then what?

If every system with parsing TO behaves for your program as you
intended, then what? This just shows that POSTPONE TO works as you
intend for this program and these systems, not in general.

Any word that is not the text interpreter and that uses "STATE @" is
deficient.

I would classify my word `to(` is a text interpreter. Is it still deficient?

Yes, because it is no text interpreter, and playing Humpty-Dumpty does
not change that. E.g.,

to( : foo 1 + . ; 2 foo )

does not print 3.

[...]

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

Thus, this implementation is more complex and unhygienic [2], and these
drawbacks are introduced solely for the sake of a few Forth systems that >>> provide non-standard `to`. The cost seems unjustified.

[2] It is unhygienic, as it requires `to` to be present in the context.
See: <https://en.wikipedia.org/wiki/Hygienic_macro>

Accidental capture of identifiers is not to only problem of
EVALUATE-based macros, even in this case.

...

If we
take the standard words as the basis, I expect the approach with two
loops and `evaluate` will be longer than the approach with recursion and >`postpone`.

If you take the standard words as the basis, then you cannot use
POSTPONE TO.

2) No STATE @ deficiency.

Your definition also contains `STATE @`, but indirectly, inside
`evaluate`.

Yes, good point, as mentioned: Accidental capture of identifiers is
not to only problem of EVALUATE-based macros, even in this case. The
problem is that the text interpreter inside the EVALUATE invocations
use the STATE at the run-time of TO(, not at the parsing time.

One way to handle that would be to disallow using ', ['], POSTPONE and [COMPILE] on TO(, but

1) The Forth standard does not give us a way to enforce that.

2) As your TO( implementation shows, such a restriction can become a
hindrance.

Another way to handle that is to have a recognizer that deals with
TO(. That solves objection 1) above, but not necessarily objection
2). However, maybe the user who wants to do something which would
require POSTPONEing or ticking a word TO( can do manage to do what
they want with the recognizer, the translator, or the translator's
actions.

In any case, the interpreting action of TRANSLATE-TO( could EVALUATE
"TO b" and "TO a" in interpret state, while the compiling action could
do that in compile state. For the postponing action one could store
the translation data with POSTPONE SLITERAL and the like, and postpone
the word that performs the compiling action.

This solves the STATE @ problem of EVALUATE, but the accidental
capture of TO or accidental non-visibility of TO would not be solved.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Mon Jun 8 07:28:25 2026

From Newsgroup: comp.lang.forth

On 2026-06-07 13:23, Anton Ertl wrote:
[...]

There is one thing that this TO( cannot do: It does not work as
intended inside ]] ... [[. And if somebody writes

POSTPONE TO( a b )

they will not get the equivalent of

POSTPONE ->b POSTPONE ->a

This behavior is unexpected, because `POSTPONE TO( a b )` is
grammatically incorrect in Forth. Since, in this case, while compiling
the containing definition, `postpone` appends the compilation semantics
of `to(` to the current definition. And the fragment "a b )" remains in
the parse area.

This is a known problem of the `]] .. [[` construct, stemming from the
fact that it relies on `postpone`. Inside this construct, parsing words
do not parse.

My `c{ ... }c` construct [3] does not have this drawback, but it is more complex to implement. Nevertheless, it should work on standard Forth
systems.

[3] <https://github.com/ruv/forth-on-forth/blob/master/c-state.readme.txt>

Can we deal with this by defining a recognizer for TO(, which could
then use the TRANSLATE-TO designed for REC-TO?

This should be solved on the level of `compile,` and `lit,`, (where
parsing is already complete), not on the level of `postpone`.

'translate-to' ( n xt - translation ) gforth-experimental
xt belongs to a value-flavoured (or defer-flavoured) word, n is the
index into the 'to-table:' for xt (see Words with user-defined TO etc.). Interpreting run-time: '( ... -- ... )'
Perform the to-action with index n in the 'to-table:' of xt. Additional stack effects depend on n and xt.

It is probably possible to do that, but it would be complex: The
recognizer would produce a translation that contains all the xts of
the value-like words, and a TRANSLATE-TO(.

But a recognizer must not parse the parse area. Additional parsing, if
it is needed, is performed by the translator that the recognizer returns.

The TRANSLATE-TO( action
for interpreting would have to get all of the xts out of the way. The
it woul get the xt for the last one and perform TRANSLATE-TO
INTERPRETING; repeat for the next one, and repeat until all the xts
are done. The action for compiling and postponing could be
implemented in a similar way (but call COMPILING and POSTPONING
instead of INTERPRETING), or maybe simpler because there is no need to
get the xts out of the way.

In any case, that's quite a bit of work. Maybe someone (Stephen
Pelc?) thinks this demonstrates that the recognizer words are too
limited. I think it demonstrates that TO( is not a good idea.

I think, it demonstrates that you shouldn't drive nails with a microscope.

--
Ruvim
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Mon Jun 8 15:53:55 2026

From Newsgroup: comp.lang.forth

On 2026-06-08 05:04, Anton Ertl wrote:

Ruvim <[email protected]> writes:

On 2026-06-07 13:23, Anton Ertl wrote:

[...]

Any word that is not the text interpreter and that uses "STATE @" is
deficient.

I would classify my word `to(` is a text interpreter. Is it still deficient?

Yes, because it is no text interpreter, and playing Humpty-Dumpty does
not change that. E.g.,

to( : foo 1 + . ; 2 foo )

does not print 3.

Ah, you meant the Forth text interpreter. I see.

I understood/meant the term "text interpreter" in a more general sense.
I usually prefer to use the term "translator" for that (see bellow),
since interpretation often implies execution.

[...]

So here's the implementation (untested):

: to(
0 0 2>r begin
parse-name dup 0= abort" unfinished TO("
2dup 2>r
s" )" str= until
2r> 2drop \ get rid of ")"
begin
2r> dup while
[: "to " type ;] >string-execute evaluate
\ freeing the strings is left as exercise to the reader
repeat
2drop \ get rid of 0 0
; immediate

[...]

2) No STATE @ deficiency.

Your definition also contains `STATE @`, but indirectly, inside
`evaluate`.

Yes, good point, as mentioned: Accidental capture of identifiers is
not to only problem of EVALUATE-based macros, even in this case.

The problem is that the text interpreter inside the EVALUATE
invocations use the STATE at the run-time of TO(, not at the parsing
time.

That is, the execution semantics of `to(` uses `STATE` and that is a
problem. In the same time, the execution semantics of `evaluate` also
uses `STATE`, and that is not a problem. Correct?

Why do you think it is a problem in the first place, but not in the
second place?

There are words that translate a fragment of the input source (a lexical block) into some other form according to their own rules. I call them
text translators (or, translators of the input source).

For example, the standard word `code` can be implemented as a text
translator. <https://forth-standard.org/standard/tools/CODE>

Moreover, in this sense, all immediate parsing words are text translators.

Translation (as a process) can depend on STATE, as well as on other
pieces of the lexical context.

It is totally unclear why do you think that a text translator should not
use STATE.

--
Ruvim
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Tue Jun 9 10:25:10 2026

From Newsgroup: comp.lang.forth

On 2026-06-08 05:04, Anton Ertl wrote:
[...]

One way to handle that would be to disallow using ', ['], POSTPONE and [COMPILE] on TO(, but

1) The Forth standard does not give us a way to enforce that.

2) As your TO( implementation shows, such a restriction can become a hindrance.

Another way to handle that is to have a recognizer that deals with
TO(.
That solves objection 1) above, but not necessarily objection 2).

In 2019, I considered an approach that uses recognizers for lexemes
"to", "action-of", "is", etc, instead of providing the corresponding
Forth words, simply to technically disallow ticking them (i.e.,
obtaining their execution tokens). I then abandoned this approach. The
reasons are as follows.

1. This approach contradicts the spirit of the section "3.4.3.2
Interpretation semantics", which says:

| A system shall be capable of executing,
| in interpretation state, all of the definitions
| from the Core word set and any definitions included
| from the optional word sets or word set extensions
| whose interpretation semantics are defined
| by this standard.

And other sections, e.g. "3.4.2 Finding definition names", which says:
| A system shall be capable of finding the definition names
| defined by this standard

2. Providing a recognizer for a lexeme instead of providing a word (of
the same name as the lexeme) does not prevent us from obtaining an
execution token, since we can still obtain it like this:

[undefined] perceive [if]
: perceive ( sd -- any td | 0 ) rec-forth ;
[then]

[:
[ s" to" perceive ?found 2lit, ] execute
;] ( xt )
constant xt-of(to)

\ NB: in this approach, "to" is not a Forth word.
\ Then, the following definition for `[to]` should work
\ as the standard `to` word.
: [to] xt-of(to) execute ; immediate

This works in the Gforth and SP-Forth/4 recognizers implementations.

However, maybe the user who wants to do something which would
require POSTPONEing or ticking a word TO( can do manage to do what
they want with the recognizer, the translator, or the translator's
actions.

Currently, in other Forth system, obtaining an execution token may
differ from the example above, but there are no conceptual limitations.

So, conceptually, recognizers *allow* us to obtain/create an execution
token for any *unordinary word*, which identifies the behavior that
implements the interpretation semantics in interpretation state and the compilation semantics in compilation state for the word.

In general, recognizers are suitable in the following cases:
- the behavior cannot be implemented with a single word (as for a
string literal starter, numeric literals, etc);
- the recognizing of a lexeme (or the beginning of a construct)
should not depend on the search order (or be shadowed by words from the
search order);
- reuse of the Forth text interpreter for translating text by
different rules (especially, through nested input sources; for example, `execute-parsing` can be implemented using the Recognizer API);

Otherwise, using a parsing word is also perfectly suitable.

Whether it is a parsing word or another construct, it is better if the affected lexical block (its beginning and end) is visually marked.

Therefore, `to( foo )` is better than `to foo`.

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Wed Jun 10 19:45:42 2026

From Newsgroup: comp.lang.forth

On 2026-06-09 10:25, Ruvim wrote:

In general, recognizers are suitable in the following cases:
- the behavior cannot be implemented with a single word (as for a
string literal starter, numeric literals, etc);
- the recognizing of a lexeme (or the beginning of a construct)
should not depend on the search order (or be shadowed by words from the search order);
- reuse of the Forth text interpreter for translating text by
different rules (especially, through nested input sources; for example, `execute-parsing` can be implemented using the Recognizer API);

Otherwise, using a parsing word is also perfectly suitable.

Whether it is a parsing word or another construct, it is better if the affected lexical block (its beginning and end) is visually marked.

Therefore, `to( foo )` is better than `to foo`.

The case of `to( ... )`, if implemented using a nested call to the Forth
text interpreter, falls into the third category — reusing the Forth text interpreter. However, we don't have a Forth text interpreter factor that
would apply only to the next lexical block in the input source. And even
if we did, it would not simplify the implementation, since in this case
the contained lexemes must be translated in *reverse order*.

Furthermore, we currently cannot use recognizers (namely, the perceptor,
the recognizer used by the Forth text interpreter) because we have
neither `to` factors applicable to typed data objects (results of recognizers), nor `to` factors applicable to xt of
`value`/`2value`/`fvalue` children and to local-id of local
variables/values.

But if we do have the latter `to` factors, then using them and the
perceptor, the word `to(` can be implemented as follows (with a bit of compatibility layer at the beginning):

: equals ( sd1 sd2 -- flag ) compare 0= ;

synonym take-lexeme-maybe parse-name

: extract-lexeme ( -- sd )
begin take-lexeme-maybe dup if exit then
2drop refill 0=
until -39 throw \ "unexpected end of the input source"
;

' translate-xtval-setter constant td-xtval-setter
' translate-localid-setter constant td-localid-setter

: to(
extract-lexeme 2dup ")" equals if 2drop exit then
perceive ?found ( tdo ) \ "tdo" stands for "typed data object"
tdo>xtval? if ( xtval )
td-xtval-setter 2>r
else ( tdo ) tdo>localid? if ( localid )
?comp td-localid-setter 2>r
else -32 throw then then
recurse 2r> execute
; immediate

Using the perceptor and the `to` factors allows us to make the `to( ...
)` construct multi-line more easily than using the top-level word `to`,
since we don't have to allocate and free strings.

This approach works in SP-Forth/4.

The word `td-xtval-setter ( -- td )` returns the same type descriptor as
a proper recognizer returns for "->foo", where "foo" is a children of
`value`.

The word `td-localid-setter ( -- td )` returns the same type descriptor
as a proper recognizer returns for "->bar", where "bar" is a local variable.

The word `tdo>xtval? ( tdo1 -- xtval true | tdo1 false )` converts tdo
to xtval (a subtype of xt).

The word `tdo>localid? ( tdo1 -- localid true | tdo1 false )` converts
tdo to localid.

Note: Above I use the data type symbol "tdo" rather than the proposed by
Anton data type symbol "translation", as the latter is too confusing in
this context, because the English "translation" means either an act of translating, or a result of translating [1]
[1] <https://en.wiktionary.org/wiki/translation#Noun>

--
Ruvim
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Sun Jun 14 12:41:34 2026

From Newsgroup: comp.lang.forth

On 2026-06-07 13:23, Anton Ertl wrote:
[...]

<https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11?hideDiff#reply-1623>.
REC-FORTH is proposed as a deferred word, but there is no requirement
to use IS on it. And unlike for value-flavoured words, there are
non-parsing words for dealing with deferred words: DEFER@ and DEFER!.

As I recall, you had objections regarding the `BASE` variable
(specifically, the use `@` and `!` as methods to get/set its value). I
agree with them as well.

These objections also apply to the use of `DEFER@` and `DEFER!` (or `ACTION-OF` and `IS`) to get/set the value of a `DEFER` child.

For example, you wrote on 2024-10-05 in <[email protected]>:

| Forth-94 seems to have had some of that, though,
| with words like GET-CURRENT and SET-CURRENT instead
| of a (user) variable CURRENT that had existing
| practice at the time.
| I wish they had defined GET-BASE and SET-BASE
| instead of BASE.

I would like to understand why you do not apply the same reasoning to
the Recognizers API variant you proposed.

So if you want to define IS( ... ), there is no need to concern
yourself with non-standard words like (TO).

In the case of the Recognizer API, the issue related to the words `IS`/`ACTION-OF`/`DEFER@`/`DEFER!` concerns the party providing the API (including a program that acts as a "wrapper" for the system while
providing standard system functionality), rather than a program that
merely uses the API.

--
Ruvim

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Mon Jun 15 17:17:25 2026

From Newsgroup: comp.lang.forth

Ruvim <[email protected]> writes:

On 2026-06-07 13:23, Anton Ertl wrote:
[...]

<https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11?hideDiff#reply-1623>.
REC-FORTH is proposed as a deferred word, but there is no requirement
to use IS on it. And unlike for value-flavoured words, there are
non-parsing words for dealing with deferred words: DEFER@ and DEFER!.

As I recall, you had objections regarding the `BASE` variable
(specifically, the use `@` and `!` as methods to get/set its value). I
agree with them as well.

These objections also apply to the use of `DEFER@` and `DEFER!` (or >`ACTION-OF` and `IS`) to get/set the value of a `DEFER` child.

For example, you wrote on 2024-10-05 in ><[email protected]>:

| Forth-94 seems to have had some of that, though,
| with words like GET-CURRENT and SET-CURRENT instead
| of a (user) variable CURRENT that had existing
| practice at the time.
| I wish they had defined GET-BASE and SET-BASE
| instead of BASE.

I would like to understand why you do not apply the same reasoning to
the Recognizers API variant you proposed.

The reason why I would have preferred SET-BASE GET-BASE over the
(user) variable BASE is that it would have allowed to do the first
stage of two-stage division in SET-BASE, and then perform the second
stage in #.

If BASE had been a UVALUE, that would have been ok, too, because I
could have used SET-TO on BASE to perform the same optimization.

Eventually I found a different way to achieve much of the same
benefit, see <[email protected]>.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ruvim@[email protected] to comp.lang.forth on Tue Jun 16 17:34:12 2026

From Newsgroup: comp.lang.forth

On 2026-06-15 17:17, Anton Ertl wrote:

Ruvim <[email protected]> writes:

On 2026-06-07 13:23, Anton Ertl wrote:
[...]

<https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11?hideDiff#reply-1623>.
REC-FORTH is proposed as a deferred word, but there is no requirement
to use IS on it. And unlike for value-flavoured words, there are
non-parsing words for dealing with deferred words: DEFER@ and DEFER!.

As I recall, you had objections regarding the `BASE` variable
(specifically, the use `@` and `!` as methods to get/set its value). I
agree with them as well.

These objections also apply to the use of `DEFER@` and `DEFER!` (or
`ACTION-OF` and `IS`) to get/set the value of a `DEFER` child.

For example, you wrote on 2024-10-05 in
<[email protected]>:

| Forth-94 seems to have had some of that, though,
| with words like GET-CURRENT and SET-CURRENT instead
| of a (user) variable CURRENT that had existing
| practice at the time.
| I wish they had defined GET-BASE and SET-BASE
| instead of BASE.

I would like to understand why you do not apply the same reasoning to
the Recognizers API variant you proposed.

The reason why I would have preferred SET-BASE GET-BASE over the
(user) variable BASE is that it would have allowed to do the first
stage of two-stage division in SET-BASE, and then perform the second
stage in #.

If BASE had been a UVALUE, that would have been ok, too, because I
could have used SET-TO on BASE to perform the same optimization.

So, by all appearances, you see the advantages of using setters/getters.

`set-to` in Gforth allows to configure a setter to a child of `value` or `defer` that will be used by `to`. The problem with `set-to` approach is
that it cannot be used in a standard program.

In contrast, getters and setters implemented as separate *ordinary*
words can be defined or redefined within a standard program. And they
are far simpler.

Eventually I found a different way to achieve much of the same
benefit, see <[email protected]>.

Thank you for the reference.

Other legacy variables are `>in` and `state`.

In each case, there may be specific reasons why a getter or setter is
needed.

However, the general objection against variables in an API is that they
do not allow a system (or a wrapper) to have additional actions that are performed when getting or setting the value.

This objection also applies to the words created using `value` and
`defer` (perhaps to a lesser extent in complex systems like Gforth, but
fully so regarding simpler systems and programs/wrappers).

--
Ruvim
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Noozle
  Wed Jun 17 08:27:55 2026
  from Noozle City via Telnet
- Noozle
  Tue Jun 16 16:49:14 2026
  from Noozle City via Telnet
- Noozle
  Mon Jun 15 09:50:24 2026
  from Noozle City via Telnet
- Noozle
  Sun Jun 14 11:44:58 2026
  from Noozle City via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,123
Nodes:	10 (0 / 10)
Uptime:	39:12:57
Calls:	14,372
Calls today:	1
Files:	186,380
D/L today:	7,900 files (2,327M bytes)
Messages:	2,540,712

locals (was: Coroutines in Forth)

Who's Online

Recent Visitors

System Info