Forum: War Ensemble BBS

ciforth model

From albert@[email protected] to comp.lang.forth on Thu Apr 16 15:38:47 2026

From Newsgroup: comp.lang.forth

The cimodel is based on a memory map that is well defined.

The most important difference from all preceding models is the
central role of the dictionary entry concept.
The dictionary is subdivided into mutually exclusive entries
with no other data.

Memory model

- os system header (or disk boot code)
- boot code
- bespoke dictionary
- dictionary entries
- header
- memory possessed by the previous header
HERE
- free dictionary
- task frame
- buffers

There is a HIP (high level interpreter pointer) that points to high
level code. NEXT executes the dictionary entry pointed to by HIP.

Only the part up till HERE is present in the executable.

os system header is the obligatory os dependant code to make the
file loadable by the os. (Or boot from disk.)
Possibly definitions of segments are to be found here.

boot code
The boot code contains a safeguard of the external interface,
for example the parameters that are passed to an executable.
Then it initializes the data stack pointer DSP , the return stack pointer RSP, and the high level interpreter pointer HIP, and possibly
other register initialisation for special purposes.
It imposes a structure on the task frame and possibly buffers based on pertinent data that is stored in so called user variables.
(In order to save the system, user variables are changed
to reflect the new state of the system.)
The HIP is made to point to the first command
of the word COLD. Then this first command is executed,
which is indicated by "doing NEXT".

Bespoke dictionary
The reserved part of the dictionary is subdivided in dictionary
entries.
A d.e. is
- name (string, i.e. possibly variable length)
- Fixed size field, normally the cell-size of the current Forth
- C Code pointer to what can be executed.
- D Data pointer to what can be fetched or stored.
- F A bit array of flags, e.g. immediate flag.
- L Link information to other dictionary entries.
- N Contains a name or points to a name
And possibly optional CELLS like
- S Points to source code
- X Whatever serves
- Data possessed by the preceding entry.
This data may be machine code, interpreted code or plain data.

Free dictionary
The free dictionary can be allocated and then becomes bespoke.
Strings, functions, buffers, and headers all can be allocated.

Task frame
A task frame consist of a data stack, terminal input buffer,
return stack and user variables.
This is called thusly because it is replicated if multiple
task are running concurrently.
Note that the data stack can run down to HERE, like in the FIG-model.

The buffers
The buffers are 1 Kbyte buffers. They are used for a block
system, that plays an important role as a library.
They are locked and unlocked while in use and can serve
for nested includes for files, as well.

Indirect threading model.
A pointer to the first fixed field of a header is called dea,
dictionary entry address.
Indirect threaded code means that the program counter is loaded
with the C field of a dea, so effecting an indirect jump to
the C-field.

An entry is identified by the handle. All manipulation and properties
of "definitions", "words", "functions", "buffers" is referring to by
this address.
Instead of having a plethora of relation between different fields,
one finds properties of a dea by passing through the fields.

A dea containing a low level definition has a pointer to machine code
in C.

A variable or buffer contains code that returns the content
of the D field pointing to storage (in general directly
after the header.)
A constant contains code that returns the content of the
D field, where there is no implication of that being a pointer.

A high level definition contains a pointer to a specific
machine code called DOCOL in C.
The D pointer points to an area (in general directly after
the header) where a sequence of dea's is stored.
Execution the definition means executing these dea's in order.

An object (CREATE DOES> construct) contains a pointer to specific
machine code called DODOES in C.
The D pointer points to an area where a pointer to the DOES>
code resides, followed by a data area.

ANNEX
In the 64 bits era a string constant is
- a cell containing the length in bytes
- string itself, not necessarily one char per byte.
- alignment to 8 bytes.
All fields are one cell.

However we could squeeze for 16 bits, without logically affecting the model.

code field: one byte, an offset to a code area of 256 bytes.
data field: 16 bit pointer
name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
flag field: hidden in the name
link field: 256 bit offset, d.e. are at most 256 byte long.
The total of 8 bytes will put even the original fig model to shame.
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From dxf@[email protected] to comp.lang.forth on Fri Apr 17 11:44:49 2026

From Newsgroup: comp.lang.forth

On 16/04/2026 11:38 pm, [email protected] wrote:

...
However we could squeeze for 16 bits, without logically affecting the model.

code field: one byte, an offset to a code area of 256 bytes.
data field: 16 bit pointer
name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
flag field: hidden in the name
link field: 256 bit offset, d.e. are at most 256 byte long.
The total of 8 bytes will put even the original fig model to shame.

Don't know who was first but Fig-forth's variable length names is something that Forth Inc and pretty much everyone adopted. Moore attempted to defend
'3 chars plus count' but to no avail. That ship had sailed.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 17 00:27:21 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

However we could squeeze for 16 bits, without logically affecting the model.

These days for such a constrained target, it's probably best to tether
from a bigger machine.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Fri Apr 17 07:29:44 2026

From Newsgroup: comp.lang.forth

dxf <[email protected]> writes:

name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags

...

Don't know who was first but Fig-forth's variable length names is something >that Forth Inc and pretty much everyone adopted. Moore attempted to defend >'3 chars plus count' but to no avail. That ship had sailed.

Looking at the traditional length+3 chars and
[email protected]'s 3 first and last, at least one pair of
words in Forth-94 conflicts on both systems, and WORDS could show them
as REA?????? and REA*E, respectively. It would be interesting to
determine (say, by checking the words from an existing Forth system),
which scheme produces more conflicts.

Moore continues with this approach in Color Forth, but he uses some
compression approach to usually store more characters in the number of
bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
bits, 18 bits, and 32 bits on different hardware). I don't remember
if he stores the length.

Another option would be to store a hash value that is computed using
all characters in the name. If a good hash function is used, the
probability of a conflict is relatively small with, e.g. 4000 names in
a wordlist (about the number of names that Gforth has in the Forth
wordlist), and even the 28 bits that [email protected]
provides. The probability of no conflict is approximately

((2^28-1)/(2^28))^((4000*3999)/2)

i.e.

1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.

The result is 0.97, i.e., there is a 3% probability of conflict for
these numbers.

The disadvantage of this approach is that WORDS or SEE cannot even
show the little about the name that Chuck Moore's approaches or [email protected]'s approach shows. But then, if you are so
pressed for memory that you use one of these approaches, why not also
save the memory for WORDS and SEE?

Another disadvantage is that the system cannot tell if a redefinition
warning comes from a hash conflict or from the name actually being
redefined; but it shares this disadvantage with all approaches that do
not store the full name.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Fri Apr 17 12:10:34 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

dxf <[email protected]> writes:

name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags

...

Don't know who was first but Fig-forth's variable length names is something >>that Forth Inc and pretty much everyone adopted. Moore attempted to defend >>'3 chars plus count' but to no avail. That ship had sailed.

Looking at the traditional length+3 chars and
[email protected]'s 3 first and last, at least one pair of
words in Forth-94 conflicts on both systems, and WORDS could show them
as REA?????? and REA*E, respectively. It would be interesting to
determine (say, by checking the words from an existing Forth system),
which scheme produces more conflicts.

The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.

This is a realistic header in this model.
The name takes 24 bytes or 3 cells, a pointer to an area, a preceding count
and a 1 byte area padded to 8 bytes.

2513 # *********
2514 # * + *
2515 # *********
2516 #
2517 21c3 00000000 .balign 8,0x00
2517 00
2518 N_PLUS: # Name string
2519 21c8 01000000 .quad 1 # Name string
2519 00000000 # Name string
2520 21d0 2B .ASCII "+" # Name string
2521 21d1 00000000 .balign 8,0x00 # Name string
2521 000000
2522 PLUS: # 0x21D8 is the handle
2523 21d8 00000000 .quad X_PLUS # code
2523 00000000
2524 21e0 00000000 .quad PLUS+HEADSIZE # data ignored

2524 00000000
2525 21e8 00000000 .quad 0x0 # flags, empty
2525 00000000
2526 21f0 00000000 .quad ZLESS # link pointer
2526 00000000
2527 21f8 00000000 .quad N_PLUS # points to name
2527 00000000
2528 2200 00000000 .quad 0 # source field
2528 00000000
2529 2208 00000000 .quad 0 # extra field (spare)
2529 00000000
2530
2531 X_PLUS:
2532
2533 2210 58 POP %RAX #(S1) <- (S1) + (S2)
2534 2211 5B POP %RBX
2535 2212 4801D8 ADD %RAX,%RBX
2536 2215 50 PUSH %RAX
2537 2216 48AD LODSQ # NEXT
2538 2218 FF20 JMP QWORD PTR[%RAX]
2539

<SNIP>

Print the name for + :
HEX 21D8 >NFA $@ TYPE

A total of 10 cells for the header alone.
Who cares?

lina+ -a
AMDX86 ciforth beta 2026Apr12
WANT UNUSED
OK
UNUSED .
134221795712 OK

- anton

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Fri Apr 17 12:13:20 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

[email protected] writes:

However we could squeeze for 16 bits, without logically affecting the model.

These days for such a constrained target, it's probably best to tether
from a bigger machine.

What was the argument about? See my answer to Anton Ertl.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 18 10:26:11 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.

The users have voted with their feet: They usually have used the
"extremely wasteful" fig-Forth with the default settings (names with
up to 31-char) rather than using the "extremely impractical" option to
only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
that even Forth, Inc. switched from "impracticality" to
"wastefulness", as well as almost everyone else. The exception is
Chuck Moore, who continues with the "impractical" approach in
ColorForth. Even in the Forth universe, few people seem to use
ColorForth and I have not heard of systems that follow its approach to
names.

Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers,
however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system):

here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........ 403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U.. 403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
403AC9D0: link field (pointer to previous word in the same wordlist)
403AC9D8: header methods (pointer to a method table)
403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case

For more information, read

@InProceedings{paysan&ertl19,
author = {Bernd Paysan and M. Anton Ertl},
title = {The new {Gforth} Header},
crossref = {euroforth19},
pages = {5--20},
url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
OPTnote = {refereed},
abstract = {The new Gforth header is designed to directly
implement the requirements of Forth-94 and
Forth-2012. Every header is an object with a fixed
set of fields (code, parameter, count, name, link)
and methods (\texttt{execute}, \texttt{compile,},
\texttt{(to)}, \texttt{defer@}, \texttt{does},
\texttt{name>interpret}, \texttt{name>compile},
\texttt{name>string}, \texttt{name>link}). The
implementation of each method can be changed
per-word (prototype-based object-oriented
programming). We demonstrate how to use these
features to implement optimization of constants,
\texttt{fvalue}, \texttt{defer}, \texttt{immediate},
\texttt{to} and other dual-semantics words, and
\texttt{synonym}.}
}

@Proceedings{euroforth19,
title = {35th EuroForth Conference},
booktitle = {35th EuroForth Conference},
year = {2019},
key = {EuroForth'19},
url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
}

There have been a few changes since that paper: The body address is
now used as nt and xt, and the "name length and flags" field has
gained another flag or two.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Sat Apr 18 18:11:48 2026

From Newsgroup: comp.lang.forth

On Sat, 18 Apr 2026 10:26:11 GMT
[email protected] (Anton Ertl) wrote:

[email protected] writes:

The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.

The users have voted with their feet: They usually have used the
"extremely wasteful" fig-Forth with the default settings (names with
up to 31-char) rather than using the "extremely impractical" option to
only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
that even Forth, Inc. switched from "impracticality" to
"wastefulness", as well as almost everyone else. The exception is
Chuck Moore, who continues with the "impractical" approach in
ColorForth. Even in the Forth universe, few people seem to use
ColorForth and I have not heard of systems that follow its approach to
names.

Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system):

here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
403AC9D0: link field (pointer to previous word in the same wordlist) 403AC9D8: header methods (pointer to a method table)
403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case

I like the idea to have the name at the beginning. I might try that also.
In ntf64/lxf64 I have a compact header! Your example becomes
n' five 16 - 32 dump
0000000071EFF0 08 F0 71 00 30 70 42 00 A8 EF 71 00 03 00 10 00 ..q.0pB...q..... 0000000071F000 A4 46 49 56 45 00 00 00 26 05 25 00 00 00 00 00 .FIVE...&.%..... 71EFF0: xt to token code
71EFF4: xt to native code
71EFF8: link field
71EFFC: lenght of token code
71EFFE: lenght of native code
71F000: flags (3 bits) and count (5 bits)
71F001: name zero extended to 8 byte alignment
71F008: token code
Compilation is first to token code then this is sent to the native code generator
SEE decompiles the token code and SEEA the native code
Code Headers and Data have separate memory regions
A constant have no data associated to it.
The constant is stored in the code
five in this example is a macro and will be inlined if called from
another definition.
see five macro
Address OP Instruction
0x71F008 26 05 LIT1 5
0x71F00A 25 RET
3 bytes, 2 instructions
ok
seea five
0x427030 48895DF8 mov qword [rbp-0x8], rbx
0x427034 48C7C305000000 mov rbx, 0x5
0x42703B 488D6DF8 lea rbp, [rbp-0x8]
0x42703F C3 ret
16 bytes, 4 instructions
ticking a word gives a double xt
' five h. $0042�7030�0071�F008
This way I can store an xt and both the token and native code can use it.
It works as I compile the executable to be loaded at 0x400000.
BR
Peter

For more information, read

@InProceedings{paysan&ertl19,
author = {Bernd Paysan and M. Anton Ertl},
title = {The new {Gforth} Header},
crossref = {euroforth19},
pages = {5--20},
url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
OPTnote = {refereed},
abstract = {The new Gforth header is designed to directly
implement the requirements of Forth-94 and
Forth-2012. Every header is an object with a fixed
set of fields (code, parameter, count, name, link)
and methods (\texttt{execute}, \texttt{compile,},
\texttt{(to)}, \texttt{defer@}, \texttt{does},
\texttt{name>interpret}, \texttt{name>compile},
\texttt{name>string}, \texttt{name>link}). The
implementation of each method can be changed
per-word (prototype-based object-oriented
programming). We demonstrate how to use these
features to implement optimization of constants,
\texttt{fvalue}, \texttt{defer}, \texttt{immediate},
\texttt{to} and other dual-semantics words, and
\texttt{synonym}.}
}

@Proceedings{euroforth19,
title = {35th EuroForth Conference},
booktitle = {35th EuroForth Conference},
year = {2019},
key = {EuroForth'19},
url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
}

There have been a few changes since that paper: The body address is
now used as nt and xt, and the "name length and flags" field has
gained another flag or two.

- anton

--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sat Apr 18 20:57:21 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

Concerning memory consumption, Gforth's development version includes
quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, >however. The result is that gforth.fi takes 2.3MB on my system; for >comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).

In view that ctags understands Forth code this seems to be a duplicate
effort.
ctags --lang=forth *.frt
The advantage is that you can use emacs (or other sophisticated
editors) to go to a function in a familiar way that you were used
to in other languages too.
For those not familiar with ctags, in additions to definitions
it finds also references. It also is blindingly fast.
Under 100 mS for hundreds of files.

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system):

here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

This was exactly my point.

- anton

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 19 11:08:26 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

Concerning memory consumption, Gforth's development version includes
quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

This is used for making backtraces more informative.

2) How the machine code addresses relate to what is COMPILE,d, to get >>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

3) There is also the where table that records where each word is
actually used in the loaded source code, whether there is threaded
code for it or not; it records interpretive use as well as immediate
words where the threaded code is for a different word, if there is
threaded code at all.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, >>however. The result is that gforth.fi takes 2.3MB on my system; for >>comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also >>out-of-line, both not in the image).

In view that ctags understands Forth code this seems to be a duplicate >effort.
ctags --lang=forth *.frt

I have run ctags and etags with this option on the files from the Gray directory, and both do not find any of the definitions of TERM (which
exist in calc.fs and oberon.fs); that's probably because they have
been defined with a user-defined defining word.

Due to this shortcoming of etags (and ctags), gforth has included
etags.fs since very early, which really understands Forth code,
because it hooks into the Gforth system and records a tag whenever a
named word is defined. I dimly remember that we also have done ctags
support (for vi users), but do not find anything about it at the
moment.

The advantage is that you can use emacs (or other sophisticated
editors) to go to a function in a familiar way that you were used
to in other languages too.

For a long time, I thought that etags.fs is sufficient and we do not
need to add LOCATE to Gforth, but once we implemented LOCATE, I found
that I use it much more often than M-. (forth-find-tag).

For those not familiar with ctags, in additions to definitions
it finds also references. It also is blindingly fast.
Under 100 mS for hundreds of files.

So you may be claiming that ctags covers the job of the where table.
I do not see how to achive that. ctags has an option --cxref, but it
just outputs the definitions in a different format. E.g., when I say

ctags --lang=forth --cxref *.fs

it shows:

...
disjoint? 187 gray.fs : disjoint?
empty 105 gray.fs : empty
...

which shows the definitions of these words, but not the uses. And it
also does not show the uses of DUP. By contrast, if I include
gray.fs, and then say WHERE DUP, I get the following:

[... 1128 lines of DUP uses in other files ...]
gray.fs:85:2: dup @ = ; 1128
gray.fs:88:2: dup ! ; 1129
gray.fs:112:2: dup cells/set ! 1130
gray.fs:120:3: dup @ , 1131
gray.fs:295:18: source-location dup 2@ swap cr type 1132
gray.fs:354:2: dup follow-set @ subset? 0= \ would everything stay the same1133
gray.fs:357:22: follow-set @ union dup follow-set ! 1134
gray.fs:385:2: dup pass2 1135
gray.fs:460:19: operand1 compute dup if 1136
gray.fs:499:2: dup operand1 propagate 1137

with the DUP use being highlighted. The where table, which consumes
quite a bit of memory (827_776 bytes in gforth.fi for a 64-bit
system), contains that information. I do not see anything in the
ctags/etags manual that provides this functionality. So the LOCATE
information (a cell for each dictionary entry) may be seen as
duplicate information, but the WHERE information does not duplicate
anything.

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system): >>
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

This was exactly my point.

Your point was that Gforth uses 6 cells for a word with a name that
fits in a cell and where the body takes one cell?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Mon Apr 20 13:39:34 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

[email protected] writes:

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

Concerning memory consumption, Gforth's development version includes >>>quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

This is used for making backtraces more informative.

2) How the machine code addresses relate to what is COMPILE,d, to get >>>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

3) There is also the where table that records where each word is
actually used in the loaded source code, whether there is threaded
code for it or not; it records interpretive use as well as immediate
words where the threaded code is for a different word, if there is
threaded code at all.

There is no argument that the gforth supplies more functionality.

<SNIP>

I have run ctags and etags with this option on the files from the Gray >directory, and both do not find any of the definitions of TERM (which
exist in calc.fs and oberon.fs); that's probably because they have
been defined with a user-defined defining word.

This is a weak point of ctags. It has not conceived with an extendable
language in mind.

editors) to go to a function in a familiar way that you were used
to in other languages too.

For a long time, I thought that etags.fs is sufficient and we do not
need to add LOCATE to Gforth, but once we implemented LOCATE, I found
that I use it much more often than M-. (forth-find-tag).

For those not familiar with ctags, in additions to definitions
it finds also references. It also is blindingly fast.
Under 100 mS for hundreds of files.

So you may be claiming that ctags covers the job of the where table.
I do not see how to achive that. ctags has an option --cxref, but it
just outputs the definitions in a different format. E.g., when I say

<SNIP>

ctags --lang=forth --cxref *.fs

<SNIP>

with the DUP use being highlighted. The where table, which consumes
quite a bit of memory (827_776 bytes in gforth.fi for a 64-bit
system), contains that information. I do not see anything in the
ctags/etags manual that provides this functionality. So the LOCATE >information (a cell for each dictionary entry) may be seen as
duplicate information, but the WHERE information does not duplicate
anything.

I rarely use cross references. From my editor the cross reference
of ctags suffice. No argument again that gforth facilities is more comprehensive.

One may consider this wasteful, but I consider it good use of the RAM >>>that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system): >>>
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

This was exactly my point.

Your point was that Gforth uses 6 cells for a word with a name that
fits in a cell and where the body takes one cell?

The point was that in this time and age you don't try to save a
few cells here and there.

The bottom line is, in view of the superior cross reference is
that an argument to trade mpeforth for gforth. Or can we make
do with the weaker ctags (or dispense with such a facility.)

- anton

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Hans Bezemer@[email protected] to comp.lang.forth on Tue Apr 21 19:39:19 2026

From Newsgroup: comp.lang.forth

On 17-04-2026 09:29, Anton Ertl wrote:

dxf <[email protected]> writes:

name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags

...

Don't know who was first but Fig-forth's variable length names is something >> that Forth Inc and pretty much everyone adopted. Moore attempted to defend >> '3 chars plus count' but to no avail. That ship had sailed.

Looking at the traditional length+3 chars and
[email protected]'s 3 first and last, at least one pair of
words in Forth-94 conflicts on both systems, and WORDS could show them
as REA?????? and REA*E, respectively. It would be interesting to
determine (say, by checking the words from an existing Forth system),
which scheme produces more conflicts.

Moore continues with this approach in Color Forth, but he uses some compression approach to usually store more characters in the number of
bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
bits, 18 bits, and 32 bits on different hardware). I don't remember
if he stores the length.

Another option would be to store a hash value that is computed using
all characters in the name. If a good hash function is used, the
probability of a conflict is relatively small with, e.g. 4000 names in
a wordlist (about the number of names that Gforth has in the Forth
wordlist), and even the 28 bits that [email protected]
provides. The probability of no conflict is approximately

((2^28-1)/(2^28))^((4000*3999)/2)

i.e.

1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.

The result is 0.97, i.e., there is a 3% probability of conflict for
these numbers.

The disadvantage of this approach is that WORDS or SEE cannot even
show the little about the name that Chuck Moore's approaches or [email protected]'s approach shows. But then, if you are so
pressed for memory that you use one of these approaches, why not also
save the memory for WORDS and SEE?

Another disadvantage is that the system cannot tell if a redefinition
warning comes from a hash conflict or from the name actually being
redefined; but it shares this disadvantage with all approaches that do
not store the full name.

- anton

It depends a lot on the hashing routine used. FNV1a is particularly
good, having only 4 collisions on 215,000 words.

(https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633)

I use it in my uBasic/4tH interpreter to convert labels to line numbers :-)

Yeah, labels don't need to be sequential. It's not ZX BASIC. After a
decade I still have to see a collision. Full disclosure, max. source
size is 16K, that's about 300-440 lines.

In my only 16K like program there are about 65 subroutines, most of 'em one-liners. Yeah, it's been heavily Forthified. :-)

4tH itself is a pseudo compiler and has (apart from a BRANCH
instruction) no headers. It does have a great disassembler, though. With symbols. ;-)

Thanks to Aaron!

Hans Bezemer
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Wed Apr 22 22:48:31 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

Another disadvantage is that the system cannot tell if a redefinition
warning comes from a hash conflict or from the name actually being
redefined; but it shares this disadvantage with all approaches that do
not store the full name.

In this context a hash conflict is a redefinition. Avoid, unless
you intend to hide an earlier definition.

- anton

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 24 10:38:34 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

In this context a hash conflict is a redefinition. Avoid,

Unclear how to avoid, other than by storing the entire name so you can
detect the conflict.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sat Apr 25 11:54:01 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

[email protected] writes:

In this context a hash conflict is a redefinition. Avoid,

Unclear how to avoid, other than by storing the entire name so you can
detect the conflict.

If you have a conflict, you rename the new offending definition,
as you do now. If you redefine a word, it is intentional, and
you can ignore the message.
ciforth can use 10,000 character names, even store it in ALLOCATEd space.

The essential point of ciforth is that there is one handle (dictionary
entry address) to characterize a word/definition/procedure/data-object.
Use that for everything to pass around.
No words with "no name", only data structures where the name is left a
zero pointer.

Not that the model can be valid even if you use 4 character names.
Jeez!

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Rubin@[email protected] to comp.lang.forth on Sat Apr 25 13:22:05 2026

From Newsgroup: comp.lang.forth

[email protected] writes:

If you have a conflict, you rename the new offending definition,
as you do now.

How do you know when there is a conflict? We're talking about a hash collision, right? Are we supposed to guarantee that the hash function
won't change between interpreter versions and that sort of thing?

"As you do now": well, no; I've never used a Forth that faced this
issue. All the ones I've used have stored the entire name instead of
hashing. I thought (or at least hoped) that the different lossy
compression schemes from the early days were historical artifacts due to
the very small machines of the era. By the time of the Commodore 64,
those tricks were not needed.
--- Synchronet 3.21f-Linux NewsLink 1.2

From albert@[email protected] to comp.lang.forth on Sun Apr 26 14:05:52 2026

From Newsgroup: comp.lang.forth

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

[email protected] writes:

If you have a conflict, you rename the new offending definition,
as you do now.

How do you know when there is a conflict? We're talking about a hash >collision, right? Are we supposed to guarantee that the hash function
won't change between interpreter versions and that sort of thing?

You know there is a conflict because the message:
: aapx1 ; ISN'T UNIQUE \ because there was aapy1
You donot want a hash conflict, as the hash replaces the name.

"As you do now": well, no; I've never used a Forth that faced this

Yes you do encounter name collisions. See below.

issue. All the ones I've used have stored the entire name instead of >hashing. I thought (or at least hoped) that the different lossy
compression schemes from the early days were historical artifacts due to
the very small machines of the era. By the time of the Commodore 64,
those tricks were not needed.

You defined a constant SIZE, and then discovered that
the name was used in another part of the program. You then
redefine it with THINGO-SIZE or some such.
This doesn't change a bit if you use 3+last names. Collusion are
more probable, but the first FIG-Forth were usable.

If you have a SIZE and then you define size, you have a conflict
caused by case-insensitivity. You redefine the second size.
: SIZE ; ISN'T UNIQUE \ because there was size

How is this different?

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.
--- Synchronet 3.21f-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Thu May 21 10:28:17 2026

From Newsgroup: comp.lang.forth

On Sat, 18 Apr 2026 10:26:11 GMT
[email protected] (Anton Ertl) wrote:

[email protected] writes:

The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.

The users have voted with their feet: They usually have used the
"extremely wasteful" fig-Forth with the default settings (names with
up to 31-char) rather than using the "extremely impractical" option to
only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
that even Forth, Inc. switched from "impracticality" to
"wastefulness", as well as almost everyone else. The exception is
Chuck Moore, who continues with the "impractical" approach in
ColorForth. Even in the Forth universe, few people seem to use
ColorForth and I have not heard of systems that follow its approach to
names.

Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system):

here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
403AC9D0: link field (pointer to previous word in the same wordlist) 403AC9D8: header methods (pointer to a method table)
403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case

For more information, read

I did go ahead and change the LXF64 header to something more like the
gforth one! This is what it looks like:

\ offset length purpose
\ -24-8n 8+8n counted name aligned and patched with zeros n=0,1,2,3
\ -16 8 xt (xt token + xt native)
\ -8 4 link
\ -4 2 Tlen Token code length
\ -2 2 Nlen Native code length
\ 0 1 flag byte <- NT points here
\ 1 1 offset to name from NT
\ 2 2 unused
\ 4 4 pointer to translate-name
\ 8 Tlen token code

Your example of five becomes

align-h here-h 5 constant five here-h over - dump
000000007227A0 04 46 49 56 45 00 00 00 C0 27 72 00 30 68 42 00 .FIVE....'r.0hB. 000000007227B0 F8 D0 71 00 03 00 10 00 20 18 00 00 50 00 A0 00 ..q..... ...P... 000000007227C0 26 05 25 00 00 00 00 00 00 00 00 00 00 00 00 00 &.%.............

I have the counted name aligned and zero padded at the start.
This will allow to compare 8 bytes at a time

: NCOMP ( addr addr' - f) \ compare counted name strings strings 0= match
dup c@ 1+
0 ?do
over i + @ over i + @
<> if 2drop unloop true exit then
8 +loop 2drop false ;

I have checked forth-wordlist and 71.5% of all words will require only one comparison. 1.5% will require more then 2 comparisons.

The other interesting change I did was to put in a link to translate-name.
Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

I also introduced SET-TRANSLATOR that sets the translator of the
last defined word. This lets me define all state smart words
without state! S" illustrates this:

: [S"]
34 parse slit ; immediate

' ht-execute
:noname drop postpone [S"] ;
:noname drop [n'] [S"] lit, postpone ht-execute ;
create translate-s"
, , ,

: S"
34 parse dup >r pocket dup >r swap move r> r> ;

translate-s" set-translator

ht-execute executes the NT. [n'] returns the NT

C", TO, ACTION-OF, IS and S\" are implemented in similar ways.

Now the recognizers are starting to make good sense!

BR
Peter

@InProceedings{paysan&ertl19,
author = {Bernd Paysan and M. Anton Ertl},
title = {The new {Gforth} Header},
crossref = {euroforth19},
pages = {5--20},
url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
OPTnote = {refereed},
abstract = {The new Gforth header is designed to directly
implement the requirements of Forth-94 and
Forth-2012. Every header is an object with a fixed
set of fields (code, parameter, count, name, link)
and methods (\texttt{execute}, \texttt{compile,},
\texttt{(to)}, \texttt{defer@}, \texttt{does},
\texttt{name>interpret}, \texttt{name>compile},
\texttt{name>string}, \texttt{name>link}). The
implementation of each method can be changed
per-word (prototype-based object-oriented
programming). We demonstrate how to use these
features to implement optimization of constants,
\texttt{fvalue}, \texttt{defer}, \texttt{immediate},
\texttt{to} and other dual-semantics words, and
\texttt{synonym}.}
}

@Proceedings{euroforth19,
title = {35th EuroForth Conference},
booktitle = {35th EuroForth Conference},
year = {2019},
key = {EuroForth'19},
url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
}

There have been a few changes since that paper: The body address is
now used as nt and xt, and the "name length and flags" field has
gained another flag or two.

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From minforth@[email protected] to comp.lang.forth on Fri May 22 12:04:00 2026

From Newsgroup: comp.lang.forth

Am 21.05.2026 um 10:28 schrieb peter:

On Sat, 18 Apr 2026 10:26:11 GMT
[email protected] (Anton Ertl) wrote:

[email protected] writes:

The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.

The users have voted with their feet: They usually have used the
"extremely wasteful" fig-Forth with the default settings (names with
up to 31-char) rather than using the "extremely impractical" option to
only store the first n chars of a name (with n being configurable in
fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
that even Forth, Inc. switched from "impracticality" to
"wastefulness", as well as almost everyone else. The exception is
Chuck Moore, who continues with the "impractical" approach in
ColorForth. Even in the Forth universe, few people seem to use
ColorForth and I have not heard of systems that follow its approach to
names.

Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.

2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers,
however. The result is that gforth.fi takes 2.3MB on my system; for
comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).

One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.

Concerning header size, here's what we have in Gforth (on a 64-bit system): >>
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:

403AC9C0: Name padded with spaces at the front to align it to a cell boundary
403AC9C8: Name length and flags
403AC9D0: link field (pointer to previous word in the same wordlist)
403AC9D8: header methods (pointer to a method table)
403AC9E0: code field (contains the code address of docon in this case)
403AC9E8: body aka parameter field, contains the value in this case

For more information, read

I did go ahead and change the LXF64 header to something more like the
gforth one! This is what it looks like:

\ offset length purpose
\ -24-8n 8+8n counted name aligned and patched with zeros n=0,1,2,3
\ -16 8 xt (xt token + xt native)
\ -8 4 link
\ -4 2 Tlen Token code length
\ -2 2 Nlen Native code length
\ 0 1 flag byte <- NT points here
\ 1 1 offset to name from NT
\ 2 2 unused
\ 4 4 pointer to translate-name
\ 8 Tlen token code

Your example of five becomes

align-h here-h 5 constant five here-h over - dump
000000007227A0 04 46 49 56 45 00 00 00 C0 27 72 00 30 68 42 00 .FIVE....'r.0hB.
000000007227B0 F8 D0 71 00 03 00 10 00 20 18 00 00 50 00 A0 00 ..q..... ...P...
000000007227C0 26 05 25 00 00 00 00 00 00 00 00 00 00 00 00 00 &.%.............

I have the counted name aligned and zero padded at the start.
This will allow to compare 8 bytes at a time

: NCOMP ( addr addr' - f) \ compare counted name strings strings 0= match
dup c@ 1+
0 ?do
over i + @ over i + @
<> if 2drop unloop true exit then
8 +loop 2drop false ;

I have checked forth-wordlist and 71.5% of all words will require only one comparison. 1.5% will require more then 2 comparisons.

The other interesting change I did was to put in a link to translate-name. Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

I also introduced SET-TRANSLATOR that sets the translator of the
last defined word. This lets me define all state smart words
without state! S" illustrates this:

: [S"]
34 parse slit ; immediate

' ht-execute
:noname drop postpone [S"] ;
:noname drop [n'] [S"] lit, postpone ht-execute ;
create translate-s"
, , ,

: S"
34 parse dup >r pocket dup >r swap move r> r> ;

translate-s" set-translator

ht-execute executes the NT. [n'] returns the NT

C", TO, ACTION-OF, IS and S\" are implemented in similar ways.

Now the recognizers are starting to make good sense!

My approach from a different angle: minimize header size AND have
dual execution tokens for STATE-independent definitions

Header struct: <name> (lfa,nfa,ccfa,cfa)
<name> name string
lfa link field address to preceding header
nfa pointer to <name>
ccfa compile-time code field address
cfa execution-time code field address

Different word types are recognised by comparing cfa with ccfa:
ccfa empty: "normal" word
cfa empty: compile-only word
cfa=ccfa: immediate word

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat May 23 18:12:20 2026

From Newsgroup: comp.lang.forth

peter <[email protected]> writes:

I did go ahead and change the LXF64 header to something more like the
gforth one! This is what it looks like:

\ offset length purpose
\ -24-8n 8+8n counted name aligned and patched with zeros n=0,1,2,3
\ -16 8 xt (xt token + xt native)
\ -8 4 link
\ -4 2 Tlen Token code length
\ -2 2 Nlen Native code length
\ 0 1 flag byte <- NT points here
\ 1 1 offset to name from NT
\ 2 2 unused
\ 4 4 pointer to translate-name
\ 8 Tlen token code

...

The other interesting change I did was to put in a link to translate-name. >Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

In Gforth we did this by making the implementations of NAME>INTERPRET
and NAME>COMPILE word-specific:

Words with default compilation semantics have DEFAULT-NAME>COMP als implementation, immediate words have IMM>COMP as implementation, and
other words (e.g., S") have other implementations.

\ the actual implementation is a bit different, but this is the
\ easier-to-understand version.
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;

: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;

In Gforth translate-name does not differentiate between different
kinds of words; it always produces "nt translate-name" on success, and NAME>COMPILE takes care of the differences. My guess us that you do
it differently because you do not have NAME>COMPILE. Am I corrent?

I also introduced SET-TRANSLATOR that sets the translator of the
last defined word. This lets me define all state smart words
without state! S" illustrates this:

: [S"]
34 parse slit ; immediate

' ht-execute
:noname drop postpone [S"] ;
:noname drop [n'] [S"] lit, postpone ht-execute ;
create translate-s"
, , ,

: S"
34 parse dup >r pocket dup >r swap move r> r> ;

translate-s" set-translator

Interesting.

ht-execute executes the NT. [n'] returns the NT

So you have NTs. Do you have NT>COMPILE? If so, the differences
between default and immediate and other words should already be
implemented there.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Sat May 23 23:09:33 2026

From Newsgroup: comp.lang.forth

On Sat, 23 May 2026 18:12:20 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

I did go ahead and change the LXF64 header to something more like the >gforth one! This is what it looks like:

\ offset length purpose
\ -24-8n 8+8n counted name aligned and patched with zeros n=0,1,2,3
\ -16 8 xt (xt token + xt native)
\ -8 4 link
\ -4 2 Tlen Token code length
\ -2 2 Nlen Native code length
\ 0 1 flag byte <- NT points here
\ 1 1 offset to name from NT
\ 2 2 unused
\ 4 4 pointer to translate-name
\ 8 Tlen token code

...

The other interesting change I did was to put in a link to translate-name. >Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

In Gforth we did this by making the implementations of NAME>INTERPRET
and NAME>COMPILE word-specific:

Words with default compilation semantics have DEFAULT-NAME>COMP als implementation, immediate words have IMM>COMP as implementation, and
other words (e.g., S") have other implementations.

\ the actual implementation is a bit different, but this is the
\ easier-to-understand version.
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;

: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;

I studied your linked document and slides a understood it worked
something like that. Seeing your VT table gave me the idea to
put in a link to the translate record

In Gforth translate-name does not differentiate between different
kinds of words; it always produces "nt translate-name" on success, and NAME>COMPILE takes care of the differences. My guess us that you do
it differently because you do not have NAME>COMPILE. Am I corrent?

No I have also name>compile, but it is not used anymore

I also introduced SET-TRANSLATOR that sets the translator of the
last defined word. This lets me define all state smart words
without state! S" illustrates this:

: [S"]
34 parse slit ; immediate

' ht-execute
:noname drop postpone [S"] ;
:noname drop [n'] [S"] lit, postpone ht-execute ;
create translate-s"
, , ,

: S"
34 parse dup >r pocket dup >r swap move r> r> ;

translate-s" set-translator

Interesting.

ht-execute executes the NT. [n'] returns the NT

So you have NTs. Do you have NT>COMPILE? If so, the differences
between default and immediate and other words should already be
implemented there.

HT stands for header token. I started using that long before the NT
was introduced. They are of course the same.

name>compile was defined as:

: NAME>COMPILE ( nt -- w xt )
dup nt>flag c@ 64 and if ['] ht-execute else ['] ht-compile, then ;

Now it is

: NAME>COMPILE ( nt -- w xt )
dup nt>trans l@ cell+ @ ;

ht-compile, is defined as

: HT-COMPILE, ( ht -- )
dup nt>flag c@ 32 and
if ht-expand-macro exit then
ht-,call ;

If the word is a macro it is expanded otherwise an ordinary
call is compiled.
About 50% of all words are macros.

With the 3 translate-name-xxx all the flag testing is gone!
The right execution path is set at creation time.

The ability to set a specific translation record for the
state smart words comes as an extra benefit.

BR
Peter

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Sun May 24 10:07:09 2026

From Newsgroup: comp.lang.forth

On Sat, 23 May 2026 18:12:20 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

I did go ahead and change the LXF64 header to something more like the >gforth one! This is what it looks like:

\ offset length purpose
\ -24-8n 8+8n counted name aligned and patched with zeros n=0,1,2,3
\ -16 8 xt (xt token + xt native)
\ -8 4 link
\ -4 2 Tlen Token code length
\ -2 2 Nlen Native code length
\ 0 1 flag byte <- NT points here
\ 1 1 offset to name from NT
\ 2 2 unused
\ 4 4 pointer to translate-name
\ 8 Tlen token code

...

The other interesting change I did was to put in a link to translate-name. >Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

In Gforth we did this by making the implementations of NAME>INTERPRET
and NAME>COMPILE word-specific:

Words with default compilation semantics have DEFAULT-NAME>COMP als implementation, immediate words have IMM>COMP as implementation, and
other words (e.g., S") have other implementations.

\ the actual implementation is a bit different, but this is the
\ easier-to-understand version.
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;

: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;

I studied your linked document and slides a understood it worked
something like that. Seeing your VT table gave me the idea to
put in a link to the translate record

In Gforth translate-name does not differentiate between different
kinds of words; it always produces "nt translate-name" on success, and NAME>COMPILE takes care of the differences. My guess us that you do
it differently because you do not have NAME>COMPILE. Am I corrent?

No I have also name>compile, but it is not used anymore

I also introduced SET-TRANSLATOR that sets the translator of the
last defined word. This lets me define all state smart words
without state! S" illustrates this:

: [S"]
34 parse slit ; immediate

' ht-execute
:noname drop postpone [S"] ;
:noname drop [n'] [S"] lit, postpone ht-execute ;
create translate-s"
, , ,

: S"
34 parse dup >r pocket dup >r swap move r> r> ;

translate-s" set-translator

Interesting.

ht-execute executes the NT. [n'] returns the NT

So you have NTs. Do you have NT>COMPILE? If so, the differences
between default and immediate and other words should already be
implemented there.

HT stands for header token. I started using that long before the NT
was introduced. They are of course the same.

name>compile was defined as:

: NAME>COMPILE ( nt -- w xt )
dup nt>flag c@ 64 and if ['] ht-execute else ['] ht-compile, then ;

Now it is

: NAME>COMPILE ( nt -- w xt )
dup nt>trans l@ cell+ @ ;

ht-compile, is defined as

: HT-COMPILE, ( ht -- )
dup nt>flag c@ 32 and
if ht-expand-macro exit then
ht-,call ;

If the word is a macro it is expanded otherwise an ordinary
call is compiled.
About 50% of all words are macros.

With the 3 translate-name-xxx all the flag testing is gone!
The right execution path is set at creation time.

The ability to set a specific translation record for the
state smart words comes as an extra benefit.

To really integrate the recognizers well I made REC-NAME the
primary name finding function. Fnd-name is then defined as:

: FIND-NAME ( caddr u -- ht | 0)
rec-name dup if drop then ;

Translate-none returns a null pointer in my system.

BR
Peter

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.lang.forth on Mon May 25 13:34:48 2026

From Newsgroup: comp.lang.forth

peter <[email protected]> writes:

On Sat, 23 May 2026 18:12:20 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

The other interesting change I did was to put in a link to translate-name. >> >Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

In Gforth we did this by making the implementations of NAME>INTERPRET
and NAME>COMPILE word-specific:

Words with default compilation semantics have DEFAULT-NAME>COMP als
implementation, immediate words have IMM>COMP as implementation, and
other words (e.g., S") have other implementations.

\ the actual implementation is a bit different, but this is the
\ easier-to-understand version.
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;

: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;

I studied your linked document and slides a understood it worked
something like that. Seeing your VT table gave me the idea to
put in a link to the translate record

Nowadays we call the table HM, for header methods. VT is too generic.
In development Gforth, you can see the header methods for a word by
using .HM on its NT. E.g.:

``+ .hm
opt: $7FA3C4A363D8
to: n/a
extra: $0

int: default-name>int
comp: default-name>comp
string: named>string
link: named>link

: NAME>COMPILE ( nt -- w xt )
dup nt>trans l@ cell+ @ ;

That's interesting. Instead of defining TRANSLATE-NAME's compilation
action in terms of NAME>COMPILE, you put the differences between
different names into TRANSLATE-NAME, and implement NAME>COMPILE by
accessing the internals of TRANSLATE-NAME.

With the 3 translate-name-xxx all the flag testing is gone!

Yes, we also eliminated nearly all flags with the new header format.
We kept a compile-only flag (for warning about compile-only words),
and added an obsolete flag (for warning about words that are going to
be removed from a future Gforth), because warnings do not introduce
complicated control flow.

The ability to set a specific translation record for the
state smart words comes as an extra benefit.

I guess you mean words with non-immediate non-default compilation
semantics, and yes, being able to tell the Forth system how it should
treat such a word at text interpretation time avoids the unpleasant
surprises that STATE-smart immediate words (that try to figure out at
run-time by inspecting STATE what they should do, but the STATE at
run-time does not provide information about whether their
interpretation semantics or compilation semantics is performed).

To really integrate the recognizers well I made REC-NAME the

primary name finding function. Fnd-name is then defined as:

: FIND-NAME ( caddr u -- ht | 0)
rec-name dup if drop then ;

Translate-none returns a null pointer in my system.

Yes, we have written about the idea of unifying recognizers and
wordlists [paysan20]. Development Gforth implements this idea. E.g.,
if you do

s" dup" forth-wordlist execute

you find a translation on the stack, consisting of the nt of DUP and
of TRANSLATE-NAME. The implementations of FIND-NAME-IN and FIND-NAME
are:

: find-name-in ( c-addr u wid -- nt | 0 ) \ gforth
execute translate-none = IF 0 THEN ;

: find-name ( c-addr u -- nt | 0 ) \ gforth
['] rec-name find-name-in ;

The latter makes use of the fact that the recognizer sequence in
REC-NAME can be treated as wordlist. That relies on the fact that
only wordlists are in the search order (and the search-order is in the
deferred word REC-NAME). If you put, e.g., REC-NUMBER into the search
order, the result will be that FIND-NAME will push a single-cell or
double-cell number when you pass it something that is recognized by
that REC-NUMBER. But that's the usual fare in Forth, if you hold it
wrong, it produces the wrong result.

The current proposal proposes the nested recognizers, but does not
require wordlists to work as recognizers.

@InProceedings{paysan20,
author = {Bernd Paysan and M. Anton Ertl},
title = {The Grand Recognizer Unification},
crossref = {euroforth20},
pages = {19--22},
url = {http://www.euroforth.org/ef20/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef20/papers/paysan-slides.pdf},
video = {https://www.youtube.com/watch?v=VUi6uYqIbTI},
OPTnote = {not refereed},
abstract = {There is an obvious similarity between the search
order and a recognizer sequence, which has led to
similarities in proposed words (e.g.,
\code{get-recognizer} is modeled on
\code{get-order}). By turning word lists into
recognizers, we unify these concepts. We also turn
recognizer sequences (and be extension the search
order) into a recognizer, which allows nestable
recognizer sequences and wordlist sequences in the
search order. The implementation becomes simpler,
too.}
}
@Proceedings{euroforth20,
title = {36th EuroForth Conference},
booktitle = {36th EuroForth Conference},
year = {2020},
key = {EuroForth'20},
url = {http://www.euroforth.org/ef20/papers/proceedings.pdf}
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
--- Synchronet 3.22a-Linux NewsLink 1.2

From peter@[email protected] to comp.lang.forth on Wed May 27 10:42:43 2026

From Newsgroup: comp.lang.forth

On Mon, 25 May 2026 13:34:48 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

On Sat, 23 May 2026 18:12:20 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

The other interesting change I did was to put in a link to translate-name.
Each word now knows how to interpret, compile and postpone itself!

I have now 3 standard word types
translate-name
translate-name-immediate
translate-name-macro

This takes away all checks of the flag and following conditionals.
I could actually remove the flag byte.

In Gforth we did this by making the implementations of NAME>INTERPRET
and NAME>COMPILE word-specific:

Words with default compilation semantics have DEFAULT-NAME>COMP als
implementation, immediate words have IMM>COMP as implementation, and
other words (e.g., S") have other implementations.

\ the actual implementation is a bit different, but this is the
\ easier-to-understand version.
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;

: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;

I studied your linked document and slides a understood it worked
something like that. Seeing your VT table gave me the idea to
put in a link to the translate record

Nowadays we call the table HM, for header methods. VT is too generic.
In development Gforth, you can see the header methods for a word by
using .HM on its NT. E.g.:

``+ .hm
opt: $7FA3C4A363D8
to: n/a
extra: $0

int: default-name>int
comp: default-name>comp
string: named>string
link: named>link

: NAME>COMPILE ( nt -- w xt )
dup nt>trans l@ cell+ @ ;

That's interesting. Instead of defining TRANSLATE-NAME's compilation
action in terms of NAME>COMPILE, you put the differences between
different names into TRANSLATE-NAME, and implement NAME>COMPILE by
accessing the internals of TRANSLATE-NAME.

With the 3 translate-name-xxx all the flag testing is gone!

Yes, we also eliminated nearly all flags with the new header format.
We kept a compile-only flag (for warning about compile-only words),
and added an obsolete flag (for warning about words that are going to
be removed from a future Gforth), because warnings do not introduce complicated control flow.

The ability to set a specific translation record for the
state smart words comes as an extra benefit.

I guess you mean words with non-immediate non-default compilation
semantics, and yes, being able to tell the Forth system how it should
treat such a word at text interpretation time avoids the unpleasant
surprises that STATE-smart immediate words (that try to figure out at run-time by inspecting STATE what they should do, but the STATE at
run-time does not provide information about whether their
interpretation semantics or compilation semantics is performed).

To really integrate the recognizers well I made REC-NAME the

primary name finding function. Fnd-name is then defined as:

: FIND-NAME ( caddr u -- ht | 0)
rec-name dup if drop then ;

Translate-none returns a null pointer in my system.

Yes, we have written about the idea of unifying recognizers and
wordlists [paysan20]. Development Gforth implements this idea. E.g.,
if you do

s" dup" forth-wordlist execute

you find a translation on the stack, consisting of the nt of DUP and
of TRANSLATE-NAME. The implementations of FIND-NAME-IN and FIND-NAME
are:

: find-name-in ( c-addr u wid -- nt | 0 ) \ gforth
execute translate-none = IF 0 THEN ;

: find-name ( c-addr u -- nt | 0 ) \ gforth
['] rec-name find-name-in ;

The latter makes use of the fact that the recognizer sequence in
REC-NAME can be treated as wordlist. That relies on the fact that
only wordlists are in the search order (and the search-order is in the deferred word REC-NAME). If you put, e.g., REC-NUMBER into the search
order, the result will be that FIND-NAME will push a single-cell or double-cell number when you pass it something that is recognized by
that REC-NUMBER. But that's the usual fare in Forth, if you hold it
wrong, it produces the wrong result.

My first reaction when the idea of recognizers were presented was that
they should be combined with word-lists and search order. I have never
followed up on that idea. Interesting that you are doing it.

Instead in the latest version rec-name has been slim lined to avoid
unnecessary work. In LXF64 I store the names in uppercase. At find time
the parsed string also needs to be upper cased and a hash calculated.
I now do this one time and use the result for all comparisons in the
different word-lists rec-name is defined as

: REC-NAME ( c-addr n -- 0| ht translator )
dup 0= if nip exit then \ empty string
#order @ 0= if 2drop 0 exit then \ empty order
copy-upcase-hash \ hash namestring in namebuf
#order @

begin
dup
while
1- >r
dup \ hash hash R:ordernr
r@ cells context + @ swap \ hash wid hash
hash>bucket @ namebuf swap \ hash name bucket
search-bucket2 \ hash nt|0
dup
if r>drop nip dup nt>trans l@ exit then
drop r>
repeat
nip ;

copy-upcase-hash takes care of the work in just one loop and places
the string in namebuf. Earlier it was 3 passes over the string!

hash>bucket calculates the bucket to search from the hash and wid.
Wordlists can have different number of buckets

This saved 2-3 ms in recompiling the whole Forth system!
Hardly measurable but still a 5% improvement!

The current proposal proposes the nested recognizers, but does not
require wordlists to work as recognizers.

I checked the text and slides but did not like everything that was
presented. I then downloaded a fresh tarball from gforth.org and
followed the instructions to install it on a debian WSL instance.
It was not a good idea! running install-deps installed 150 packages
totaling 640 MB! Looked like mostly graphics stuff. I run only slim
console only installations of Linux!

I think it should have warned me before starting the installation!

Despite I saw install-deps compile swig with forth support configure
did not find the freshly compiled copy and swig support was not avalible

The gforth binary runs just fine and I could confirm that the
implementation was different and improved.

I think some instructions on how to compile without all this bloat
is needed. It looks mainly to be used for producing documentation
in different formats.

BR
Peter

@InProceedings{paysan20,
author = {Bernd Paysan and M. Anton Ertl},
title = {The Grand Recognizer Unification},
crossref = {euroforth20},
pages = {19--22},
url = {http://www.euroforth.org/ef20/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef20/papers/paysan-slides.pdf},
video = {https://www.youtube.com/watch?v=VUi6uYqIbTI},
OPTnote = {not refereed},
abstract = {There is an obvious similarity between the search
order and a recognizer sequence, which has led to
similarities in proposed words (e.g.,
\code{get-recognizer} is modeled on
\code{get-order}). By turning word lists into
recognizers, we unify these concepts. We also turn
recognizer sequences (and be extension the search
order) into a recognizer, which allows nestable
recognizer sequences and wordlist sequences in the
search order. The implementation becomes simpler,
too.}
}
@Proceedings{euroforth20,
title = {36th EuroForth Conference},
booktitle = {36th EuroForth Conference},
year = {2020},
key = {EuroForth'20},
url = {http://www.euroforth.org/ef20/papers/proceedings.pdf}
}

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,123
Nodes:	10 (0 / 10)
Uptime:	38:08:42
Calls:	14,371
Files:	186,380
D/L today:	5,624 files (1,634M bytes)
Messages:	2,540,681

ciforth model

Who's Online

System Info