• ciforth model

    From albert@[email protected] to comp.lang.forth on Thu Apr 16 15:38:47 2026
    From Newsgroup: comp.lang.forth

    The cimodel is based on a memory map that is well defined.

    The most important difference from all preceding models is the
    central role of the dictionary entry concept.
    The dictionary is subdivided into mutually exclusive entries
    with no other data.

    Memory model

    - os system header (or disk boot code)
    - boot code
    - bespoke dictionary
    - dictionary entries
    - header
    - memory possessed by the previous header
    HERE
    - free dictionary
    - task frame
    - buffers

    There is a HIP (high level interpreter pointer) that points to high
    level code. NEXT executes the dictionary entry pointed to by HIP.

    Only the part up till HERE is present in the executable.

    os system header is the obligatory os dependant code to make the
    file loadable by the os. (Or boot from disk.)
    Possibly definitions of segments are to be found here.

    boot code
    The boot code contains a safeguard of the external interface,
    for example the parameters that are passed to an executable.
    Then it initializes the data stack pointer DSP , the return stack pointer RSP, and the high level interpreter pointer HIP, and possibly
    other register initialisation for special purposes.
    It imposes a structure on the task frame and possibly buffers based on pertinent data that is stored in so called user variables.
    (In order to save the system, user variables are changed
    to reflect the new state of the system.)
    The HIP is made to point to the first command
    of the word COLD. Then this first command is executed,
    which is indicated by "doing NEXT".

    Bespoke dictionary
    The reserved part of the dictionary is subdivided in dictionary
    entries.
    A d.e. is
    - name (string, i.e. possibly variable length)
    - Fixed size field, normally the cell-size of the current Forth
    - C Code pointer to what can be executed.
    - D Data pointer to what can be fetched or stored.
    - F A bit array of flags, e.g. immediate flag.
    - L Link information to other dictionary entries.
    - N Contains a name or points to a name
    And possibly optional CELLS like
    - S Points to source code
    - X Whatever serves
    - Data possessed by the preceding entry.
    This data may be machine code, interpreted code or plain data.

    Free dictionary
    The free dictionary can be allocated and then becomes bespoke.
    Strings, functions, buffers, and headers all can be allocated.

    Task frame
    A task frame consist of a data stack, terminal input buffer,
    return stack and user variables.
    This is called thusly because it is replicated if multiple
    task are running concurrently.
    Note that the data stack can run down to HERE, like in the FIG-model.

    The buffers
    The buffers are 1 Kbyte buffers. They are used for a block
    system, that plays an important role as a library.
    They are locked and unlocked while in use and can serve
    for nested includes for files, as well.

    Indirect threading model.
    A pointer to the first fixed field of a header is called dea,
    dictionary entry address.
    Indirect threaded code means that the program counter is loaded
    with the C field of a dea, so effecting an indirect jump to
    the C-field.

    An entry is identified by the handle. All manipulation and properties
    of "definitions", "words", "functions", "buffers" is referring to by
    this address.
    Instead of having a plethora of relation between different fields,
    one finds properties of a dea by passing through the fields.


    A dea containing a low level definition has a pointer to machine code
    in C.

    A variable or buffer contains code that returns the content
    of the D field pointing to storage (in general directly
    after the header.)
    A constant contains code that returns the content of the
    D field, where there is no implication of that being a pointer.

    A high level definition contains a pointer to a specific
    machine code called DOCOL in C.
    The D pointer points to an area (in general directly after
    the header) where a sequence of dea's is stored.
    Execution the definition means executing these dea's in order.

    An object (CREATE DOES> construct) contains a pointer to specific
    machine code called DODOES in C.
    The D pointer points to an area where a pointer to the DOES>
    code resides, followed by a data area.

    ANNEX
    In the 64 bits era a string constant is
    - a cell containing the length in bytes
    - string itself, not necessarily one char per byte.
    - alignment to 8 bytes.
    All fields are one cell.

    However we could squeeze for 16 bits, without logically affecting the model.

    code field: one byte, an offset to a code area of 256 bytes.
    data field: 16 bit pointer
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    flag field: hidden in the name
    link field: 256 bit offset, d.e. are at most 256 byte long.
    The total of 8 bytes will put even the original fig model to shame.
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From dxf@[email protected] to comp.lang.forth on Fri Apr 17 11:44:49 2026
    From Newsgroup: comp.lang.forth

    On 16/04/2026 11:38 pm, [email protected] wrote:
    ...
    However we could squeeze for 16 bits, without logically affecting the model.

    code field: one byte, an offset to a code area of 256 bytes.
    data field: 16 bit pointer
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    flag field: hidden in the name
    link field: 256 bit offset, d.e. are at most 256 byte long.
    The total of 8 bytes will put even the original fig model to shame.

    Don't know who was first but Fig-forth's variable length names is something that Forth Inc and pretty much everyone adopted. Moore attempted to defend
    '3 chars plus count' but to no avail. That ship had sailed.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 17 00:27:21 2026
    From Newsgroup: comp.lang.forth

    [email protected] writes:
    However we could squeeze for 16 bits, without logically affecting the model.

    These days for such a constrained target, it's probably best to tether
    from a bigger machine.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@[email protected] (Anton Ertl) to comp.lang.forth on Fri Apr 17 07:29:44 2026
    From Newsgroup: comp.lang.forth

    dxf <[email protected]> writes:
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    ...
    Don't know who was first but Fig-forth's variable length names is something >that Forth Inc and pretty much everyone adopted. Moore attempted to defend >'3 chars plus count' but to no avail. That ship had sailed.

    Looking at the traditional length+3 chars and
    [email protected]'s 3 first and last, at least one pair of
    words in Forth-94 conflicts on both systems, and WORDS could show them
    as REA?????? and REA*E, respectively. It would be interesting to
    determine (say, by checking the words from an existing Forth system),
    which scheme produces more conflicts.

    Moore continues with this approach in Color Forth, but he uses some
    compression approach to usually store more characters in the number of
    bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
    bits, 18 bits, and 32 bits on different hardware). I don't remember
    if he stores the length.

    Another option would be to store a hash value that is computed using
    all characters in the name. If a good hash function is used, the
    probability of a conflict is relatively small with, e.g. 4000 names in
    a wordlist (about the number of names that Gforth has in the Forth
    wordlist), and even the 28 bits that [email protected]
    provides. The probability of no conflict is approximately

    ((2^28-1)/(2^28))^((4000*3999)/2)

    i.e.

    1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.

    The result is 0.97, i.e., there is a 3% probability of conflict for
    these numbers.

    The disadvantage of this approach is that WORDS or SEE cannot even
    show the little about the name that Chuck Moore's approaches or [email protected]'s approach shows. But then, if you are so
    pressed for memory that you use one of these approaches, why not also
    save the memory for WORDS and SEE?

    Another disadvantage is that the system cannot tell if a redefinition
    warning comes from a hash conflict or from the name actually being
    redefined; but it shares this disadvantage with all approaches that do
    not store the full name.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Fri Apr 17 12:10:34 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    dxf <[email protected]> writes:
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    ...
    Don't know who was first but Fig-forth's variable length names is something >>that Forth Inc and pretty much everyone adopted. Moore attempted to defend >>'3 chars plus count' but to no avail. That ship had sailed.

    Looking at the traditional length+3 chars and
    [email protected]'s 3 first and last, at least one pair of
    words in Forth-94 conflicts on both systems, and WORDS could show them
    as REA?????? and REA*E, respectively. It would be interesting to
    determine (say, by checking the words from an existing Forth system),
    which scheme produces more conflicts.

    The argument was that an extreme impractical Forth can be implemented
    with this model as guideline, to counter the argument of extreme waste
    that I expected.

    This is a realistic header in this model.
    The name takes 24 bytes or 3 cells, a pointer to an area, a preceding count
    and a 1 byte area padded to 8 bytes.


    2513 # *********
    2514 # * + *
    2515 # *********
    2516 #
    2517 21c3 00000000 .balign 8,0x00
    2517 00
    2518 N_PLUS: # Name string
    2519 21c8 01000000 .quad 1 # Name string
    2519 00000000 # Name string
    2520 21d0 2B .ASCII "+" # Name string
    2521 21d1 00000000 .balign 8,0x00 # Name string
    2521 000000
    2522 PLUS: # 0x21D8 is the handle
    2523 21d8 00000000 .quad X_PLUS # code
    2523 00000000
    2524 21e0 00000000 .quad PLUS+HEADSIZE # data ignored

    2524 00000000
    2525 21e8 00000000 .quad 0x0 # flags, empty
    2525 00000000
    2526 21f0 00000000 .quad ZLESS # link pointer
    2526 00000000
    2527 21f8 00000000 .quad N_PLUS # points to name
    2527 00000000
    2528 2200 00000000 .quad 0 # source field
    2528 00000000
    2529 2208 00000000 .quad 0 # extra field (spare)
    2529 00000000
    2530
    2531 X_PLUS:
    2532
    2533 2210 58 POP %RAX #(S1) <- (S1) + (S2)
    2534 2211 5B POP %RBX
    2535 2212 4801D8 ADD %RAX,%RBX
    2536 2215 50 PUSH %RAX
    2537 2216 48AD LODSQ # NEXT
    2538 2218 FF20 JMP QWORD PTR[%RAX]
    2539

    <SNIP>

    Print the name for + :
    HEX 21D8 >NFA $@ TYPE

    A total of 10 cells for the header alone.
    Who cares?

    lina+ -a
    AMDX86 ciforth beta 2026Apr12
    WANT UNUSED
    OK
    UNUSED .
    134221795712 OK


    - anton

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Fri Apr 17 12:13:20 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Paul Rubin <[email protected]d> wrote:
    [email protected] writes:
    However we could squeeze for 16 bits, without logically affecting the model.

    These days for such a constrained target, it's probably best to tether
    from a bigger machine.

    What was the argument about? See my answer to Anton Ertl.

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sat Apr 18 10:26:11 2026
    From Newsgroup: comp.lang.forth

    [email protected] writes:
    The argument was that an extreme impractical Forth can be implemented
    with this model as guideline, to counter the argument of extreme waste
    that I expected.

    The users have voted with their feet: They usually have used the
    "extremely wasteful" fig-Forth with the default settings (names with
    up to 31-char) rather than using the "extremely impractical" option to
    only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
    that even Forth, Inc. switched from "impracticality" to
    "wastefulness", as well as almost everyone else. The exception is
    Chuck Moore, who continues with the "impractical" approach in
    ColorForth. Even in the Forth universe, few people seem to use
    ColorForth and I have not heard of systems that follow its approach to
    names.

    Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

    1) How the threaded code relates to the source code, not just where
    each definition starts, but also where words are used.

    2) How the machine code addresses relate to what is COMPILE,d, to get
    proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

    Both sets of informations are in big tables that are stored
    out-of-line, not with the headers, and each takes about as much space
    as the inline stuff (headers, threaded code, other bodies); the
    information where each definition is defined is stored in the headers,
    however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
    code of 465671 bytes for gforth-fast and 892251 for gforth (also
    out-of-line, both not in the image).

    One may consider this wasteful, but I consider it good use of the RAM
    that our machines have; my PC from 1993 had 16MB, the one from 2015
    16GB, my current one 64GB.

    Concerning header size, here's what we have in Gforth (on a 64-bit system):

    here 5 constant five here over - dump
    403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........ 403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U.. 403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

    I.e., 6 cells for a word with a name that fits in one cell, and one
    cell of body. The cells are:

    403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
    403AC9D0: link field (pointer to previous word in the same wordlist)
    403AC9D8: header methods (pointer to a method table)
    403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case

    For more information, read

    @InProceedings{paysan&ertl19,
    author = {Bernd Paysan and M. Anton Ertl},
    title = {The new {Gforth} Header},
    crossref = {euroforth19},
    pages = {5--20},
    url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
    url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
    video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
    OPTnote = {refereed},
    abstract = {The new Gforth header is designed to directly
    implement the requirements of Forth-94 and
    Forth-2012. Every header is an object with a fixed
    set of fields (code, parameter, count, name, link)
    and methods (\texttt{execute}, \texttt{compile,},
    \texttt{(to)}, \texttt{defer@}, \texttt{does},
    \texttt{name>interpret}, \texttt{name>compile},
    \texttt{name>string}, \texttt{name>link}). The
    implementation of each method can be changed
    per-word (prototype-based object-oriented
    programming). We demonstrate how to use these
    features to implement optimization of constants,
    \texttt{fvalue}, \texttt{defer}, \texttt{immediate},
    \texttt{to} and other dual-semantics words, and
    \texttt{synonym}.}
    }

    @Proceedings{euroforth19,
    title = {35th EuroForth Conference},
    booktitle = {35th EuroForth Conference},
    year = {2019},
    key = {EuroForth'19},
    url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
    }

    There have been a few changes since that paper: The body address is
    now used as nt and xt, and the "name length and flags" field has
    gained another flag or two.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From peter@[email protected] to comp.lang.forth on Sat Apr 18 18:11:48 2026
    From Newsgroup: comp.lang.forth

    On Sat, 18 Apr 2026 10:26:11 GMT
    [email protected] (Anton Ertl) wrote:
    [email protected] writes:
    The argument was that an extreme impractical Forth can be implemented
    with this model as guideline, to counter the argument of extreme waste
    that I expected.

    The users have voted with their feet: They usually have used the
    "extremely wasteful" fig-Forth with the default settings (names with
    up to 31-char) rather than using the "extremely impractical" option to
    only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
    that even Forth, Inc. switched from "impracticality" to
    "wastefulness", as well as almost everyone else. The exception is
    Chuck Moore, who continues with the "impractical" approach in
    ColorForth. Even in the Forth universe, few people seem to use
    ColorForth and I have not heard of systems that follow its approach to
    names.

    Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:

    1) How the threaded code relates to the source code, not just where
    each definition starts, but also where words are used.

    2) How the machine code addresses relate to what is COMPILE,d, to get
    proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

    Both sets of informations are in big tables that are stored
    out-of-line, not with the headers, and each takes about as much space
    as the inline stuff (headers, threaded code, other bodies); the
    information where each definition is defined is stored in the headers, however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
    code of 465671 bytes for gforth-fast and 892251 for gforth (also
    out-of-line, both not in the image).

    One may consider this wasteful, but I consider it good use of the RAM
    that our machines have; my PC from 1993 had 16MB, the one from 2015
    16GB, my current one 64GB.

    Concerning header size, here's what we have in Gforth (on a 64-bit system):

    here 5 constant five here over - dump
    403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
    403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
    403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

    I.e., 6 cells for a word with a name that fits in one cell, and one
    cell of body. The cells are:

    403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
    403AC9D0: link field (pointer to previous word in the same wordlist) 403AC9D8: header methods (pointer to a method table)
    403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case

    I like the idea to have the name at the beginning. I might try that also.
    In ntf64/lxf64 I have a compact header! Your example becomes
    n' five 16 - 32 dump
    0000000071EFF0 08 F0 71 00 30 70 42 00 A8 EF 71 00 03 00 10 00 ..q.0pB...q..... 0000000071F000 A4 46 49 56 45 00 00 00 26 05 25 00 00 00 00 00 .FIVE...&.%..... 71EFF0: xt to token code
    71EFF4: xt to native code
    71EFF8: link field
    71EFFC: lenght of token code
    71EFFE: lenght of native code
    71F000: flags (3 bits) and count (5 bits)
    71F001: name zero extended to 8 byte alignment
    71F008: token code
    Compilation is first to token code then this is sent to the native code generator
    SEE decompiles the token code and SEEA the native code
    Code Headers and Data have separate memory regions
    A constant have no data associated to it.
    The constant is stored in the code
    five in this example is a macro and will be inlined if called from
    another definition.
    see five macro
    Address OP Instruction
    0x71F008 26 05 LIT1 5
    0x71F00A 25 RET
    3 bytes, 2 instructions
    ok
    seea five
    0x427030 48895DF8 mov qword [rbp-0x8], rbx
    0x427034 48C7C305000000 mov rbx, 0x5
    0x42703B 488D6DF8 lea rbp, [rbp-0x8]
    0x42703F C3 ret
    16 bytes, 4 instructions
    ticking a word gives a double xt
    ' five h. $0042�7030�0071�F008
    This way I can store an xt and both the token and native code can use it.
    It works as I compile the executable to be loaded at 0x400000.
    BR
    Peter
    For more information, read

    @InProceedings{paysan&ertl19,
    author = {Bernd Paysan and M. Anton Ertl},
    title = {The new {Gforth} Header},
    crossref = {euroforth19},
    pages = {5--20},
    url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
    url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
    video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
    OPTnote = {refereed},
    abstract = {The new Gforth header is designed to directly
    implement the requirements of Forth-94 and
    Forth-2012. Every header is an object with a fixed
    set of fields (code, parameter, count, name, link)
    and methods (\texttt{execute}, \texttt{compile,},
    \texttt{(to)}, \texttt{defer@}, \texttt{does},
    \texttt{name>interpret}, \texttt{name>compile},
    \texttt{name>string}, \texttt{name>link}). The
    implementation of each method can be changed
    per-word (prototype-based object-oriented
    programming). We demonstrate how to use these
    features to implement optimization of constants,
    \texttt{fvalue}, \texttt{defer}, \texttt{immediate},
    \texttt{to} and other dual-semantics words, and
    \texttt{synonym}.}
    }

    @Proceedings{euroforth19,
    title = {35th EuroForth Conference},
    booktitle = {35th EuroForth Conference},
    year = {2019},
    key = {EuroForth'19},
    url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
    }

    There have been a few changes since that paper: The body address is
    now used as nt and xt, and the "name length and flags" field has
    gained another flag or two.

    - anton
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Sat Apr 18 20:57:21 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:

    Concerning memory consumption, Gforth's development version includes
    quite a bit of meta-information for two purposes:

    1) How the threaded code relates to the source code, not just where
    each definition starts, but also where words are used.

    2) How the machine code addresses relate to what is COMPILE,d, to get
    proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

    Both sets of informations are in big tables that are stored
    out-of-line, not with the headers, and each takes about as much space
    as the inline stuff (headers, threaded code, other bodies); the
    information where each definition is defined is stored in the headers, >however. The result is that gforth.fi takes 2.3MB on my system; for >comparison, the inline dictionary stuff is 686464 bytes, the native
    code of 465671 bytes for gforth-fast and 892251 for gforth (also
    out-of-line, both not in the image).

    In view that ctags understands Forth code this seems to be a duplicate
    effort.
    ctags --lang=forth *.frt
    The advantage is that you can use emacs (or other sophisticated
    editors) to go to a function in a familiar way that you were used
    to in other languages too.
    For those not familiar with ctags, in additions to definitions
    it finds also references. It also is blindingly fast.
    Under 100 mS for hundreds of files.


    One may consider this wasteful, but I consider it good use of the RAM
    that our machines have; my PC from 1993 had 16MB, the one from 2015
    16GB, my current one 64GB.

    Concerning header size, here's what we have in Gforth (on a 64-bit system):

    here 5 constant five here over - dump
    403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
    403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
    403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

    I.e., 6 cells for a word with a name that fits in one cell, and one
    cell of body. The cells are:

    This was exactly my point.

    - anton
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@[email protected] (Anton Ertl) to comp.lang.forth on Sun Apr 19 11:08:26 2026
    From Newsgroup: comp.lang.forth

    [email protected] writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:

    Concerning memory consumption, Gforth's development version includes
    quite a bit of meta-information for two purposes:

    1) How the threaded code relates to the source code, not just where
    each definition starts, but also where words are used.

    This is used for making backtraces more informative.

    2) How the machine code addresses relate to what is COMPILE,d, to get >>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

    3) There is also the where table that records where each word is
    actually used in the loaded source code, whether there is threaded
    code for it or not; it records interpretive use as well as immediate
    words where the threaded code is for a different word, if there is
    threaded code at all.

    Both sets of informations are in big tables that are stored
    out-of-line, not with the headers, and each takes about as much space
    as the inline stuff (headers, threaded code, other bodies); the
    information where each definition is defined is stored in the headers, >>however. The result is that gforth.fi takes 2.3MB on my system; for >>comparison, the inline dictionary stuff is 686464 bytes, the native
    code of 465671 bytes for gforth-fast and 892251 for gforth (also >>out-of-line, both not in the image).

    In view that ctags understands Forth code this seems to be a duplicate >effort.
    ctags --lang=forth *.frt

    I have run ctags and etags with this option on the files from the Gray directory, and both do not find any of the definitions of TERM (which
    exist in calc.fs and oberon.fs); that's probably because they have
    been defined with a user-defined defining word.

    Due to this shortcoming of etags (and ctags), gforth has included
    etags.fs since very early, which really understands Forth code,
    because it hooks into the Gforth system and records a tag whenever a
    named word is defined. I dimly remember that we also have done ctags
    support (for vi users), but do not find anything about it at the
    moment.

    The advantage is that you can use emacs (or other sophisticated
    editors) to go to a function in a familiar way that you were used
    to in other languages too.

    For a long time, I thought that etags.fs is sufficient and we do not
    need to add LOCATE to Gforth, but once we implemented LOCATE, I found
    that I use it much more often than M-. (forth-find-tag).

    For those not familiar with ctags, in additions to definitions
    it finds also references. It also is blindingly fast.
    Under 100 mS for hundreds of files.

    So you may be claiming that ctags covers the job of the where table.
    I do not see how to achive that. ctags has an option --cxref, but it
    just outputs the definitions in a different format. E.g., when I say

    ctags --lang=forth --cxref *.fs

    it shows:

    ...
    disjoint? 187 gray.fs : disjoint?
    empty 105 gray.fs : empty
    ...

    which shows the definitions of these words, but not the uses. And it
    also does not show the uses of DUP. By contrast, if I include
    gray.fs, and then say WHERE DUP, I get the following:

    [... 1128 lines of DUP uses in other files ...]
    gray.fs:85:2: dup @ = ; 1128
    gray.fs:88:2: dup ! ; 1129
    gray.fs:112:2: dup cells/set ! 1130
    gray.fs:120:3: dup @ , 1131
    gray.fs:295:18: source-location dup 2@ swap cr type 1132
    gray.fs:354:2: dup follow-set @ subset? 0= \ would everything stay the same1133
    gray.fs:357:22: follow-set @ union dup follow-set ! 1134
    gray.fs:385:2: dup pass2 1135
    gray.fs:460:19: operand1 compute dup if 1136
    gray.fs:499:2: dup operand1 propagate 1137

    with the DUP use being highlighted. The where table, which consumes
    quite a bit of memory (827_776 bytes in gforth.fi for a 64-bit
    system), contains that information. I do not see anything in the
    ctags/etags manual that provides this functionality. So the LOCATE
    information (a cell for each dictionary entry) may be seen as
    duplicate information, but the WHERE information does not duplicate
    anything.

    One may consider this wasteful, but I consider it good use of the RAM
    that our machines have; my PC from 1993 had 16MB, the one from 2015
    16GB, my current one 64GB.

    Concerning header size, here's what we have in Gforth (on a 64-bit system): >>
    here 5 constant five here over - dump
    403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
    403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
    403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

    I.e., 6 cells for a word with a name that fits in one cell, and one
    cell of body. The cells are:

    This was exactly my point.

    Your point was that Gforth uses 6 cells for a word with a name that
    fits in a cell and where the body takes one cell?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Mon Apr 20 13:39:34 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    [email protected] writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:

    Concerning memory consumption, Gforth's development version includes >>>quite a bit of meta-information for two purposes:

    1) How the threaded code relates to the source code, not just where
    each definition starts, but also where words are used.

    This is used for making backtraces more informative.

    2) How the machine code addresses relate to what is COMPILE,d, to get >>>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.

    3) There is also the where table that records where each word is
    actually used in the loaded source code, whether there is threaded
    code for it or not; it records interpretive use as well as immediate
    words where the threaded code is for a different word, if there is
    threaded code at all.

    There is no argument that the gforth supplies more functionality.

    <SNIP>
    I have run ctags and etags with this option on the files from the Gray >directory, and both do not find any of the definitions of TERM (which
    exist in calc.fs and oberon.fs); that's probably because they have
    been defined with a user-defined defining word.

    This is a weak point of ctags. It has not conceived with an extendable
    language in mind.

    editors) to go to a function in a familiar way that you were used
    to in other languages too.

    For a long time, I thought that etags.fs is sufficient and we do not
    need to add LOCATE to Gforth, but once we implemented LOCATE, I found
    that I use it much more often than M-. (forth-find-tag).

    For those not familiar with ctags, in additions to definitions
    it finds also references. It also is blindingly fast.
    Under 100 mS for hundreds of files.

    So you may be claiming that ctags covers the job of the where table.
    I do not see how to achive that. ctags has an option --cxref, but it
    just outputs the definitions in a different format. E.g., when I say


    <SNIP>
    ctags --lang=forth --cxref *.fs
    <SNIP>
    with the DUP use being highlighted. The where table, which consumes
    quite a bit of memory (827_776 bytes in gforth.fi for a 64-bit
    system), contains that information. I do not see anything in the
    ctags/etags manual that provides this functionality. So the LOCATE >information (a cell for each dictionary entry) may be seen as
    duplicate information, but the WHERE information does not duplicate
    anything.

    I rarely use cross references. From my editor the cross reference
    of ctags suffice. No argument again that gforth facilities is more comprehensive.

    One may consider this wasteful, but I consider it good use of the RAM >>>that our machines have; my PC from 1993 had 16MB, the one from 2015
    16GB, my current one 64GB.

    Concerning header size, here's what we have in Gforth (on a 64-bit system): >>>
    here 5 constant five here over - dump
    403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
    403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
    403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............

    I.e., 6 cells for a word with a name that fits in one cell, and one
    cell of body. The cells are:

    This was exactly my point.

    Your point was that Gforth uses 6 cells for a word with a name that
    fits in a cell and where the body takes one cell?

    The point was that in this time and age you don't try to save a
    few cells here and there.

    The bottom line is, in view of the superior cross reference is
    that an argument to trade mpeforth for gforth. Or can we make
    do with the weaker ctags (or dispense with such a facility.)


    - anton
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Hans Bezemer@[email protected] to comp.lang.forth on Tue Apr 21 19:39:19 2026
    From Newsgroup: comp.lang.forth

    On 17-04-2026 09:29, Anton Ertl wrote:
    dxf <[email protected]> writes:
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    ...
    Don't know who was first but Fig-forth's variable length names is something >> that Forth Inc and pretty much everyone adopted. Moore attempted to defend >> '3 chars plus count' but to no avail. That ship had sailed.

    Looking at the traditional length+3 chars and
    [email protected]'s 3 first and last, at least one pair of
    words in Forth-94 conflicts on both systems, and WORDS could show them
    as REA?????? and REA*E, respectively. It would be interesting to
    determine (say, by checking the words from an existing Forth system),
    which scheme produces more conflicts.

    Moore continues with this approach in Color Forth, but he uses some compression approach to usually store more characters in the number of
    bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
    bits, 18 bits, and 32 bits on different hardware). I don't remember
    if he stores the length.

    Another option would be to store a hash value that is computed using
    all characters in the name. If a good hash function is used, the
    probability of a conflict is relatively small with, e.g. 4000 names in
    a wordlist (about the number of names that Gforth has in the Forth
    wordlist), and even the 28 bits that [email protected]
    provides. The probability of no conflict is approximately

    ((2^28-1)/(2^28))^((4000*3999)/2)

    i.e.

    1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.

    The result is 0.97, i.e., there is a 3% probability of conflict for
    these numbers.

    The disadvantage of this approach is that WORDS or SEE cannot even
    show the little about the name that Chuck Moore's approaches or [email protected]'s approach shows. But then, if you are so
    pressed for memory that you use one of these approaches, why not also
    save the memory for WORDS and SEE?

    Another disadvantage is that the system cannot tell if a redefinition
    warning comes from a hash conflict or from the name actually being
    redefined; but it shares this disadvantage with all approaches that do
    not store the full name.

    - anton

    It depends a lot on the hashing routine used. FNV1a is particularly
    good, having only 4 collisions on 215,000 words.

    (https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633)

    I use it in my uBasic/4tH interpreter to convert labels to line numbers :-)

    Yeah, labels don't need to be sequential. It's not ZX BASIC. After a
    decade I still have to see a collision. Full disclosure, max. source
    size is 16K, that's about 300-440 lines.

    In my only 16K like program there are about 65 subroutines, most of 'em one-liners. Yeah, it's been heavily Forthified. :-)

    4tH itself is a pseudo compiler and has (apart from a BRANCH
    instruction) no headers. It does have a great disassembler, though. With symbols. ;-)

    Thanks to Aaron!

    Hans Bezemer
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Wed Apr 22 22:48:31 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    Another disadvantage is that the system cannot tell if a redefinition
    warning comes from a hash conflict or from the name actually being
    redefined; but it shares this disadvantage with all approaches that do
    not store the full name.

    In this context a hash conflict is a redefinition. Avoid, unless
    you intend to hide an earlier definition.


    - anton

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Paul Rubin@[email protected] to comp.lang.forth on Fri Apr 24 10:38:34 2026
    From Newsgroup: comp.lang.forth

    [email protected] writes:
    In this context a hash conflict is a redefinition. Avoid,

    Unclear how to avoid, other than by storing the entire name so you can
    detect the conflict.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@[email protected] to comp.lang.forth on Sat Apr 25 11:54:01 2026
    From Newsgroup: comp.lang.forth

    In article <[email protected]>,
    Paul Rubin <[email protected]d> wrote:
    [email protected] writes:
    In this context a hash conflict is a redefinition. Avoid,

    Unclear how to avoid, other than by storing the entire name so you can
    detect the conflict.

    If you have a conflict, you rename the new offending definition,
    as you do now. If you redefine a word, it is intentional, and
    you can ignore the message.
    ciforth can use 10,000 character names, even store it in ALLOCATEd space.

    The essential point of ciforth is that there is one handle (dictionary
    entry address) to characterize a word/definition/procedure/data-object.
    Use that for everything to pass around.
    No words with "no name", only data structures where the name is left a
    zero pointer.

    Not that the model can be valid even if you use 4 character names.
    Jeez!

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2