...
However we could squeeze for 16 bits, without logically affecting the model.
code field: one byte, an offset to a code area of 256 bytes.
data field: 16 bit pointer
name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
flag field: hidden in the name
link field: 256 bit offset, d.e. are at most 256 byte long.
The total of 8 bytes will put even the original fig model to shame.
However we could squeeze for 16 bits, without logically affecting the model.
...name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
Don't know who was first but Fig-forth's variable length names is something >that Forth Inc and pretty much everyone adopted. Moore attempted to defend >'3 chars plus count' but to no avail. That ship had sailed.
dxf <[email protected]> writes:
...name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
Don't know who was first but Fig-forth's variable length names is something >>that Forth Inc and pretty much everyone adopted. Moore attempted to defend >>'3 chars plus count' but to no avail. That ship had sailed.
Looking at the traditional length+3 chars and
[email protected]'s 3 first and last, at least one pair of
words in Forth-94 conflicts on both systems, and WORDS could show them
as REA?????? and REA*E, respectively. It would be interesting to
determine (say, by checking the words from an existing Forth system),
which scheme produces more conflicts.
- anton
[email protected] writes:
However we could squeeze for 16 bits, without logically affecting the model.
These days for such a constrained target, it's probably best to tether
from a bigger machine.
The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.
[email protected] writes:
The argument was that an extreme impractical Forth can be implemented
with this model as guideline, to counter the argument of extreme waste
that I expected.
The users have voted with their feet: They usually have used the
"extremely wasteful" fig-Forth with the default settings (names with
up to 31-char) rather than using the "extremely impractical" option to
only store the first n chars of a name (with n being configurable in fig-Forth). "Wastefulness" won over "Impracticality" so convincingly
that even Forth, Inc. switched from "impracticality" to
"wastefulness", as well as almost everyone else. The exception is
Chuck Moore, who continues with the "impractical" approach in
ColorForth. Even in the Forth universe, few people seem to use
ColorForth and I have not heard of systems that follow its approach to
names.
Concerning memory consumption, Gforth's development version includes quite a bit of meta-information for two purposes:
1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.
2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.
Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, however. The result is that gforth.fi takes 2.3MB on my system; for comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).
One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.
Concerning header size, here's what we have in Gforth (on a 64-bit system):
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............
I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:
403AC9C0: Name padded with spaces at the front to align it to a cell boundary 403AC9C8: Name length and flags
403AC9D0: link field (pointer to previous word in the same wordlist) 403AC9D8: header methods (pointer to a method table)
403AC9E0: code field (contains the code address of docon in this case) 403AC9E8: body aka parameter field, contains the value in this case
For more information, read--- Synchronet 3.21f-Linux NewsLink 1.2
@InProceedings{paysan&ertl19,
author = {Bernd Paysan and M. Anton Ertl},
title = {The new {Gforth} Header},
crossref = {euroforth19},
pages = {5--20},
url = {http://www.euroforth.org/ef19/papers/paysan.pdf},
url-slides = {http://www.euroforth.org/ef19/papers/paysan-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2019:header},
OPTnote = {refereed},
abstract = {The new Gforth header is designed to directly
implement the requirements of Forth-94 and
Forth-2012. Every header is an object with a fixed
set of fields (code, parameter, count, name, link)
and methods (\texttt{execute}, \texttt{compile,},
\texttt{(to)}, \texttt{defer@}, \texttt{does},
\texttt{name>interpret}, \texttt{name>compile},
\texttt{name>string}, \texttt{name>link}). The
implementation of each method can be changed
per-word (prototype-based object-oriented
programming). We demonstrate how to use these
features to implement optimization of constants,
\texttt{fvalue}, \texttt{defer}, \texttt{immediate},
\texttt{to} and other dual-semantics words, and
\texttt{synonym}.}
}
@Proceedings{euroforth19,
title = {35th EuroForth Conference},
booktitle = {35th EuroForth Conference},
year = {2019},
key = {EuroForth'19},
url = {http://www.euroforth.org/ef19/papers/proceedings.pdf}
}
There have been a few changes since that paper: The body address is
now used as nt and xt, and the "name length and flags" field has
gained another flag or two.
- anton
Concerning memory consumption, Gforth's development version includes
quite a bit of meta-information for two purposes:
1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.
2) How the machine code addresses relate to what is COMPILE,d, to get
proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.
Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, >however. The result is that gforth.fi takes 2.3MB on my system; for >comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also
out-of-line, both not in the image).
One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.
Concerning header size, here's what we have in Gforth (on a 64-bit system):
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............
I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:
- anton--
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
Concerning memory consumption, Gforth's development version includes
quite a bit of meta-information for two purposes:
1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.
2) How the machine code addresses relate to what is COMPILE,d, to get >>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.
Both sets of informations are in big tables that are stored
out-of-line, not with the headers, and each takes about as much space
as the inline stuff (headers, threaded code, other bodies); the
information where each definition is defined is stored in the headers, >>however. The result is that gforth.fi takes 2.3MB on my system; for >>comparison, the inline dictionary stuff is 686464 bytes, the native
code of 465671 bytes for gforth-fast and 892251 for gforth (also >>out-of-line, both not in the image).
In view that ctags understands Forth code this seems to be a duplicate >effort.
ctags --lang=forth *.frt
The advantage is that you can use emacs (or other sophisticated
editors) to go to a function in a familiar way that you were used
to in other languages too.
For those not familiar with ctags, in additions to definitions
it finds also references. It also is blindingly fast.
Under 100 mS for hundreds of files.
One may consider this wasteful, but I consider it good use of the RAM
that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.
Concerning header size, here's what we have in Gforth (on a 64-bit system): >>
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............
I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:
This was exactly my point.
[email protected] writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
Concerning memory consumption, Gforth's development version includes >>>quite a bit of meta-information for two purposes:
1) How the threaded code relates to the source code, not just where
each definition starts, but also where words are used.
This is used for making backtraces more informative.
2) How the machine code addresses relate to what is COMPILE,d, to get >>>proper decompilation with SEE-CODE, SIMPLE-SEE, and SEE.
3) There is also the where table that records where each word is
actually used in the loaded source code, whether there is threaded
code for it or not; it records interpretive use as well as immediate
words where the threaded code is for a different word, if there is
threaded code at all.
I have run ctags and etags with this option on the files from the Gray >directory, and both do not find any of the definitions of TERM (which
exist in calc.fs and oberon.fs); that's probably because they have
been defined with a user-defined defining word.
editors) to go to a function in a familiar way that you were used
to in other languages too.
For a long time, I thought that etags.fs is sufficient and we do not
need to add LOCATE to Gforth, but once we implemented LOCATE, I found
that I use it much more often than M-. (forth-find-tag).
For those not familiar with ctags, in additions to definitions
it finds also references. It also is blindingly fast.
Under 100 mS for hundreds of files.
So you may be claiming that ctags covers the job of the where table.
I do not see how to achive that. ctags has an option --cxref, but it
just outputs the definitions in a different format. E.g., when I say
ctags --lang=forth --cxref *.fs<SNIP>
with the DUP use being highlighted. The where table, which consumes
quite a bit of memory (827_776 bytes in gforth.fi for a 64-bit
system), contains that information. I do not see anything in the
ctags/etags manual that provides this functionality. So the LOCATE >information (a cell for each dictionary entry) may be seen as
duplicate information, but the WHERE information does not duplicate
anything.
One may consider this wasteful, but I consider it good use of the RAM >>>that our machines have; my PC from 1993 had 16MB, the one from 2015
16GB, my current one 64GB.
Concerning header size, here's what we have in Gforth (on a 64-bit system): >>>
here 5 constant five here over - dump
403AC9C0: 20 20 20 20 66 69 76 65 - 04 00 00 00 00 00 00 00 five........
403AC9D0: 08 C7 3A 40 00 00 00 00 - 81 EA A1 9D EA 55 00 00 ..:@.........U..
403AC9E0: 60 AF 30 40 00 00 00 00 - 05 00 00 00 00 00 00 00 `.0@............
I.e., 6 cells for a word with a name that fits in one cell, and one
cell of body. The cells are:
This was exactly my point.
Your point was that Gforth uses 6 cells for a word with a name that
fits in a cell and where the body takes one cell?
- anton--
dxf <[email protected]> writes:
...name filed: 4 byte, 3 first and last char. Only 7 bits,
8th bit counts are flags
Don't know who was first but Fig-forth's variable length names is something >> that Forth Inc and pretty much everyone adopted. Moore attempted to defend >> '3 chars plus count' but to no avail. That ship had sailed.
Looking at the traditional length+3 chars and
[email protected]'s 3 first and last, at least one pair of
words in Forth-94 conflicts on both systems, and WORDS could show them
as REA?????? and REA*E, respectively. It would be interesting to
determine (say, by checking the words from an existing Forth system),
which scheme produces more conflicts.
Moore continues with this approach in Color Forth, but he uses some compression approach to usually store more characters in the number of
bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
bits, 18 bits, and 32 bits on different hardware). I don't remember
if he stores the length.
Another option would be to store a hash value that is computed using
all characters in the name. If a good hash function is used, the
probability of a conflict is relatively small with, e.g. 4000 names in
a wordlist (about the number of names that Gforth has in the Forth
wordlist), and even the 28 bits that [email protected]
provides. The probability of no conflict is approximately
((2^28-1)/(2^28))^((4000*3999)/2)
i.e.
1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.
The result is 0.97, i.e., there is a 3% probability of conflict for
these numbers.
The disadvantage of this approach is that WORDS or SEE cannot even
show the little about the name that Chuck Moore's approaches or [email protected]'s approach shows. But then, if you are so
pressed for memory that you use one of these approaches, why not also
save the memory for WORDS and SEE?
Another disadvantage is that the system cannot tell if a redefinition
warning comes from a hash conflict or from the name actually being
redefined; but it shares this disadvantage with all approaches that do
not store the full name.
- anton
Another disadvantage is that the system cannot tell if a redefinition
warning comes from a hash conflict or from the name actually being
redefined; but it shares this disadvantage with all approaches that do
not store the full name.
- anton
In this context a hash conflict is a redefinition. Avoid,
[email protected] writes:
In this context a hash conflict is a redefinition. Avoid,
Unclear how to avoid, other than by storing the entire name so you can
detect the conflict.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,114 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492511:59:12 |
| Calls: | 14,267 |
| Calls today: | 3 |
| Files: | 186,320 |
| D/L today: |
26,261 files (8,509M bytes) |
| Messages: | 2,518,394 |