This posting is a more general reflection about designing types in
Forth; it just uses recognizers as example.
Am 17.09.2025 um 18:53 schrieb Anton Ertl:
This posting is a more general reflection about designing types in
Forth; it just uses recognizers as example.
My gut feeling is that the standard Forth word zoo is already big
enough. Why should one define return types now, after more than half
a century of Forth's history? This is beyond me.
minforth <[email protected]> writes:
Am 17.09.2025 um 18:53 schrieb Anton Ertl:
This posting is a more general reflection about designing types in
Forth; it just uses recognizers as example.
My gut feeling is that the standard Forth word zoo is already big
enough. Why should one define return types now, after more than half
a century of Forth's history? This is beyond me.
There is a discussion of 0 vs. R:FAIL in at least one of the versions
of Matthias Trute's proposal, and a less thorough discussion of 0
vs. NOTFOUND (=R:FAIL) in Bernd Paysan's proposal. To see an
advantage of TRANSLATE-NONE (=R:FAIL=NOTFOUND), consider:
: postpone ( "name" -- ) \ core
\g Compiles the compilation semantics of @i{name}.
parse-name rec-forth postponing ; immediate
REC-FORTH is the system recognizer, which produces a translation (a representation of the parsed word/number/etc. on the stack). If the recognizer does not recognize "name", it produces TRANSLATE-NONE.
POSTPONING then performs the postpone action for the translation. A straightforward implementation of translation tokens (the top cell of
a translation)
create translate-...
' ... , \ interpreting
' ... , \ compiling
' ... , \ postponing
For TRANSLATE-NONE that would be:
: undefined-word #-13 throw ;
create translate-none
' undefined-word ,
' undefined-word ,
' undefined-word ,
And POSTPONING can then be implemented as:
: postponing ( translation -- )
2 cells + @ execute ;
However, if you use 0 instead of TRANSLATE-NONE, you would have to special-case that in POSTPONING:
: postponing ( translation -- )
dup 0= if -13 throw then
2 cells + @ execute ;
- anton
On lxf64 each individual recognizer returns 0 when no match was found.
The last recognizer to be tested is REC-ABORT (same as your REC-NONE).
As a consequence REC-FORTH will never fail!
The interpret word thus becomes very simple
M: STATE-TRANSLATING ( trans -- ) \ get the right xt for the current state
2 state @ + cells+ @ execute ;
: INTERPRET2 ( -- )
begin parse-name
dup while
forth-recognize state-translating
repeat 2drop
?stack ;
peter <[email protected]> writes::NONAME -13 throw ;
On lxf64 each individual recognizer returns 0 when no match was found.
The last recognizer to be tested is REC-ABORT (same as your REC-NONE).
REC-NONE is the neutral element of recognizer sequences, i.e., as far
as the sequence is concerned, a noop. You can prepend REC-NONE to a recognizer sequence and the sequence will produce the same result.
The implementation of REC-NONE is:
: rec-none ( c-addr u -- translation )
2drop translate-none ;
I doubt that your REC-ABORT works like that. My guess is that your
REC-ABORT is:
: rec-abort -13 throw ;
msg saves the string to be able to print the name in the abort message: postpone ( "name" -- )
and I will work with that guess in the folloing.
As a consequence REC-FORTH will never fail!
In the proposal, any recognizer and recognizer sequence, including
that in REC-FORTH, can have TRANSLATE-NONE as a result, which
indicates that the recognizer (sequence) did not recognize the string.
The interpret word thus becomes very simple
M: STATE-TRANSLATING ( trans -- ) \ get the right xt for the current state
2 state @ + cells+ @ execute ;
: INTERPRET2 ( -- )
begin parse-name
dup while
forth-recognize state-translating
repeat 2drop
?stack ;
The same implementation can be used with the proposal (but it calls FORTH-RECOGNIZE by a new name: REC-FORTH) and TRANSLATE-NONE.
Compared to what I presented, the order of xts in the
TRANSLATE-... tables is reversed, so POSTPONING would become even
simpler:
: POSTPONING ( translation -- )
@ execute ;
One difference is that, for an unrecignized string, the -13 throw isApart from the names that has changed (I have not updated them yet) I see only minor differences. One being the individual recognizer returning 0 on fail.
done later, when performing the action of the translation.
The benefit of having TRANSLATE-NONE and doing the -13 throw in its
actions, instead of hard-coded in REC-FORTH is that REC-FORTH contains
just another recognizer sequence, that recognizer sequences behave
like recognizers, and thus are nestable, and that you can write code
like
( c-addr u ) rec-something ( translation ) postponing
and it will work without you having to put REC-ABORT at the end of REC-SOMETHING.
However, the current proposal does not propose to standardize
POSTPONING etc., but leaves it to the standard text interpreter and
standard POSTPONE to perform the translation actions. So, as long as
we don't standardize these words, one could also have a recognizer
sequence
' rec-abort ' rec-forth 2 recognizer-sequence: rec-forth-abort
and let the text interpreter and POSTPONE call REC-FORTH-ABORT instead
of REC-FORTH. But if we want to leave the option open to standardize POSTPONING etc. in the future, the proposed approach is more flexible.
- anton
On Sat, 20 Sep 2025 07:25:54 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
On lxf64 each individual recognizer returns 0 when no match was found.REC-NONE is the neutral element of recognizer sequences, i.e., as far
The last recognizer to be tested is REC-ABORT (same as your REC-NONE). >>=20
as the sequence is concerned, a noop. You can prepend REC-NONE to a
recognizer sequence and the sequence will produce the same result.
The implementation of REC-NONE is:
=20
: rec-none ( c-addr u -- translation )
2drop translate-none ;
=20
I doubt that your REC-ABORT works like that. My guess is that your
REC-ABORT is:
=20
: rec-abort -13 throw ;
:NONAME -13 throw ;=20
dup-t
dup-t
CREATE TRANSLATE-ABORT
,-d-t ,-d-t ,-d-t
: REC-ABORT ( addr len -- nt)
>msg translate-abort ;
msg saves the string to be able to print the name in the abort message
Apart from the names that has changed (I have not updated them yet) I see only >minor differences. One being the individual recognizer returning 0 on fail.
I am using a linked list of recognizers but that is only an implementation = >detail.
I do not see that the "standard" would mandate an array.
I have introduce a vocabulary like word for managing the recognizers
It looks like
' rec-local recognizer: Locals-recognizer
This does 3 things
- It gives the recognizer a name.
- It inserts the recognizer in the list just behind the number recognizers
- Executing it moves the recognizers to the top of the list
peter <[email protected]> writes:
On Sat, 20 Sep 2025 07:25:54 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
On lxf64 each individual recognizer returns 0 when no match was found.REC-NONE is the neutral element of recognizer sequences, i.e., as far
The last recognizer to be tested is REC-ABORT (same as your REC-NONE). >>=20
as the sequence is concerned, a noop. You can prepend REC-NONE to a
recognizer sequence and the sequence will produce the same result.
The implementation of REC-NONE is:
=20
: rec-none ( c-addr u -- translation )
2drop translate-none ;
=20
I doubt that your REC-ABORT works like that. My guess is that your
REC-ABORT is:
=20
: rec-abort -13 throw ;
:NONAME -13 throw ;=20
dup-t
dup-t
CREATE TRANSLATE-ABORT
,-d-t ,-d-t ,-d-t
: REC-ABORT ( addr len -- nt)
>msg translate-abort ;
Ok, your TRANSLATE-ABORT is TRANSLATE-NONE, and your REC-ABORT is
REC-NONE.
msg saves the string to be able to print the name in the abort message
Might be cleaner than Gforth's current mechanism (I don't remember how
that works).
Apart from the names that has changed (I have not updated them yet) I see only
minor differences. One being the individual recognizer returning 0 on fail.
So the usual recognizers return 0 for not-recognized, but REC-FORTH
returns TRANSLATE-NONE. Interesting twist. What about other
recognizer sequences?
I am using a linked list of recognizers but that is only an implementation = >detail.
I do not see that the "standard" would mandate an array.
The standard does not mandate any particular implementation of
REC-FORTH or of recognizer sequences.
I have introduce a vocabulary like word for managing the recognizers
It looks like
' rec-local recognizer: Locals-recognizer
This does 3 things
- It gives the recognizer a name.
- It inserts the recognizer in the list just behind the number recognizers
But before other recognizers behind the number recognizers?
- Executing it moves the recognizers to the top of the list
I am confused. Under what circumstances does the "insert just behind" happen, and when "move to the top"?
And what scenario do you have in mind that makes this behaviour
useful?
- anton
It has been usefull when testing new recognizers.
peter <[email protected]> writes:
It has been usefull when testing new recognizers.
I have been doing that by directly calling the recognizer. E.g.
s" `dup" rec-tick . .
s" `fkjfd" rec-tick .
s" dup" rec-tick .
- anton
On Sat, 20 Sep 2025 07:25:54 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
On lxf64 each individual recognizer returns 0 when no match was found.
The last recognizer to be tested is REC-ABORT (same as your REC-NONE).
REC-NONE is the neutral element of recognizer sequences, i.e., as far
as the sequence is concerned, a noop. You can prepend REC-NONE to a
recognizer sequence and the sequence will produce the same result.
The implementation of REC-NONE is:
: rec-none ( c-addr u -- translation )
2drop translate-none ;
I doubt that your REC-ABORT works like that. My guess is that your
REC-ABORT is:
: rec-abort -13 throw ;
:NONAME -13 throw ;
dup-t
dup-t
CREATE TRANSLATE-ABORT
,-d-t ,-d-t ,-d-t
: REC-ABORT ( addr len -- nt)
>msg translate-abort ;
The -t and -d-t endings are due to this being metacompiled
msg saves the string to be able to print the name in the abort message
and I will work with that guess in the folloing.
As a consequence REC-FORTH will never fail!
In the proposal, any recognizer and recognizer sequence, including
that in REC-FORTH, can have TRANSLATE-NONE as a result, which
indicates that the recognizer (sequence) did not recognize the string.
The interpret word thus becomes very simple
M: STATE-TRANSLATING ( trans -- ) \ get the right xt for the current state
2 state @ + cells+ @ execute ;
: INTERPRET2 ( -- )
begin parse-name
dup while
forth-recognize state-translating
repeat 2drop
?stack ;
The same implementation can be used with the proposal (but it calls
FORTH-RECOGNIZE by a new name: REC-FORTH) and TRANSLATE-NONE.
Compared to what I presented, the order of xts in the
TRANSLATE-... tables is reversed, so POSTPONING would become even
simpler:
: POSTPONING ( translation -- )
@ execute ;
: postpone ( "name" -- )
parse-name forth-recognize @ execute ; immediate
One difference is that, for an unrecignized string, the -13 throw is
done later, when performing the action of the translation.
The benefit of having TRANSLATE-NONE and doing the -13 throw in its
actions, instead of hard-coded in REC-FORTH is that REC-FORTH contains
just another recognizer sequence, that recognizer sequences behave
like recognizers, and thus are nestable, and that you can write code
like
( c-addr u ) rec-something ( translation ) postponing
and it will work without you having to put REC-ABORT at the end of
REC-SOMETHING.
However, the current proposal does not propose to standardize
POSTPONING etc., but leaves it to the standard text interpreter and
standard POSTPONE to perform the translation actions. So, as long as
we don't standardize these words, one could also have a recognizer
sequence
' rec-abort ' rec-forth 2 recognizer-sequence: rec-forth-abort
and let the text interpreter and POSTPONE call REC-FORTH-ABORT instead
of REC-FORTH. But if we want to leave the option open to standardize
POSTPONING etc. in the future, the proposed approach is more flexible.
- anton
Apart from the names that has changed (I have not updated them yet) I see only >minor differences. One being the individual recognizer returning 0 on fail.
I implemented yur proposal from February and it has worked as expected.
The float recognizer has been a cleanup removing deferred words and now done >instead when the float package is included.
I am using a linked list of recognizers but that is only an implementation detail.
I do not see that the "standard" would mandate an array.
I have introduce a vocabulary like word for managing the recognizers
It looks like
' rec-local recognizer: Locals-recognizer
This does 3 things
- It gives the recognizer a name.
- It inserts the recognizer in the list just behind the number recognizers
- Executing it moves the recognizers to the top of the list
I have also adjuster ORDER to also chow the recognizers
order
Order: $0070�01C0 Forth
$0070�01E8 Root
Current: $0070�01C0 Forth
Loaded recognizers:
$0070�18E8 Locals-recognizer
$0070�0648 Word-recognizer
$0070�0620 Number-recognizer
$0070�1C88 Float-recognizer
$0070�1D28 String-recognizer
$0070�19B0 Tick-recognizer
$0070�1948 To-recognizer
$0070�1970 To2-recognizer
$0070�1A20 Only-recognizer
$0070�0600 Abort not found
ok
BR
Peter
All this is accomplised by a PREFIX flag (compare IMMEDIATE)
and a provision that advances the interpreter pointer by
the length of the prefixes, not by the length of the word passed
to it.
It is believable that the system presented above is more powerful,
but I love to see examples what it can do that warrant the
complexity. Also I love to see if the examples can't be
done with my simpler setup.
Recently I presented the Roman number prefix. How does
that look in the recognizer presented.
Am 21.09.2025 um 10:37 schrieb [email protected]:
<snip>
All this is accomplised by a PREFIX flag (compare IMMEDIATE)
and a provision that advances the interpreter pointer by
the length of the prefixes, not by the length of the word passed
to it.
It is believable that the system presented above is more powerful,
but I love to see examples what it can do that warrant the
complexity. Also I love to see if the examples can't be
done with my simpler setup.
Recently I presented the Roman number prefix. How does
that look in the recognizer presented.
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
On Sat, 20 Sep 2025 17:57:30 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
It has been usefull when testing new recognizers.
I have been doing that by directly calling the recognizer. E.g.
s" `dup" rec-tick . .
s" `fkjfd" rec-tick .
s" dup" rec-tick .
I do that also.
A bit difficult with the string recognizer!
It is believable that the system presented above is more powerful,
but I love to see examples what it can do that warrant the
complexity. Also I love to see if the examples can't be
done with my simpler setup.
Recently I presented the Roman number prefix. How does
that look in the recognizer presented.
In article <[email protected]>,
minforth <[email protected]> wrote:
Am 21.09.2025 um 10:37 schrieb [email protected]:The ability to use suffixes doesn't contribute necessarily to
<snip>
All this is accomplised by a PREFIX flag (compare IMMEDIATE)
and a provision that advances the interpreter pointer by
the length of the prefixes, not by the length of the word passed
to it.
It is believable that the system presented above is more powerful,
but I love to see examples what it can do that warrant the
complexity. Also I love to see if the examples can't be
done with my simpler setup.
Recently I presented the Roman number prefix. How does
that look in the recognizer presented.
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
power. It adds confusion and difficulty to parse.
Try it
Program a suffix aided recognizer for Roman numbers:
MMXIIX:R
peter <[email protected]> writes:
On Sat, 20 Sep 2025 17:57:30 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
It has been usefull when testing new recognizers.
I have been doing that by directly calling the recognizer. E.g.
s" `dup" rec-tick . .
s" `fkjfd" rec-tick .
s" dup" rec-tick .
I do that also.
A bit difficult with the string recognizer!
In theory:
s\" \"abc\"" rec-string scan-translate-string = . type \ -1 "abc"
Trying to pass the result of the REC-STRING to INTERPRETING has
revealed interesting restrictions in Gforth's implementation of the
results of REC-STRING:
s\" \"abc\"" rec-string interpreting
*the terminal*:26:25: error: Scanned string not in input buffer
parse-name "abc" rec-string scan-translate-string = . type \ -1 "abc" parse-name "abc" rec-string interpreting
*the terminal*:29:29: error: Invalid memory address
parse-name "abc" rec-string >>>interpreting<<<
- anton
Unlike your approach, one can write roman numerals in the same way
that we learned in school.
Does it warrant the complexity? I think
that already the benefit of not having to FIND-NAME for all prefixes
of a word we search for is a good reason to avoid your approach.
The interface to the recognizer proposal has seen some renaming and
other changes in the recent committee meeting, and you find the
updated code for recognizing and printing roman numerals in:
https://www.complang.tuwien.ac.at/forth/programs/roman-numerals.4th
The recognizer stuff conforms with the recent proposal, the rest of
the code uses Gforth extensions.
- anton
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
Unlike your approach, one can write roman numerals in the same way
that we learned in school.
I think that it is good that the roman numerals are distinguished.
VI M L X become suddenly reserved words.
There is a reason that we
prefer $CD and $DEADBEEF.
Does it warrant the complexity? I think
that already the benefit of not having to FIND-NAME for all prefixes
of a word we search for is a good reason to avoid your approach.
That is probably a misunderstanding of my approach.
I understand that in your recognizer system all recognizers are tried
in succession.
Using a prefix like 0x for hex, the lookup for 0x is the same
as the lookup for `` 1 CONSTANT 0x '' , so no separate mechanism is
needed.
Only after the word is found, a prefix is handled differently, compare >immediate words. Such lookup is probably even less effort.
Assume we have a PREFIX $ for hex.
Think of a unix environment, where $ is used for environment variables
and we want 0x for hex.
NAMESPACE unix \ That is VOCABULARY with a built-in ALSO
unix DEFINITIONS
' $ ALIAS 0x
\ Warning: is not unique.
: $ PARSE-NAME GET-ENV POSTPONE DLITERAL ; IMMEDIATE PREFIX
...
...
PREVIOUS DEFINITIONS
As soon as you kick unix out of the search order, $ is again the
prefix for hex and 0xCD is no more recognized.
P.S. GET-ENV leaves a double. Adding POSTPONE DLITERAL makes that
$XXXX can be used in compilation mode.
Only after the word is found, a prefix is handled differently, compare >>immediate words. Such lookup is probably even less effort.
My understaning is that if the user types
123456789
into the text interpreter, your text interpreter will search for
123456789
12345678
1234567
123456
12345
1234
123
12
1
and fail at the first 8 attempts, and finally match the ninth, and
only then try to convert the string into a number. By contrast, with >recognizers, every recognizer (including REC-NAME) only has to deal
with the full string, and most other recognizers have simpler and
cheaper checks than REC-NAME.
Assume we have a PREFIX $ for hex.
Think of a unix environment, where $ is used for environment variables
and we want 0x for hex.
NAMESPACE unix \ That is VOCABULARY with a built-in ALSO
unix DEFINITIONS
' $ ALIAS 0x
\ Warning: is not unique.
: $ PARSE-NAME GET-ENV POSTPONE DLITERAL ; IMMEDIATE PREFIX
...
...
PREVIOUS DEFINITIONS
As soon as you kick unix out of the search order, $ is again the
prefix for hex and 0xCD is no more recognized.
Gforth has REC-ENV and that is active by default, and there is usually
no reason to eliminate it from the system recognizer sequence. You
write ${HOME}.
P.S. GET-ENV leaves a double. Adding POSTPONE DLITERAL makes that
$XXXX can be used in compilation mode.
It seems that your approach embraces state-smartness. By contrast,
one benefit of recognizers is that they make it unnecessary to use
words like S" or TO that often are implemented as state-smart words,
or require unconventional mechanisms to avoid that.
- anton
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
<SNIP>
Only after the word is found, a prefix is handled differently, compare >>>immediate words. Such lookup is probably even less effort.
My understaning is that if the user types
123456789
into the text interpreter, your text interpreter will search for
123456789
12345678
1234567
123456
12345
1234
123
12
1
and fail at the first 8 attempts, and finally match the ninth, and
only then try to convert the string into a number. By contrast, with >>recognizers, every recognizer (including REC-NAME) only has to deal
with the full string, and most other recognizers have simpler and
cheaper checks than REC-NAME.
No. 123456789 is looked up in the Forth wordlist, fails, then in the
minimum search wordlist.
' & ^ 0 1 2 3 4 5 6 7
8 9 - + " FORTH
1234556789 matches the prefix 1.
Assume we have a PREFIX $ for hex.
Think of a unix environment, where $ is used for environment variables >>>and we want 0x for hex.
NAMESPACE unix \ That is VOCABULARY with a built-in ALSO
unix DEFINITIONS
' $ ALIAS 0x
\ Warning: is not unique.
: $ PARSE-NAME GET-ENV POSTPONE DLITERAL ; IMMEDIATE PREFIX
...
...
PREVIOUS DEFINITIONS
As soon as you kick unix out of the search order, $ is again the
prefix for hex and 0xCD is no more recognized.
Gforth has REC-ENV and that is active by default, and there is usually
no reason to eliminate it from the system recognizer sequence. You
write ${HOME}.
Does that invalidate the example?
P.S. GET-ENV leaves a double. Adding POSTPONE DLITERAL makes that
$XXXX can be used in compilation mode.
It seems that your approach embraces state-smartness. By contrast,
one benefit of recognizers is that they make it unnecessary to use
words like S" or TO that often are implemented as state-smart words,
or require unconventional mechanisms to avoid that.
No I don't. Numbers have always been state-smart, although you
won't admit to it.
In my system you can't postpone numbers, so that cannot lead to
problems.
[email protected] writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
<SNIP>
Only after the word is found, a prefix is handled differently, compare >>>>immediate words. Such lookup is probably even less effort.
My understaning is that if the user types
123456789
into the text interpreter, your text interpreter will search for
123456789
12345678
1234567
123456
12345
1234
123
12
1
and fail at the first 8 attempts, and finally match the ninth, and
only then try to convert the string into a number. By contrast, with >>>recognizers, every recognizer (including REC-NAME) only has to deal
with the full string, and most other recognizers have simpler and
cheaper checks than REC-NAME.
No. 123456789 is looked up in the Forth wordlist, fails, then in the >>minimum search wordlist.
' & ^ 0 1 2 3 4 5 6 7
8 9 - + " FORTH
1234556789 matches the prefix 1.
How so? Linear search through the wordlist, with prefix matching?
That's even slower than the approach outlined above (when that
approach is implemented using hash tables).
And how does matching "0r" for your roman numerals work, if "0rM"
matches the prefix "0"?
Assume we have a PREFIX $ for hex.
Think of a unix environment, where $ is used for environment variables >>>>and we want 0x for hex.
NAMESPACE unix \ That is VOCABULARY with a built-in ALSO
unix DEFINITIONS
' $ ALIAS 0x
\ Warning: is not unique.
: $ PARSE-NAME GET-ENV POSTPONE DLITERAL ; IMMEDIATE PREFIX
...
...
PREVIOUS DEFINITIONS
As soon as you kick unix out of the search order, $ is again the
prefix for hex and 0xCD is no more recognized.
Gforth has REC-ENV and that is active by default, and there is usually
no reason to eliminate it from the system recognizer sequence. You
write ${HOME}.
Does that invalidate the example?
It means that we can mix standard syntax for hex numbers and
environment variables freely, without having to shadow one with the
other, or kicking one to be able to use the other.
P.S. GET-ENV leaves a double. Adding POSTPONE DLITERAL makes that
$XXXX can be used in compilation mode.
It seems that your approach embraces state-smartness. By contrast,
one benefit of recognizers is that they make it unnecessary to use
words like S" or TO that often are implemented as state-smart words,
or require unconventional mechanisms to avoid that.
No I don't. Numbers have always been state-smart, although you
won't admit to it.
A state-smart 123 would behave like
: 123
123 state @ if postpone literal then ; immediate
By contrast, a normal 123 behaves like
: 123
123 ;
Here is a test
: p123 postpone 123 ; : test [ p123 ] ; test .
Let's see how it works (outputs are shown with preceding "\ "):
: 123 \ compiling
123 state @ if postpone literal then ; immediate
\ *terminal*:2:40: warning: defined literal 123 as word ok
: p123 postpone 123 ; : test [ p123 ] ; test .
\ *the terminal*:3:39: error: Control structure mismatch
\ : p123 postpone 123 ; : test [ p123 ] >>>;<<< test
Now with a freshly started system:
: 123 \ compiling
123 ;
\ *terminal*:2:7: warning: defined literal 123 as word ok
: p123 postpone 123 ; : test [ p123 ] ; test . \ 123 ok
Now with a freshly started system:
: p123 postpone 123 ; : test [ p123 ] ; test . \ 123 ok
The last example uses REC-NUM to recognize 123. It behaves like the
normal (not state-smart) word 123, showing that numbers are not
state-smart.
You may say that in a traditional system the last test will not work,
because POSTPONE does not work with numbers. That's true, but not
proof of any state-smartness. It just means that we have to look at
the implementation to decide it. And in the traditional
implementation the text interpreter decides whether to perform the >interpretation or compilation semantics of a number, whereas in a
state-smart word, these two semantics are the same (immediate), and
the word itself decides when it is run what to do, based on STATE. No
such thing happens with numbers, so they are not state-smart, not even
in a traditional system.
In my system you can't postpone numbers, so that cannot lead to
problems.
That is certainly a good idea if you have made the mistake of
embracing state-smartness in your system, but it is another
disadvantage of your approach compared to recognizers.
- anton
In article <[email protected]>,
Anton Ertl <[email protected]> wrote: >>[email protected] writes:
1234556789 matches the prefix 1.
How so? Linear search through the wordlist, with prefix matching?
That's even slower than the approach outlined above (when that
approach is implemented using hash tables).
I use a simple Forth and there linear search is acceptable for me.
And how does matching "0r" for your roman numerals work, if "0rM"
matches the prefix "0"?
Normal precedence rules for Forth. 0r is later defined so it is
probed earlier.
You may say that in a traditional system the last test will not work, >>because POSTPONE does not work with numbers. That's true, but not
proof of any state-smartness. It just means that we have to look at
the implementation to decide it. And in the traditional
implementation the text interpreter decides whether to perform the >>interpretation or compilation semantics of a number, whereas in a >>state-smart word, these two semantics are the same (immediate), and
the word itself decides when it is run what to do, based on STATE. No
such thing happens with numbers, so they are not state-smart, not even
in a traditional system.
You have tried to explain this to me several times, but this is the clearest.
I terminate denotation words with [COMPILE] LITERAL or [COMPILE] DLITERAL. >Suppose I change it to a system where INTERPRET checks whether an
immediate word left something on the stack ( assuming a separate
compilation check) and only in compilation mode adds a LITERAL
(compiles LIT and the number).
In that case denotations doesn't end with [COMPILE] LITERAL/DLITERAL.
Would that be an acceptable implementation?
To end the controversy, maybe I have to admit I have smart numbers,
but I manage to be ISO-94 compliant.
"AAP"
OK
POSTPONE "AAP"
POSTPONE "AAP" ? ciforth ERROR # 15 : CANNOT FIND WORD TO BE POSTPONED
Maybe not ISO-2012 compliant.
FWIW I also use suffixes for recognizers:
let M be a matrix
M� auto-transposed
M~ auto-inverted
This is the closest I have come
s\" \"" rec-string Hej Peter" interpreting cr type
Hej Peter ok
I still gives me an extra space that is needed to separate rec-string
from the continuation of the string.
It works because rec-string does
the parsing of the rest of the string. If the parsing is done in
interpreting the string would have to be after that!
peter <[email protected]> writes:
This is the closest I have come
s\" \"" rec-string Hej Peter" interpreting cr type
Hej Peter ok
I still gives me an extra space that is needed to separate rec-string
from the continuation of the string.
Yes, if you want to avoid that space, you lose a lot of the
interactive testing benefit.
But at least that works on lxf64.
It works because rec-string does
the parsing of the rest of the string. If the parsing is done in >interpreting the string would have to be after that!
The recommendation in the proposals is that recognizers don't do
additional parsing (in order to be callable by, e.g., LOCATE or other
tools), and that the translator should do the parsing.
- anton
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
[email protected] writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote: >>>[email protected] writes:
So defining a prefix "0r" shadows all earlier-defined words starting
with "0r"? One has to choose the prefixes well, but the same is true
for what recognizers should recognize.
However, LITERAL is a standard word that a conforming implementation
cannot implement in a state-smart way.
: lit, postpone literal ;
: foo [ 1 lit, ] ;
foo . \ 1
(Gforth, iForth, SwiftForth64, and VFX64 process this example correctly).
To end the controversy, maybe I have to admit I have smart numbers,
but I manage to be ISO-94 compliant.
- anton--
On Tue, 23 Sep 2025 17:25:18 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
This is the closest I have come
s\" \"" rec-string Hej Peter" interpreting cr type
Hej Peter ok
I still gives me an extra space that is needed to separate rec-string
from the continuation of the string.
Yes, if you want to avoid that space, you lose a lot of the
interactive testing benefit.
But at least that works on lxf64.
If I do like this it works
s\" \"" drop 0 rec-string Hej Peter" interpreting cr type
Hej Peter ok
It works because rec-string does
the parsing of the rest of the string. If the parsing is done in
interpreting the string would have to be after that!
The recommendation in the proposals is that recognizers don't do
additional parsing (in order to be callable by, e.g., LOCATE or other
tools), and that the translator should do the parsing.
Yes I think that would be a better solution. But it has its problems.
today I have
\ Recognizer for "text strings"
: adj ( len -- len' ) \ if we are at the end of the parse area
\ we need to adjust what we step back
source nip >in @ = + ;
' noop \ interpret action
:noname postpone sliteral ; \ compile action
:noname postpone sliteral postpone 2lit, ; \ postpone action
translator: translate-string
: rec-string ( addr len --xi | 0)
swap c@ '"' =
if adj negate >in +! (s\") translate-string
else drop 0 then ;
I can easily move (s\") ( the interpretive part of S\")
into the translator.
But if I move also adj negate >in +! I need to send the len also.
That is not as clean as doing the adjust in rec-string.
Honestly I think using the string recognizer outside of recognizing
is difficult.
peter
In article <[email protected]>,
...
However, LITERAL is a standard word that a conforming implementation
cannot implement in a state-smart way.
: lit, postpone literal ;
: foo [ 1 lit, ] ;
foo . \ 1
This shows me how to Lift this defect. Rename LITERAL to (LIT) and
define
: LITERAL 'LIT , , ; IMMEDIATE
Then the above test succeeds.
The interpretation syntax of LITERAL is undefined.
LIT, is a sneaky way to have an interpretation syntax.
Normal is
: foo [ 1 ] LITERAL ;
In the standard:
LITERAL :
Interpretation: Interpretation syntax for this word is undefined.
What if the standard says
execution of this word while in interpret mode is an ambiguous condition
then I would gladly throw an exception if anybody tries it and the examples wouldn't fly.
On Tue, 23 Sep 2025 17:25:18 GMT
[email protected] (Anton Ertl) wrote:
peter <[email protected]> writes:
This is the closest I have come
s\" \"" rec-string Hej Peter" interpreting cr type
Hej Peter ok
I still gives me an extra space that is needed to separate rec-string
from the continuation of the string.
Yes, if you want to avoid that space, you lose a lot of the
interactive testing benefit.
But at least that works on lxf64.
If I do like this it works
s\" \"" drop 0 rec-string Hej Peter" interpreting cr type
Hej Peter ok
Honestly I think using the string recognizer outside of recognizing
is difficult.
Am 23.09.2025 um 19:23 schrieb Anton Ertl:
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
To be fair, here MinForth displays the matrix/vector stack in the
QUIT prompt:
MinForth 3.6 (64 bit) (fp matrix)
# 0 0 matrix mat ok
# m[ 1 2 3 ; 4 5 6 ] ok
This shows me how to Lift this defect. Rename LITERAL to (LIT) and
define
: LITERAL 'LIT , , ; IMMEDIATE
In the standard:
LITERAL :
Interpretation: Interpretation syntax for this word is undefined.
What if the standard says
execution of this word while in interpret mode is an ambiguous condition
minforth <[email protected]> writes:
Am 23.09.2025 um 19:23 schrieb Anton Ertl:
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
To be fair, here MinForth displays the matrix/vector stack in the
QUIT prompt:
MinForth 3.6 (64 bit) (fp matrix)
# 0 0 matrix mat ok
# m[ 1 2 3 ; 4 5 6 ] ok
Given this syntax, a parsing word M[ suggests itself to me (although I generally dislike parsing words and probably would choose a different syntax); or maybe a word that switches to a matrix interpreter
(possibly implemented using the recognizer words, with ] switching
back. Why did you choose to use a recognizer?
minforth <[email protected]> writes:
Am 23.09.2025 um 19:23 schrieb Anton Ertl:
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
To be fair, here MinForth displays the matrix/vector stack in the
QUIT prompt:
MinForth 3.6 (64 bit) (fp matrix)
# 0 0 matrix mat ok
# m[ 1 2 3 ; 4 5 6 ] ok
Given this syntax, a parsing word M[ suggests itself to me (although I >generally dislike parsing words and probably would choose a different >syntax); or maybe a word that switches to a matrix interpreter
(possibly implemented using the recognizer words, with ] switching
back. Why did you choose to use a recognizer?
- anton
minforth <[email protected]> writes:
Am 23.09.2025 um 19:23 schrieb Anton Ertl:
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
To be fair, here MinForth displays the matrix/vector stack in the
QUIT prompt:
MinForth 3.6 (64 bit) (fp matrix)
# 0 0 matrix mat ok
# m[ 1 2 3 ; 4 5 6 ] ok
Given this syntax, a parsing word M[ suggests itself to me (although I generally dislike parsing words and probably would choose a different syntax); or maybe a word that switches to a matrix interpreter
(possibly implemented using the recognizer words, with ] switching
back. Why did you choose to use a recognizer?
In article <[email protected]>,In principle yes, but wordlists don't hook themselves into the Forth interpreter. IMO this is the only novelty of recognizers.
Anton Ertl <[email protected]> wrote:
minforth <[email protected]> writes:
Am 23.09.2025 um 19:23 schrieb Anton Ertl:
minforth <[email protected]> writes:
FWIW I also use suffixes for recognizers:
let M be a matrix
M´ auto-transposed
M~ auto-inverted
Can you give an example of a matrix with your matrix recognizer?
To be fair, here MinForth displays the matrix/vector stack in the
QUIT prompt:
MinForth 3.6 (64 bit) (fp matrix)
# 0 0 matrix mat ok
# m[ 1 2 3 ; 4 5 6 ] ok
Given this syntax, a parsing word M[ suggests itself to me (although I
generally dislike parsing words and probably would choose a different
syntax); or maybe a word that switches to a matrix interpreter
(possibly implemented using the recognizer words, with ] switching
back. Why did you choose to use a recognizer?
WORDLIST suggest a different solution with a wordlist MATRIX
MAT( adds MATRIX to the search order
)MAT removes MATRIX from the search order
Circumstances may prevent this, but I think that is the situation
where wordlists are intended for, create a different interpretation/compile environment.
[email protected] writes:
This shows me how to Lift this defect. Rename LITERAL to (LIT) and
define
: LITERAL 'LIT , , ; IMMEDIATE
Looks good.
In the standard:
LITERAL :
Interpretation: Interpretation syntax for this word is undefined.
Has ISO changed the text? Forth-94 and Forth-2012 say:
|Interpretation:
|Interpretation semantics for this word are undefined.
What if the standard says
execution of this word while in interpret mode is an ambiguous condition
It does not, and that's a good thing.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,071 |
Nodes: | 10 (0 / 10) |
Uptime: | 77:30:24 |
Calls: | 13,745 |
Files: | 186,974 |
D/L today: |
323 files (105M bytes) |
Messages: | 2,422,107 |