Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
Instead, what we see is one language (Python3) that has an even worse misfeature: You can set an individual code point in a string; see
above for the things you get when you overwrite code points.
But why would one want to set individual code points?
Thomas Koenig <[email protected]> writes:
E.g., consider the following Gforth code (others can tell you how to
do it in Python):
"Ko\u0308nig" cr type
The output is:
König
That is, the second character consists of two Unicode code points, the
"o" and the "\u0308" (Combining Diaeresis).
(I think that somewhere along the way from the Forth system to the
xterm through copying and pasting into Emacs the second character has
become precomposed, but that's probably just as well, so you can see
what I see).
If I replace the third code point with an e, I get "Koenig". So by overwriting one code point, I insert a character into the string.
If instead I replace the second code point with a "\u0316" (Combining
Grave Accent Below):
"K\u0316\u0308nig" cr type
I get this (which looks as expected in my xterm, but not in Emacs)
K̖̈nig
The first character is now a K with a diaresis above and an accent
grave below and there are now a total of 4 characters, but still 6
code points in the string; the second character has been deleted by
this code-point replacement.
I think people in Japan should be able to use printf by using プリントフ There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.
I think people in Japan should be able to use printf by using プリントフ
Anton Ertl:]
Thomas Koenig:]
Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
I suspect "individual character" meant "code point" above.
Does Unicode even has the notion of "character", really?
Instead, what we see is one language (Python3) that has an even worse
misfeature: You can set an individual code point in a string; see
above for the things you get when you overwrite code points.
I think it's fairly common for languages that started with strings
as "arrays of 8bit chars".
Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁
It's really hard to get rid of it, even though it's used *very* rarely.
In ELisp, strings are represented internally as utf-8 (tho it pretends
to be an array opf code points), so an assignment that replaces a single
char can require reallocating the array!
But why would one want to set individual code points?
Because you know your string only contains "characters" made of a single
code point?
E.g. your string contains the representation of the border of a table
(to be displayed in a tty), and you want to "move" the `+` of a column >separator (or a prettier version that takes advantage of the wider
choice offered by Unicode).
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not. But applications like palindrome checkers care
about characters, not code points.
It seems to me (in my vast ignorance) that names for things should be
written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property
of the display what character set(s) it can properly emit, and thereby
alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Only the display device needs to understand this mapping and NOT the >program/software/device holding the string.
I think people in Japan should be able to use printf by using プリントフ >There is way to much "english" in the way computers are being used.
Anton Ertl <[email protected]> schrieb:
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not. But applications like palindrome checkers care
about characters, not code points.
Considering the huge market for palindrome checkers, that is a
real concern, especially if they involve characters for which
UTF-32 is not sufficient, such as smileys.
Is there any language whose characters cannot be represented in
UTF-32?
[email protected] (MitchAlsup1) writes:
It seems to me (in my vast ignorance) that names for things should be >>written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property >>of the display what character set(s) it can properly emit, and thereby >>alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Why do you think that K̖̈nig should be written as Koenig or König?
However, for König
Unicode specifies that the precomposed form is
König. And if you want a transcription into ASCII with the knowledge
that it's German, the result would be Koenig.
Anton Ertl <[email protected]> schrieb:
[email protected] (MitchAlsup1) writes:
It seems to me (in my vast ignorance) that names for things should be>>> written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property >>> of the display what character set(s) it can properly emit, and thereby
alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Why do you think that K̖̈nig should be written as Koenig or König?
On my display, this read K, n with a diacritic and something close to
a cedille under the n.
However, for König
Again, the diaresis is over the n, not the o.
Unicode specifies that the precomposed form is
König. And if you want a transcription into ASCII with the knowledge
that it's German, the result would be Koenig.
This is actually sometimes a (fairly minor) problem because the
name on my passport actually reads "König" (o-diacritic), but
people without knowledge of German tend to translscribe this as
"Konig", whereas I transcribe it as "Koenig" on offical forms
such as the one I need to fill out prior to entering the US.
This is why modern EU passports have a canonical form of the
name, which then is "KOENIG".
Canonical simplification of the 'ø' character is either 'o' or 'oe', and passports and airline tickets differ, something which can cause all
sorts of issues with US passport control.
Anton Ertl <[email protected]> schrieb:
Why do you think that K̖̈nig should be written as Koenig or König?
On my display, this read K, n with a diacritic and something close to
a cedille under the n.
However, for König
Again, the diaresis is over the n, not the o.
Terje Mathisen <[email protected]> schrieb:
Canonical simplification of the 'ø' character is either 'o' or 'oe', and >> passports and airline tickets differ, something which can cause all
sorts of issues with US passport control.
Reminds me of either "Asterix and the Great Crossing" or "Asterix
and the Normans", where Viking speach was indicated by having
slashes through letters (like ø). When Obelix tries to speak
their language, he also applies slashes, but does so randomly
(like through a c) so nobody can understand him.
Hmm... a challenge, can this be represented as Unicode codepoints?
Considering the huge market for palindrome checkers, that is a
real concern, especially if they involve characters for which
UTF-32 is not sufficient, such as smileys.
Is there any language whose characters cannot be represented in
UTF-32?
A similar concept was implemented in COBOL, where the designers though
that having to write
ADD A TO B GIVING C
or somesuch makes programming easier than writing
C = A+B
in FORTRAN.
I think people in Japan should be able to use printf by using ?????
There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.
and
because it was supposedly "self documenting", easier for managers, etc.
to see how the program worked.
Remember back in the early 8-bit days of computing, and before them,
when schools were exposing children to PDP-8 computers?
Children were learning to program computers in BASIC.
Obviously, here, if children in other countries used modified versions
of BASIC that used keywords in their own natural language, it would be
much easier for them to get started with programming than if the
keywords were simply arbitrary strings of letters, taken from a
foreign language of which they may not necessarily have any knowledge.
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
Historical note: Algol was originally called IAL; remember what JOVIAL
stood for.
John Savard wrote:
Historical note: Algol was originally called IAL; remember what
JOVIAL stood for.
Who was Joe ?? in Jovial
MitchAlsup1 wrote:
John Savard wrote:
Historical note: Algol was originally called IAL; remember what
JOVIAL stood for.
Who was Joe ?? in Jovial
Just in case you weren't joking,
Jules Own Version of the International Algorithmic Language
Jules was Jules Schwartz
https://en.wikipedia.org/wiki/Jules_Schwartz
I meant character, not code point, as should have become clear fromI suspect "individual character" meant "code point" above.Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
the following. I think that Thomas Koenig meant "character", too, but
he may have been unaware of the difference between "character" and
"Unicode code point".
OTOH, most code can be implemented fine as working on strings, without knowing how many characters there are in the string (and it then does
not need to know about code points, either).
Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁One way forward might be to also provide a string-oriented API with
It's really hard to get rid of it, even though it's used *very* rarely.
In ELisp, strings are represented internally as utf-8 (tho it pretends
to be an array opf code points), so an assignment that replaces a single
char can require reallocating the array!
byte (code unit) indices, and recommend that people use that instead
of the inefficient code-point-indexed API.
Because you know your string only contains "characters" made of a single
code point?
This incorrect "knowledge" may be the reason why Emacs 27.1 displays
K̖̈nig
as if the first three-code-point character actually was three characters.
E.g. your string contains the representation of the border of a tableThese kinds of things involve additional complications.
(to be displayed in a tty), and you want to "move" the `+` of a column
separator (or a prettier version that takes advantage of the wider
choice offered by Unicode).
I meant character, not code point, as should have become clear fromI suspect "individual character" meant "code point" above.Assume you're implementing a language which has a function of setting >>>>> an individual character in a string.That's a design mistake in the language, and I know no language that
has this misfeature.
the following. I think that Thomas Koenig meant "character", too, but
he may have been unaware of the difference between "character" and
"Unicode code point".
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings. 🙁
OTOH, most code can be implemented fine as working on strings, without
knowing how many characters there are in the string (and it then does
not need to know about code points, either).
Indeed, most operations on strings are conversion of things to strings, concatenation of strings, search (typically for a substring or a regexp), extraction of substring where the boundaries result from an earlier
search, and parsing (which at the bottom relies often on some sort of
regexp or equivalent system).
All of those work just fine on a UTF-8 sequence of bytes.
Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁One way forward might be to also provide a string-oriented API with
It's really hard to get rid of it, even though it's used *very* rarely.
In ELisp, strings are represented internally as utf-8 (tho it pretends
to be an array opf code points), so an assignment that replaces a single >>> char can require reallocating the array!
byte (code unit) indices, and recommend that people use that instead
of the inefficient code-point-indexed API.
I think the long term solution for ELisp will be to declare strings as basically immutable.
Because you know your string only contains "characters" made of a single >>> code point?
This incorrect "knowledge" may be the reason why Emacs 27.1 displays
K̖̈nig
as if the first three-code-point character actually was three characters.
No, the above seems like a problem in the redisplay code, and that code
is quite aware of combining characters and stuff. You're probably
seeing simply a missing rule to allow composition/shaping of your word.
(the composition/shaping library operates on whole strings at a time,
but Emacs tends to be quite conservative about the string-chunks it
sends to that library).
I recommend you `M-x report-emacs-bug`. The fix should be fairly simple.
E.g. your string contains the representation of the border of a tableThese kinds of things involve additional complications.
(to be displayed in a tty), and you want to "move" the `+` of a column
separator (or a prettier version that takes advantage of the wider
choice offered by Unicode).
Very much so, indeed. It usually breaks down in many different ways
because of the common-but-not-guaranteed assumptions.
Stefan
I meant character, not code point, as should have become clear from
the following. I think that Thomas Koenig meant "character", too, but
he may have been unaware of the difference between "character" and
"Unicode code point".
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings.
On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:
and
because it was supposedly "self documenting", easier for managers,
etc. to see how the program worked.
Of course, if they designed COBOL that way, why did they include a
statement that let you re-direct GOTO statements from elsewhere in a
program?
I mean, that was just asking for dishonest programmers to direct the
odd pennies into their bank accounts and so on.
John Savard wrote:
On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld"
<[email protected]d> wrote:
and
because it was supposedly "self documenting", easier for managers,
etc. to see how the program worked.
Of course, if they designed COBOL that way, why did they include a
statement that let you re-direct GOTO statements from elsewhere in a
program?
That feature (Alter GOTO) was also in Fortran, as the, long since
deprecated, assigned GOTO statement.
Rumor has it that the AD statement was regularly abused,
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not.
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings. 🙁
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
John Savard wrote:
Historical note: Algol was originally called IAL; remember what JOVIAL
stood for.
Who was Joe ?? in Jovial
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode terms)
is (represented by) a non-combining code point combined with all the >immediately-following combining code points.
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the
notion of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode >> terms) is (represented by) a non-combining code point combined with all
the immediately-following combining code points.
Take another look at the table I referred to yesterday. When you have
ZWJ the rules of what combines with what gets awfully complicated.
On Mon, 27 May 2024 15:16:13 -0000 (UTC), John Levine wrote:
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the
notion of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode >>> terms) is (represented by) a non-combining code point combined with all
the immediately-following combining code points.
Take another look at the table I referred to yesterday. When you have
ZWJ the rules of what combines with what gets awfully complicated.
ZWJ is classed as “punctuation”, and has no combining class. So it forms a
“character” or “grapheme” it its own right.
Really, you need to look at that combined emoji table I told you about yesterday.
On Tue, 28 May 2024 01:25:38 -0000 (UTC), John Levine wrote:
Really, you need to look at that combined emoji table I told you about
yesterday.
I’m just telling you what the official Unicode spec says.
On Sun, 19 May 2024 15:32:49 -0600, John Savard wrote:
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
Much of its syntax came from mathematics, which is international.
Semi-related question: are there non-English equivalents for mathematical operators like “grad”, “div” and “curl”?
Anyway, the Emacs Lisp functions right-char (and, after testing, also left-char, forward-char, and backward-char) support the notion of
character at least for some scripts. That may be the result of an interaction with the redisplay code that you mention later, but in
that case it's that code that knows about characters in Unicode.
Um, so am I. Those nine code point things are supposed to display
as a single little picture, regardless of what some other bit of
the spec may assert about ZWJ.
Anyway, the Emacs Lisp functions right-char (and, after testing, also
left-char, forward-char, and backward-char) support the notion of
character at least for some scripts. That may be the result of an
interaction with the redisplay code that you mention later, but in
that case it's that code that knows about characters in Unicode.
Indeed, the concept is somewhat visible, but it's not really exposed in
the language. I think what you're seeing is implemented elsewhere than
in `forward-char`, it's a part of the interactive loop which sees that
after `forward-char` you end up "in the middle" of a composition and it
moves the point further, based on information that mostly belongs to the >redisplay code.
Try `C-u 2 C-f` and I suspect you'll see that it doesn't always advance
by 2 characters but rather it advances by "2 code points + rounding up
to the next character boundary".
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the
wide variation in character sets between computer systems.
Confirmed. So Emacs Lisp has a codepoint-oriented interface and then
needs to compensate for that elsewhere. This does not indicate that a codepoint-oriented interface is a good idea, rather the opposite.
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters
(a grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the
wide variation in character sets between computer systems.
COBOL did it. LISP did it.
It's just that the Algol 60 committee did not want to go there.
They wanted symbols like “÷”, “×”, “↑”, “≤”, “≥”, “≠”, “≡”, “⊃”, “∨”,
“∧”, “¬” ... you get the idea. I don’t any computer system on earth
could provide all those symbols at the time, or even, say, 20 years
later.
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions
"Stephen Fuld" <[email protected]d> writes:
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
I don�t any computer system on earth could
provide all those symbols at the time, or even, say, 20 years later.
Tim Rentsch wrote:
"Stephen Fuld" <[email protected]d> writes:
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
OK, but my main point was to show, by counter example, the error of Lawrence's statement quoted below
I don't any computer system on earth could provide all those
symbols at the time, or even, say, 20 years later.
If the part about the difficulty of learning APL was wrong, then I
apologise.
I'm not sure the codepoint-oriented API is the best option, but it's not >completely clear what *is* the best option. You mention a byte-oriented
API and you might be right that it's a better option, but in the case of >Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from
scratch now (i.e. without having to contend with backward compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the >>wide variation in character sets between computer systems.
COBOL did it. LISP did it. It was feasible in 1960. It's just that
the Algol 60 committee did not want to go there.
And so did Fortran. They all did it by severely curtailing their allowed >character sets.
It's just that the Algol 60 committee did not want to go there.
They wanted symbols like ���, �ה, �?�, �?�, �?�, �?�, �?�, �?�, �?�, �?�, >��� ... you get the idea. I don�t any computer system on earth could
provide all those symbols at the time, or even, say, 20 years later.
If the part about the difficulty of learning APL was wrong, then I
apologise.
But they _were_ fairly U.S. - centric, and Algol was *not*. For
example,
On Thu, 30 May 2024 06:12:11 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:
If the part about the difficulty of learning APL was wrong, then I
apologise.
I would not say that it was wrong. APL "without special characters"
was achieved by way of a transliteration scheme, where short codes represented the special characters. So instead of memorizing funny
shapes, you memorized cryptic abbreviations.
So the character set was _still_ the source of the difficulty of
learning APL even if you happened to be using an implementation that
didn't have any special characters.
Stefan Monnier <[email protected]> writes:
I'm not sure the codepoint-oriented API is the best option, but it's not
completely clear what *is* the best option. You mention a byte-oriented
API and you might be right that it's a better option, but in the case of
Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from
scratch now (i.e. without having to contend with backward compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
Plus, editors are among the very few uses where you have to deal with individual characters, so the "treat it as opaque string" approach
that works so well for most other code is not good enough there. The command-line editor of Gforth is one case where we use the xchar words
(those for dealing with code points of UTF-8).
- anton
On 5/30/2024 11:25 AM, Anton Ertl wrote:
Stefan Monnier <[email protected]> writes:
I'm not sure the codepoint-oriented API is the best option, but it's
not
completely clear what *is* the best option. You mention a
byte-oriented
API and you might be right that it's a better option, but in the case
of
Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from
scratch now (i.e. without having to contend with backward
compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
Plus, editors are among the very few uses where you have to deal with
individual characters, so the "treat it as opaque string" approach
that works so well for most other code is not good enough there. The
command-line editor of Gforth is one case where we use the xchar words
(those for dealing with code points of UTF-8).
Yeah.
For text editors, this is one of the few cases it makes sense to use 32
or 64 bit characters (say, combining the 'character' with some
additional metadata such as formatting).
Though, one thing that makes sense for text editors is if only the "currently being edited" lines are fully unpacked, whereas the others
can remain in a more compact form (such as UTF-8), and are then
unpacked
as they come into view (say, treating the editor window as a 32-entry
modulo cache or similar).
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
If a line is modified, it can be reallocated at the end of the buffer,
and if the buffer gets full, it can be "repacked" and/or expanded as
needed. When written back to a file, the buffer lines can be emitted in-order to the text file.
Not entirely sure how other text editors manage things here, not really
looked into it.
--- Synchronet 3.20a-Linux NewsLink 1.114- anton
BGB wrote:
On 5/30/2024 11:25 AM, Anton Ertl wrote:
Stefan Monnier <[email protected]> writes:
I'm not sure the codepoint-oriented API is the best option, but it's
not
completely clear what *is* the best option. You mention a
byte-oriented
API and you might be right that it's a better option, but in the case
of
Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from >>>> scratch now (i.e. without having to contend with backward
compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
Plus, editors are among the very few uses where you have to deal with
individual characters, so the "treat it as opaque string" approach
that works so well for most other code is not good enough there. The
command-line editor of Gforth is one case where we use the xchar words
(those for dealing with code points of UTF-8).
Yeah.
For text editors, this is one of the few cases it makes sense to use 32
or 64 bit characters (say, combining the 'character' with some
additional metadata such as formatting).
Though, one thing that makes sense for text editors is if only the
"currently being edited" lines are fully unpacked, whereas the others
can remain in a more compact form (such as UTF-8), and are then
unpacked
as they come into view (say, treating the editor window as a 32-entry
modulo cache or similar).
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..}
along with text from different fonts and different backgrounds on a per character basis.
If a line is modified, it can be reallocated at the end of the buffer,
and if the buffer gets full, it can be "repacked" and/or expanded as
needed. When written back to a file, the buffer lines can be emitted
in-order to the text file.
Not entirely sure how other text editors manage things here, not really
looked into it.
If you think about it with the above features, you quickly realize it
is not just text anymore.
- anton
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..}
along with text from different fonts and different backgrounds on a per
character basis.
Errm, I think we call this a word processor, not a text editor.
Granted, text editors don't usually store font or formatting
information
in the text itself, but rather it exists temporarily for things like
"syntax highlighting".
If a line is modified, it can be reallocated at the end of the buffer,
and if the buffer gets full, it can be "repacked" and/or expanded as
needed. When written back to a file, the buffer lines can be emitted
in-order to the text file.
Not entirely sure how other text editors manage things here, not really
looked into it.
If you think about it with the above features, you quickly realize it
is not just text anymore.
But, word processors are their own category...
Typically, they also have their own specialized formats (though, "big
blob of XML inside a ZIP package" seems to have become popular).
Whereas text-editors typically use plain ASCII/UTF-8/UTF-16 files...
The great "feature creep" in text editors is mostly that modern ones
support syntax highlighting and emojis.
An intermediate option would be a wysiwyg editor that does MediaWiki or
Markdown. Though, annoyingly, there don't seem to be any that exist as standalone desktop programs (seemingly invariably they are written in JavaScript or similar and intended to operate inside a browser).
I might eventually need to get around to writing something like this
(mostly because I use MediaWiki notation for some of my own
documentation). Also arguably mode advanced than the system used by
"info" and "man", though a tool along these lines could make sense (but
possibly as an intermediate, with an interface more like "man" but able
to jump between documents more like "info").
Also, bug hunt is annoying. Find/fix one bug, but more bugs remain...
My project is seemingly in a rather buggy state right at the moment.
But, I guess, did add things like file redirection and similar, along
with a few more standard commands.
So, in the working version, technically things like "cat file1 > file2"
or "program > file" and similar are now technically possible...
But, also, everything has turned into a crapstorm of crashes...
--- Synchronet 3.20a-Linux NewsLink 1.114
- anton
U.S.-centric vs U.S. eccentric. >http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Actually I am pretty sure that "eccentric" is not a fair
characterisation of his personality, but can't resist.
BGB wrote:
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..}
along with text from different fonts and different backgrounds on a per
character basis.
Errm, I think we call this a word processor, not a text editor.
So, you are calling AOL e-mail editor a word processor ???
And every modern forum editor (this one not included) word processors
According to Michael S <[email protected]>:
U.S.-centric vs U.S. eccentric. >>http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Actually I am pretty sure that "eccentric" is not a fair
characterisation of his personality, but can't resist.
He was my thesis advisor and he was pretty eccentric. In a nice way,
but still quite a character.
[email protected] (MitchAlsup1) writes:
BGB wrote:
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
For the rest, say, one can have, say, a big buffer, with an array of >>>>> lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif, >>>> ..}
along with text from different fonts and different backgrounds on a per >>>> character basis.
Errm, I think we call this a word processor, not a text editor.
So, you are calling AOL e-mail editor a word processor ???
Yep.
And every modern forum editor (this one not included) word processors
Yep. They're certainly not text editors along the lines of vim or emacs.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 762 |
Nodes: | 10 (0 / 10) |
Uptime: | 104:20:18 |
Calls: | 12,295 |
Calls today: | 1 |
Files: | 186,558 |
Messages: | 2,254,837 |