Forum: War Ensemble BBS

Concertina IV Has Arrived

From quadi@[email protected] to comp.arch on Tue May 19 20:14:37 2026

From Newsgroup: comp.arch

It had to happen?
I was not sure if it ever could happen.
There was Concertina II - an attempt at a practical ISA, unlike the
original Concertina, which was merely illustrative.
But it had a block structure, which was highly criticized. And I had to
admit it was overly complicated. And so I went on and used the Concertina
III designation again - for a CISC-like instruction set with variable-
length instructions. The price was, though, switching to register banks of
16 registers instead of 32.
The IBM 360 had banks of 16 registers, and more modern CISC designs, like
the 680x0 and the x86 have banks of only eight. Only RISC designs can
offer banks of 32 registers.
And yet it seemed so tantalizingly close - that it might just be possible, using what I've learned about squeezing an ISA into the available opcode space, to go back to banks of 32 registers.
I found it to be possible - at a price.
It could be done, but I wouldn't have much space left for 16-bit short instructions.
Even if I had lots of space for 16-bit short instructions, though, they
would still, just by being 16 bits long, where the banks of registers have
32 registers in them, be badly compromised.
And so I decided to offer only a very limited set of 16-bit short instructions, and to chiefly provide... 24-bit short instructions.
I didn't want to depart from the example of the 680x0 and the System/360
to allow instructions to start on odd bytes, but it seemed like I had no choice if I wanted to offer a reasonably complete set of short
instructions at all.
Concertina IV is described at:
http://www.quadibloc.com/arch/cw01int.htm

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 00:03:27 2026

From Newsgroup: comp.arch

I've made my first change to Concertina IV. I'm not happy with the way
things were before the change or the way they are now, so I may change it again.

The 16-bit short instructions only have 12 free bits available. That's not much to work with when there are 32 registers in each register bank.

Initially, I settled on four bits of opcode, along with the basic register specification scheme used for the 15-bit paired short instructions in Concertina II.

But choosing single and double precision floating-point as the only two
types supported didn't rest easily with me. Single precision isn't really precise enough to be useful, or so I've heard.

The alternative of supporting 48-bit intermediate precision and double precision, while it appeals to me personally... is clearly untenable.
Medium is a nonstandard data type, and so it would not be widely used.

So instead I decided to only support double precision, and use the extra
bits to allow additional ways to specify registers.

The result, of course, is messy.

So I'm considering going back to the earlier format, but instead of
supporting two floating-point data types, to support one integer type and
one floating type. But which integer type? 32-bit integer, or 64-bit long?

I could get more bits by going to _paired_ instructions. But I have some
free space between 32-bit instructions so that I could just add those
while keeping 16-bit short instructions.

And this also led me to thinking about something else.

I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So integer operations must sign-extend if they're on values shorter than 64 bits.

Propagating a bit takes time.

So should I design the ALU so that the sign extension takes place after
the rest of the instruction, and allow another 32-bit (or shorter) integer instruction to use results when they're ready, before sign extension? Is
that just normal efficiency, or wasteful complexity?

In any case, I think I've come up with something that is a reasonable compromise I can live with after all.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 00:34:59 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 00:03:27 +0000, quadi wrote:

In any case, I think I've come up with something that is a reasonable compromise I can live with after all.

And what was that compromise?

When it comes to floating-point types, there was only one that I valued
above all the rest, so I couldn't decide what second one to use.

With integer types, on the other hand, there were two types that I
couldn't decide between.

So go with it!

Support two integer types - even with room for logical as well as
arithmetic operations - but with a more limited specification of source
and destination registers... and one floating-point type.

Done!

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed May 20 01:35:01 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

I've made my first change to Concertina IV. I'm not happy with the way things were before the change or the way they are now, so I may change it again.

The 16-bit short instructions only have 12 free bits available. That's not much to work with when there are 32 registers in each register bank.

Initially, I settled on four bits of opcode, along with the basic register specification scheme used for the 15-bit paired short instructions in Concertina II.

But choosing single and double precision floating-point as the only two types supported didn't rest easily with me. Single precision isn't really precise enough to be useful, or so I've heard.

Everything you have heard is both true and false::

There are many applications where DP is de rigueur {galactic simulations} smaller precision simply will not do. Many of these would like to go
FP128 but performance is not there yet.
There is a growing demand for FP16 and FP8 data types for memory-size
and BW reasons.
There is a growing background need for FP128, too.

The alternative of supporting 48-bit intermediate precision and double precision, while it appeals to me personally... is clearly untenable.
Medium is a nonstandard data type, and so it would not be widely used.

So instead I decided to only support double precision, and use the extra bits to allow additional ways to specify registers.

My 66000 started out that way and the compiler showed that this choice sucks.

The result, of course, is messy.

No it becomes unacceptable when FP32 takes 3 instructions while FP64
takes but 1.

So I'm considering going back to the earlier format, but instead of supporting two floating-point data types, to support one integer type and one floating type. But which integer type? 32-bit integer, or 64-bit long?

You will find you have no <marketable> choice; you need to support::

Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}

I could get more bits by going to _paired_ instructions. But I have some free space between 32-bit instructions so that I could just add those
while keeping 16-bit short instructions.

And this also led me to thinking about something else.

I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So integer operations must sign-extend if they're on values shorter than 64 bits.

Go LE all the way. LE won get over BE thinking.

As far as integers go: all calculations produce proper integer values
in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...

Propagating a bit takes time.

A solved HW gate-level problem.

So should I design the ALU so that the sign extension takes place after
the rest of the instruction, and allow another 32-bit (or shorter) integer instruction to use results when they're ready, before sign extension? Is that just normal efficiency, or wasteful complexity?

All the sign and zero stuff goes "in the CARRY chain".

In any case, I think I've come up with something that is a reasonable compromise I can live with after all.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 02:09:08 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.

Go LE all the way. LE won get over BE thinking.

a) I didn't think this really had anything to do with little-endian versus big-endian.

b) Yes, little-endian is more popular, but that's just because the PDP-11, 8080, and 6502 happened to choose it. Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.

As far as integers go: all calculations produce proper integer values in
the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...

If you have 64 bit registers, then if you want to avoid a gap between the
sign in a 32-bit number and the sign of a 64-bit number by placing the 32-
bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.

Propagating a bit takes time.

A solved HW gate-level problem.

That's good news, then I don't have a problem. I figured the solution
would be to use slightly slower gates with larger current output.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 02:26:42 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 02:09:08 +0000, quadi wrote:

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.

Go LE all the way. LE won get over BE thinking.

a) I didn't think this really had anything to do with little-endian
versus big-endian.

b) Yes, little-endian is more popular, but that's just because the
PDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.

As far as integers go: all calculations produce proper integer values
in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...

If you have 64 bit registers, then if you want to avoid a gap between
the sign in a 32-bit number and the sign of a 64-bit number by placing
the 32-
bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.

While the majority of computers nowadays are little-endian, back in the
old days only a very few computers treated fixed-point numbers as
fractions in the range [-1,1) instead of as integers.

Those that did that either wasted a bit in double-word integers, or
required one to do a right shift by one bit after doing a multiplication
if you wanted the result of the multiplication to correspond with treating
the numbers as integers instead.

This, not little-endian versus big-endian, was what I was talking about
not doing.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 07:21:04 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

Everything you have heard is both true and false::

There are many applications where DP is de rigueur {galactic
simulations} smaller precision simply will not do. Many of these would
like to go FP128 but performance is not there yet.
There is a growing demand for FP16 and FP8 data types for memory-size
and BW reasons.
There is a growing background need for FP128, too.

I'm aware of all of this.

You will find you have no <marketable> choice; you need to support::

Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}

I *do* intend to support them all. However, U8, U16, U32, and U64 don't
get special instructions; the compiler will just have to remember the
meaning of the condition codes for signed numbers when doing comparisons
on unsigned numbers.

Actually, though, that does mean I have to modify the conditional branch instructions. One will actually want to test for combinations of less,
equal, and greater when overflow is present, and I've assumed that some combinations can be excluded!

So in commenting on a different part of my design entirely, you've pointed
out an important flaw I will have to correct.

It's just that the pigeonhole principle prevents me, quite effectively,
from supporting them all *in 16-bit short instructions with only 12 bits available*. I don't care what marketing says; I believe engineering when
they say they can't do the impossible.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed May 20 05:38:07 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

b) Yes, little-endian is more popular, but that's just because the PDP-11, >8080, and 6502 happened to choose it.

Thinking about it:

* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.

* 8080: Yes, because AMD64 inherited its byte order from it. But if
we go to the origin here, it's not the 8080 and not the 8008, but
the Datapoint 2200, which is remarkable, because it was designed as
a terminal for mainframes, and S/360 is big-endian.
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:

|The fact that most laptops and cloud computers today store numbers
|in little-endian format is carried forward from the original
|Datapoint 2200. Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries. Microprocessors descended from the
|Datapoint 2200 (the 8008, Z80, and the x86 chips used in most
|laptops and cloud computers today) kept the little-endian format
|used by that original Datapoint 2200.

* 6502: Yes, because ARM A64 inherited its byte order from it. The
6502 is remarkable because it is a child of the 6800, which is
big-endian. So the choice of little-endian byte order was
deliberate.

RISC-V inherits its original byte order from the descendents of 8080
and 6502. The ISA manual comments on this:

|We originally chose little-endian byte ordering for the RISC-V memory
|system because little-endian systems are currently dominant
|commercially (all x86 systems; iOS, Android, and Windows for ARM). A
|minor point is that we have also found little-endian memory systems to
|be more natural for hardware designers. However, certain application
|areas, such as IP networking, operate on big-endian data structures,
|and certain legacy code bases have been built assuming big-endian
|processors, so we have defined big-endian and bi-endian variants of
|RISC-V.
[...]
|We further make the instruction parcels themselves little-endian to
|decouple the instruction encoding from the memory system endianness |altogether.

I expect that big-endian RISC-V's will be as common as big-endian
Alphas and big-endian ARMs (all Alphas and ARMs after a certain point
in time support a big-endian mode), i.e., not at all.

Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.

It certainly does. I know it because we had a group exercise in
assembly language on 80286s that dealt with BCD numbers, and we split
the project into submodules, one for each student. In integration
testing we found that we had forgotten to specify the byte order in
our interface descriptions. Two in our group, two students (including
me) had chosen little-endian and IIRC two had chosen big-endian. I
did not find that doing the BCD stuff in little-endian byte order did
not work well.

With the BCD support of instruction sets typically requiring piecing
together the complete operation of suboperations of less than full
length (e.g., bytes on the 6502 and the 80(2)86), little-endian is
actually easier. When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want
to completely unroll the loop that handles these bytes.

Note that the 6502 includes BCD support with its decimal mode, and the designers of the 6502 obviously did not agree with the claim you made
above.

When the 8080 added BCD support in form of the DAA instruction (the
8086 added DAS), the byte order decision had already been made with
the Datapoint 2200, but if they really thought that decimal operation
is a good reason for big-endian byte order, they could have done what
the 6502 had done and switched the byte order around from its
ancestors.

On the other hand, given that the 6502 and 8080 BCD support worked on
bytes, the programmers were free to choose any byte order they prefer,
as our student project proved. Maybe some (how many?) of the
programmers who wrote BCD code for the 6502 and for the 8080 and its descendants actually chose a big-endian format. Things get more
interesting if the granularity of BCD support is bigger than a byte,
e.g., on the HPPA or IIRC S/360.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed May 20 08:10:10 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> schrieb:

* 8080: Yes, because AMD64 inherited its byte order from it. But if
we go to the origin here, it's not the 8080 and not the 8008, but
the Datapoint 2200, which is remarkable, because it was designed as
a terminal for mainframes, and S/360 is big-endian.
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:

|The fact that most laptops and cloud computers today store numbers
|in little-endian format is carried forward from the original
|Datapoint 2200. Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries. Microprocessors descended from the
|Datapoint 2200 (the 8008, Z80, and the x86 chips used in most
|laptops and cloud computers today) kept the little-endian format
|used by that original Datapoint 2200.

For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.

See https://www.righto.com/2014/12/inside-intel-1405-die-photos-of-shift.html --
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bernd Linsel@[email protected] to comp.arch on Wed May 20 10:42:44 2026

From Newsgroup: comp.arch

On 5/20/26 04:09, quadi wrote:

b) Yes, little-endian is more popular, but that's just because the

PDP-11,

8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.

For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when performing an addition, you can proceed bytewise on ascending addresses.

As a consequence should packed decimals in registers also be little
endian, conceding the fact that the classic byte-wise representation is
skewed (but when displaying words, the reading order is natural).
--
Bernd Linsel
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed May 20 08:36:05 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:

|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.

[...]

For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.

Looks plausible at first, but when I think about it some more, both
claims are wrong.

Yes, you start with the least significant bit, but given that the
architecture is not bit-addressed, this is irrelevant.

The architecture is byte-addressed, and the ALU only works on a single
byte, so the ALU does not work any better for little-endian than for big-endian.

For the 6502 dealing with carries in addressing, both in the relative addressing of conditional branches, and in the indexed addressing
modes with 16-bit base addresses, little-endian made the
implementation a little simpler. The Datapoint 2200 does not have
indexed addressing modes, so relative branches may have been the issue
(if the DataPoint 2200 has them).

Did I miss any other reason why little-endian byte order is easier to
implement on these processors than big-endian?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed May 20 10:37:39 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> schrieb:

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:

|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.

[...]

For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.

Looks plausible at first, but when I think about it some more, both
claims are wrong.

Unfortunately, you are mistaken.

Yes, you start with the least significant bit, but given that the architecture is not bit-addressed, this is irrelevant.

JMP with a two-byte address was little-endian on the Datapoint 2200,
and so had to be on the Intel 8808, which had to be binary compatible
with the TTL CPU of the 2200.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Wed May 20 15:03:22 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) writes:

quadi <[email protected]d> writes:

b) Yes, little-endian is more popular, but that's just because the PDP-11, >>8080, and 6502 happened to choose it.

Thinking about it:

With the BCD support of instruction sets typically requiring piecing
together the complete operation of suboperations of less than full
length (e.g., bytes on the 6502 and the 80(2)86), little-endian is
actually easier. When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go >backwards from there. This is especially relevant if you do not want
to completely unroll the loop that handles these bytes.

The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.

"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"

The algorithm used a 9's counter to track the leading
digits.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 15:28:16 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:

* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.

The reason I blame the PDP-11 for everything is that it was a hugely influential machine. It was widely used in academic settings, and it was
also the machine for which UNIX was first widely distributed.

When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go backwards from there. This is especially relevant if you do not want to completely unroll the loop that handles these bytes.

This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.

The reason I claim that BCD support strongly favors big-endian byte order
is this:

Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are written from left to right, numerals appear in texts with the most
significant digit first.

So if one has a hardware instruction to convert from BCD to the string representation of numbers, such as UNPK or EDIT, then those two representations should have the same endian-ness.

And if one wants to use the same ALU for binary and BCD arithmetic, then
those have to have the same endianness.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Wed May 20 15:32:41 2026

From Newsgroup: comp.arch

Bernd Linsel <[email protected]> writes:

On 5/20/26 04:09, quadi wrote:

b) Yes, little-endian is more popular, but that's just because the

PDP-11,

8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.

For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.

Burroughs figured that problem out a half century ago, and were
able to add two big-endian BCD numbers memory-to-memory handling
overflow (by counting leading 9s). Overflow was detected before
the receiving field was modified (without intermediate or internal
storage) by counting leading 9s.

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed May 20 11:04:44 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:

|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>> |in order to handle carries.

[...]

For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.

Looks plausible at first, but when I think about it some more, both
claims are wrong.

Unfortunately, you are mistaken.

A claim without any supporting argument.

Yes, you start with the least significant bit, but given that the
architecture is not bit-addressed, this is irrelevant.

JMP with a two-byte address was little-endian on the Datapoint 2200,

Yes, but is the bit-serial memory the reason for that? No, the ALU is
not involved, and they could just have decided to represent the
address in big-endian byte order, and the 16 bits into the PC (or
next-PC) register.

The conditional jump instructions of the Datapoint 2200 also have
absolute target addresses and don't involve the ALU.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed May 20 15:42:03 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:

* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.

The reason I blame the PDP-11 for everything is that it was a hugely >influential machine. It was widely used in academic settings, and it was >also the machine for which UNIX was first widely distributed.

But its byte order was not influential into this century. Unix and
its applications are portable, including between byte orders (or at
least they were, when there were still enough machines of either byte
order around that one could test that). And somehow the PDP-11 and
its offspring did not capture the workstation market and the server
market that involved from that, and which constituted the Unix
markets.

Instead, the big-endian 68000 and its offspring dominated that market
for a while, and was replaced with RISCs later, which had the same
byte order as the earlier machines from the same company (i.e.,
little-endian for DEC and big-endian for the others). And when the
market for workstations and server on RISCs shrunk down to almost
nothing, not only did these big-endian machine vanish, but the
offspring of the PDP-11 as well (and actually before some of the
big-endian RISCs). What remains of this world is AIX on Power, and I
have no idea how many installations there still are.

Linux on Power was switched to little-endian with the introduction of OpenPower, not because of the PDP-11 descendants, but because of the
Datapoint 2200 descendants. And the Datapoint 2200 (announced in June
1970) was probably not influence by the PDP-11 (announced in January
1970).

When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to
completely unroll the loop that handles these bytes.

This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.

If the numbers fit in one granule, yes, that benefit does not matter.
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?

The reason I claim that BCD support strongly favors big-endian byte order
is this:

Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are >written from left to right, numerals appear in texts with the most >significant digit first.

So if one has a hardware instruction to convert from BCD to the string >representation of numbers, such as UNPK or EDIT, then those two >representations should have the same endian-ness.

Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.

For doing it for more than one granule, you have to pay the big-endian
cost on that conversion (for storing into the string, the loading of
the BCD number would still be in little-endian order), but at least
not for the arithmetic operations.

And if one wants to use the same ALU for binary and BCD arithmetic, then >those have to have the same endianness.

Sure, but that's not a reason to use big-endian byte order, see above.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed May 20 17:01:56 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> schrieb:

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description> >>>>> says:

|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>>> |in order to handle carries.

[...]

For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.

Looks plausible at first, but when I think about it some more, both
claims are wrong.

Unfortunately, you are mistaken.

A claim without any supporting argument.

Then maybe some more explanation is needed. It is sometimes difficult
to think back to the limitations those designers faced.

The 2200 did not have byte-addressable memory; memory contents only
could be used when they bubbled up through the shift registers.
Otherwise, the CPU had to wait. (It was a silicon version of the
mercury delay lines of the UNIVAC I).

So, how do you add or subtract values in memory? From low to high
value, saving carries. You then have a choice of either loading
them in sequence, in a single go, or to load the high value,
wait for half a microsecond and then load the low value.

Would you build such a machine in big-endian or little-endian?

(And yes, it seems negative branches could take ~ 500 cycles, as
well.)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 17:25:21 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 15:42:03 +0000, Anton Ertl wrote:

quadi <[email protected]d> writes:

Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive >>characters at increasing addresses - and, at least in languages that are >>written from left to right, numerals appear in texts with the most >>significant digit first.

So if one has a hardware instruction to convert from BCD to the string >>representation of numbers, such as UNPK or EDIT, then those two >>representations should have the same endian-ness.

Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.

An additional instruction is an additional instruction! But I think you
simply mean that the hardware is present. I'm not saying that BCD can't be implemented in a little-endian architecture; I'm saying it's much easier
to understand and define when BCD and character strings and binary all go
the same way - and the byte order of character strings is fixed.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Wed May 20 17:47:59 2026

From Newsgroup: comp.arch

According to Bernd Linsel <[email protected]>:

For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.

It depends what you're doing. If you're doing arithmetic, you need to start at the low end. If you're packing or unpacking or editing for display, you need to start at the high end. My understanding is that back in the day when performance
mattered, the applications that used BCD arithmetic typically did one arithmetic
operation on each value, so the pack/edit mattered more.

Having looked into this in some detail, both when IBM used bigendian order on S/360 and DEC used little-endian on the PDP-11, neither documented the reasons for the byte order choice at all. Not even a litle bit.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Wed May 20 18:07:14 2026

From Newsgroup: comp.arch

According to Scott Lurndal <[email protected]>:

The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.

"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"

The algorithm used a 9's counter to track the leading
digits.

How did it handle carries? Let's say you're adding

099999999999999999999999999999999999999999999999999
000000000000000000000000000000000000000000000000001

If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.

S/360 had operand lengths in the instructions so even though it
addresed the high byte, it could do one add and get the address
of the low byte. On S/370 and later machines with virtual memory
it was more complicated since it had to check and be sure that all
of the pages where the operands resided were available.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed May 20 17:30:49 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> writes:

The 2200 did not have byte-addressable memory; memory contents only
could be used when they bubbled up through the shift registers.
Otherwise, the CPU had to wait. (It was a silicon version of the
mercury delay lines of the UNIVAC I).

So, how do you add or subtract values in memory? From low to high
value, saving carries. You then have a choice of either loading
them in sequence, in a single go, or to load the high value,
wait for half a microsecond and then load the low value.

The Datapoint 2200 has only instructions for adding or subtracting the
bits of a byte. For adding two 16-bit values X and Y, you load the
LSB of X and the LSB of Y, add them, store the result, load the MSB of
X and MSB of Y, adc them, and store the result.

Given that you have only HL for memory access, and several registers,
if the LSBs and MSBs are adjacent, you probably first want to load the
LSB and MSB of X (and in that case, there is no preferred order), and
add the LSB of Y, move A to some other register, then move the MSB of
X into A, and adc the MSB of Y, then store the LSB and MSB of the
result (again, no preferred order). And note that for any new address
you access, you have to change at least L between the memory accesses,
and maybe also H.

Even with that kind of drum-like memory, how will little-endian
provide a benefit? At best in the memory accesses to Y, but only if
the other stuff that is going on between these two memory accesses
does not advance the memory chip across the MSB (if the MSB is
actually in the same memory chip as the LSB).

And in any case, this is pure software convention. There is nothing
in the architecture that tells programmers how to arrange the two
bytes of a 16-bit data number. They could also do an array for the
LSBs and an array for the MSBs (structure-of-array style), and then
one would not need so many registers for intermediate storage. Load
LSB of X, (update L), add LSB of Y, (update L), store LSB of the
result, then (update L and maybe H), load MSB of X, (update L), adc
MSB of Y, (update L) store the MSB of the result.

The only thing in the architecture that actually specifies
little-endian byte order is in the control-flow instructions where the
byte order of the target address is little-endian. But bit-serial
memory is not the reason for that, implementing these instructions
with a big-endian target address would have been just as fast and just
as hard.

Would you build such a machine in big-endian or little-endian?

It's not about what I would do, but about what is little-endian about
the Datapoint 2200, and if there were technical reasons for that. I
don't see any.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed May 20 18:13:22 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.

Go LE all the way. LE won get over BE thinking.

a) I didn't think this really had anything to do with little-endian versus big-endian.

b) Yes, little-endian is more popular, but that's just because the PDP-11, 8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.

BEs advantage is only when packed decimal is both not a power of 2 in
size, and residing in memory. Once in a register those advantages vanish.
One could make a LE in MEM PD solution work with modern resource counts,
too.

As far as integers go: all calculations produce proper integer values in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...

If you have 64 bit registers, then if you want to avoid a gap between the sign in a 32-bit number and the sign of a 64-bit number by placing the 32- bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.

Propagating a bit takes time.

A solved HW gate-level problem.

That's good news, then I don't have a problem. I figured the solution
would be to use slightly slower gates with larger current output.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Wed May 20 19:03:01 2026

From Newsgroup: comp.arch

John Levine <[email protected]> writes:

According to Scott Lurndal <[email protected]>:

The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.

"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"

The algorithm used a 9's counter to track the leading
digits.

How did it handle carries? Let's say you're adding

099999999999999999999999999999999999999999999999999 000000000000000000000000000000000000000000000000001

A value that overflows the size of the receiving field
cannot be represented, so the overflow toggle is set and
the instruction terminates _without modifying the
receiving field_.

The size of the receiving field is the larger of the
two source fields. So

ADD 0508 000000 100000 200000

would add the 5 digit value at address 0 to the
8 digit value at address 100000 and store the
result at address 200000.

If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.

Actually, that's the clever part. They count 9s.

Example 1: 10 digit receiving field, 10 digit addend, 1 digit augend:

Memory contents before:

000000: 9999999999
000010: 1

ADD 1001 000000 000010 000020

The result of the instruction is that the overflow toggle
will be set and the destination field will remain unmodified.

The algorithm implicitly fills leading zeros into
the shorter operand.

The first digit of the addend operand is read. '9' in
this case. The first digit of the augend is added (in this
case, implicitly zero) and the result is 9. A special
register (the 9's counter) is incremented and the algorithm
proceeds to the next digit. Wash, rinse and repeat until
reaching the last digit, where the sum of 9 + 1 will overflow
a single digit, so the instruction terminates with overflow.

If in the case you showed above, there was a zero in the
first digit of both operands, there is no posibility of
overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.

There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Wed May 20 21:33:05 2026

From Newsgroup: comp.arch

Anton Ertl wrote:

quadi <[email protected]d> writes:

On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:

* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.

The reason I blame the PDP-11 for everything is that it was a hugely
influential machine. It was widely used in academic settings, and it was
also the machine for which UNIX was first widely distributed.

But its byte order was not influential into this century. Unix and
its applications are portable, including between byte orders (or at
least they were, when there were still enough machines of either byte
order around that one could test that). And somehow the PDP-11 and
its offspring did not capture the workstation market and the server
market that involved from that, and which constituted the Unix
markets.

Instead, the big-endian 68000 and its offspring dominated that market
for a while, and was replaced with RISCs later, which had the same
byte order as the earlier machines from the same company (i.e.,
little-endian for DEC and big-endian for the others). And when the
market for workstations and server on RISCs shrunk down to almost
nothing, not only did these big-endian machine vanish, but the
offspring of the PDP-11 as well (and actually before some of the
big-endian RISCs). What remains of this world is AIX on Power, and I
have no idea how many installations there still are.

Linux on Power was switched to little-endian with the introduction of OpenPower, not because of the PDP-11 descendants, but because of the Datapoint 2200 descendants. And the Datapoint 2200 (announced in June
1970) was probably not influence by the PDP-11 (announced in January
1970).

When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to >>> completely unroll the loop that handles these bytes.

This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.

If the numbers fit in one granule, yes, that benefit does not matter.
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?

ieee754 does define decimal64, decimal128 and even decimal32, but the
first two has pretty much all the actual usage, probably (?) decimal128
as the majority, at least for all accumulators.

The reason I claim that BCD support strongly favors big-endian byte order
is this:

Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are
written from left to right, numerals appear in texts with the most
significant digit first.

So if one has a hardware instruction to convert from BCD to the string
representation of numbers, such as UNPK or EDIT, then those two
representations should have the same endian-ness.

Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.

BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64
digits, would start with an exchange of the high and low 16-byte halves,
then a permute of each half to reverse the order. The final single-cycle operation is the only overhead of the little vs high-endian inputs.

Next we duplicate the input by unpacking the high and low 16 bytes into
each byte value into 16 16-bit shorts, with the leading byte 0, then (in parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the
original input. About 15-20 cycles in total with well under 10% being
the byte order swap.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Wed May 20 21:45:57 2026

From Newsgroup: comp.arch

Scott Lurndal wrote:

John Levine <[email protected]> writes:

According to Scott Lurndal <[email protected]>:

The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.

"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"

The algorithm used a 9's counter to track the leading
digits.

How did it handle carries? Let's say you're adding

099999999999999999999999999999999999999999999999999
000000000000000000000000000000000000000000000000001

A value that overflows the size of the receiving field
cannot be represented, so the overflow toggle is set and
the instruction terminates _without modifying the
receiving field_.

The size of the receiving field is the larger of the
two source fields. So

ADD 0508 000000 100000 200000

would add the 5 digit value at address 0 to the
8 digit value at address 100000 and store the
result at address 200000.

If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.

Actually, that's the clever part. They count 9s.

Example 1: 10 digit receiving field, 10 digit addend, 1 digit augend:

Memory contents before:

000000: 9999999999
000010: 1

ADD 1001 000000 000010 000020

The example he showed had an 11 digit receive field so it would not
overflow, but the two inputs would cause a full carry propagate all the
way to the top digit.

The result of the instruction is that the overflow toggle
will be set and the destination field will remain unmodified.

The algorithm implicitly fills leading zeros into
the shorter operand.

The first digit of the addend operand is read. '9' in
this case. The first digit of the augend is added (in this
case, implicitly zero) and the result is 9. A special
register (the 9's counter) is incremented and the algorithm
proceeds to the next digit. Wash, rinse and repeat until
reaching the last digit, where the sum of 9 + 1 will overflow
a single digit, so the instruction terminates with overflow.

If in the case you showed above, there was a zero in the
first digit of both operands, there is no posibility of

That's what he showed afair?

overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.

There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.

So it did process them top-down, but delayed writing the anything to the output field until it was known that it would not overflow, and the same happened for every subsequent partial sum of 9.

Yeah, that works but it probably caused some output hickups when a long
chain of potential carries finally resolved. :-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 20 22:50:04 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 18:07:14 +0000, John Levine wrote:

On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.

Yes, since while the System/360 gave you an error if you tried to use unaligned operands in memory, this restriction was abolished with the System/370. Only an unaligned operand can possibly cross a page boundary, since pages have a power-of-two size greater than the size of any data
type.

But this means that even on the System/370, it's a rare event that an instruction will refer to an unaligned operand. So that there is some
extra overhead for unaligned values might well have been considered acceptable.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 00:06:54 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 07:21:04 +0000, quadi wrote:

So in commenting on a different part of my design entirely, you've
pointed out an important flaw I will have to correct.

It's possible that I panicked needlessly, and the conditional branches I support, being the conventional set, are indeed sufficient for unsigned
values as well; for them, they would have alternate names in assembler,
but no additional types of branch perhaps are needed.

I will have to review this point, however, to be sure.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Thu May 21 00:37:39 2026

From Newsgroup: comp.arch

It appears that quadi <[email protected]d> said:

On Wed, 20 May 2026 18:07:14 +0000, John Levine wrote:

On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.

Yes, since while the System/360 gave you an error if you tried to use >unaligned operands in memory, this restriction was abolished with the >System/370. Only an unaligned operand can possibly cross a page boundary, >since pages have a power-of-two size greater than the size of any data
type.

While that is true for the RX and RS instructions that do loads and
stores and arithmetic operations, it is not at all true for the SS
instructions common in commercial code.

Yhey have two storage operands with the length specified in the second
byte of the instruction. Even on S/360 there is no alignment
requirement for any of the operands. In most cases it can tell the
sizes of the operands at the time the instruction is decoded, e.g.,
decimal add (AP) has two four-bit length codes that say how long each
operand is and move characters (MVC) has a single 8-bit length code
that applies to both operands.

But sometimes it is not that simple. Translate and test (TRT) has
a string operand with a length, and a second 256 byte table operand.
It fetches the bytes from the string one at a time, looks them up
in the table, and stops as soon as the looked up value is non-zero,
putting the address of the source byte and the lookup values in
R1 and R2. Only the bytes actually fetched have to be resident.

The Edit instruction (ED) takes a packed decimal operand and
a pattern, with the length specifying the length of the pattern.
It goes through the pattern a byte at a time with some pattern
bytes ("digit selector") taking the next digit from the input
operand and others just copied literally. The length of the
input operand depends on the contents of the pattern.

To make this work S/370 and its successors first do a trial
execution of the instruction without storing anything to see
if it causes a page fault. If not, it then redoes the
instruction for real, storing the result. I suspect that
if they had known how soon S/370 would add paging to the 360
architecture, they might have designed these instructions
differently.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 02:18:27 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 00:37:39 +0000, John Levine wrote:

To make this work S/370 and its successors first do a trial execution of
the instruction without storing anything to see if it causes a page
fault. If not, it then redoes the instruction for real, storing the
result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.

When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they
could possibly have known that.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 02:33:51 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.

My 66000 started out that way and the compiler showed that this choice
sucks.

The good news is that this only concerns the 16-bit short instructions. A compiler can choose to ignore them if it can't handle them.

Currently, the 16-bit instructions provide the following:

All the basic operate instructions for two integer types; they can only operate on the first eight integer registers.

The basic floating operate instructions for one floating-point type; the register specification is the one used with Concertina II's paired 15-bit operate instructions; choose one of four banks of eight registers, and
both operands must be in that bank.

The idea is that it can be used for efficient pipelined code where four sequences of instructions which are independent are interleaved.

Everything else is straightforwards; the 24-bit short instructions and all
the 32-bit and longer instructions that operate on registers allow the use
of all 32 registers in a bank.

Of course, though, the other restrictions are still present - seven
choices for an index register, seven choices for a base register (for each
of three displacement sizes, 20, 16, and 12 bits).

I think I have indeed achieved the goal which, when I started out, I
thought might prove to be an "impossible dream" - combining what a CISC instruction set offers with what a RISC instruction set offers, and yet
doing so without making the instructions longer than they usually are in
those instruction types.

Except for register-to-register operate instructions being 24 bits instead
of 16 bits, this has been achieved - but for a very limited subset of the possible register-to-register operate instructions, chosen by me as the
ones I think are the most useful and popular - and I realize the choice is subjective and hence potentially controversial - the 16-bit instruction
length is retained!

I think it's an ISA that, in this respect, has achieved more than anyone
could have expected!

Now, of course, whether or not this is an achievement that anyone cares
about, that anyone wants, that anyone is interested in... well, I don't
know.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 06:12:11 2026

From Newsgroup: comp.arch

I had a tiny bit of unused opcode space within the 32-bit operate instructions.

As well, there were a couple of lengths of instructions longer than 32
bits which were allocated more opcode space than they actually needed.

That let me move those two lengths of instructions, plus one other length
of instructions longer than 32 bits which kept is entire, though small, allocation of opcode space, into that unused space.

And that let me increase the opcode space allocated to 16-bit short instructions from 1/16th of the opcode space to 3/32nds of the opcode
space.

Which allowed me to give them a much simpler and plainer format, of which
it finally could be argued - without the claim being utterly laughable -
that they offer just about what 16-bit short instructions do in a CISC architecture.

So now the 16-bit short instructions have all 96 basic operate opcodes, so they can perform all the basic operations on all the basic integer and floating-point types.

They are in all cases now limited to just the first eight registers. So
this is inferior to the System/360, which has sixteen, but it matches the 680x0 which had eight.

Finally, I have achieved my dream, insane and useless though it may be!

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Thu May 21 06:29:29 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.

The Atlas existed in 1962 and did have paging. So it was possible.
Is it excusable that the S/360 designers did not consider this
development at the time? Probably, although according to <https://en.wikipedia.org/wiki/Atlas_(computer)> "it was a 1959
description of Muse [the 1959 name for Atlas] that gave CDC ideas that significantly accelerated the development of the 6600 and allowed it
to be delivered earlier than originally estimated".

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 07:03:47 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 00:06:54 +0000, quadi wrote:

I will have to review this point, however, to be sure.

Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for signed numbers even if one is comparing, say, a positive number and a
negative number which are both over half of the maximum possible magnitude
for their format... it will be necessary to have a special compare
instruction for unsigned integers.

Since there is opcode space for that readily available, though, there is
no difficulty in adding that.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Thu May 21 10:29:12 2026

From Newsgroup: comp.arch

Terje Mathisen <[email protected]> writes:

Anton Ertl wrote:

But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?

ieee754 does define decimal64, decimal128 and even decimal32, but the
first two has pretty much all the actual usage, probably (?) decimal128
as the majority, at least for all accumulators.

I should check half-known things before I make claims in a posting.

Anyway, looking at <https://en.wikipedia.org/wiki/Decimal64_floating-point_format>, I see
that Decimal64 even has 16 digits of mantissa. So 15 digits is not
enough. (And, as an aside, they complicated things by not specifying
a 54-bit mantissa, but combining the exponent with the upper bits of
the mantissa).

To the point: these 16 digits are not enough, as the lack of
popularity of decimal64 (even relative to decimal128) shows, so 64-bit
BCD numbers are not enough in all cases, either.

Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.

BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64
digits, would start with an exchange of the high and low 16-byte halves, >then a permute of each half to reverse the order. The final single-cycle >operation is the only overhead of the little vs high-endian inputs.

Next we duplicate the input by unpacking the high and low 16 bytes into
each byte value into 16 16-bit shorts, with the leading byte 0, then (in >parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >original input. About 15-20 cycles in total with well under 10% being
the byte order swap.

My thinking was along the lines of using VPERMB to do the
byte-swapping, the duplicating, and the unpacking in one step. E.g.,
if you have a 64-bit BCD number 1234567890123456 as the following
sequence of bytes

56 34 12 90 78 56 34 12

Then you have the index vector

7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0

and VPERMB xmm1, xmm2, xmm3

(where the BCD number is in xmm3 and the index vector is in xmm2) will
put the following in xmm1:

12 12 34 34 56 56 78 78 90 90 12 12 34 34 56 56

So no extra instruction for the byte swapping.

The problem is that I now would like a masked parallel byte shift to
shift the even-indexed bytes right by 4 bits, but I don't find
parallel byte shifts. I guess the answer is to let the VPERMB arrange
the result as follows

1234 1234 5678 5678 9012 9012 3456 3456
^^^^ ^^^^ ^^^^ ^^^^

then use a masked VPSRLW for shifting the marked 16-bit pieces to the
right by 4 bits, resulting in

0123 1234 0567 5678 0901 9012 0345 3456

Now use VPSHUFB or VPERMB to rearrange the bytes in the intended order:

01 12 23 34 45 56 67 78 89 90 01 12 23 34 45 56

Now mask away the top 4 bits of each byte with VPAND and turn it into
ASCII by VPORing every byte with 0x30.

And the whole thing can be done with BCD numbers of up to 64 digits
per pass.

The absence of VPSRLB caused an additional instruction, but that's
also necessary for dealing with big-endian BCD numbers. So storing
the BCD numbers in little-endian format costs no additional
instruction.

VPERMB is not in AVX2, so if you want to limit yourself to that,
little-endian needs an extra instruction indeed.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Thu May 21 11:52:51 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Wed, 20 May 2026 15:42:03 +0000, Anton Ertl wrote:

quadi <[email protected]d> writes:
Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.

An additional instruction is an additional instruction!

There is no additional instruction. VPERMB does the byte swapping and
byte duplication at the same time, see <[email protected]>.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Thu May 21 12:04:40 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) writes:

Terje Mathisen <[email protected]> writes:

BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64 >>digits, would start with an exchange of the high and low 16-byte halves, >>then a permute of each half to reverse the order. The final single-cycle >>operation is the only overhead of the little vs high-endian inputs.

Next we duplicate the input by unpacking the high and low 16 bytes into >>each byte value into 16 16-bit shorts, with the leading byte 0, then (in >>parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >>original input. About 15-20 cycles in total with well under 10% being
the byte order swap.

My thinking was along the lines of using VPERMB to do the
byte-swapping, the duplicating, and the unpacking in one step. E.g.,
if you have a 64-bit BCD number 1234567890123456 as the following
sequence of bytes

56 34 12 90 78 56 34 12

Then you have the index vector

7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0

and VPERMB xmm1, xmm2, xmm3

(where the BCD number is in xmm3 and the index vector is in xmm2) will
put the following in xmm1:

12 12 34 34 56 56 78 78 90 90 12 12 34 34 56 56

So no extra instruction for the byte swapping.

The problem is that I now would like a masked parallel byte shift to
shift the even-indexed bytes right by 4 bits, but I don't find
parallel byte shifts. I guess the answer is to let the VPERMB arrange
the result as follows

1234 1234 5678 5678 9012 9012 3456 3456
^^^^ ^^^^ ^^^^ ^^^^

then use a masked VPSRLW for shifting the marked 16-bit pieces to the
right by 4 bits, resulting in

0123 1234 0567 5678 0901 9012 0345 3456

Now use VPSHUFB or VPERMB to rearrange the bytes in the intended order:

01 12 23 34 45 56 67 78 89 90 01 12 23 34 45 56

I have a better approach:

First do the shifting with, e.g. VPSRLW, with the result in a new
register. So you now have

56 34 12 90 78 56 34 12 #original data
0563 0129 0785 0341 #shifted version

Now you use VPERMT2B to rearrange the bytes from both registers into a
third one, doing the byte-swapping while you are at it, resulting in:

41 12 03 34 85 56 07 78 29 90 01 12 63 34 05 56

The remainder uses VPAND and VPOR, as described earlier.

If you have BCD numbers with more than 64, but at most 128 digits, the
first step would only have to be performed once. You would then use
two VPERMI2B instructions with different index inputs to produce the
64 least significant and the 64 most significant digits, and the VPAND
and VPOR would also have to be duplicated.

So 4 central instructions for a BCD number with up to 64 digits, and 7
for up to 128 digits. In addition, you need the VPERMT2B index, the
VPSRLW shift amounts and the other operand for VPAND and VPOR in
registers, but if you are converting a lot of BCD numbers, you may
already have them in registers when you convert the next BCD number.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 13:13:23 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 06:12:11 +0000, quadi wrote:

Finally, I have achieved my dream, insane and useless though it may be!

Someone once suggested that, if a genie grants you three wishes, you
should use one of them to wish for more wishes.

Well, I have taken the opportunity to squeeze one more little thing into
the instruction set that Concertina III had, but this time I could not
squeeze quite as many of them in... 16-bit prefixes for instructions,
which allow the instruction set to be extended.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 13:22:55 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 13:13:23 +0000, quadi wrote:

Well, I have taken the opportunity to squeeze one more little thing into
the instruction set that Concertina III had, but this time I could not squeeze quite as many of them in... 16-bit prefixes for instructions,
which allow the instruction set to be extended.

I've taken the opportunity now, before things go on, to modify this
addition in one important way: I've precluded the possibility that the complexity of instruction length encoding might grow without bounds by specifying the length scheme now for any prefixed instructions that might
be added.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 13:42:09 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 07:03:47 +0000, quadi wrote:

it will be necessary to have a special
compare instruction for unsigned integers.

I have now back-propagated this needful change to Concertina II. The description of Concertina III hadn't gotten to the point where this would
be placed.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Thu May 21 14:36:13 2026

From Newsgroup: comp.arch

Terje Mathisen <[email protected]> writes:

Scott Lurndal wrote:

<snip>

overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.

There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.

So it did process them top-down, but delayed writing the anything to the >output field until it was known that it would not overflow, and the same >happened for every subsequent partial sum of 9.

Yeah, that works but it probably caused some output hickups when a long >chain of potential carries finally resolved. :-)

The maximum size of an operand was 100 digits.

To add to the potential for a long hickup, each of the operands
could be indirect, which in turn could point to indirect
operands ad infinitum. A processor timer was started with
each instruction, and if it expired before the instruction
finished, the processor would raise a fault and the application
would be terminated.

There was also search table and linked list instructions, which had
variable timing depending on the number of entries in the
list or table (the instruction timer would handle infinite
loops in the list).

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Thu May 21 15:41:17 2026

From Newsgroup: comp.arch

According to quadi <[email protected]d>:

On Thu, 21 May 2026 00:37:39 +0000, John Levine wrote:

result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.

When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.

According to Pugh et al., IBM Research was quite aware of Atlas and
was doing its own work on one-level store and time sharing. They were
also close to CTSS at MIT Project MAC. Atlas' performance was terrible
(later solved partly by better paging schemes but mostly by larger
real memory) and I get the impression that there was an internal
institutional bias that only batch was real computing and time sharing
was somewhere between a niche and a fad.

The MIT people were deeply disappointed when S/360 had no memory
mapping at all, which led to Multics switching from IBM to GE for its
new computer. IBM then came out with the 360/67 which had quite decent
virtual memory but it was too late. It didn't help that its intended
main operating system was TSS which was overambitious and didn't work.
Lucky for them CP/67 escaped from the lab to become VM/370.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Thu May 21 18:26:32 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.

My 66000 started out that way and the compiler showed that this choice sucks.

The good news is that this only concerns the 16-bit short instructions. A compiler can choose to ignore them if it can't handle them.

Currently, the 16-bit instructions provide the following:

All the basic operate instructions for two integer types; they can only operate on the first eight integer registers.

I suspect you (and compiler) will end up not liking the restriction.

The basic floating operate instructions for one floating-point type; the register specification is the one used with Concertina II's paired 15-bit operate instructions; choose one of four banks of eight registers, and
both operands must be in that bank.

I suspect you (and compiler) will end up not liking the restriction.

The idea is that it can be used for efficient pipelined code where four sequences of instructions which are independent are interleaved.

I suspect you (and compiler) will end up not finding that much parallelism.

Everything else is straightforwards; the 24-bit short instructions and all the 32-bit and longer instructions that operate on registers allow the use of all 32 registers in a bank.

Of course, though, the other restrictions are still present - seven
choices for an index register, seven choices for a base register (for each of three displacement sizes, 20, 16, and 12 bits).

I think I have indeed achieved the goal which, when I started out, I
thought might prove to be an "impossible dream" - combining what a CISC instruction set offers with what a RISC instruction set offers, and yet doing so without making the instructions longer than they usually are in those instruction types.

Except for register-to-register operate instructions being 24 bits instead of 16 bits, this has been achieved - but for a very limited subset of the possible register-to-register operate instructions, chosen by me as the
ones I think are the most useful and popular - and I realize the choice is subjective and hence potentially controversial - the 16-bit instruction length is retained!

I think it's an ISA that, in this respect, has achieved more than anyone could have expected!

Now, of course, whether or not this is an achievement that anyone cares about, that anyone wants, that anyone is interested in... well, I don't know.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Thu May 21 18:32:48 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Thu, 21 May 2026 00:06:54 +0000, quadi wrote:

I will have to review this point, however, to be sure.

Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for signed numbers even if one is comparing, say, a positive number and a negative number which are both over half of the maximum possible magnitude for their format... it will be necessary to have a special compare instruction for unsigned integers.

Or a wider condition register !

Since there is opcode space for that readily available, though, there is
no difficulty in adding that.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 22:14:51 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 07:03:47 +0000, quadi wrote:

Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result
for signed numbers even if one is comparing, say, a positive number and
a negative number which are both over half of the maximum possible
magnitude for their format... it will be necessary to have a special
compare instruction for unsigned integers.

I have now given the matter thought, and I found that it would indeed be necessary to add an extra bit to all the conditional jump, branch, or set
flag instructions to indicate the test was being applied to the condition
code settings left after an integer arithmetic instruction on integers
deemed to be unsigned.

Amazingly enough, however, it turned out that in each case there was no difficulty in finding the additional opcode space that was needed.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 22:21:53 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 18:32:48 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

it will be necessary to have a special
compare instruction for unsigned integers.

Or a wider condition register !

A wider condition register isn't enough by itself.

I have now realized that I will have to add a bit to the conditional
branch instructions. Amazingly, though, that bit was readily available
without much trouble.

In the case of conditional branches after integer arithmetic, a wider condition register might be needed, although it seems that carry,
overflow, negative, and zero will suffice.

The compare instruction in my ISA _does not_ return the same condition
codes as the subtract instruction. So if I compare bytes, the compare instruction will correctly indicate that -100 is less than 100. The fact
that if you subtracted -100 from 100 as byte values, you wouldn't get 200, since that doesn't fit into a signed byte, but the negative value -44 is neither here nor there.

Because of this special handling of the MSB, I do need a different compare instruction - not just the modified branch instructions for unsigned
values - to yield correct behavior.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 23:44:34 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 18:26:32 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

Currently, the 16-bit instructions provide the following:

All the basic operate instructions for two integer types; they can only
operate on the first eight integer registers.

I suspect you (and compiler) will end up not liking the restriction.

I don't like the restriction, but since there's not much opcode space available, there's not much I can do.

The basic floating operate instructions for one floating-point type;
the register specification is the one used with Concertina II's paired
15-bit operate instructions; choose one of four banks of eight
registers, and both operands must be in that bank.

I suspect you (and compiler) will end up not liking the restriction.

The compiler will, indeed, probably have difficulty dealing with a kind of restriction that no one else has ever put in an ISA.

But this is moot now. I've found some additional opcode space for 16-bit
short instructions. Not much, just enough to increase the available opcode space by a factor of 1.5.

So now all the operations are restricted to only the first eight registers
- but 16-bit short instructions now support all the basic data types.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu May 21 23:46:05 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 22:14:51 +0000, quadi wrote:

Amazingly enough, however, it turned out that in each case there was no difficulty in finding the additional opcode space that was needed.

I even managed to find enough opcode space to increase the size of the displacement field from 8 bits to 9 bits in all the branch instructions,
so that having 24-bit short instructions doesn't shorten their range.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Fri May 22 02:20:14 2026

From Newsgroup: comp.arch

On Thu, 21 May 2026 23:46:05 +0000, quadi wrote:

I even managed to find enough opcode space to increase the size of the displacement field from 8 bits to 9 bits in all the branch instructions,
so that having 24-bit short instructions doesn't shorten their range.

However, there were a number of serious mistakes on the page, which I have
now corrected.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Fri May 22 07:22:05 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for >signed numbers even if one is comparing, say, a positive number and a >negative number which are both over half of the maximum possible magnitude >for their format... it will be necessary to have a special compare >instruction for unsigned integers.

The fact that IA-32/AMD64 and ARM A64 do not have a special compare
instruction for unsigned integers (and manage to do with NCVZ) shows
that this is unnecessary. What you do for your "if (-100<100)" case
is encode it (on AMD64) as

cmpb %r8, %r9 #note that AT&T syntax has the arguments reversed
jnl target
... code to execute if r9<r8 ...
target:

And JNL (jump if not less) tests for N=V (the Intel manual writes SF=OF).

See <https://www.felixcloutier.com/x86/jcc>

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Fri May 22 07:35:36 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

The compare instruction in my ISA _does not_ return the same condition
codes as the subtract instruction. So if I compare bytes, the compare >instruction will correctly indicate that -100 is less than 100. The fact >that if you subtracted -100 from 100 as byte values, you wouldn't get 200, >since that doesn't fit into a signed byte, but the negative value -44 is >neither here nor there.

8086, IA-32, AMD64, and AFAIK ARM A64 produce the same condition codes
for compare and subtract instructions. That the subtract instruction
writes back the result does not influence the condition codes. The
fact that you see an overflow/underflow if you byte-subtract/compare
100 with -100 and want to interpret the result as a signed byte is
reflected in the overflow flag for both subtract and compare, and the conditional jumps for signed <, <=, >, >= take the overflow flag into
account (as well as the sign flag, and, in some cases, the zero flag).

Because of this special handling of the MSB, I do need a different compare >instruction - not just the modified branch instructions for unsigned
values - to yield correct behavior.

You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).

An interesting case is PowerPC (and Power). It stores < = > flags
(for comparsons, for other instructions it's <0, =0, and >0) and a
sticky overflow flag in one of the CRs (for many instructions, CR0,
for comparison instructions, the CR can be selected). It has an
overflow flag and a carry flag elsewhere, so it could use the
subtraction instruction together with these flags for both signed and
unsigned conditional branches, but instead it has unsigned and signed comparisons, and the conditional branches are only conditional on
flags in a CR register.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Fri May 22 15:48:18 2026

From Newsgroup: comp.arch

On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:

You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).

While the System/360 had only two condition code bits, I do plan to have
full VZNC bits. However, unlike the System/360, I do not have a complete
set of sixteen conditional branch instructions. I just have twelve: eight instructions for testing between negative, zero, and positive nonzero in
any combination, and instructions for separately testing for carry and overflow.

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I want a compare instruction which, for integers, isn't fooled by
overflows - and overflows happen at a different point in the two's
complement number circle for signed and unsigned; for unsigned, basically carry takes the role of overflow. And I don't want to have to do two instructions for the conditional branch afterwards to handle that. So I
_may_ still need a separate compare unsigned, even though the rest of your points are well taken.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Fri May 22 21:22:48 2026

From Newsgroup: comp.arch

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I have made the first set of changes, using five-bit condition code fields
to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers,
which makes sense, given that they were originally in a coprocessor, but
that means an extra set of instructions is needed.)

So, while it used a four-bit condition code field, I needed a five-bit one.

I did notice it didn't just always fail the signed tests if overflow was present; instead, in that case it switched plus and minus. Given that, and treating carry the same way for unsigned tests, you likely are right that
an unsigned compare is not needed. Oh, wait; my assumed behavior that everything should just fail if there's an overflow... is reasonable for floating-point numbers.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sat May 23 08:36:49 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:

You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).

While the System/360 had only two condition code bits, I do plan to have >full VZNC bits. However, unlike the System/360,

The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.

So my recommendation is that you look at other architectures for
inspiration. 8086, 88000, MIPS/Alpha/RISC-V (including the
differences between them), and IA-64 all have quite different
approaches that are worthy of study. And if you want to look for
something unproven, look at <http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>.

I do not have a complete

set of sixteen conditional branch instructions. I just have twelve: eight >instructions for testing between negative, zero, and positive nonzero in
any combination, and instructions for separately testing for carry and >overflow.

...

I want a compare instruction which, for integers, isn't fooled by
overflows - and overflows happen at a different point in the two's >complement number circle for signed and unsigned; for unsigned, basically >carry takes the role of overflow. And I don't want to have to do two >instructions for the conditional branch afterwards to handle that.

What's this thing about "two instructions for the conditional branch afterwards"? On the 8086, if you want to branch on signed <, you use
JL, and if you want to branch on unsigned <, you use JB; each of them
is one instruction (and the 8086 has IIRC signed and unsigned <= > >=,
too).

If you mean the opcode space, then yes, you may use less opcode space
if you have a signed and unsigned comparison, and fewer conditional
branches (depending on how much proportion of your opcode space the
respective instructions take). You can also save opcode space by
leaving away the <= and > conditions (reverse the operands of < and

=). One question in such a design is if there are cases where you

want to have the unsigned and signed conditions for the same operands,
but it's probably rare enough that it is not a big disadvantage that
you need to use both comparison instructions for those cases (at least
I have never seen a complaint about this aspect of PowerPC).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sat May 23 09:28:45 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I have made the first set of changes, using five-bit condition code fields >to nicely and fully handle both the signed and unsigned cases; I checked >what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.

I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:

HI >
LS <=
CC >=
CS <

For the signed ones there is

GT >
LE <=
GE >=
LT <

my assumed behavior that
everything should just fail if there's an overflow... is reasonable for >floating-point numbers.

The usual setup is that FP operations silently overflow to +INF and
underflow to -INF. They do set sticky flags (called "exceptions" in
the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 23 16:19:35 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the conditional branch instructions, then I also have enough opcode space to fix that instead, so I likely will rework this part of the ISA into something more conventional.

I have made the first set of changes, using five-bit condition code fields to nicely and fully handle both the signed and unsigned cases; I checked what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers, which makes sense, given that they were originally in a coprocessor, but that means an extra set of instructions is needed.)

So, while it used a four-bit condition code field, I needed a five-bit one.

x86 uses COZAP but this includes P=parity, which it is unlikely you do.
Thus, 4 bits are sufficient to define 16-states, of which you only need 10-states signless{EQ, NEQ}, signed{>=, >, <, <=}, unsigned{>=, >, <, <=}.

I did notice it didn't just always fail the signed tests if overflow was present; instead, in that case it switched plus and minus. Given that, and treating carry the same way for unsigned tests, you likely are right that
an unsigned compare is not needed. Oh, wait; my assumed behavior that everything should just fail if there's an overflow... is reasonable for floating-point numbers.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Sat May 23 16:38:57 2026

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> writes:

quadi <[email protected]d> posted:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >> > fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers,
which makes sense, given that they were originally in a coprocessor, but
that means an extra set of instructions is needed.)

So, while it used a four-bit condition code field, I needed a five-bit one.

x86 uses COZAP but this includes P=parity, which it is unlikely you do.
Thus, 4 bits are sufficient to define 16-states, of which you only need >10-states signless{EQ, NEQ}, signed{>=, >, <, <=}, unsigned{>=, >, <, <=}.

ARM includes the Q flag (saturation).

--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Sat May 23 16:46:40 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) writes:

quadi <[email protected]d> writes:

On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:

You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).

While the System/360 had only two condition code bits, I do plan to have >>full VZNC bits. However, unlike the System/360,

The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.

The B3500 had three bits: Overflow, COM Low and COM High. The
V-Series added COM null, used by the search linked list (SLT)
instruction when the search key wasn't found.

Condition Flags
--------- -----------------------
EQUAL COML=1, COMH=1
Less Than COML=1, COMH=0
Greater Than COML=0, COMH=1
NULL COML=0, COMH=0

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sat May 23 17:01:10 2026

From Newsgroup: comp.arch

On Sat, 23 May 2026 09:28:45 +0000, Anton Ertl wrote:

quadi <[email protected]d> writes:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

I have made the first set of changes, using five-bit condition code
fields to nicely and fully handle both the signed and unsigned cases; I >>checked what the Motorola 68000 did, and found that it only provided a >>complete set of tests for signed values, but only two tests for unsigned >>ones.

I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:

HI >
LS <=
CC >=
CS <

For the signed ones there is

GT >
LE <=
GE >=
LT <

What I was going by was Table 3-19 on page 3-19 of the M68000 Family Programmer's Reference Manual on the Internet Archive from Bitsavers; it
gives the available condition code tests on the architecture as:

0000 True
0001 False
0010 High not C and not Z
0011 Low or Same C or Z
0100 Carry Clear
0101 Carry Set
0110 Not Equal not Z
0111 Equal Z
1000 Overflow Clear not V
1001 Overflow Set V
1010 Plus not N
1011 Minus N
1100 Greater or Equal (N and V) or (not N and not V)
1101 Less Than (N and not V) or (not N and V)
1110 Greater Than (N and V and not Z) or (not N and not V and not Z)
1111 Less or Equal Z or (N and not V) or (not N and V)

I took Low or Same as unsigned, and Plus, Minus, Greater or Equal, Less
Than, Greater Than, and Less or Equal as signed.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From Robert Finch@[email protected] to comp.arch on Sat May 23 14:15:46 2026

From Newsgroup: comp.arch

On 2026-05-23 5:28 a.m., Anton Ertl wrote:

quadi <[email protected]d> writes:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.

I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:

HI >
LS <=
CC >=
CS <

CS may also be called LO
CC may also be called HS

For the signed ones there is

GT >
LE <=
GE >=
LT <

my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.

The usual setup is that FP operations silently overflow to +INF and
underflow to -INF. They do set sticky flags (called "exceptions" in

Methinks overflow could be to +/- INF and underflow to zero or a denormal.

the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").

- anton

If one has CVNZ it is enough for both signed and unsigned integer
conditional testing using only four bits.

The CVNZ could be repurposed for float comparisons. V = INF. C=inexact
for instance.

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sat May 23 18:37:39 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Sat, 23 May 2026 09:28:45 +0000, Anton Ertl wrote:

I see four tests for unsigned conditions on the 68000
<https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:

HI >
LS <=
CC >=
CS <

For the signed ones there is

GT >
LE <=
GE >=
LT <

What I was going by was Table 3-19 on page 3-19 of the M68000 Family >Programmer's Reference Manual on the Internet Archive from Bitsavers; it >gives the available condition code tests on the architecture as:

0000 True
0001 False
0010 High not C and not Z
0011 Low or Same C or Z
0100 Carry Clear
0101 Carry Set
0110 Not Equal not Z
0111 Equal Z
1000 Overflow Clear not V
1001 Overflow Set V
1010 Plus not N
1011 Minus N
1100 Greater or Equal (N and V) or (not N and not V)
1101 Less Than (N and not V) or (not N and V)
1110 Greater Than (N and V and not Z) or (not N and not V and not Z) >1111 Less or Equal Z or (N and not V) or (not N and V)

I took Low or Same as unsigned, and Plus, Minus, Greater or Equal, Less >Than, Greater Than, and Less or Equal as signed.

Carry Clear (CC) is unsigned >=
Carry Set (CS) is unsigned <

after a CMP or SUB instruction.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sat May 23 19:33:46 2026

From Newsgroup: comp.arch

According to Anton Ertl <[email protected]>:

The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.

I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.

I agree that nobody else did that, and in retrospect it was an overoptimization.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 23 20:01:07 2026

From Newsgroup: comp.arch

Robert Finch <[email protected]> posted:

On 2026-05-23 5:28 a.m., Anton Ertl wrote:

quadi <[email protected]d> writes:

On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:

However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.

I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked >> what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.

I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:

HI >
LS <=
CC >=
CS <

CS may also be called LO
CC may also be called HS

For the signed ones there is

GT >
LE <=
GE >=
LT <

my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.

The usual setup is that FP operations silently overflow to +INF and underflow to -INF. They do set sticky flags (called "exceptions" in

Methinks overflow could be to +/- INF and underflow to zero or a denormal.

IEEE defines OVERFLOW as finite becomes signed infinite.
IEEE defines UNDERFLOW as finite becomes signed sub-finite*.
Sub-finite ={deNormal or zero}

the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").

- anton

If one has CVNZ it is enough for both signed and unsigned integer conditional testing using only four bits.

The CVNZ could be repurposed for float comparisons. V = INF. C=inexact
for instance.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 23 20:03:34 2026

From Newsgroup: comp.arch

John Levine <[email protected]> posted:

According to Anton Ertl <[email protected]>:

The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.

I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.

S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.

I agree that nobody else did that, and in retrospect it was an overoptimization.

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sat May 23 20:09:54 2026

From Newsgroup: comp.arch

According to MitchAlsup <[email protected]d>:

I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.

S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.

They'd also have been better off making the addresses 32 bits and not
putting junk in the high byte, which caused endless pain later, but
they were really really worried about making low end models with 8K
bytes usable.

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 23 22:15:30 2026

From Newsgroup: comp.arch

John Levine <[email protected]> posted:

According to MitchAlsup <[email protected]d>:

I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.

S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.

They'd also have been better off making the addresses 32 bits and not
putting junk in the high byte, which caused endless pain later, but
they were really really worried about making low end models with 8K
bytes usable.

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.

B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions

I think they thought they were saving on complexity and HW logic, but
I think the whole RS and SS could have used a "more regular format pattern"; and they (IBM) would have been better off long term.

But that was "Oh so long ago."
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sun May 24 01:43:29 2026

From Newsgroup: comp.arch

According to MitchAlsup <[email protected]d>:

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.

B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions

four bits of B, 12 bits of D, 16 bit addresses
you're right that RX used another four bits.

I think they thought they were saving on complexity and HW logic, but

We don't have to guess. "Architecture of the IBM System/360" by Amdahl, Blaauw, and Brooks in the IBM Systems Journal in April 1964 described a lot
of the reasoning, and they wrote a whole book about it.

They had to make a lot of other design decisions like 6 vs 8 bit
bytes, ones- vs twos-complement, length fields vs word marks for
variable length data, stack vs registers, floating point format (they
blew that one).

They said that the combination of a full length base register and a
short displacement "gives consequent gains in instruction density. The base-register approach was adopted, and then augmented, for some
instructions, with a second level of indexing."

In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add. On the other hand, it's not obvious what a better use of the X
field would have been. I suppose they could have made instructions
three operand, e.g.

A Rx,Ry,B(D)

would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sun May 24 03:10:27 2026

From Newsgroup: comp.arch

John Levine <[email protected]> posted:

According to MitchAlsup <[email protected]d>:

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.

B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions

four bits of B, 12 bits of D, 16 bit addresses
you're right that RX used another four bits.

I think they thought they were saving on complexity and HW logic, but

We don't have to guess. "Architecture of the IBM System/360" by Amdahl, Blaauw, and Brooks in the IBM Systems Journal in April 1964 described a lot of the reasoning, and they wrote a whole book about it.

They had to make a lot of other design decisions like 6 vs 8 bit
bytes, ones- vs twos-complement, length fields vs word marks for
variable length data, stack vs registers, floating point format (they
blew that one).

They said that the combination of a full length base register and a
short displacement "gives consequent gains in instruction density. The base-register approach was adopted, and then augmented, for some instructions, with a second level of indexing."

In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.

That is the view of MIPS and RISC_V
That is not the view of x86 or ARM or My 66000 or Mc 88K

On the other hand, it's not obvious what a better use of the X
field would have been. I suppose they could have made instructions
three operand, e.g.

A Rx,Ry,B(D)

would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.

Agreed about time it took compiler to be taught how to use it.

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 13:30:42 2026

From Newsgroup: comp.arch

On Sat, 23 May 2026 20:09:54 +0000, John Levine wrote:

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.

12 bits, of course. And they felt that 12 bits were enough because memory
was such an issue back then.

In hindsight, of course having a two-bit condition code was a "mistake".
But C hadn't been invented yet, so nobody knew there would be any real use
for unsigned integers.

And the PSW really was full - when IBM went to System/370, they had to repurpose a bit in the PSW that was already assigned to an existing
feature, ASCII mode. Since nobody ever used it, however, using it instead
for the System/370's "Extended Control Mode", wherein the PSW *did* get doubled in length was possible.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 13:32:41 2026

From Newsgroup: comp.arch

On Sat, 23 May 2026 20:03:34 +0000, MitchAlsup wrote:

S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running out
of PSW space.

It's not as if these problems were impossible to fix.

Remember the System/370, and its Extended Control Mode? All they lost was
the ability to switch the computer into an ASCII mode nobody ever used.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 13:49:26 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 13:32:41 +0000, quadi wrote:

On Sat, 23 May 2026 20:03:34 +0000, MitchAlsup wrote:

S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.

Remember the System/370, and its Extended Control Mode? All they lost
was the ability to switch the computer into an ASCII mode nobody ever
used.

Come to think of this, though, that fact doesn't make you wrong. They
would have been better off defining it as 128 bits long in the first
place, since one thing they _couldn't_ do with Extended Control Mode was change the condition codes from two bits to full NZVC, since user programs
had to remain compatible.

Of course, though, people must have been able to get C compilers working
on z/Architecture, despite inefficiencies, or it wouldn't be possible to install Linux on those machines.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 13:57:12 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 01:43:29 +0000, John Levine wrote:

In retrospect, B+X+D was probably a mistake since I believe that double indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.

A Rx,Ry,B(D)

would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.

Since there were three-address machines back in the days before general registers, I am surprised to hear that they didn't know how to write
compilers that made use of such a field.

But the "better use of the X field" is obvious - make the displacement
field 16 bits instead of 12 bits. Except, of course, that this would have killed the SS format of instructions.

But I don't agree that B+X+D is a bad thing. An extra register add is an
extra instruction. And it's not rarely used; it's used every time an array
is accessed, and arrays are often accessed in inner loops!

Of course, there are counterarguments. B+X+D, when used, involves an extra
add inside the instruction. Doesn't that take time too? Wouldn't it be
better to add just once at the beginning of the loop?

The thing is, though, there's also *register pressure* to think about.
Plus, the extra add inside the instruction just means a three-input add,
and once one recalls how *multipliers* are designed, one realizes that
this extra add, though it may still take time, does not take _much_ time.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 14:14:41 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 13:49:26 +0000, quadi wrote:

Of course, though, people must have been able to get C compilers working
on z/Architecture, despite inefficiencies, or it wouldn't be possible to install Linux on those machines.

I did a search, and found that z/Architecture added add-with-carry, subtract-with-borrow, and LLGF and LLGH which appear to be UL and ULH in
my architectures.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sun May 24 09:32:07 2026

From Newsgroup: comp.arch

John Levine <[email protected]> writes:

I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.

Some additional possible reasons:

Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],
and I guess they have separate comparison instructions (I don't know
that much about these machines, so I may be wrong here). So there was
no need for flags indicating signed overflow (V), or unsigned overflow
(C). Having only two flag bits was good enough to represent the three
possible outcomes of a comparison.

[1] Zuse chose twos-complement in the early 1940s. I don't know if he
stuck with that in his later machines.

[2] Reading the IBM 704 manual, it just says for some instructions
that "Ac overflow is possible", but does not describe the
behaviour. For division, the IBM 704 has "divide or halt" and
"divide or proceed", so I guess that trapping in the modern sense
was not yet on the table.

S/360 also supports the trap-on-overflow behaviour for signed
arithmetics, but one can turn the trapping off. Arithmetic
instructions set the flags in different ways depending on whether they
are signed or unsigned. So S/360 has a separate add-signed (A) and add-unsigned (AL) instruction; thanks to 2s-complement arithmetics,
when they don't trap, they produce the same result in the target
register, but different behaviour in the flags and in trapping.

I expect that this all costs in control logic, so more constrained
processors like the 6502 and the 8080 then later went with NCZV. The
PDP-11 too AFAIK, but that may be due to the features of the bit-slice
ALUs available when the PDP-11 was designed. These machines also did
not have as many encoding bits to waste on separate signed and
unsigned integer arithemetics, thanks to their very narrow memory
bandwidth.

The architectures before the S/360 do not provide support multi-word
integer arithmetic (unless you count the digit-serial and
character-serial machines), and S/360 does not, either. It takes
until 1990 for IBM to add ALCR (add with carry-in) to the
architecture. For architectures with smaller word sizes like the
PDP-11, the 6502 and 8080, the need for multi-word integer arithmetic
was much greater.

Interestingly, the IBM 704 has the ACL instruction, an unsigned
addition with carry-in, like the ESA/390's ALCR.

Bottom line: When the S/360 was designed, the design of 2s-complement
machines was in its infancy (if we ignore Zuse, and the S/360
designers may have ignored him), so it was not known how to design the
flags for them.

One other aspect that may have played a role is that various S/360 implementations included compatibility modes for earlier IBM models,
and the may have designed the flags with that in mind. However, given
the vast differences between a 36-bit machine with sign/magnitude representation (IBM 704) and the S/360, implementing different flags
for the different architectures was probably just a minor
complication. Moreover, different S/360 models offer compatibility
for different older architectures, where the flags probably were
different.

Concerning the question about why IBM chose big-endian for the S/360.
I see <https://en.wikipedia.org/wiki/IBM_704#Registers> that already
the IBM 704 used big-endian bit-numbering. As long as you only have
one width at which to talk to registers or memory, that's as good as little-endian. It only becomes an issue if you talk to registers at
different widths (e.g., 32-bit and 64-bit Power(PC)), and likewise,
for memory it only becomes an issue when you talk to memory at
different widths; i.e., not word-addressed machines, but
byte-addressed machines.

For FP the machines have different widths from early on, but they tend
not to access the halves of a double-precision number, so the
difference between big- and little-endian rarely makes a difference
there.

Actually, one does see some effects of big-endian bit numbering in the
IBM 704, because the Accumulator has additional bits, and they are
called P and Q (with little-endian bit ordering starting with bit 0,
they would just be called 35 and 36). Also the 15-bit index registers
run from bit 3 (MSB) to bit 17 (LSB), not from 0 to 17.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sun May 24 15:12:48 2026

From Newsgroup: comp.arch

According to quadi <[email protected]d>:

On Sun, 24 May 2026 01:43:29 +0000, John Levine wrote:

In retrospect, B+X+D was probably a mistake since I believe that double
indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.

A Rx,Ry,B(D)

would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.

Since there were three-address machines back in the days before general >registers, I am surprised to hear that they didn't know how to write >compilers that made use of such a field.

Optimizing compilers largely meant Fortran, which came from the single address 70x series. Human programmers did all sorts of clever tricks but it took a while to get compilers to do it. It probably needed graph coloring register allocation which wasn't invented until 1980.

But the "better use of the X field" is obvious - make the displacement
field 16 bits instead of 12 bits. Except, of course, that this would have >killed the SS format of instructions.

Or worse had some instructions with 12 bit displacement and some with 16
which would have been a programming nightmare.

But I don't agree that B+X+D is a bad thing. An extra register add is an >extra instruction. And it's not rarely used; it's used every time an array >is accessed, and arrays are often accessed in inner loops!

A decent optimizing compiler will do strength reduction so there's a register pointing at the array and stepping through it. You're right about register pressure but with 16 registers it shouldn't be hard to find one for an inner loop value.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sun May 24 15:24:22 2026

From Newsgroup: comp.arch

According to quadi <[email protected]d>:

On Sat, 23 May 2026 20:09:54 +0000, John Levine wrote:

Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.

12 bits, of course. And they felt that 12 bits were enough because memory >was such an issue back then.

It was also to force all addresses to be base relative to make code relocatable.

You should read the 1964 paper. It's not very long. Here's a copy:

https://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Papers/IBM360-Amdahl_april64.pdf

In hindsight, of course having a two-bit condition code was a "mistake".
But C hadn't been invented yet, so nobody knew there would be any real use >for unsigned integers.

Sure they did. S/360 had separate unsigned versions of add and subtract instructions. The results were the same but the condition codes were
different and the unsigned versions couldn't overflow. There were also arithmetic and logical shifts.

And the PSW really was full - when IBM went to System/370, they had to >repurpose a bit in the PSW that was already assigned to an existing
feature, ASCII mode. Since nobody ever used it, however, using it instead >for the System/370's "Extended Control Mode", wherein the PSW *did* get >doubled in length was possible.

Yup.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 16:39:25 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 15:24:22 +0000, John Levine wrote:

Sure they did. S/360 had separate unsigned versions of add and subtract instructions. The results were the same but the condition codes were different and the unsigned versions couldn't overflow.

Ah, I didn't remember that!

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 16:44:26 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 15:12:48 +0000, John Levine wrote:

According to quadi <[email protected]d>:

But the "better use of the X field" is obvious - make the displacement >>field 16 bits instead of 12 bits. Except, of course, that this would
have killed the SS format of instructions.

Or worse had some instructions with 12 bit displacement and some with 16 which would have been a programming nightmare.

Of course, the z/Architecture does have instructions with 20 bit
displacements as well as 12 bits. But unlike the case where only the SS instructions have a 12-bit displacement, it has a complete set of
instructions in each size.

And my Concertina II and IV also have 12, 16, and 20 bit displacements -
but it uses a different set of registers as the base registers for each,
and also has a complete set of instructions for each, thus avoiding the nightmare.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sun May 24 17:06:52 2026

From Newsgroup: comp.arch

According to MitchAlsup <[email protected]d>:

In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.

That is the view of MIPS and RISC_V
That is not the view of x86 or ARM or My 66000 or Mc 88K

I suppose, but I don't think any of them reserved four instruction bits
for an index register that's rarely used. On x86 it's one bit in the r/m
field and arguably not even that since it's part of a three bit field
that's overloaded as a register number, or in 32 bit mode one address
form out of 8 that takes an extra byte for the base and index registers.

Vax also had double indexing, but it was an extra prefix byte in the
address field that said add register N scaled by the operand size to
whaver other address followed, so there it was one addrsss mode out of
16.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 17:16:35 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 09:32:07 +0000, Anton Ertl wrote:

Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],

It makes sense to trap on a floating-point overflow, but trapping on an integer overflow is usually a terrible idea.

Before the System/360, it's definitely true that one's complement and sign- magnitude representations of integers were valid options for designers.
I'm not sure of their relative frequency.

I do know of a claim made by one maker of a 24-bit computer in its
advertising literature, and I suspect it did represent the situation then.

Sign-magnitude was what the IBM 704 and its descendants used. As a result,
it was the... aspirational... integer representation.

One's complement was very popular back then - simpler to implement than sign-magnitude, but almost equivalent, in some sense. Thus, one's
complement was the preferred representation in the PDP-4, which also had a limited two's complement capability.

And two's complement was the simplest to implement, and thus chosen where
cost savings were paramount. So the PDP-5 used two's complement.

And then the IBM 360 came along, and woke everyone up to the fact that
there was no real reason to use anything but two's complement.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 17:30:40 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 09:32:07 +0000, Anton Ertl wrote:

Concerning the question about why IBM chose big-endian for the S/360

...I'm not really aware that they had a choice.

Some machines before the IBM System/360 did use little-endian ordering for multiple words, to simplify handling the carries when adding pairs of
words.

Until the PDP-11 came along, though, _nobody_ thought of putting the characters inside a word starting at the least-significant end, so that
the ordering of bytes would be consistent with the ordering of words.

Until the PDP-11 came along, therefore, little-endian wasn't a "thing";
while the most significant part of a two-word integer might be placed
second, so you could fetch the parts in forwards order and start adding
right away, but that wasn't part of a philosophy.

The System/360 _only_ did BCD arithmetic with the SS instructions, it
didn't put BCD in registers. So it wasn't forced to use big-endian by my consistency argument; binary values could still have been little-endian if they had preferred. But the different machines in the System/260 family
had different bus widths.

So they couldn't just be little-endian at the 16-bit level; they would
have had to have been consistent. I suppose they could have thought of it first even if they didn't have the PDP-11 to copy from. But because almost
all their machines were microcoded, they were in a position to do things
like working backwards from the end of a number to do arithmetic to avoid having a severe cost penalty for big-endian.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sun May 24 17:32:10 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Sun, 24 May 2026 09:32:07 +0000, Anton Ertl wrote:

Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],

It makes sense to trap on a floating-point overflow, but trapping on an integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 24 21:39:42 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sun May 24 22:07:18 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior. Otherwise, programs like random number generators wouldn't work.

They work just fine using unSigned integers.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@[email protected] to comp.arch on Sun May 24 15:22:46 2026

From Newsgroup: comp.arch

On 5/24/2026 3:07 PM, MitchAlsup wrote:

quadi <[email protected]d> posted:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

They work just fine using unSigned integers.

Ditto!

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 01:04:36 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

You will find you have no <marketable> choice; you need to support::

Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}

After realizing that I did need a second instruction for unsigned
_division_ I then learned, to my shock, that division was not one, but
two, instructions, at least in my architecture, for integers.

And there didn't seem to be enough opcode space left for Divide Extensibly Unsigned.

I was able to re-adjust the 32-bit operate instructions so that the two
places where only 96 opcodes were provided for the basic operate
instructions could now provide 128 opcodes.

The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could still
fit in, just out of place.

But that meant that this one operation would be missing from the minimum- length immediate instructions, and would still be treated as out of the
basic instruction set, getting immediate instructions that were 16 bits longer, for them.

The Pigeonhole Principle has finally bit me!

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 01:29:37 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 01:04:36 +0000, quadi wrote:

The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could
still fit in, just out of place.

But that meant that this one operation would be missing from the
minimum- length immediate instructions, and would still be treated as
out of the basic instruction set, getting immediate instructions that
were 16 bits longer, for them.

I have found a way around even that problem. There is no use for a "swap immediate" instruction, so I'll put Divide Extensibly Unsigned in its
spot, so it will be in the columns for its types, and put the swap instruction, another exotic one, in the out-of-place spots left over.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Mon May 25 10:23:00 2026

From Newsgroup: comp.arch

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

John Savard

That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)

If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.

It all depends on the language and/or any options the language and tools
might support - and code should be written to work correctly according
to the language rules.

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon May 25 14:28:21 2026

From Newsgroup: comp.arch

David Brown <[email protected]> writes:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.

Most programming environments I have had contact with don't trap on floating-point overflow.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

The question is if an integer overflow means that something went
wrong. Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:

MIPS: add traps on signed overflow, you need to write addu if you
don't want that.

Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Mon May 25 17:18:18 2026

From Newsgroup: comp.arch

On 25/05/2026 16:28, Anton Ertl wrote:

David Brown <[email protected]> writes:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.

Most programming environments I have had contact with don't trap on floating-point overflow.

So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???

The question is if an integer overflow means that something went
wrong.

At the source code level, that is often the case - but not always. I
think it is quite clear that if you do something the language does not
allow, the code is wrong, but it might give the correct results for some
tools nonetheless. And overflow will often mean something went wrong
even when the language (or compiler options) specifically allow it. At
the object code level, things may be different again. (For an obvious example, if you are using a double-width integer type then the source
code may have no overflow but the implementation might use two "add-with-carry" instructions where overflow is a natural part of the implementation.)

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related compiler flags such as "-fsanitize=signed-integer-overflow", a compiler
would check for overflow on "a + b", and report it at runtime.
(Unfortunately, gcc does not do that unless the partial expression is
assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.

If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch. And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)

It is not easy to see how a tool can avoid false positives and false
negatives and also conveniently optimise and re-arrange code.

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:

MIPS: add traps on signed overflow, you need to write addu if you
don't want that.

Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

- anton

Compilers have not always been good at taking advantage of all the
features provided by hardware - nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 16:45:07 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:

You will find you have no <marketable> choice; you need to support::

Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}

After realizing that I did need a second instruction for unsigned
_division_ I then learned, to my shock, that division was not one, but
two, instructions, at least in my architecture, for integers.

And there didn't seem to be enough opcode space left for Divide Extensibly Unsigned.

My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode
bit for non-integer calculation instructions.

I was able to re-adjust the 32-bit operate instructions so that the two places where only 96 opcodes were provided for the basic operate instructions could now provide 128 opcodes.

The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could still fit in, just out of place.

But that meant that this one operation would be missing from the minimum- length immediate instructions, and would still be treated as out of the basic instruction set, getting immediate instructions that were 16 bits longer, for them.

The Pigeonhole Principle has finally bit me!

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 16:49:59 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) posted:

David Brown <[email protected]> writes:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

-----------------

This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:

MIPS: add traps on signed overflow, you need to write addu if you
don't want that.

Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

The worst of all possible semantic encodings

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon May 25 16:43:07 2026

From Newsgroup: comp.arch

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.

OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.

Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:

|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.

As for what gcc-12.2 does for your example on AMD64:

long foo(long a, long b)
{
return a+b-a;
}

is compiled with gcc -O3 -ftrapv to:

0: 48 89 f0 mov %rsi,%rax
3: c3 ret

If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.

Which would you prefer by default?

The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.

And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)

x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.

gcc-12.2 compiles

long bar(long x, long y)
{
return x/2+y/2;
}

on AMD64 to:

gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret

so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.

It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.

If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.

The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.

Compilers have not always been good at taking advantage of all the
features provided by hardware

GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".

nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.

Yes. But I leave that for another day.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 19:20:01 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) posted:

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.

OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.

a/0/b/

Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:

|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.

As for what gcc-12.2 does for your example on AMD64:

long foo(long a, long b)
{
return a+b-a;
}

is compiled with gcc -O3 -ftrapv to:

0: 48 89 f0 mov %rsi,%rax
3: c3 ret

If "trap on overflow" has precise semantics in the code, then this >disables a range of useful optimisations and re-arrangements. If it is >just "use trapping arithmetic instructions", then it will miss many >possible cases of actual overflow in the code, which we might want to >catch.

Which would you prefer by default?

What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.

The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.

Both architectures got this one wrong--IMO--and so does RISC-V.

And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a >combined "(x + y) / 2", but that might introduce overflow.)

x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.

gcc-12.2 compiles

long bar(long x, long y)
{
return x/2+y/2;
}

on AMD64 to:

gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret

so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

Architectures without overflow traps are notorious for excess instruction
count when overflow detection is desired or mandated.

It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.

It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.

Granted, Optimization can do a lot of strange code emission and movement
when one does not care about precise overflow semantics. But, as a whole,
we are a society where we want high HP automobiles more than we want safe automobiles ('we' not including *.gov's).

If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.

It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ?? What if the sub-
sequent instruction shifts the result back and the pair are acting
as a bit-field extract ?? My 66000 has bit field extracts for exactly
this reason. Floating-point has a lot of these cases, too.

The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.

Compilers have not always been good at taking advantage of all the >features provided by hardware

GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".

nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage >of them.

Yes. But I leave that for another day.

A whole new kettle of fish...

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 20:26:24 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.

Yes. And I am used to FORTRAN, which did not trap on integer overflows.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 20:32:15 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 19:20:01 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).

Both architectures got this one wrong--IMO--and so does RISC-V.

You may not have been replying to what Anton Ertl wrote above, since there
was a lot in between that I snipped. But it does mention two architectures that took an approach to trapping on integer overflow... that I also tend
to disagree with.

What I'm used to is the System/360. While it made the mistake of having
two condition code bits instead of NZVC, the idea of having "trap on
overflow" controlled by a bit in the PSW is... what I assumed to be normal
and correct.

I could be wrong, as I haven't examined that approach critically and given full consideration to the alternatives.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Mon May 25 20:32:15 2026

From Newsgroup: comp.arch

David Brown <[email protected]> schrieb:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer
is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

John Savard

That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.

In principle, yes.

In practice, people often used whatever "worked" on their systems.
Implementors have a certain right because they control what their
compiler does or does not do. But users did so, as well, with
Numerical Recipes a(n in)famous example.

And yes, this bites people. You can see this at https://gcc.gnu.org/gcc-13/porting_to.html :

# GCC 13 includes new optimizations which may change behavior
# on integer overflow. Traditional code, like linear congruential
# pseudo-random number generators in old programs and relying on
# a specific, non-standard behavior may now generate unexpected
# results. The option -fsanitize=undefined can be used to detect
# such code at runtime.

# It is recommended to use the intrinsic subroutine RANDOM_NUMBER for
# random number generators or, if the old behavior is desired, to use
# the -fwrapv option. Note that this option can impact performance.

If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)

If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.

It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.

Fortran has no standard way of implementing this unless you
restrict yourself to sizes which do not overflow a signed integer.
Implementing LCGRNGs was one reason why I pushed for unsigned
arithmetic (modulo 2**n) in Fortran. The attempt failed (not
taken up by WG5 after being endorsed by J3), but I implemented it
for gfortran anyway.

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

Sanitizers are also fairly good now, but of course cost performance.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 20:34:41 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

The worst of all possible semantic encodings

Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon May 25 20:45:20 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 16:45:07 +0000, MitchAlsup wrote:

My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode
bit for non-integer calculation instructions.

That's nice. It's not an option I can consider, as having lots of
orthogonal modifiers on instructions would tend to increase their length.
A major goal of the Concertina II, III, and IV architectures is for instructions not to be longer than similar instructions on the Motorola
68020 or the IBM System/360 if at all possible.

Basically, the selling point is... "Your programs only get 10% bigger, if that, and yet you have 32 registers, so they run faster!".

Or they _would_, if the design didn't have so many extra transistors for supporting both IBM-format and Intel-format Decimal Floating Point, old-
style IBM floats, simple floating (You too can work with numbers that go around the world 2 1/2 times!), packed decimal, mixed-radix arithmetic...

But, hey, supporting these things in hardware is faster than doing them in software!

And are people even going to _read_ the part of the manual that
explains... as is noted in the description of the original Concertina architecture...

This chip has 8-way simultaneous multi-threading, but only for programs
which do not make use of extensions to the register set.

Only two programs per core may use the extended register banks with 128 elements.

Only one program per core may use the vector registers for long vector instructions. The 256-bit short vector registers, on the other hand, like
the integer and floating-point registers, are available to all
simultaneous threads.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon May 25 20:32:35 2026

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.

Why do you consider that desirable?

long bar(long x, long y)
{
return x/2+y/2;
}

...

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.

MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.

If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.

It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??

-ftrapv specifies trapping on overflow only for additions,
subtractions, and multiplications.
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Mon May 25 16:34:50 2026

From Newsgroup: comp.arch

On 5/25/2026 9:28 AM, Anton Ertl wrote:

David Brown <[email protected]> writes:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.

Most programming environments I have had contact with don't trap on floating-point overflow.

Many just go Inf...

Division by zero is usually handled by going NaN.

Contrast with integer division by zero which does usually trap.

So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???

The question is if an integer overflow means that something went
wrong. Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

Integer overflow happens far too often for trapping to be a good solution.

We almost need a separate "integer that should not overflow" type, with
more explicit "do something special if it does" semantics.

Though, more likely to be useful would be a "detect if an overflow had happened" mechanism.

errno_t ovfstate;
__int_no_overflow x, y, z;
...
__start_errsense(&ovfstate);
z=x+y;
__end_errsense(&ovfstate);
if(ovfstate&ERRSENSE_FLAG_OVERFLOW)
...

Which would be awkward, but probably more useful than, say, raising a
signal and/or terminating the program.

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:

MIPS: add traps on signed overflow, you need to write addu if you
don't want that.

Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

In practice, given:
We have instructions like ADDW, etc, whose behavior is explicitly to sign-extend the results of 32-bit ADD;
Behavior in practice is often to meticulously follow wrap-on-overflow semantics;
Exceptions to wrap-on-overflow usually exist as edge cases;
Various programs exist that will actively break if wrap-on-overflow is
not the observed behavior in C land;
...

The expectation that 'int' can or meaningfully do something other than
wrap on overflow is more of a fantasy.

Or like some other some other "portability boogeymen":
Non two's complement integer arithmetic;
Big endian machines;
Machines that don't allow unaligned loads and stores;
Types with sizes other than the "usually accepted" set;
...

The argument has often been, "but, 64-bit machines might not provide
native 32-bit arithmetic".

But, often in 64-bit machines, a pattern emerges:
Most ops are full 64-bit;
A subset of instructions have variants that produce sign and/or zero
extended results;
The instructions which produce these results, typically being, the ones
needed to preserve the usual wrap-on-overflow semantics in those places
where something could happen that would produce a deviation from the
expected semantics.

The ones that have zero-extension usually treating signed integers as zero-extended.

The reverse has also been done; treating unsigned as sign-extended, as
in the standard RISC-V ABI, but IMO this is stupid. Even in the absence
of a native zero-extension op (as in plain RV64G), the mess that results
from sign-extending unsigned is worse than the cost of explicit zero extension.

Best case here being to keep values using "native extension":
'int' : Always sign extended;
'unsigned int': Always zero extended.
Then 32-bit types are a strict subset of the 64-bit range, and
up-promotion becomes free. Not sure why some people don't see this as
obvious though. Well, and people keep making the choice of adding
garbage edge cases to RISC-V that would have been entirely unnecessary
if people weren't being stupid about the ABI rules.

But yeah...

But, all this would not be expected to happen unless one accepts that it
is already generally accepted that wrap-on-overflow for 'int' and
similar is the only really practical or viable solution here.

Otherwise, recently:
In my case I decided to live with a "breaking change" in XG3 and to
change some things that may matter later. Then ended up tweaking some
other things on my annoyance list (since I was already breaking existing binaries, better to cluster breakage to a singular event if doing it).

ADD, ADDS.L, and ADDU.L have all been changed from Imm10u/n to Imm10s.
The Imm10u cases are now Imm10s;
The Imm10n sub-case is now dropped/reserved.
May be reused later.
This reclaims 3 out of the 20 Imm10 spots.
Was mostly a case of it being harder to justify the encoding space.
Old behavior will need to remain for XG1 and XG2.
In this case, XG3 will explicitly deviate from XG1 and XG2 here.
Does mean that XG3 now had less ADD/SUB Imm range than XG2, but...
Only goes from 97.1% hit rate to 95.9%,
no significant effect on overall code density.
Could use the RV Imm12 ops (ADDI / ADDIW), but:
Hit rate for the RV ops here is negligible;
Much of these also happen to miss on one or both registers.

The MULS.L and MULU.L ops were also switched to Imm10s.
This means all of the Imm10 ALU ops are now unified on Imm10s.

Relocated TST and TSTN from the F0-8 block (with the XMOV instructions)
to the F0-9 block (with the other CMPxx 3R ops).

A few very rarely used instructions were demoted from 32-bit to 64-bit encodings.

Have experimentally added some 32-bit:
Bcc Rm, Imm6s, (PC, Disp6s)
instructions, where:
Imm6s: Hits ~ 80% of these cases;
Disp6s: Hits ~ 60% of these cases;
Imm5s + Disp7s would hit slightly better, but,
would have needed more new decoder logic...
Resulting in it hitting about half over the:
Bcc Rm, Imm17s, (PC, Disp10s)
Cases, for an overall code-density improvement of ~ 0.5%, ...
Dominant use-case: Final compare-and-branch in a short "for()" loop.
Secondary use-case: Short non-predicated "if()" branches.
But, is out-weighed by said predicated "if()" branches.
Would likely see more use here if not using predication.
If it would have hit for 100% of these, would have saved ~ 1%.

This is debatable.

This reused the encoding spots previously used for the Load-Disp5us ops,
which still exist for XG1 and XG2 (decoder special-case handling), but
were N/A in XG3 (they would be in effect entirely redundant with the
Disp10s forms in XG3; but had non-redundant edge-cases in XG1 and XG2).

Like with the Imm17s+Disp10s ops, these will still depend on the IMMB extension, as they still need the same basic mechanism.

Was a fairly low-priority feature, in any case.

Seemingly running low on obvious optimization paths.

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 22:49:58 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.

Yes. And I am used to FORTRAN, which did not trap on integer overflows.

WATfor and WATfive trapped on integer overflows.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 22:51:42 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Mon, 25 May 2026 19:20:01 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).

Both architectures got this one wrong--IMO--and so does RISC-V.

You may not have been replying to what Anton Ertl wrote above, since there was a lot in between that I snipped. But it does mention two architectures that took an approach to trapping on integer overflow... that I also tend
to disagree with.

What I'm used to is the System/360. While it made the mistake of having
two condition code bits instead of NZVC, the idea of having "trap on overflow" controlled by a bit in the PSW is... what I assumed to be normal and correct.

And what My 66000 does....

I purport that ANY Industrial quality ISA should provide a means to
trap on integer overflow.

I could be wrong, as I haven't examined that approach critically and given full consideration to the alternatives.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 22:59:10 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> posted:

David Brown <[email protected]> schrieb:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer >>> is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

John Savard

That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.

In principle, yes.

Principle is better in theory than in practice.

In practice, people often used whatever "worked" on their systems.

Face it, the poor slug writing the code may not have the faintest
grasp at the system qualities we are discussing, and does not care
to learn as long as he can slug through the writing and his pro-
gram not blow up catastrophically while it is under his purview.

That defines a lot of what is wrong with SW programming today.

Implementors have a certain right because they control what their
compiler does or does not do.

You would be surprised at how little influence implementors have
on compilers and other software.

But users did so, as well, with
Numerical Recipes a(n in)famous example.

And yes, this bites people. You can see this at https://gcc.gnu.org/gcc-13/porting_to.html :

# GCC 13 includes new optimizations which may change behavior
# on integer overflow. Traditional code, like linear congruential
# pseudo-random number generators in old programs and relying on
# a specific, non-standard behavior may now generate unexpected
# results. The option -fsanitize=undefined can be used to detect
# such code at runtime.

My VAX favorite was:

for( int i = 1; i; i+=i )

Traps instead of exiting the loop normally.

# It is recommended to use the intrinsic subroutine RANDOM_NUMBER for
# random number generators or, if the old behavior is desired, to use
# the -fwrapv option. Note that this option can impact performance.

If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)

If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.

It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.

Fortran has no standard way of implementing this unless you
restrict yourself to sizes which do not overflow a signed integer.

Old FORTRAN had no unSigned integer type and no way to avoid overflows.

Implementing LCGRNGs was one reason why I pushed for unsigned
arithmetic (modulo 2**n) in Fortran. The attempt failed (not
taken up by WG5 after being endorsed by J3), but I implemented it
for gfortran anyway.

The hardware, of course, cannot always enable trapping on overflow if it is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).

Sanitizers are also fairly good now, but of course cost performance.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 23:00:32 2026

From Newsgroup: comp.arch

[email protected] (Anton Ertl) posted:

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.

Why do you consider that desirable?

So you can debug production/released code to find subtle errors.

long bar(long x, long y)
{
return x/2+y/2;
}

...

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.

MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.

If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.

It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??

-ftrapv specifies trapping on overflow only for additions,
subtractions, and multiplications.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 23:03:03 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Mon, 25 May 2026 16:45:07 +0000, MitchAlsup wrote:

My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode bit for non-integer calculation instructions.

That's nice. It's not an option I can consider, as having lots of
orthogonal modifiers on instructions would tend to increase their length.

And harm instruction Entropy.

A major goal of the Concertina II, III, and IV architectures is for instructions not to be longer than similar instructions on the Motorola 68020 or the IBM System/360 if at all possible.

Basically, the selling point is... "Your programs only get 10% bigger, if that, and yet you have 32 registers, so they run faster!".

Mine are getting 30% smaller and needing fewer instructions at the same
time

Or they _would_, if the design didn't have so many extra transistors for supporting both IBM-format and Intel-format Decimal Floating Point, old- style IBM floats, simple floating (You too can work with numbers that go around the world 2 1/2 times!), packed decimal, mixed-radix arithmetic...

But, hey, supporting these things in hardware is faster than doing them in software!

And are people even going to _read_ the part of the manual that
explains... as is noted in the description of the original Concertina architecture...

This chip has 8-way simultaneous multi-threading, but only for programs which do not make use of extensions to the register set.

Another One Bites the Dust.....

Only two programs per core may use the extended register banks with 128 elements.

Only one program per core may use the vector registers for long vector instructions. The 256-bit short vector registers, on the other hand, like the integer and floating-point registers, are available to all
simultaneous threads.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon May 25 23:05:06 2026

From Newsgroup: comp.arch

BGB <[email protected]> posted:

On 5/25/2026 9:28 AM, Anton Ertl wrote:

--------------

Integer overflow happens far too often for trapping to be a good solution.

Even on 64-bit variables/machines ??
--- Synchronet 3.22a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Mon May 25 20:02:52 2026

From Newsgroup: comp.arch

On 5/25/2026 3:34 PM, quadi wrote:

On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

The worst of all possible semantic encodings

Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.

Possibly true.

The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.

Though, it is less obvious what a useful behavior is at the language level:
"signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...

Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat;
Overflows traps from any code that naively assumes wrap-on-overflow
semantics;
...

In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...

One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.

Then again, could maybe classify code, say:
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.

"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.

For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.
Instead:
Something like plain ADD plus ADDWU would have made sense.
But, they dropped ADDWU instead (also stupid IMO).

While, granted, a lot of 1 code likely exists, 3 code tends to generate
the vast majority of overflows; and if there is any reasonable
expectation for 'int' to overflow, and it is not desired for int to
overflow.

We mostly ignore 2 vs 4, because standard specifies 4 making 2 to be
purely a programming error, in which case "2" becomes "should have used
a bigger signed type instead".

Then again, could maybe make sense to add a semantic distinction, say:
"int" (plain):
Maybe a case could be made that overflow be assumed unexpected.
"signed int":
Maybe make separate from plain case, explicitly modulo;
So, could be made distinct;
Explicitly like the "unsigned" case in being modulo.
"unsigned int":
Remains the same, no real controversy here.

Or, say:
char, short, int, long, long long:
For code, assume that overflow may be unexpected / undesirable;
signed char, signed int, signed long, signed long long:
Assume signed modulo;
Compiler should, ideally, always produce wrap-on-overflow semantics.
unsigned ...:
Unsigned modulo.

For a compiler, then:
-ftrapv:
May ideally trap on lack of "signed";
Explicit "signed", continues to wrap.
-fwrapv:
Both default and signed will wrap.
Neither:
Dunno, probably better for compiler to assume "-fwrapv" semantics;
Maybe assume UB opts are safe if no "signed".

Well, and for the programmer POV:
If assuming maximum portability:
Only unsigned overflow wrapping is "safe".
If assuming "any reasonable system":
Both will wrap in most cases;
Absent "-fwrapv", UB opts may occur in certain obscure edge cases.
Though usually in the form of "early" vs "late" type promotion;
In most cases, where it does occur, early promotion is benign.
Vs whatever "nasal demons" people may assert.
What else, that it late propmotes?
(as "-fwrapv" semantics would dictate...)

Like, say:
int x;
long z;
...
z = 42 - x;
//Oh no! UB opt has turned this into a 64-bit RSUB instruction!

Yeah...

Granted, ATM, for BGBCC, wouldn't make much difference at present. Could
maybe make sense to add a distinction either to strengthen semantic
analysis, or if I decided to change away from my existing "assume wrap
on overflow semantics as sole option" policy. Or maybe adding an
"-fno-wrapv" option, with "wrapv" remaining default but allowing an
option to opt-out, sort of like how there is an "-fptropts" option to
"opt into" strict-aliasing / TBAA semantics, vs the default semantics of "assume every explicit store may alias" semantics. Though, may still
assume that loads may be cached and reordered, unless "volatile" is
used, which explicitly disallows caching and reordering loads, though at present is a little "shotgun" and will basically disable caching
throughout the whole basic block; which works as a detractor to the
"casually use volatile as a way to dispel TBAA" interpretation (works on
GCC, and is less adverse for performance than the "use memcpy" option on
some other compilers, ...).

Or, say:
Bare pointer cast and deref:
GCC: averse (falls afoul of default semantics);
MSVC: benign;
BGBCC: benign.
Volatile pointer cast and deref:
GCC: benign (doesn't use TBAA on volatile pointers);
MSVC: benign;
BGBCC: detrimental, disables caching and ld/st reordering;
Using memcpy:
GCC: benign;
MSVC:
Old (15+ years):
Averse (actually calls memcpy, significant impact);
Some intermediate versions would do an inline for "REP MOVSB".
Also kinda crap, but less bad vs calling "memcpy()".
Mostly only matters if still targeting WinXP or similar.
Newer: Mild detriment in some cases.
Inline loads/stores
may fail to optimize to plain register moves for locals.
BGBCC;
Mostly similar to newer MSVC here;
Works, just less efficient than plain "cast and deref".

...

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Mon May 25 15:27:29 2026

From Newsgroup: comp.arch

An awkward thing about using trap on overflow is determining how
precisely it is defined.

Indeed, this is a nasty part of language design.

[ IMO, the only sane choice (beside wrapping and explicit `ckd_add`) is
to treat overflow not as a exception (in the sense of `try..catch`
thingies, not in the CPU hardware sense of the word) but as an
execution error comparable to memory exhaustion. ]

Luckily, for `comp.arch` the same problem doesn't plague ISAs because
it's accepted that a CPU should stick religiously to the literal
semantics of the machine code, no matter how far it is from what
really happens inside the machine.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Tue May 26 05:39:02 2026

From Newsgroup: comp.arch

quadi <[email protected]d> schrieb:

On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:

The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.

Yes. And I am used to FORTRAN, which did not trap on integer overflows.

Incorrect.

Integer overflow is illegal in Fortran, so what the compiler then
does is not determined (see my post on random number generators).

Example:

$ cat overfl.f90
program main
integer :: a, b
a = 12345678
b = 2345678
print *,a*b
end program main
$ gfortran -fsanitize=undefined overfl.f90
$ ./a.out
overfl.f90:5:13: runtime error: signed integer overflow: 12345678 * 2345678 cannot be represented in type 'integer(kind=4)'
-1979197244
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Tue May 26 08:18:17 2026

From Newsgroup: comp.arch

On 25/05/2026 18:43, Anton Ertl wrote:

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler
would check for overflow on "a + b", and report it at runtime.
(Unfortunately, gcc does not do that unless the partial expression is
assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.

OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.

Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:

|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.

My understanding is that the GCC developers would rather deprecate
-ftrapv entirely, and encourage the use of -fsanitize instead as a way
to detect run-time errors. I don't know the details of the internals,
but I believe the GCC developers see the sanitize options as more
accurate and more likely to be further developed in the future.

As for what gcc-12.2 does for your example on AMD64:

long foo(long a, long b)
{
return a+b-a;
}

is compiled with gcc -O3 -ftrapv to:

0: 48 89 f0 mov %rsi,%rax
3: c3 ret

If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.

Which would you prefer by default?

I don't know for sure. A "by default" choice has to be suitable for a
wide variety of users and a wide variety of cases, and preferably err on
the side of caution. For my own personal use, I'm happy with UB
overflow and would have preferred that as the default even for unsigned arithmetic (but of course with a way to specify wrapping when I need
it). But that's for /my/ use - I don't think that should necessarily be
the default for others. Let those who are willing to spend the time and effort learning the details and the care needed use compiler flags to
get the highest efficiency from their code, and let the defaults help
others catch their bugs. However, the logical endpoint of that is that
C should only be used by those that have a detailed understanding of the language and need it for peak efficiency, while other programmers should
work with other languages that have more error handling.

The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.

And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the
expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)

x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.

True. Can we pretend that is not the case, and still see my point? The
point is that the compiler can, during re-arrangements, introduce new overflows as long as it knows the final results are correct (since the compiler knows the details of how instructions are actually implemented).

gcc-12.2 compiles

long bar(long x, long y)
{
return x/2+y/2;
}

on AMD64 to:

gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret

so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

Agreed. I don't know why GCC uses a function call here. In my quick
godbolt testing, clang uses the "add, jump-on-overflow" sequence.

Using

-fsanitize=signed-integer-overflow -fsanitize-trap

gives an add followed by a jump-on-overflow sequence.

It is not easy to see how a tool can avoid false positives and false
negatives and also conveniently optimise and re-arrange code.

It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.

If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.

If "-ftrapv" is to have any use at all, then overflow is no longer UB -
it has to be defined to trap. But I have to conclude that in GCC,
-ftrapv is too vaguely defined and too inconsistently and inefficiently implemented to be of any use. This matches my understanding that the "-fsanitize=signed-integer-overflow -fsanitize-trap" flags are preferred
by the GCC developers.

The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.

Compilers have not always been good at taking advantage of all the
features provided by hardware

GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".

That is my experience too (though I expect your experience here vastly outweighs mine).

nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.

Yes. But I leave that for another day.

Good idea :-)

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Tue May 26 08:27:28 2026

From Newsgroup: comp.arch

On 26/05/2026 01:00, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.

Why do you consider that desirable?

So you can debug production/released code to find subtle errors.

I think that when an unexpected error is detected (whether it is with
hardware acceleration, like trap on overflow, or via explicit generated
code), the way to handle it depends strongly on the situation. If a
debugger is present, then it is most helpful to lead to a debugger break
so that the developer can figure out what went wrong. When not
debugging, there is no sensible default handling that works for jet
engine controllers and video game frame generators.

But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and
"debug" builds. (Of course you might temporarily do builds with
different flags while chasing down a particular bug.)

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue May 26 15:13:31 2026

From Newsgroup: comp.arch

On Sun, 24 May 2026 16:39:25 +0000, quadi wrote:

On Sun, 24 May 2026 15:24:22 +0000, John Levine wrote:

Sure they did. S/360 had separate unsigned versions of add and subtract
instructions. The results were the same but the condition codes were
different and the unsigned versions couldn't overflow.

Ah, I didn't remember that!

I just looked it up. It was, and is, the Add Logical instruction.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue May 26 18:02:51 2026

From Newsgroup: comp.arch

BGB <[email protected]> posted:

On 5/25/2026 3:34 PM, quadi wrote:

On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

The worst of all possible semantic encodings

Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.

Possibly true.

The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.

Though, it is less obvious what a useful behavior is at the language level:
"signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...

The important property is that overflow is detected precisely.
Whether {trap, signal, throw} is performed is an environmental choice
not an ISA choice.

Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat; Overflows traps from any code that naively assumes wrap-on-overflow semantics;
...

In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...

One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.

If(overflow(??)) requires some flag to carry overflow from point of
detection to if(()).

And what happens if there is more than 1 overflow ??

Then again, could maybe classify code, say:
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.

5, a language hint about in-range, wrap, trap, signal, throw

"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.

For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.

You would prefer::

AND R7,Rleft,#~(~0<<31)
AND R8,Rright,#~(~0<<31)
ADD Rd,R7,R8
AND Rd,Rd,#~(~0<<31)

That is ADDW range limits operands and performs a shorter ADD.
Matching C's int a,b; semantic. In general the integer instructions
ending with W apply C's int properties to the arithmetic. If compilers
were (WERE) really good at range determination those instructions would
be unnecessary--but they are not.

I (My 66000) had to put in sized integer calculation reasons, and by
doing so, gained 2%-4% in code density and a bit more in latency. -----------------------
--- Synchronet 3.22a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Tue May 26 14:28:56 2026

From Newsgroup: comp.arch

On 5/26/2026 1:02 PM, MitchAlsup wrote:

BGB <[email protected]> posted:

On 5/25/2026 3:34 PM, quadi wrote:

On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).

The worst of all possible semantic encodings

Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even >>> test for fixed-point overflow, is a much worse idea.

Possibly true.

The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.

Though, it is less obvious what a useful behavior is at the language level: >> "signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...

The important property is that overflow is detected precisely.
Whether {trap, signal, throw} is performed is an environmental choice
not an ISA choice.

Yeah.

Say:
ADDV Rs, Rt, Rd
BT __trap_overflow

Which is how I would assume doing it, if I were to re-add ADDV to my ISA
(this had existed in SuperH and BJX1, but got lost along the way, but
could re-add if needed; just it was less often needed than even ADC/ADDC).

Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat;
Overflows traps from any code that naively assumes wrap-on-overflow
semantics;
...

In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...

One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.

If(overflow(??)) requires some flag to carry overflow from point of
detection to if(()).

And what happens if there is more than 1 overflow ??

Dunno.
You would need to set a start point and an end/detection point, and have
some way for the compiler to know to track overflows.

Say:
ADDV ...
OR?T Re, 0x100, Re

Then a way to feed Re back into C land to act upon.

There could maybe either be a 32-bit variant (ADDV.L), or some shorthand
way to detect that the value has gone outside of 32-bit range.

Then again, could maybe classify code, say:
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.

5, a language hint about in-range, wrap, trap, signal, throw

Well, possible, but C doesn't have any hints here...

But, yeah:
Leaving plain 'int' as the "probably shouldn't overflow" and 'signed
int' and 'unsigned int' as "wrap on overflow expected" could make sense.

"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.

For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.

You would prefer::

AND R7,Rleft,#~(~0<<31)
AND R8,Rright,#~(~0<<31)
ADD Rd,R7,R8
AND Rd,Rd,#~(~0<<31)

That is ADDW range limits operands and performs a shorter ADD.
Matching C's int a,b; semantic. In general the integer instructions
ending with W apply C's int properties to the arithmetic. If compilers
were (WERE) really good at range determination those instructions would
be unnecessary--but they are not.

I (My 66000) had to put in sized integer calculation reasons, and by
doing so, gained 2%-4% in code density and a bit more in latency. -----------------------

OK.

Ironically, the 4-op sequence above would have been a single "ADDWU" instruction in the RV BitManip drafts, but ADDWU was dropped as arguably
it didn't make a big enough difference on SPEC scores. They decided to
keep a whole bunch of other random crap though that serves no real
purpose other than to micro-optimize the benchmarks...

I revived this for my own extensions, but left out ADDIWU as it was
still not common enough to justify the encoding space cost (if one has jumbo-prefixes, this could be handled well enough via
immediate-synthesis, and the 64-bit encoding wasn't too bad for
something that is comparably infrequent).

...

--- Synchronet 3.22a-Linux NewsLink 1.2

From George Neuner@[email protected] to comp.arch on Tue May 26 15:29:08 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 23:05:06 GMT, MitchAlsup
<[email protected]d> wrote:

BGB <[email protected]> posted:

On 5/25/2026 9:28 AM, Anton Ertl wrote:

--------------

Integer overflow happens far too often for trapping to be a good solution.

Even on 64-bit variables/machines ??

Yes if there are options for 8/16/32 bit ops in 64 bit registers.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Tue May 26 22:09:28 2026

From Newsgroup: comp.arch

David Brown wrote:

On 26/05/2026 01:00, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.>>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.

Why do you consider that desirable?

So you can debug production/released code to find subtle errors.

I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break
so that the developer can figure out what went wrong. When not
debugging, there is no sensible default handling that works for jet
engine controllers and video game frame generators.

But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and
"debug" builds. (Of course you might temporarily do builds with
different flags while chasing down a particular bug.)

I tend to like "Release with sometimes hard-to-grok debug info",
typically resulting in a separate file with a best effort debug map of
the executable.
Then I can at least get some help when running the debugger and trying
to binary search my way into the spot where the bug resides.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue May 26 20:54:30 2026

From Newsgroup: comp.arch

Terje Mathisen <[email protected]> posted:

David Brown wrote:

On 26/05/2026 01:00, MitchAlsup wrote:

[email protected] (Anton Ertl) posted:

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >>>> trapping on overflow without code substitution or being re-compiled. >>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.

Why do you consider that desirable?

So you can debug production/released code to find subtle errors.

I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break so that the developer can figure out what went wrong. When not debugging, there is no sensible default handling that works for jet
engine controllers and video game frame generators.

But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and "debug" builds. (Of course you might temporarily do builds with different flags while chasing down a particular bug.)

I tend to like "Release with sometimes hard-to-grok debug info",
typically resulting in a separate file with a best effort debug map of
the executable.

Encrypt the debug information (and put it in a {1234-5678-9101-1121-...} folder) so that only the owner (not licensee) of the code can debug
it.

Then I can at least get some help when running the debugger and trying
to binary search my way into the spot where the bug resides.

Terje

--- Synchronet 3.22a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Tue May 26 19:13:21 2026

From Newsgroup: comp.arch

On 5/26/2026 2:29 PM, George Neuner wrote:

On Mon, 25 May 2026 23:05:06 GMT, MitchAlsup <[email protected]d> wrote:

BGB <[email protected]> posted:

On 5/25/2026 9:28 AM, Anton Ertl wrote:

--------------

Integer overflow happens far too often for trapping to be a good solution. >>

Even on 64-bit variables/machines ??

Yes if there are options for 8/16/32 bit ops in 64 bit registers.

32-bit overflow is the dominant scenario here.
While 8 and 16-bit ranges do overflow readily, the normal semantics are
for them to auto-promote to 32 bits before then being narrowed back down
to 8 or 16 bits, so they don't count.

Ironically, for my BS2 language, the semantics were in cases like this
to instead auto-promote to 64 bits; but can't really do this for C as it
gives different results in some cases (and early promotion is itself a
bug, even if early promotion would often be the most natural semantics
for a 64-bit machine).

Well, and there is the usual thing that one can't usually allow a
variable to hold values outside the range of what would be allowed for
that variable.

Well, except for floating-point types, where typically code doesn't care
about out of ranges of values (if a value fails to go to 0 or Inf in a computation in local variables, typically no one cares).

For float, it isn't obvious because the dynamic range of Binary32 is
already quite large. A "short float" effectively having Binary64's
dynamic range when used in scalar computations is a bit incredulous, but
given these smaller formats are non-standard anyways, it reasonable to
be like "these formats are only necessarily confined to their formal
range when in-memory, otherwise all bets are off".

Or: precision and dynamic range >= requested format.

Code can't entirely rely on the higher precision though, as the format
may also revert to its defined precision without warning (even if
intermediate computations may potentially wildly exceed it).

But, then again, this would be analogous to if one has an FPU with
native Binary128, occasionally performing "double" calculations at
Binary128 precision even though "double" is stated as Binary64.

Well, or implementing some operations by widening temporarily to a higher-precision format before narrowing the result.

Though, OTOH, the main use-case for things like scalar "short float" is
more for saving memory in structs and arrays, not for trying to rely on
its crappy range and precision.

So, floating point math is very different from integer math in this regard.

...

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Wed May 27 10:59:31 2026

From Newsgroup: comp.arch

MitchAlsup [2026-05-26 20:54:30] wrote:

Encrypt the debug information (and put it in
a {1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.

I resent that. All code should be Free Software.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed May 27 18:19:49 2026

From Newsgroup: comp.arch

On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:

MitchAlsup [2026-05-26 20:54:30] wrote:

Encrypt the debug information (and put it in a
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.

I resent that. All code should be Free Software.

It is wonderful that we have the open-source software movement.

However, people have the right to the fruit of their labors. To give them
away for free is generous, but it should remain a personal choice.

Of course, copyright has been misused, and deserves a critical
examination, not the sort of uncritical expansion given to it by
legislators in the United States - and imposed on the rest of the world by trade threats.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Wed May 27 15:24:09 2026

From Newsgroup: comp.arch

On 5/25/2026 5:59 PM, MitchAlsup wrote:

Thomas Koenig <[email protected]> posted:

David Brown <[email protected]> schrieb:

On 24/05/2026 23:39, quadi wrote:

On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

It makes sense to trap on a floating-point overflow, but trapping on an >>>>>> integer overflow is usually a terrible idea.

So, detecting something went wrong and you should inform the programmer >>>>> is a bad idea ???

No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.

John Savard

That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.

In principle, yes.

Principle is better in theory than in practice.

In practice, people often used whatever "worked" on their systems.

Face it, the poor slug writing the code may not have the faintest
grasp at the system qualities we are discussing, and does not care
to learn as long as he can slug through the writing and his pro-
gram not blow up catastrophically while it is under his purview.

That defines a lot of what is wrong with SW programming today.

Implementors have a certain right because they control what their
compiler does or does not do.

You would be surprised at how little influence implementors have
on compilers and other software.

Yeah.

You can design the ISA and compiler as one likes.
But, if existing C code breaks, well then this is not good.

One might think:
You know, wrap on overflow, and type promotion where it overflows and
wraps, and *then* promotes to the wider type on the final assignment, is
kinda stupid and sucks.

And, if one goes by "well, signed overflow is UB anyways", then they
should be able to turn it into a "promote first, then ADD" scenario (may
be both potentially faster, and less likely to lose information).

I would be inclined to agree.

But... there is old code around that will quietly break if the integer overflow and promotion doesn't follow the specific behavior that mimics
how it would have behaved on 32-bit systems.

I vaguely remember a case of this involving some robot enemies that
drive around in ROTT, where if the integer overflow failed to work in
just the right way, they would all miss their way-points and end up
crashing into walls or similar.

Where, the robot enemies followed a path defined as a series of
waypoints (in a grid world), and once the robot hits a particular spot
on the grid cell, it will change directions and head along the path.
But, the particular way the expression to handle this was written was sensitive to the type promotion and wrap-on-overflow semantics in C.

Also a similar case involving the "elevators", which were effectively
timed teleporters between different parts of the map (would close door,
play elevator sound, then right at the end as the door opens, it would teleport the player to the other location and initiate a screen shaking
effect at around the same time). If the overflow was wrong, the teleport
would fail and the player would still be in the original location.

One could fix this stuff with casts or similar, but, when does one draw
the line exactly?...

Easier sometimes to make it to work, than to try to justify the code was already broken due to reliance on UB.

Well, and to match the behavior of the other compilers, needed to
implement the behavior the way ROTT expected.

Where, as noted, ROTT uses fixed-point math with "fixed" as a signed
32-bit integer, and some cases involve calculations with coordinates
well outside the world bounds with the seeming intention that these
high-order components simply disappear into the ether (with the world essentially treated as a wrapping modulo space).

But, as noted, it differed from my BS2 language, where the default was effectively to auto-promote values to the widest reasonable integer type
in these cases and then drop down to the final range afterwards (to
avoid some integer overflows in cases they would happen in C).

Well, and within BGBCC, there was some non-zero bleed-over between C and
BS2 (where originally I had been implementing BS2 via BGBCC, with the intention that it would compile to an IL image that would then be run in
the VM).

The original VM however, while fast, ended up with horrible code-bloat.
Had gotten creative with the use of the C preprocessor in ways that were ultimately a terrible idea (errm, trying to use it sorta like a
poor-man's version of C++ templates). Binaries got huge, build times
sucked. This VM was a dead end.

Ironically, some of my current ISA projects were built on some of the groundwork left by this experiment, but also as a warning for something
not to do.

Or, when I learned the merit of actually writing all the opcode handler functions and similar by hand and not trying to do combinatorial stuff
via the preprocessor.

Also for the follow up VM (for BS2), had went back to ye-olde stack
machine (vs a Register IR model). But, some parts of this were relevant
to targeting an "actual CPU".

The way JX2VM works isn't too far removed from those VMs in some ways,
apart from JX2VM's general avoidance of getting too clever with the C preprocessor.

...

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sat May 30 04:02:45 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 23:03:03 +0000, MitchAlsup wrote:

Another One Bites the Dust.....

Yes, it certainly is true that Concertina IV retains a lot of baggage
which might be considered silly from even the original Concertina design.

And, since I have a "set flag" instruction still... I needed to have predicated instructions. So I added those in... giving an instruction
format which included either a predicated 32-bit instruction, or a
predicated pair of 16-bit short instructions... which now could have full register specifications! And with predicated instructions, I also brought
back the break bit.

So even without block structure, I brought back VLIW features!

I was so dismayed by how limited my 16-bit short instructions were, that
this was nice - but having two 16-bit short instructions inside a 48-bit instruction was not a gain on using 24-bit short instructions instead!

Well, I added a new 80-bit instruction format, which no longer allowed predication, but which allowed those short instructions to be used with
less overhead.

I felt I could do even better. I wanted to add 112-bit instructions, to
split the 16 bits of overhead between three pairs of these nicer short instructions. It was hard to find the opcode space for them, but I finally
did it.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sat May 30 04:06:14 2026

From Newsgroup: comp.arch

On Mon, 25 May 2026 23:03:03 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

A major goal of the Concertina II, III, and IV architectures is for
instructions not to be longer than similar instructions on the Motorola
68020 or the IBM System/360 if at all possible.

Basically, the selling point is... "Your programs only get 10% bigger,
if that, and yet you have 32 registers, so they run faster!".

Mine are getting 30% smaller and needing fewer instructions at the same
time

Well, then you're obviously doing something amazing with MY 68000, and I
don't have the experience to know which modifier bits, if added, would
save instructions often enough to more than pay for the space they take up.

I have to be content with doing the best I can, despite not being capable
of doing much more than slavishly copying existing commercial
architectures.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sat May 30 15:47:00 2026

From Newsgroup: comp.arch

On Sat, 30 May 2026 04:02:45 +0000, quadi wrote:

So even without block structure, I brought back VLIW features!

I had a little opcode space remaining. So now I have made what is perhaps
my maddest addition to the Concertina IV architecture yet!

In the normal instruction set of the Concertina IV, it was necessary to
extend the 32-bit instruction set to intrude, ever so slightly, into the portion of the opcode space where instructions begin with 11.

This was because in the 3/4 of the opcode space initially allocated to 32-
bit instructions, there wasn't quite enough room for a Halfword Immediate instruction that was 32 bits long, but allowed all 32 registers to be used
as destination registers.

Well, for the primary instruction set, this was no real problem. It may
have made decoding the lengths of instructions less simple and elegant,
but there was still enough space for instructions longer than 32 bits and
for the short instructions, both 16-bit and 24-bit - which chopped that remaining space up into pieces anyways.

But in the 48-bit instructions with an instruction that can be predicated,
and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.

So in there, the opcode space of 32-bit instructions starting with 11 is almost completely unused... but I can't use it for paired 15-bit short instructions because of that Halfword Immediate instruction.

Well, now the Halfword Immediate instruction for that case has been
modified, so that paired short instructions including short instructions
other than register-to-register operate instructions can be used.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 30 19:03:18 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Mon, 25 May 2026 23:03:03 +0000, MitchAlsup wrote:

quadi <[email protected]d> posted:

A major goal of the Concertina II, III, and IV architectures is for
instructions not to be longer than similar instructions on the Motorola
68020 or the IBM System/360 if at all possible.

Basically, the selling point is... "Your programs only get 10% bigger,
if that, and yet you have 32 registers, so they run faster!".

Mine are getting 30% smaller and needing fewer instructions at the same time

Well, then you're obviously doing something amazing with MY 68000, and I

s/68/66/

don't have the experience to know which modifier bits, if added, would
save instructions often enough to more than pay for the space they take up.

1) never use instructions to paste constant bits together
2) never use LDs to fetch constants from data-memory
3) provide ENTER and EXIT to setup and tear-down stack frames
4) provide [Rbase + Rindex<<scale + Displacement] addressing
5) encode orthogonal features in a single encode field
6) spend years reading ASM code from your compiler

The rest (encoding) is the easy part.

I have to be content with doing the best I can, despite not being capable
of doing much more than slavishly copying existing commercial
architectures.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat May 30 19:15:56 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Sat, 30 May 2026 04:02:45 +0000, quadi wrote:

So even without block structure, I brought back VLIW features!

I had a little opcode space remaining. So now I have made what is perhaps
my maddest addition to the Concertina IV architecture yet!

In the normal instruction set of the Concertina IV, it was necessary to extend the 32-bit instruction set to intrude, ever so slightly, into the portion of the opcode space where instructions begin with 11.

This was because in the 3/4 of the opcode space initially allocated to 32- bit instructions, there wasn't quite enough room for a Halfword Immediate instruction that was 32 bits long, but allowed all 32 registers to be used as destination registers.

Yet, My 66000 only has 29 instructions that use 16-bit (or larger) in instruction constants (immediates and displacements)--this includes 2 instructions for Branch on Bit, 2 instructions for branch on condition,
2 26-bit branch instructions, 13 Disp16 memory references, {9 integer,
and 2 miscellaneous instructions} with 16-bit immediates.

Only 29 from an OpCode space of 64 slots with 6 permanently reserved to
prevent executing code. So, only 1/2 my Major OpCode space is used with immediates--with 16-slots available for the future (22 if you count the reserved slots).

Well, for the primary instruction set, this was no real problem. It may
have made decoding the lengths of instructions less simple and elegant,
but there was still enough space for instructions longer than 32 bits and for the short instructions, both 16-bit and 24-bit - which chopped that remaining space up into pieces anyways.

It costs me only 6 gates (2 gates of delay) to decode the length of an instruction--whereas it takes 4 gates to decode S/360 2-bit code for instruction length.

But in the 48-bit instructions with an instruction that can be predicated, and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.

An architecture is just as much about what you leave out as what you
put in.

So in there, the opcode space of 32-bit instructions starting with 11 is almost completely unused... but I can't use it for paired 15-bit short instructions because of that Halfword Immediate instruction.

Based on my above: you should not need more than 1/2 OpCode space for instructions with 16-bit immediates.

Well, now the Halfword Immediate instruction for that case has been modified, so that paired short instructions including short instructions other than register-to-register operate instructions can be used.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 31 01:22:51 2026

From Newsgroup: comp.arch

On Sat, 30 May 2026 15:47:00 +0000, quadi wrote:

On Sat, 30 May 2026 04:02:45 +0000, quadi wrote:

So even without block structure, I brought back VLIW features!

I had a little opcode space remaining. So now I have made what is
perhaps my maddest addition to the Concertina IV architecture yet!

At least this reminded me that embedding instructions inside long
instructions is, in one very important respect, very different from having
a block structure for program code. So I have now added a warning about
how branching to an embedded instruction will not work unless a number of strict conditions are met.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 31 02:57:08 2026

From Newsgroup: comp.arch

On Sat, 30 May 2026 15:47:00 +0000, quadi wrote:

So in there, the opcode space of 32-bit instructions starting with 11 is almost completely unused... but I can't use it for paired 15-bit short instructions because of that Halfword Immediate instruction.

Well, now the Halfword Immediate instruction for that case has been
modified, so that paired short instructions including short instructions other than register-to-register operate instructions can be used.

I felt that this, while tempting, was still a crazy idea. But now I see
what my subconscious motivation could have been.

Adding this additional, seemingly redundant, short instruction
capability... now makes it possible to think of removing the one feature
of Concertina IV that I dislike the most: the 24-bit short instructions.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 31 12:05:00 2026

From Newsgroup: comp.arch

On Sun, 31 May 2026 01:22:51 +0000, quadi wrote:

At least this reminded me that embedding instructions inside long instructions is, in one very important respect, very different from
having a block structure for program code. So I have now added a warning about how branching to an embedded instruction will not work unless a
number of strict conditions are met.

And now I've added the Branch to Embedded instruction, which points to the larger instruction, and then indicates which embedded instruction within
it to which control is to be transferred as a method of avoiding these restrictions, should anyone ever need such an instruction.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Sun May 31 17:26:41 2026

From Newsgroup: comp.arch

On Sun, 31 May 2026 12:05:00 +0000, quadi wrote:

And now I've added the Branch to Embedded instruction, which points to
the larger instruction, and then indicates which embedded instruction
within it to which control is to be transferred as a method of avoiding
these restrictions, should anyone ever need such an instruction.

And now a minor change: since the opcode space was available, the shift instructions, not only the operate instructions, among the 24-bit short instructions, may now affect the condition codes.

Oh yes, and I've added 144-bit instructions that provide four embedded 32-
bit instructions with an explicit indication of parallelism.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Sun May 31 11:05:37 2026

From Newsgroup: comp.arch

On 5/30/2026 12:15 PM, MitchAlsup wrote:

quadi <[email protected]d> posted:

snip

But in the 48-bit instructions with an instruction that can be predicated, >> and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.

An architecture is just as much about what you leave out as what you
put in.

John's answer - leave out as little as possible, preferably nothing! :-)
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sun May 31 18:40:48 2026

From Newsgroup: comp.arch

Stephen Fuld <[email protected]d> posted:

On 5/30/2026 12:15 PM, MitchAlsup wrote:

quadi <[email protected]d> posted:

snip

But in the 48-bit instructions with an instruction that can be predicated, >> and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.

An architecture is just as much about what you leave out as what you
put in.

John's answer - leave out as little as possible, preferably nothing! :-)

Which is why his architecture is converging so rapidly.

NOT.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 01:14:12 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 17:47:59 +0000, John Levine wrote:

Having looked into this in some detail, both when IBM used bigendian
order on S/360 and DEC used little-endian on the PDP-11, neither
documented the reasons for the byte order choice at all. Not even a
litle bit.

I suppose that, at the time, it was something that nobody felt was
important enough to document.

But to people who were around back then, the reasons would have been
obvious.

IBM mainframes were designed to ooze quality! So here and there, an extra transistor or two was added if something seemed better. That's why the IBM 7090 used sign-magnitude arithmetic for integers.

And that's why the IBM 360 jumped ahead to the end of an integer and
worked backwards to add, because putting things in reverse order would
have shouted cheap.

Plus, the 360 came in a variety of bus widths. So when would you start
putting the small part first? (They didn't know the answer the PDP-11 came
up with. Nobody back then could even imagine it, it was so new.)

The original PDP-11 only came with a 16-bit bus. But its designers aspired
to the level of consistency that the 360 had, but they wanted to do it on
a rock-bottom minicomputer budget. DEC minis, in fact, were cheaper than
most other brands of minicomputer at the time.

So they were going to put the most significant 16-bit word of a 32-bit
integer last. But they got the brilliant idea - that more pedestrian
designers would never even considered for a second, or even thought of as possible - of numbering the bytes in a word backwards too, so as to attain consistency.

The PDP-11 made little-endian a thing. It was so new that the people
designing the floating-point unit didn't get the memo. But the concept of making little-endian consistent, instead of something you did in one particular case, the case where something was twice the size of your
biggest register... that was only born with the PDP-11.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 01:40:59 2026

From Newsgroup: comp.arch

On Sun, 31 May 2026 11:05:37 -0700, Stephen Fuld wrote:

On 5/30/2026 12:15 PM, MitchAlsup wrote:

An architecture is just as much about what you leave out as what you
put in.

Those are words of wisdom, undoubtedly.

John's answer - leave out as little as possible, preferably nothing!

So why do I choose openly to defy good sense, and neglect them?

That's a fair question.

My answer, though, is a simple one. I've opened my eyes, and looked at the world around me.

When it comes to desktop computers, the ones people generally use when
trying to solve a problem more serious than could be dealt with on a smartphone... what processor is in them?

Well, there _is_ the Macintosh, which also used x86 for a time, but is now using ARM.

But in general, x86 is dominant. There's too much software written to run
on x86 Windows.

So what I've learned is that the world of computer architectures seems to
be like _Highlander_... "There can be only one".

And if that one leaves out a feature, then that means that feature is basically not available. I want everyone to have a chance to efficiently
solve their problems, whatever special instructions or data formats they
may need.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Mon Jun 1 02:20:36 2026

From Newsgroup: comp.arch

It appears that quadi <[email protected]d> said:

On Wed, 20 May 2026 17:47:59 +0000, John Levine wrote:

Having looked into this in some detail, both when IBM used bigendian
order on S/360 and DEC used little-endian on the PDP-11, neither
documented the reasons for the byte order choice at all. Not even a
litle bit.

I suppose that, at the time, it was something that nobody felt was
important enough to document.

Evidently.

But to people who were around back then, the reasons would have been >obvious.

As I may have said once or twice before, we have plenty of guesses, but
since there is no documentation, the guesses are a waste of time.

IBM mainframes were designed to ooze quality! So here and there, an extra >transistor or two was added if something seemed better. That's why the IBM >7090 used sign-magnitude arithmetic for integers.

The 7090 used sign-magnitude because the vacuum tube 709 used sign magnitude because the 704 used sign-magnitude and they quite reasonably wanted to keep them program compatible. The preceding 701 was also sign-magnitude but had a strange addressing scheme which let you treat memory (which was flaky Williams tubes) as either 36 bit full words or 18 bit half words. Full words were addressed by even negative addresses from -0000 to -4094 while half words were even and odd positive addresses from +0000 to +4095. The 704 did not do that, thank heavens.

I presume you are aware that the 704 and successors did indexing by two's complement subtraction, which is not sign-magnitude. There is no documentation for that either, and I have looked quite hard. Pretty please, do not guess unless you can cite sources.

And that's why the IBM 360 jumped ahead to the end of an integer and
worked backwards to add, because putting things in reverse order would
have shouted cheap.

IBM's 702, 705, and 7080 decimal mainframes addressed the low digit of a number and I can assure you they were not cheap.

The original PDP-11 only came with a 16-bit bus. But its designers aspired >to the level of consistency that the 360 had, but they wanted to do it on
a rock-bottom minicomputer budget. DEC minis, in fact, were cheaper than >most other brands of minicomputer at the time.

I am familiar with this guess, but having looked at a lot of contemporary DEC documentation, there is no reason to believe it's true. If they saved any transistors by making it little-endian, the difference was trivial.

You should look at the DG Nova, designed by some DEC renegades, really cheap due
to using then-new MSI chips, and word addressed with a bigendian feel.

The PDP-11 made little-endian a thing. It was so new that the people >designing the floating-point unit didn't get the memo.

Nor did the people designing the extended multiplier, but they got it
mostly conssitent in the Vax.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Mon Jun 1 05:36:10 2026

From Newsgroup: comp.arch

quadi <[email protected]d> schrieb:

So what I've learned is that the world of computer architectures seems to
be like _Highlander_... "There can be only one".

That is what people thought about the /360 until the Minis came
along, where companies were content with lower margins to serve
new markets and customers at lower margins, but higher volume.

And then RISC, and PCs... and the low end that PCs are being attacked
from right now is mobile devices, and ARM.

For this kind of cycle, I highly recommend reading https://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma (the book
not the Wikipedia article itself) It talks a lot about hard drives,
but parallels to computers are obvious.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon Jun 1 07:47:42 2026

From Newsgroup: comp.arch

John Levine <[email protected]> writes:

I presume you are aware that the 704 and successors did indexing by two's >complement subtraction, which is not sign-magnitude.

Looking at the 704 manual <https://ia802904.us.archive.org/12/items/bitsavers_ibm7042466_32932660/24-6661-2_704_Manual_1955_text.pdf>,
it says:

|Type A instructions use two 15 -bit fields (decrement and address)
|containing numbers in the octal range 00000 to 77777.

I did not find other descriptions of addresses; given this
description, it seems that the addresses and the index registers are
unsigned. Appendix A discusses binary arithmetic, but explains
subtraction with borrows rather than addition of the 2s-complement
(borrows is probably easier to understand given the background of the
readers, but adding a 1s-complement and one is easier to implement).

In any case, I don't think that the IBM 704 manual documents
2s-complement representation of negative numbers for any purpose.

So why did the S/360 architects go for 2s-complement?

One speculation is that they wanted 32-bit (unsigned) addreesses and
wanted to be able to use the same adder for the addresses as for the
integers. But the S/360 only has 24-bit addresses, so going for,
e.g., sign-magnitude and only declaring the positive numbers <2^24 to
be valid addresses would also have worked with one adder.

An alternative speculation is that they really wanted to extend the
range of the S/360 implementations as far as possible, also on the
lower end, and the 2s-complement representation for negative numbers
is cheaper to implement, in particular when you implemant a
bit-serial, nybble-serial, or somesuch machine.

[quadi <[email protected]d> said:]

The PDP-11 made little-endian a thing. It was so new that the people >>designing the floating-point unit didn't get the memo.

Nor did the people designing the extended multiplier, but they got it
mostly conssitent in the Vax.

This all indicates that byte-ordering decisions worked like in our
student group. The "right" choice seemed so obvious to everyone that
we did not communicate about it nor document it nor document the
reasons for it, and different contributors took different "right"
choices.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon Jun 1 08:36:22 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

But they got the brilliant idea - that more pedestrian
designers would never even considered for a second, or even thought of as >possible - of numbering the bytes in a word backwards too, so as to attain >consistency.

The designers of the DataPoint 2200 did that, too, in their
instruction encoding, for no technical reason that I am aware of. And
the Datapoint 2200 came out within months of the PDP-11, so it is
unlikely that they took inspiration from the PDP-11 in this decision.

When you introduce byte addressing, you have to take the byte ordering decision. Some designers decide for big-endian, and some for
little-endian, and the decision is mostly arbitrary. And, as John
Levine writes, the designers of the S/360 and the PDP-11 did not
document their reasons for that.

For the 6502, the decision is not arbitrary when implementing the
addressing modes "ABS,X", "ABS,Y" and "(IND),Y". So for that they
decided to go for little-endian to simplify the implementation.

Its predecessor, the 6800, does not have any operations, where 16-bit
numbers coming from memory are added to something else (at least I did
not find such operations), and therefore the decision could be made arbitrarily, and they decided for big-endian. But I think that the
6809 and the 68000 have addressing modes where the big-endian nature complicates the implementation.

I looked at how this turned out for the offspring of the Datapoint
2200: For the Z80, I did not find any instruction where the
little-endian byte order provided an advantage: when a 16-bit value is
accessed in memory, it is used directly instead of being added or
somesuch. For the 8088, in theory little-endian might provide an
advantage when it comes to addressing modes such as disp16[BX], but
AFAIK in practice the 8088 was internally mostly an 8086, with a
16-bit adder, so it loaded the whole 16-bit number anyway before doing
the full 16-bit add (am I wrong?). Likewise for the 386SX.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Mon Jun 1 16:04:26 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> schrieb:

So why did the S/360 architects go for 2s-complement?

Brooks (who was program manager for /360) writes about this in
"The Design of Design". Unique zero and unified hardware were
his main points, IIRC.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Paul Clayton@[email protected] to comp.arch on Sun May 31 18:38:28 2026

From Newsgroup: comp.arch

On 5/30/26 3:15 PM, MitchAlsup wrote:
[snip]

It costs me only 6 gates (2 gates of delay) to decode the length of an instruction--whereas it takes 4 gates to decode S/360 2-bit code for instruction length.

Does the current version of My 66000 have three instruction
lengths or four? You mentioned before dropping "large" constants
as store operands, but I am not certain what that means.

Earlier, if I understood correctly, the longest instruction was
a store of a 64-bit constant with a 64-bit displacement,
requiring five 32-bit words.

If My 66000 has the same variability in instruction length as
S/360 (three sizes), then presumably the extra length decode
effort provides some other advantage, perhaps more flexibility
in length allocation (with a 2-bit size indicator, major opcodes
can only be allocated at 25% granularity)?

There may be an advantage in having different lengths have
different detection speed.

Since My 66000 only uses the extra words for immediates, there
*may* even be an advantage to detecting some illegal opcodes and
speculating that such are from constant words. (An illegal
opcode field can indicate an immediate, a faulting instruction,
or a skipped instruction.) Such could introduce variable timing
for parsing a given fetch chunk, but that might be handled by
reducing the number of parsed instructions emitted and inserting
the slowly parsed instructions into the start of the next group
of parsed instructions.

My guess is that such would just be silly complexity even at 16-
wide parsing, especially given the likely minuscule (typical)
timing benefit (if any!). Process variation probably would have
vastly more impact on frequency than trying to exploit a
statistical bias in encoding. (The concept just seemed
interesting.)

Given that register dependencies also "carry", there may be some
opportunity for "width pipelining" (like the staggered ALUs of
the Pentium 4) in parsing, extracting register names, renaming
(at least with RAT-based renaming), and even insertion into a
scheduler. If a dependency means it would not be useful to
insert the operation into a scheduler, this additional delay
might be exploited.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 17:59:28 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 02:20:36 +0000, John Levine wrote:

As I may have said once or twice before, we have plenty of guesses, but
since there is no documentation, the guesses are a waste of time.

I understand that you would like to have actual documentation. But it
doesn't appear to exist.

But I don't think my "guesses" are wild. I'm familiar with the other
computers that existed in those years, with the milieu in which the
System/360 and the PDP-11 existed.

I presume you are aware that the 704 and successors did indexing by
two's complement subtraction, which is not sign-magnitude. There is no documentation for that either, and I have looked quite hard. Pretty
please, do not guess unless you can cite sources.

I admit that the fact that one subtracts the index on an IBM 704 seems
very weird to me. Since the IBM 704 was made out of vacuum tubes, saving
them, instead of mere discrete transistors, let alone transistors on a microchip with a billion of them, was probably more important.

My guess that sign-magnitude arithmetic was regarded as more prestigious, until IBM outgrew that notion with the 360, does have a source, although
not an IBM source.

A 24-bit computer was advertised as having sign-magnitude integer
arithmetic, unlike cheaper machines which either used one's complement
integer arithmetic, or, even worse, two's complement integer arithmetic.

I think it was the DDP-24, but offhand I'm not completely sure.

To guess - or to attempt to derive intelligence from the available
information - one might think that IBM considered indexing to be less important or less visible than ordinary integer arithmetic per se.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 18:08:38 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 07:47:42 +0000, Anton Ertl wrote:

This all indicates that byte-ordering decisions worked like in our
student group. The "right" choice seemed so obvious to everyone that we
did not communicate about it nor document it nor document the reasons
for it, and different contributors took different "right"
choices.

As has been argued many times, byte-ordering is completely arbitrary, and
so either choice is just as good. Given that widespread belief, that kind
of behavior is not surprising.

Some will think that of course a computer should be little-endian, because arithmetic is faster and simpler that way (if you're doing any multi-word arithmetic).

Some will think that of course a computer should be big-endian, because
that's just the natural way we write numbers, and anything else would be hopelessly confusing.

As I've pointed out, though, there is *one* particular case where there actually is a genuine difference between big-endian and little-endian.

If, like the System/360, your computer performs BCD arithmetic and not
just binary arithmetic, and if, unlike the System/360, you did your BCD arithmetic in the same registers you use for binary arithmetic...

Then, because binary arithmetic is done in the same registers as BCD arithmetic, they should both have the same endianness.

And because BCD numbers are directly related to character strings
representing numbers - just take the last four bits of each digit
character - they ought to have the same endianness. And character strings
that represent numbers _are_ big-endian.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 18:10:37 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 05:36:10 +0000, Thomas Koenig wrote:

quadi <[email protected]d> schrieb:

So what I've learned is that the world of computer architectures seems
to be like _Highlander_... "There can be only one".

That is what people thought about the /360 until the Minis came along,
where companies were content with lower margins to serve new markets and customers at lower margins, but higher volume.

And then RISC, and PCs... and the low end that PCs are being attacked
from right now is mobile devices, and ARM.

For this kind of cycle, I highly recommend reading https://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma (the book not
the Wikipedia article itself) It talks a lot about hard drives, but
parallels to computers are obvious.

This made me think of a different kind of cycle, called the "wheel of reincarnation", discussed in a book on interactive graphical displays.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon Jun 1 18:01:36 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> writes:

Anton Ertl <[email protected]> schrieb:

So why did the S/360 architects go for 2s-complement?

Brooks (who was program manager for /360) writes about this in
"The Design of Design". Unique zero and unified hardware were
his main points, IIRC.

This made me remember: G. M. Amdahl, G. A. Blaauw, F. P. Brooks, Jr.: "Architecture of the IBM System/ 360" <https://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Papers/IBM360-Amdahl_april64.pdf>,
which John Levine pointed to. It says on page 92:

|Sign representations. For the fixed-point arithmetic system, which is |binary,the two's complement representation for negative numbers was |selected.The well-known virtues of this system are the unique
|representation of zero and the absence of recomplementation. These |substantial advantages are augmented by several properties especially
|useful in address arithmetic, particularly in the large models, where
|address arithmetic has its own hardware. With two's complement
|notation, this indexing hardware requires no true/complement gates
|and thus works faster. In the smaller, serial models, the fact that |high-order bits of address arithmetic can be elided without changing
|the low-order bits also permits a gain in speed. The same truncation
|property simplifies double-precision calculations. Furthermore, for
|table calculation, rounding or truncation to an integer changes all
|variables in the same direction, thus giving a more acceptable
|distribution than does an absolute-value-plus-sign representation.
|
|The established commercial rounding convention made the use of
|complement notation awkward for decimal data; therefore, |absolute-value-plus-sign is used here.

What is "recomplementation"?

As an aside: When listing authors in alphabetic order, choose your
co-authors wisely: You have a name like "Brooks", and yet only get the
last spot out of three:-).
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Mon Jun 1 18:13:34 2026

From Newsgroup: comp.arch

According to Anton Ertl <[email protected]>:

John Levine <[email protected]> writes:

I presume you are aware that the 704 and successors did indexing by two's >>complement subtraction, which is not sign-magnitude.

Looking at the 704 manual ><https://ia802904.us.archive.org/12/items/bitsavers_ibm7042466_32932660/24-6661-2_704_Manual_1955_text.pdf>,
In any case, I don't think that the IBM 704 manual documents
2s-complement representation of negative numbers for any purpose.

The documentation was a bit sparse, but see item 7 on page 17.

The manual for the 7090 which had a superset of the 704's instruction set
is more complete. See "Complement Arithmetic" on page 10 where it says

Effective addresses are always formed in the computer by the addition
of the 2's complement of the contents of the index register.

https://bitsavers.org/pdf/ibm/7090/22-6528-4_7090Manual.pdf

So why did the S/360 architects go for 2s-complement?

One speculation ...

We don't have to guess, because they told us in the Amdahl et al article
in 1964 in the IBMSJ.

Sign representations. For the fixed-point arithmetic
system, which is binary, the two’s complement representa-
tion for negative numbers was selected. The well-known
virtues of this system are the unique representation
of zero and the absence of recomplementation. These
substantial advantages are augmented by several properties
especially useful in address arithmetic, particularly in the
large models, where address arithmetic has its own hard-
ware. With two’s complement notation, this indexing
hardware requires no true/complement gates and thus
works faster. In the smaller, serial models, the fact that
high-order bits of address arithmetic can be elided with-
out changing the low-order bits also permits a gain in
speed. The same truncation property simplifies double-
precision calculations. Furthermore, for table calculation,
rounding or truncation to an integer changes all variables
in the same direction, thus giving a more acceptable
distribution than does an absolute-value-plus-sign repre-
sentation.

They go on to explain why decimal numbers are still sign magnitude,
mostly becaue it made rounding easier, and float because it made
normalizing easier.

Nor did the people designing the extended multiplier, but they got it >>mostly conssitent in the Vax.

This all indicates that byte-ordering decisions worked like in our
student group. The "right" choice seemed so obvious to everyone that
we did not communicate about it nor document it nor document the
reasons for it, and different contributors took different "right"
choices.

That would seem to be the case. Sometimes things are obscure at the
time and obvious in retrospect, sometimes the converse.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Mon Jun 1 18:26:58 2026

From Newsgroup: comp.arch

According to Anton Ertl <[email protected]>:

|Sign representations. For the fixed-point arithmetic system, which is >|binary,the two's complement representation for negative numbers was >|selected.The well-known virtues of this system are the unique >|representation of zero and the absence of recomplementation.

What is "recomplementation"?

To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.

Straight one's complement doesn't have the recomplementation but does
have end around carry if there's a carry out of the high bit, and
shares with sign-magnitude the question of how you handle +0 and -0
which are different bit patterns but mathemetically equal.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 19:56:38 2026

From Newsgroup: comp.arch

On Sun, 31 May 2026 18:40:48 +0000, MitchAlsup wrote:

Which is why his architecture is converging so rapidly.

NOT.

Indeed, it's not converging as rapidly as I'd like.

I decided that one of my 32-bit instructions really needed to be allocated twice as much opcode space as I had originally given it.

Even if that meant dropping the 24-bit short instructions to make the
room! (Now that I have paired 15-bit short instructions, which also
include short shift instructions, and short branch instructions, I felt I didn't need them as badly, and I had disliked having instructions that
were an odd number of bytes long.)

Well, after making the changes, I still had room - 1/4 as much as I had
before - for 24-bit short instructions.

I wasn't happy. So I noticed that I actually had some unused space that I could squeeze out. So now the 24-bit short instructions have 1/2 as much
space as they used to, which meant the only thing I had to give up was the ability to change the condition codes. Fine, when you want to do that, use
a full 32-bit operate instruction. So I was happy.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Mon Jun 1 20:00:51 2026

From Newsgroup: comp.arch

According to quadi <[email protected]d>:

I admit that the fact that one subtracts the index on an IBM 704 seems
very weird to me. Since the IBM 704 was made out of vacuum tubes, saving >them, instead of mere discrete transistors, let alone transistors on a >microchip with a billion of them, was probably more important.

We can guess that someone thought that counting down indexes was important
but they turned out to be wrong. Fortran stored arrays in reverse order to make indexing easier.

My guess that sign-magnitude arithmetic was regarded as more prestigious, >until IBM outgrew that notion with the 360, does have a source, although
not an IBM source.

My equally uninformed guess is that their tab machines and their commerical computers were decimal sign magnitude, so binary sign magnitude was a
short step away. It evidently took a while to realize that while the
two's complement negative represntation seemed less intuitive, the logic
was a lot simpler.

A 24-bit computer was advertised as having sign-magnitude integer >arithmetic, unlike cheaper machines which either used one's complement >integer arithmetic, or, even worse, two's complement integer arithmetic.

I think it was the DDP-24, but offhand I'm not completely sure.

To guess - or to attempt to derive intelligence from the available >information - one might think that IBM considered indexing to be less >important or less visible than ordinary integer arithmetic per se.

John Savard

--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 21:57:21 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 20:00:51 +0000, John Levine wrote:

My equally uninformed guess is that their tab machines and their
commerical computers were decimal sign magnitude, so binary sign
magnitude was a short step away. It evidently took a while to realize
that while the two's complement negative represntation seemed less
intuitive, the logic was a lot simpler.

I agree with that. Remember, IBM made tab machines long before they got
into computers, and commercial computers, not scientific ones, were their
core business later.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 1 23:40:45 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 19:56:38 +0000, quadi wrote:

Well, after making the changes, I still had room - 1/4 as much as I had before - for 24-bit short instructions.

I wasn't happy. So I noticed that I actually had some unused space that
I could squeeze out. So now the 24-bit short instructions have 1/2 as
much space as they used to, which meant the only thing I had to give up
was the ability to change the condition codes.

When it was 1/4 as much, I was no longer able to fit in a modified form of
the Halfword Immediate instruction as an embedded 32-bit instruction
strictly confined to the opcode space of 32-bit instructions that don't
begin with 11.

But when it was 1/2 as much, I didn't realize that I had enough space to
put that back in. So I've made the fix.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 01:22:37 2026

From Newsgroup: comp.arch

John Levine <[email protected]> posted:

According to Anton Ertl <[email protected]>:

|Sign representations. For the fixed-point arithmetic system, which is >|binary,the two's complement representation for negative numbers was >|selected.The well-known virtues of this system are the unique >|representation of zero and the absence of recomplementation.

What is "recomplementation"?

To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.

In microarchitecture, you can make the registers 2^(3+n)+1 bits long.
Then simply record that the mantissa is complemented (or not) when
used as an operand. We do this all the time in microarchitecture to
save gates/time/... depending on the implementation technology
constraints.

Straight one's complement doesn't have the recomplementation but does
have end around carry if there's a carry out of the high bit, and
shares with sign-magnitude the question of how you handle +0 and -0
which are different bit patterns but mathemetically equal.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 01:52:28 2026

From Newsgroup: comp.arch

Paul Clayton <[email protected]> posted:

On 5/30/26 3:15 PM, MitchAlsup wrote:
[snip]

It costs me only 6 gates (2 gates of delay) to decode the length of an instruction--whereas it takes 4 gates to decode S/360 2-bit code for instruction length.

Does the current version of My 66000 have three instruction
lengths or four? You mentioned before dropping "large" constants
as store operands, but I am not certain what that means.

1-word, 2-words, and 3-words.

Earlier, if I understood correctly, the longest instruction was
a store of a 64-bit constant with a 64-bit displacement,
requiring five 32-bit words.

Yes, it was. We measured its use at 0.2%.

If My 66000 has the same variability in instruction length as
S/360 (three sizes), then presumably the extra length decode
effort provides some other advantage, perhaps more flexibility
in length allocation (with a 2-bit size indicator, major opcodes
can only be allocated at 25% granularity)?

There are 64-slots in the Major Opcode, 42 are in use, 6 permanently
reserved and 16 free for the future.

There may be an advantage in having different lengths have
different detection speed.

Only 1/8th of the Major group is allowed to have VLI. And all of
these have the same 4-bit encoding--which is called operand routing
and is responsible for {inversion, negation, constant substitution}

Since My 66000 only uses the extra words for immediates, there
*may* even be an advantage to detecting some illegal opcodes and
speculating that such are from constant words.

One of the reasons for the 6 permanently reserved slots if to prevent
that.

(An illegal
opcode field can indicate an immediate, a faulting instruction,
or a skipped instruction.) Such could introduce variable timing
for parsing a given fetch chunk, but that might be handled by
reducing the number of parsed instructions emitted and inserting
the slowly parsed instructions into the start of the next group
of parsed instructions.

My 66000 is specified such that ALL unspecified patterns must be
detected and raise UNIMPLEMENTED. And not just on Major OpCodes,
every unimplemented pattern must be detected. It is better to
prevent mayhem than to allow it to damage all future implementations
{no Carry when shift-count == 0 on x86 comes to mind}.

When performing LL/SC sequences--some sequences are not allowed
and will also raise UNIMPLEMENTED. Silently doing unexpected stuff
is worse than doing nothing.

My guess is that such would just be silly complexity even at 16-
wide parsing, especially given the likely minuscule (typical)
timing benefit (if any!). Process variation probably would have
vastly more impact on frequency than trying to exploit a
statistical bias in encoding. (The concept just seemed
interesting.)

Given that register dependencies also "carry", there may be some
opportunity for "width pipelining" (like the staggered ALUs of
the Pentium 4) in parsing, extracting register names, renaming
(at least with RAT-based renaming), and even insertion into a
scheduler. If a dependency means it would not be useful to
insert the operation into a scheduler, this additional delay
might be exploited.

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Tue Jun 2 01:57:26 2026

From Newsgroup: comp.arch

According to MitchAlsup <[email protected]d>:

What is "recomplementation"?

To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.

In microarchitecture, you can make the registers 2^(3+n)+1 bits long.
Then simply record that the mantissa is complemented (or not) when
used as an operand. We do this all the time in microarchitecture to
save gates/time/... depending on the implementation technology
constraints.

You can do that now, not so much when building computers out of vacuum
tubes in the 1950s.

Also, that works OK for registers, but at some point you need to
store values in memory at which point I'd think you'd need to do
the recomplementing.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Mon Jun 1 14:51:14 2026

From Newsgroup: comp.arch

quadi [2026-05-27 18:19:49] wrote:

On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:

MitchAlsup [2026-05-26 20:54:30] wrote:

Encrypt the debug information (and put it in a
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.

I resent that. All code should be Free Software.

[...]

However, people have the right to the fruit of their labors. To give them away for free is generous, but it should remain a personal choice.

You don't need to encrypt the debug information of your programs in
order to earn a decent living.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 06:19:05 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 14:51:14 -0400, Stefan Monnier wrote:

quadi [2026-05-27 18:19:49] wrote:

However, people have the right to the fruit of their labors. To give
them away for free is generous, but it should remain a personal choice.

You don't need to encrypt the debug information of your programs in
order to earn a decent living.

Perhaps. But if someone can write a program that is so useful that it
could make him wealthy beyond the dreams of avarice, who am I to judge him
for seeking to maximize its revenue potential?

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 06:20:53 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 19:56:38 +0000, quadi wrote:

I decided that one of my 32-bit instructions really needed to be
allocated twice as much opcode space as I had originally given it.

There was another 32-bit instruction that was also short of opcode space -
but this time, I didn't even have to extensively reorganize the opcodes of other instructions in order to remedy that.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 07:47:36 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 06:19:05 +0000, quadi wrote:

On Mon, 01 Jun 2026 14:51:14 -0400, Stefan Monnier wrote:

quadi [2026-05-27 18:19:49] wrote:

However, people have the right to the fruit of their labors. To give
them away for free is generous, but it should remain a personal
choice.

You don't need to encrypt the debug information of your programs in
order to earn a decent living.

Perhaps. But if someone can write a program that is so useful that it
could make him wealthy beyond the dreams of avarice, who am I to judge
him for seeking to maximize its revenue potential?

Perhaps this answer was too casual, and a more detailed and serious answer
is needed.

To say that one doesn't "need" to encrypt debug information "to earn a
decent living" is true enough, but you're also implying that this is all anyone has the right to expect.

To me, this implies a mindset that says that everyone should remain a
laborer, and that it's wrong to transition to rent-seeking.

I don't share that view. While there are excesses in the free-enterprise system as we have it now, I have no quarrel with its basic principles. I
see the ownership of property, including intellectual property, and
including capital property, as fully legitimate.

So a person can write a program once and make a living from selling copies
of it, instead of just from providing services to its users. If the
program is good, there's nothing illegitimate about that. And to defend
the program against piracy and reverse-engineering is also basically legitimate.

However, to encrypt debug information is strange. Why would a copy of the debug information in any form be included with distributed copies of
software? I suppose it could be there in an encrypted form to be used in conjunction with remote diagnostic tools, in the case of software which
has to be maintained on customer premises, unlike mass-market applications.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From jgd@[email protected] (John Dallman) to comp.arch on Tue Jun 2 14:01:40 2026

From Newsgroup: comp.arch

In article <10uks4f$1dqo$[email protected]>, [email protected] (John Levine)
wrote:

Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."

That explains why IBM mainframes number the most significant bit as zero,
the opposite way around to all the platforms I've worked on, which number
the least significant bit as zero.

I've always find the latter convention helpful for doing hex arithmetic
in my head or on paper. I _think_ big-endian SPARC, MIPS and POWER all
regard the least significant bit as bit zero, but I can no longer easily
check that,

John
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 14:54:24 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 14:00:00 +0100, John Dallman wrote:

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered
from left to right, following the convention of writing in Western
culture."

On the other hand, Arabic is written from right to left, and yet the Arabs also write numbers with the most significant digit on the left. Hence, little-endian would seem more logical to them for the same reason.

Since this is, therefore, a cultural matter, and not something universal,
like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.

So, while they can call it "the more logical convention", this isn't
something everyone would agree with. The famous article on the subject,
"On Holy Wars and a Plea for Peace", by Danny Cohen from 1981 thus termed
it as being much less important which standard was chosen than for
everyone to choose the same one for compatibility, but he wasn't shy about expressing his personal preference for little-endian, referring to those
who practiced big-endian as "outlaws".

The case for big-endian is...

It makes computers easier to understand for most people in Western
societies.
It makes core dumps easier to read.
Multi-precision compare is faster.

The case for little-endian is...

People don't need to poke around in core dumps or even program in
assembler very much these days. We have compilers.
Multi-precision add and subtract is faster, and it's much more common than compare.

At least, those are the usual arguments, and from that set of arguments,
it does seem like there's little difference and it's just a personal preference.

But, as I've noted, I've finally come up with a more compelling
justification for big-endian. It still assumes that, if you're processing
text data, that text data will be from a society that writes from left to right.

Think of text records that include words and numbers in character format.
Like

00134700 John Smith
00250000 Richard Roe

and so on.

The numerical portion has the most significant digit on the left, the alphabetic portion has the first character on the left. Thus, these
characters will be stored in memory at succeeding addresses from left to right; the most significant digit is stored at the lower address.

So numbers as text strings are stored in big-endian order.

That means that it's simplest to convert a text number to a packed decimal number that's in the same order.

And an ALU that performs binary arithmetic can be modified to also perform decimal arithmetic by changing when carries take place out of each group
of four bits. If that's done, binary and decimal numbers ought to have the same endianness, so that one doesn't need two load and store instructions
for the accumulator or the registers.

I know some computers, regardless of endianness, number the least
significant bit as one instead of zero. Either way, this convention is considered to make sense for wiring a 12-bit DAC to a 16-bit data bus,
since now each number corresponds to the same power of two no matter how
wide your bus is.

Of course, one can argue for considering fixed-point numbers as fractions
in [-1,1), but that is needed far less often than using them as integers.
A few computers were designed this way; it meant that integers had to be represented with a wasted bit, or that a shift was usually needed after a multiply, so it did not get popular.

The IBM 360 made bit numbering consistent not just to make reading the
manuals easier, but out of habit - since their most recent previous
computer with a 64-bit word was the STRETCH (or 7030)... which had the
ability to do bit-addressing as a prominent feature. There, consistency actually mattered.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Tue Jun 2 17:50:38 2026

From Newsgroup: comp.arch

Stefan Monnier wrote:

quadi [2026-05-27 18:19:49] wrote:

On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:

MitchAlsup [2026-05-26 20:54:30] wrote:

Encrypt the debug information (and put it in a
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.

I resent that. All code should be Free Software.

[...]

However, people have the right to the fruit of their labors. To give them >> away for free is generous, but it should remain a personal choice.

You don't need to encrypt the debug information of your programs in
order to earn a decent living.

I'd say rather the opposite!

In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source interface specifications, having a proprietary debug format seems like a
good way to strongly limit your potential customer base.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Tue Jun 2 16:13:28 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

The case for big-endian is...

<snip>

It makes core dumps easier to read.

Actually the program that analyzes the core dump can handle
endedness without the programmer even being aware of it.

It's been more than half a century since programmers looked at raw
memory dumps.....

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 17:04:55 2026

From Newsgroup: comp.arch

John Levine <[email protected]> posted:

According to MitchAlsup <[email protected]d>:

What is "recomplementation"?

To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.

In microarchitecture, you can make the registers 2^(3+n)+1 bits long.
Then simply record that the mantissa is complemented (or not) when
used as an operand. We do this all the time in microarchitecture to
save gates/time/... depending on the implementation technology >constraints.

You can do that now, not so much when building computers out of vacuum
tubes in the 1950s.

Also, that works OK for registers, but at some point you need to
store values in memory at which point I'd think you'd need to do
the recomplementing.

Sure, but there is plenty of time to re-complement when storing
the value.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 17:09:26 2026

From Newsgroup: comp.arch

[email protected] (John Dallman) posted:

In article <10uks4f$1dqo$[email protected]>, [email protected] (John Levine) wrote:

Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."

That explains why IBM mainframes number the most significant bit as zero,
the opposite way around to all the platforms I've worked on, which number
the least significant bit as zero.

I've always find the latter convention helpful for doing hex arithmetic
in my head or on paper. I _think_ big-endian SPARC, MIPS and POWER all
regard the least significant bit as bit zero, but I can no longer easily check that,

Do you want to isolate the register bit as::

bit = ((register) >> (register_bits - bit) ) & 1;

or

bit = ((register) >> bit) & 1;

John

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 17:13:40 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 14:00:00 +0100, John Dallman wrote:

Brooks and Blaauw, two of the S/360 architects, consider the subject in their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered
from left to right, following the convention of writing in Western
culture."

On the other hand, Arabic is written from right to left, and yet the Arabs also write numbers with the most significant digit on the left. Hence, little-endian would seem more logical to them for the same reason.

Chinese and Japanese is written top to bottom ...

Since this is, therefore, a cultural matter, and not something universal, like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.

Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.

So, while they can call it "the more logical convention", this isn't something everyone would agree with. The famous article on the subject,
"On Holy Wars and a Plea for Peace", by Danny Cohen from 1981 thus termed
it as being much less important which standard was chosen than for
everyone to choose the same one for compatibility, but he wasn't shy about expressing his personal preference for little-endian, referring to those
who practiced big-endian as "outlaws".

The case for big-endian is...

It makes computers easier to understand for most people in Western societies.

Just core dumps--they can be read without dumping hex on one side and characters on the other.

It makes core dumps easier to read.
Multi-precision compare is faster.

Not up to 256-bits.

The case for little-endian is...

It won.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Tue Jun 2 15:59:33 2026

From Newsgroup: comp.arch

[email protected] (John Dallman) writes:

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."

That explains why IBM mainframes number the most significant bit as zero,
the opposite way around to all the platforms I've worked on, which number
the least significant bit as zero.

I've always find the latter convention helpful for doing hex arithmetic
in my head or on paper. I _think_ big-endian SPARC, MIPS and POWER all
regard the least significant bit as bit zero, but I can no longer easily >check that,

Power(PC) gives the MSB bit of GPRs the number 0 and the LSB bit
number 63. It's not clear how that works in 32-bit implementations,
and if it plays a role at all. AFAICS, it plays no role (no
instructions refer to the bit number as defined in the manual).

The 68020 is bit-little-endian and byte-big-endian, and it has
bitfield instructions, and from what I have read, this has led to
problems (e.g., consider what to do if you have an array of 17-bit
fields: how do you access the nth element of the array?

The 88000 is bit-little-endian and byte-big-endian (Section 2.2.3 of
the manual is quite clear about this at the start, and then discusses
the byte-little-endian option; AFAIK all 88000 machines are
byte-big-endian). It has bit-field instuctions that specify the
bitfield as a offset from the LSB of the register and a width. Given
that the bitfield instructions work on registers, and the load
instructions require alignment, I don't expect the difference in order
to cause many problems; maybe confusion if you try to deal with bit
fields that cross words.

MIPS also is bit-little-endian; there are byte-big-endian and byte-little-endian machines with MIPS CPUs. MIPS64r2 has bit-field instructions that use little-endian bit order, and before MIPS64r6 it
also required alignment (MIPS64r6 allows either unaligned support or
trapping on unaligned access). With unaligned support and big-endian
byte order, problems like on the 68020 may arise.

SPARCv9 <https://www.cs.utexas.edu/~novak/sparcv9.pdf> is
bit-little-endian, and "uses big-endian byte order by default"
(3.2.1.2) and I am not aware of any little-endian SPARC machine.
AFAICS SPARC does not have instructions that use bit numbers, so the
numbering of bits in the manual does not have any effect on the
instruction set and programming.

My take is, that in a world with different access widths (e.g.,
accessing a register for a 32-bit value or a 64-bit value),
bit-big-endian is a bad idea. And we already see that in the IBM 704
manual which gives its most significant bits in its 38-bit accumulator
the names (starting from the most significant)

S (sign, maybe out of contest, but shown to the left of Q)
Q (not present in memory)
P (not present in memory; this is the carry bit for the ACL instruction)
1

If they had used bit-little-endian (and started at 0 instead of 1),
they could have called P 35, Q 36, and S could be called 37, but given
that it is a sign/magnitude machine, S is ok).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Tue Jun 2 18:25:12 2026

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> schrieb:

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 14:00:00 +0100, John Dallman wrote:

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered >> > from left to right, following the convention of writing in Western
culture."

On the other hand, Arabic is written from right to left, and yet the Arabs >> also write numbers with the most significant digit on the left. Hence,
little-endian would seem more logical to them for the same reason.

Chinese and Japanese is written top to bottom ...

In classical times, yes, but modern texts are written left to right.

Since this is, therefore, a cultural matter, and not something universal, >> like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.

Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.

Very Turing machine-like.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Tue Jun 2 11:44:46 2026

From Newsgroup: comp.arch

On 6/2/2026 10:13 AM, MitchAlsup wrote:

Since this is, therefore, a cultural matter, and not something universal,
like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.

Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.

But then you have the "discussion" with those who want to start with a
step to the right, followed by one to the left! :-). And that doesn't
even address (pun intended), the issue of when you have an even number
of bits/bytes/words, do you start with the one to the right of the
"middle" or the left. :-)
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 19:07:38 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 18:25:12 +0000, Thomas Koenig wrote:

MitchAlsup <[email protected]d> schrieb:

Chinese and Japanese is written top to bottom ...

In classical times, yes, but modern texts are written left to right.

In Taiwan and Hong Kong, books written top to bottom, and then right to
left, and bound like books written right to left, were still being printed
in the 1960s.

As far as endianness is concerned, however, the Chinese wrote numbers with
the most significant digit on the top, so for purposes of this discussion, Chinese was big-endian even traditionally.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Tue Jun 2 19:10:51 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:

In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source interface specifications, having a proprietary debug format seems like a
good way to strongly limit your potential customer base.

You appear to have understood his post in a different way than I did.

I wasn't thinking of the kind of debug information provided by a compiler.

I was thinking of leaving debug information in when one was distributing software to customers.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Tue Jun 2 19:37:17 2026

From Newsgroup: comp.arch

Stephen Fuld <[email protected]d> posted:

On 6/2/2026 10:13 AM, MitchAlsup wrote:

Since this is, therefore, a cultural matter, and not something universal, >> like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.

Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.

But then you have the "discussion" with those who want to start with a
step to the right, followed by one to the left! :-). And that doesn't
even address (pun intended), the issue of when you have an even number
of bits/bytes/words, do you start with the one to the right of the
"middle" or the left. :-)

You could do random endian where a LSFR based on the address sequence determines MEL or MER.

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Tue Jun 2 22:00:06 2026

From Newsgroup: comp.arch

According to John Dallman <[email protected]>:

In article <10uks4f$1dqo$[email protected]>, [email protected] (John Levine) >wrote:

Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.

Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:

"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."

I'd forgotten about that. Given who they were it's not surprising they found their preconceptions to be "more logical".

On the next page they said (written in 1997):

"Unlike Swift's, the computer Endian controversy is not pointless. The Little Endian design has many complications in use; we much prefer the
Big Endian. Having two active conventions is very painful. Several recent
Big Endian RISC computers., including the MIPS, the Motorola 88000, and
the Intel i860 provide a data-movement operation that can perform the Big Endian-Little Endian permuation [Hennesy and Patterson, 1990]. We predict
that Little Endian addressing will die out, just as decimal addressing did."

Uh huh.

A few years later IBM added LOAD REVERSED and STORE REVERSED to z/Architecture and retroactively to S/390 mode on Z machines.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed Jun 3 00:21:03 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 22:00:06 +0000, John Levine wrote:

"We predict that Little Endian addressing will die out, just as
decimal addressing did."

Uh huh.

A few years later IBM added LOAD REVERSED and STORE REVERSED to z/Architecture and retroactively to S/390 mode on Z machines.

I certainly would not hazard such a bold prediction.

The prediction, though, is not hard to understand. If big-endian is more straightforward and easier to understand, but just costs an extra
transistor here and there, then in the age of billion-transistor chips,
why wouldn't it die out?

However, just because something is going to die out _eventually_ doesn't
mean it will do so any time soon. Interoperating and communicating with
that little-endian monster IBM created in 1981 is going to be important
for generating revenue for decades to come.

So the existence of load reversed and store reversed instructions doesn't prove they were wrong... even though I still would not dare to say they
are definitely right. I just think it's not unreasonable to think as they
did, provided you account for a sufficiently long timeframe.

Of course, given a sufficiently long timeframe, we might all be speaking Arabic, in which case little-endian would be the logical choice. Although
that would require fossil fuels being important for longer than the
climate could sustain it...

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed Jun 3 00:37:06 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 15:59:33 +0000, Anton Ertl wrote:

My take is, that in a world with different access widths (e.g.,
accessing a register for a 32-bit value or a 64-bit value),
bit-big-endian is a bad idea.

There is an argument for that.

But if a computer does have bit-field instructions, I tend to consider it insane for it to number bits in the opposite direction of its endianness.

Even though the problem isn't necessarily all that bad; as long as the bit fields are genuinely contiguous, then only the names of the bits in a byte
are encoded funny.

So if a 32 bit number is stored in bytes 5001, 5000, 4999, and 4998, from
most significant byte to least significant, and you specify a 9-bit field starting in bit 6 of byte 4999... and the bits are numbered in big-endian order... the same thing should happen as if you specified bit 1 of byte
4999 on a little-endian machine with little-endian bit numbering. You get
nine bits, the seven least significant of which are bits 0 through 6 of
byte 4999, and the remaining two of which are bits 6 and 7 of byte 5000.

In the more common case, where the machine is big-endian, and it is the
bit numbering that's little-endian, specifying a nine-bit field starting
in bit 6 of byte 4999 would give you bits 6 through 0 of byte 4999,
followed by bits 7 and 6 of byte 5000. Here, though, you're going from
most significant to least significant, but in both cases you're moving
forward to higher addresses, just as you do when accessing multi-byte
numbers with a byte address.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed Jun 3 00:41:43 2026

From Newsgroup: comp.arch

On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:

* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.

The long decline of big-endian happened later.

But there wouldn't have _been_ little-endian architectures to out-compete big-endian if it hadn't been for the PDP-11. That was where the idea of little-endian got started.

It wasn't the first machine to store two-word numbers least-significant-
word first. But it was the first machine to be little-endian in any other
way but that. Little-endian, as something more than an ad-hoc way to
handle one case of double-precision integers, wasn't a thing until the
PDP-11 came along.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed Jun 3 00:55:35 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:

In the current environment where every language is expected to be compatible with a generic IDE like Visual Studio Code, via open source interface specifications, having a proprietary debug format seems like a good way to strongly limit your potential customer base.

You appear to have understood his post in a different way than I did.

I wasn't thinking of the kind of debug information provided by a compiler.

I was thinking of leaving debug information in when one was distributing software to customers.

Yes, you the vendor do not want random customer debugging the code,
however, you want the ability to debug the code that was distributed
on whatever medium on customer's system(s)--

AND you want to debug one copy of the running code while others are using
other processes running the code under normal use.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed Jun 3 01:03:26 2026

From Newsgroup: comp.arch

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 22:00:06 +0000, John Levine wrote:

"We predict that Little Endian addressing will die out, just as
decimal addressing did."

Uh huh.

A few years later IBM added LOAD REVERSED and STORE REVERSED to z/Architecture and retroactively to S/390 mode on Z machines.

I certainly would not hazard such a bold prediction.

The prediction, though, is not hard to understand. If big-endian is more straightforward and easier to understand, but just costs an extra
transistor here and there, then in the age of billion-transistor chips,
why wouldn't it die out?

Linux has gone all in on LE. So, if you want to start a HW company,
you are forced to either choose LE or develop your own Operating
System (with all the accoutrement involved.)

However, just because something is going to die out _eventually_ doesn't mean it will do so any time soon. Interoperating and communicating with
that little-endian monster IBM created in 1981 is going to be important
for generating revenue for decades to come.

The whole internet is Dual-endian !! With part LE and other parts BE.

So the existence of load reversed and store reversed instructions doesn't prove they were wrong... even though I still would not dare to say they
are definitely right. I just think it's not unreasonable to think as they did, provided you account for a sufficiently long timeframe.

A byte reverse instruction will also work.

Of course, given a sufficiently long timeframe, we might all be speaking Arabic, in which case little-endian would be the logical choice. Although that would require fossil fuels being important for longer than the
climate could sustain it...

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed Jun 3 01:52:13 2026

From Newsgroup: comp.arch

On Wed, 03 Jun 2026 01:03:26 +0000, MitchAlsup wrote:

Linux has gone all in on LE.

It's true that Linux doesn't support the big-endian version of RISC-V. But
it runs on other big-endian architectures.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@[email protected] to comp.arch on Wed Jun 3 08:29:27 2026

From Newsgroup: comp.arch

On 2026-Jun-02 13:13, MitchAlsup wrote:

Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.

We're doing the time warp... again!

--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed Jun 3 13:54:01 2026

From Newsgroup: comp.arch

quadi <[email protected]d> schrieb:

On Tue, 02 Jun 2026 22:00:06 +0000, John Levine wrote:

"We predict that Little Endian addressing will die out, just as
decimal addressing did."

Uh huh.

A few years later IBM added LOAD REVERSED and STORE REVERSED to
z/Architecture and retroactively to S/390 mode on Z machines.

I certainly would not hazard such a bold prediction.

The prediction, though, is not hard to understand. If big-endian is more straightforward and easier to understand, but just costs an extra
transistor here and there, then in the age of billion-transistor chips,
why wouldn't it die out?

It causes problems with badly-written software.

Consider the following test program:

#include <stdio.h>

void printit(void *p)
{
char *c = p;
printf ("Value is: %d\n", *c);
}

int main()
{
int i = 42;
printit (&i);
return 0;
}

On a little-endian system, this prints
Value is: 42

On a big-endian system, this prints
Value is: 0

If software designers play games with this sort of thing
(knowingly or unknowingly), then software that will run
on a little-endian system will not run on a big-endian
system.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Wed Jun 3 15:33:53 2026

From Newsgroup: comp.arch

On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:

It causes problems with badly-written software.

I don't see that as a fault of big-endian.

One has to exert oneself to write a program equivalent to

INTEGER*2 IP
EQUIVALENCE (I, IP)
I = 42
WRITE(6,11) IP
STOP
11 FORMAT(' ', 'VALUE IS: ', I3)
END

and so the fact that it will print

VALUE IS: 0

is not a bug, it's exactly what one should expect.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Wed Jun 3 17:36:45 2026

From Newsgroup: comp.arch

According to quadi <[email protected]d>:

On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:

It causes problems with badly-written software.

I don't see that as a fault of big-endian.

Agreed. There were plenty of bugs porting BSD software from
the little-endian Vax to big-endian 68000 series. Buggy software
is buggy software.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed Jun 3 17:13:08 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

But there wouldn't have _been_ little-endian architectures to out-compete >big-endian if it hadn't been for the PDP-11. That was where the idea of >little-endian got started.

The Datapoint 2200 would most likely have had the little-endian
encoding if the PDP-11 had been big-endian, because it was developed independently and in parallel. The 8080's call and ret instructions
were the first where little-endian extended beyond instruction
encoding, but I expect that, once you have jump instructions with
little-endian targets, you also want return addresses to be stored in little-endian byte order (simplifies implementation). And from there
it goes to the 8086 which has 16-bit data memory accesses, and where
you also stick with little-endian if you already have done so for jump
targets and return addresses. And following the
8086, IA-32 and AMD64 would have been little-endian, too.

The 6502 would have been little-endian if the PDP-11 had been
big-endian, for technical reasons. They ignored even the big-endian
byte order of its predecessor, the 6800. And following the 6502, the
ARM would have been little-endian even if the PDP-11 had been
big-endian.

So the architectures that dominate now would be little-endian even if
the PDP-11 had been big-endian. Would they have been less successful
if the PDP-11 had been big-endian? I doubt it. At a point around
1990, most of the Unix market was big-endian, based on the 68000 being big-endian, and it seemed that if any byte order would win, it would
be big-endian. IA-32 and VAX were expected to die because they were
CISCs, and ARM was just a minor player in the RISC market at the time.

And yet, IA-32/AMD64 and ARM's instruction sets outlived all the
highly successful RISCs of the time. This would also have happened if
the PDP-11 and the VAX would have been big-endian.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Wed Jun 3 17:44:20 2026

From Newsgroup: comp.arch

quadi <[email protected]d> writes:

On Tue, 02 Jun 2026 15:59:33 +0000, Anton Ertl wrote:

My take is, that in a world with different access widths (e.g.,
accessing a register for a 32-bit value or a 64-bit value),
bit-big-endian is a bad idea.

There is an argument for that.

But if a computer does have bit-field instructions, I tend to consider it >insane for it to number bits in the opposite direction of its endianness.

So if big-endian bit numbering is a bad idea (and it is), big endian
byte order is a bad idea, too.

In the more common case, where the machine is big-endian, and it is the
bit numbering that's little-endian, specifying a nine-bit field starting
in bit 6 of byte 4999 would give you bits 6 through 0 of byte 4999,
followed by bits 7 and 6 of byte 5000.

This does not make sense. You have the 32-bit word with address 4998.
If you access a 9-bit field at bit 6, it extends to bit 14, and these
bits will be in the bytes at addresses 5001 and 5000 on your
byte-big-endian machine. As long as you only access this bit field
through a 32-bit access at this address, the difference does not play
a role. But once you want to access it through an 32-bit access at
4999 (now it's bit 14 through 22), a 16-bit access at 5000 (bit 6
through 14 again), or a 32-bit access at 5000 (bit 22 through 30), the different orders become confusing.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Wed Jun 3 10:42:09 2026

From Newsgroup: comp.arch

Yes, you the vendor do not want random customer debugging the code,

I also want a pony, but that doesn't make it right.

The customer will usually not want to debug your code, but sometimes
they will have to (e.g. because you the vendor don't exist any more or
don't find that product of commercial value any more, ...).

The customer deserves to be able to debug the code it's paid for.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Wed Jun 3 18:36:51 2026

From Newsgroup: comp.arch

Stefan Monnier <[email protected]> writes:

<snip discussion of including debug data in distributed software>

Yes, you the vendor do not want random customer debugging the code,

I also want a pony, but that doesn't make it right.

The customer will usually not want to debug your code, but sometimes
they will have to (e.g. because you the vendor don't exist any more or
don't find that product of commercial value any more, ...).

The customer deserves to be able to debug the code it's paid for.

There are several reasons that a vendor may wish to refrain from
distributing the DWARF (or windows equiv) data with a software
package. For example, program identifiers may inadvertently
identify other customers, internal proprietary
information or internal codenames.

Being able to debug code without the source code doesn't seem
a particulary common use case, nor would it be a viable way
to continue to use orphaned software, other than to, perhaps,
get it working sufficient to export any application
data in an interchange form (e.g. csv or xml if supported
by the application). I would certainly not recommend
that it be used for production.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed Jun 3 19:22:49 2026

From Newsgroup: comp.arch

quadi <[email protected]d> schrieb:

On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:

[big-endian]

It causes problems with badly-written software.

I don't see that as a fault of big-endian.

Neither do I. I am glad I still have access to a few big-endian
machines. For example:

$ lscpu
Architecture: ppc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Model name: POWER10 (architected), altivec supported
[...]

(which shows that big-endian Linux is still supported).

But if the software you are targeting is primarily written on,
and for, little-endian systems like x86, then the little-endian
assumption will tend to creep in - certain things like writing
an int to memory and reading back a char will "just work", and
programmers may not know or care that they are violating language
standards; they very rarely do.

So what to do? Submit bug reports and patches and hoped they are
integrated, or just bite the bullet and offer a little-endian
version as well? IBM chose the latter.

And refering to the point above, the code

One has to exert oneself to write a program equivalent to

INTEGER*2 IP
EQUIVALENCE (I, IP)
I = 42
WRITE(6,11) IP
STOP
11 FORMAT(' ', 'VALUE IS: ', I3)
END

also violates the FORTRAN standard going back to Fortran 66.
After the "I = 42" statement, IP becomes undefined according to
the language definition.

and so the fact that it will print

VALUE IS: 0

is not a bug, it's exactly what one should expect.

It could also launch World War III, provided the right operational
hardware has been installed.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed Jun 3 21:24:09 2026

From Newsgroup: comp.arch

Stefan Monnier <[email protected]> posted:

Yes, you the vendor do not want random customer debugging the code,

I also want a pony, but that doesn't make it right.

The customer will usually not want to debug your code, but sometimes
they will have to (e.g. because you the vendor don't exist any more or
don't find that product of commercial value any more, ...).

The customer deserves to be able to debug the code it's paid for.

Does MS allow you to debug W11 or Office ...
Does Corel allow you to debug Draw ...
Does Adobe allow you to debug PDF reader ...

Which is why, sooner or later, open source should win.

=== Stefan

--- Synchronet 3.22a-Linux NewsLink 1.2

From George Neuner@[email protected] to comp.arch on Wed Jun 3 19:05:54 2026

From Newsgroup: comp.arch

On Tue, 02 Jun 2026 15:59:33 GMT, [email protected]
(Anton Ertl) wrote:

The 68020 is bit-little-endian and byte-big-endian, and it has
bitfield instructions, and from what I have read, this has led to
problems (e.g., consider what to do if you have an array of 17-bit
fields: how do you access the nth element of the array?

<array>[n] ?

In C the default would be an aligned array of 32-bit containers each
of which stored a 17-bit field.

If you mean a /packed/ array in which the 17-bit fields are stored bit contiguously ... well that could get interesting.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu Jun 4 04:06:45 2026

From Newsgroup: comp.arch

On Mon, 01 Jun 2026 19:56:38 +0000, quadi wrote:

I wasn't happy. So I noticed that I actually had some unused space that
I could squeeze out. So now the 24-bit short instructions have 1/2 as
much space as they used to, which meant the only thing I had to give up
was the ability to change the condition codes.

I found that I had some unused space within the 80-bit instructions, and
that was enough to let me restore the 24-bit short instructions to their former glory.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Thu Jun 4 04:44:33 2026

From Newsgroup: comp.arch

On Thu, 04 Jun 2026 04:06:45 +0000, quadi wrote:

I found that I had some unused space within the 80-bit instructions, and
that was enough to let me restore the 24-bit short instructions to their former glory.

Then another crazy idea came into my head. The 16-bit short instructions
are limited to operating on the first eight registers. Some believe that
this limitation will make them essentially useless.

They have a lot more opcode space allocated to them than the 24-bit short instructions. If I took that space, and gave it to the 24-bit short instructions, I could perhaps add a 24-bit memory-reference instruction!

Well, I tried, and found out that I could indeed almost do that, with, of course, a restriction to 12-bit displacements... but, at best, I could
only use two registers as destination registers for those memory-reference instructions!

So that idea had to be discarded.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Thu Jun 4 12:45:39 2026

From Newsgroup: comp.arch

George Neuner <[email protected]> writes:

On Tue, 02 Jun 2026 15:59:33 GMT, [email protected]
(Anton Ertl) wrote:

The 68020 is bit-little-endian and byte-big-endian, and it has
bitfield instructions, and from what I have read, this has led to
problems (e.g., consider what to do if you have an array of 17-bit
fields: how do you access the nth element of the array?

[...]

If you mean a /packed/ array in which the 17-bit fields are stored bit >contiguously ... well that could get interesting.

For a consistently little-endian architecture that has no alignment requirements, the access is relativelysimple:

nbit = n*17
nbyte = nbit/8
bitoffset = nbit%8
tmp = load32b(array+nbyte)
element = ext(tmp,bitoffset,17)

(ext extracts the bitfield with length 17 at bitoffset from tmp; 88000
and MIPS64r2 have such instructions).

I leave the consistently big-endian version and the byte-big-endian bit-little-endian version as exercise to those who think that these
are good ideas. I guess, for consistent big-endian, given an
appropriate definition of ext, it's pretty similar, if not the same as
above. The inconsistent variants (e.g., 68020, 88000, MIPS64r2) are
not so easy, however.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Brian G. Lucas@[email protected] to comp.arch on Thu Jun 4 11:53:31 2026

From Newsgroup: comp.arch

On 6/3/26 12:36 PM, John Levine wrote:

According to quadi <[email protected]d>:

On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:

It causes problems with badly-written software.

I don't see that as a fault of big-endian.

Agreed. There were plenty of bugs porting BSD software from
the little-endian Vax to big-endian 68000 series. Buggy software
is buggy software.

That followed the port to the Interdata 7/32 which was big-endian,
so BSD must not have learned from that.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Thu Jun 4 23:46:11 2026

From Newsgroup: comp.arch

Scott Lurndal [2026-06-03 18:36:51] wrote:

Being able to debug code without the source code doesn't seem
a particulary common use case,

Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Fri Jun 5 08:36:53 2026

From Newsgroup: comp.arch

Stefan Monnier <[email protected]> schrieb:

Scott Lurndal [2026-06-03 18:36:51] wrote:

Being able to debug code without the source code doesn't seem
a particulary common use case,

Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂

I am a big proponent of free software, but it has a basic problem:
Getting developers paid is not easy.

An example is OpenFOAM. This is a very widely used CFD package,
both in academia (because it costs nothing, and ANSYS is very
expensive, also for universities) and also now in industry because
people who come in from university have learned this during their
PhDs (you need quite some time to learn).

Funding? They want 500 k€ in 2026, which is far from excessive,
see https://openfoam.org/news/funding-2026/ , both compared to
commercial CFD companies and the value that OpenFOAM provides.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From George Neuner@[email protected] to comp.arch on Fri Jun 5 15:07:25 2026

From Newsgroup: comp.arch

On Wed, 03 Jun 2026 00:55:35 GMT, MitchAlsup
<[email protected]d> wrote:

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:

In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source
interface specifications, having a proprietary debug format seems like a >> > good way to strongly limit your potential customer base.

You appear to have understood his post in a different way than I did.

I wasn't thinking of the kind of debug information provided by a compiler. >>
I was thinking of leaving debug information in when one was distributing
software to customers.

Yes, you the vendor do not want random customer debugging the code,
however, you want the ability to debug the code that was distributed
on whatever medium on customer's system(s)--

AND you want to debug one copy of the running code while others are using >other processes running the code under normal use.

That is true, but the issue at hand is how to achieve that. Leaving
debug information /in/ the executable, I think, is a bad idea.

However, many (most?) toolchains provide a way to separate debug
symbols from the executable - either by generating a separate symbol
database in the 1st place, or by allowing debug data to be stripped
from the executables. If you have to debug at the client site, you
simply take the symbol database with you.

Another useful method is to write out debug information as the program
executes and arrange that it either is suppressed or (alternatively)
goes to /dev/null unless some undocumented flag is given.
[Obviously where speed is paramount you can't be generating
unnecessary output, so the utility of this method is situation
dependent.]

I have used both of these methods in the past.
YMMV.
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sat Jun 6 01:37:13 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> wrote:

MitchAlsup <[email protected]d> writes:

[email protected] (Anton Ertl) posted:

long bar(long x, long y)
{
return x/2+y/2;
}

...

Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:

gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32

The call costs a lot of overhead.

Architectures without overflow traps are notorious for excess instruction >>count when overflow detection is desired or mandated.

MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.

My guess is that GCC developers care more about -trapv than about
MIPS. AFAICS several architectures officialy supported by GCC
struggle to work at all. I suspect that maintainers of MIPS
backend are happy that -trapv works and do not have resources
to make it efficient.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Sat Jun 6 07:57:46 2026

From Newsgroup: comp.arch

Waldek Hebisch <[email protected]> schrieb:

Anton Ertl <[email protected]> wrote:

MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.

My guess is that GCC developers care more about -trapv than about
MIPS.

It is a common misconception to treat GCC developers as a
monolithic group. There are hobbyists (such as myself) but
I would guess only a small minority of work is done by them,
with the notable exception of some front ends such as Fortran or
(the most recent example) Algol 68. There are employees by
different companies: Linux distributors like RedHat or Suse,
Large software companies like Google, hardware vendors like
IBM, Intel or Qualcomm, ...

For MIPS, there are not so many active people and commits.
mips64-linux-gnu is a secondary platform, so if it fails
bootstrap, a release would be held up, but a wrong-code
regression will not.

Counting changes since 2025-01-01 in the gcc/config directories
can give a good idea of the relative activity for different
subdirectories; I cut this off below 7, where the PDP-11 is (note
that architecture names are often historical, so i386 includes
x86_64, s390 includes Z, rs6000 includes POWER and so on).

539 ./riscv
435 ./i386
432 ./aarch64
177 ./loongarch
100 ./arm
85 ./s390
75 ./avr
72 ./xtensa
60 ./rs6000
51 ./gcn
39 ./nvptx
25 ./sparc
25 ./mips
20 ./arc
19 ./pa
19 ./bpf
19 ./alpha
18 ./pru
15 ./sh
13 ./rx
13 ./cris
12 ./or1k
12 ./microblaze
12 ./m68k
11 ./lm32
11 ./ia64
11 ./h8300
9 ./vax
9 ./nds32
8 ./mcore
8 ./epiphany
8 ./c6x
7 ./visium
7 ./rl78
7 ./pdp11
7 ./frv
7 ./csky

AFAICS several architectures officialy supported by GCC
struggle to work at all. I suspect that maintainers of MIPS
backend are happy that -trapv works and do not have resources
to make it efficient.

First, they would need to know about this, which requires a PR,
but resources may well be lacking.

There are currently 28 open "missed-optimization" bugs with mips
in their target field. Looking at a few architectures above,
RISC-V has 118, x86 has 943, aarch64 has 305, power has 133.
(Some bugs affect more than one architecture, of course).

But it is worth submitting a PR nonetheless, if anybody cares enough :-)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sat Jun 6 08:57:09 2026

From Newsgroup: comp.arch

George Neuner <[email protected]> writes:

That is true, but the issue at hand is how to achieve that. Leaving
debug information /in/ the executable, I think, is a bad idea.

On the contrary, it's an excellent idea. It means that the debug
information goes with the code. No chance of confusing yourself by inadvertantly associating the wrong debugging information with the
code, and much less chance of not finding the correct debug
information.

Best of all, of course, is to deliver the source code.

Another useful method is to write out debug information as the program >executes and arrange that it either is suppressed or (alternatively)
goes to /dev/null unless some undocumented flag is given.

Undocumented features are forgotten and reimplemented. There's the
story of Microsoft embedding some watermark into Microsoft BASIC
twice, the second time apparently because they had forgotten about the
first time.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Sun Jun 7 16:42:25 2026

From Newsgroup: comp.arch

George Neuner <[email protected]> writes:

On Wed, 03 Jun 2026 00:55:35 GMT, MitchAlsup ><[email protected]d> wrote:

quadi <[email protected]d> posted:

On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:

In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source >>> > interface specifications, having a proprietary debug format seems like a >>> > good way to strongly limit your potential customer base.

You appear to have understood his post in a different way than I did.

I wasn't thinking of the kind of debug information provided by a compiler. >>>
I was thinking of leaving debug information in when one was distributing >>> software to customers.

Yes, you the vendor do not want random customer debugging the code, >>however, you want the ability to debug the code that was distributed
on whatever medium on customer's system(s)--

AND you want to debug one copy of the running code while others are using >>other processes running the code under normal use.

That is true, but the issue at hand is how to achieve that. Leaving
debug information /in/ the executable, I think, is a bad idea.

However, many (most?) toolchains provide a way to separate debug
symbols from the executable - either by generating a separate symbol
database in the 1st place, or by allowing debug data to be stripped
from the executables. If you have to debug at the client site, you
simply take the symbol database with you.

Indeed, and that's been the common paradigm at my employers

I'll also note that many linux distributions include the debug
symbols for the distribution in optionally loaded packages.

Another useful method is to write out debug information as the program >executes and arrange that it either is suppressed or (alternatively)
goes to /dev/null unless some undocumented flag is given.

We arrange for the application to be able to be configured
(both statically before startup and dynamically during
runtime) to produce additional debug logging. Generally
arranged in the code to avoid significant impact to non-debug
performance (e.g. using __builtin_expect with GCC toolchains).

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Sun Jun 7 15:05:24 2026

From Newsgroup: comp.arch

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

Scott Lurndal [2026-06-03 18:36:51] wrote:

Being able to debug code without the source code doesn't seem
a particulary common use case,

Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 8 01:19:17 2026

From Newsgroup: comp.arch

On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:

Note that free does not equal open source. There is a fair amount of software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

Commonly, when this distinction is discussed in the open-source community,
the phrases "free as in beer" and "free as in freedom" are used to
distinguish between freeware that remains proprietary versus true open-
source software under the GPL.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon Jun 8 06:05:59 2026

From Newsgroup: comp.arch

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of >software that is freely available for which the source is not. Many of >these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary software),
not free software.

In the appendix of "1984" George Orwell wrote:

|To give a single example, the word free still existed in Newspeak, but
|could only be used in such statements as "The dog is free from lice"
|or "This field is free from weeds." It could not be used in its old
|sense of "politically free" or "intellectually free," since political
|and intellectual freedom no longer existed even as concepts, and were |therefore of necessity nameless.

Some of us obviously already write and think in Newspeak.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Mon Jun 8 09:25:32 2026

From Newsgroup: comp.arch

On Mon, 8 Jun 2026 01:19:17 -0000 (UTC)
quadi <[email protected]d> wrote:

On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:

Note that free does not equal open source. There is a fair amount
of software that is freely available for which the source is not.
Many of these are reduced functionality versions of paid for
software, e.g. Adobe PDF reader, but there are others.

Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.

John Savard

I strongly disagree with statement that true open source software is
equivalent of GPL.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Mon Jun 8 11:45:51 2026

From Newsgroup: comp.arch

Michael S wrote:

On Mon, 8 Jun 2026 01:19:17 -0000 (UTC)
quadi <[email protected]d> wrote:

On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:

Note that free does not equal open source. There is a fair amount
of software that is freely available for which the source is not.
Many of these are reduced functionality versions of paid for
software, e.g. Adobe PDF reader, but there are others.

Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.

John Savard

I strongly disagree with statement that true open source software is equivalent of GPL.

The obviously "most free" sw must be public domain, right?

Followed by free use but attribution required/requested?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Mon Jun 8 07:30:46 2026

From Newsgroup: comp.arch

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary software),
not free software.

I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software". I only meant that anyone
could use it without paying anything to anyone. In the sense that John
talked about, it is free beer.

If I misinterpreted Stefan's use of the word free, then I apologize.

In the appendix of "1984" George Orwell wrote:

|To give a single example, the word free still existed in Newspeak, but |could only be used in such statements as "The dog is free from lice"
|or "This field is free from weeds." It could not be used in its old
|sense of "politically free" or "intellectually free," since political
|and intellectual freedom no longer existed even as concepts, and were |therefore of necessity nameless.

Some of us obviously already write and think in Newspeak.

I hardly think that using the word free to mean "you don't have to pay
for it" is Newspeak.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Mon Jun 8 15:19:57 2026

From Newsgroup: comp.arch

Stephen Fuld <[email protected]d> writes:

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary software),
not free software.

I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software". I only meant that anyone
could use it without paying anything to anyone. In the sense that John >talked about, it is free beer.

Acroread sends basic telemetry to Adobe every time you use it,
so in a sense, it's not exactly free.

xpdf on the other hand....

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Mon Jun 8 16:18:37 2026

From Newsgroup: comp.arch

Stephen Fuld <[email protected]d> writes:

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary software),
not free software.

I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software".

Non-free software. I put the more commonly used term in parentheses.

I only meant that anyone
could use it without paying anything to anyone.

That's not what "free software" means. The four essential freedoms of
software are
<https://www.gnu.org/philosophy/free-sw.en.html#fs-definition>:

|* The freedom to run the program as you wish, for any purpose (freedom 0).
|
|* The freedom to study how the program works, and change it so it does
| your computing as you wish (freedom 1). Access to the source code is
| a precondition for this.
|
|* The freedom to redistribute copies so you can help others (freedom 2).
|
|* The freedom to distribute copies of your modified versions to others
| (freedom 3). By doing this you can give the whole community a chance
| to benefit from your changes. Access to the source code is a
| precondition for this.
|
|A program is free software if it gives users adequately all of these |freedoms. Otherwise, it is nonfree.

In the appendix of "1984" George Orwell wrote:

|To give a single example, the word free still existed in Newspeak, but
|could only be used in such statements as "The dog is free from lice"
|or "This field is free from weeds." It could not be used in its old
|sense of "politically free" or "intellectually free," since political
|and intellectual freedom no longer existed even as concepts, and were
|therefore of necessity nameless.

Some of us obviously already write and think in Newspeak.

I hardly think that using the word free to mean "you don't have to pay
for it" is Newspeak.

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@[email protected] to comp.arch on Mon Jun 8 17:11:35 2026

From Newsgroup: comp.arch

On Mon, 08 Jun 2026 09:25:32 +0300, Michael S wrote:

On Mon, 8 Jun 2026 01:19:17 -0000 (UTC)
quadi <[email protected]d> wrote:

On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many
of these are reduced functionality versions of paid for software,
e.g. Adobe PDF reader, but there are others.

Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.

I strongly disagree with statement that true open source software is equivalent of GPL.

I did not mean to imply that _only_ GPL-licensed software is truly open source. The GPL license is only the most common example. There is, as
another reply has already noted, also MIT-licensed software and public
domain software.

And the term "open source", of course, is broader than this, as well. It
isn't incorrect to use that term for any software the source code of which
is open to inspection, even if the software itself is proprietary.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Mon Jun 8 10:40:17 2026

From Newsgroup: comp.arch

On 6/8/2026 9:18 AM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of >>>> these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary software),
not free software.

I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software".

Non-free software. I put the more commonly used term in parentheses.

OK.

I only meant that anyone
could use it without paying anything to anyone.

That's not what "free software" means. The four essential freedoms of software are
<https://www.gnu.org/philosophy/free-sw.en.html#fs-definition>:

|* The freedom to run the program as you wish, for any purpose (freedom 0).
|
|* The freedom to study how the program works, and change it so it does
| your computing as you wish (freedom 1). Access to the source code is
| a precondition for this.
|
|* The freedom to redistribute copies so you can help others (freedom 2).
|
|* The freedom to distribute copies of your modified versions to others
| (freedom 3). By doing this you can give the whole community a chance
| to benefit from your changes. Access to the source code is a
| precondition for this.
|
|A program is free software if it gives users adequately all of these |freedoms. Otherwise, it is nonfree.

That is certainly *a* definition. It is obviously your preferred
definition. But there are others.

snipped the Orwell quotation.

Can you accept that others might have a different definition without
insulting them? (I take the assertion that I am using Newspeak as an
insult.)
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Mon Jun 8 09:18:02 2026

From Newsgroup: comp.arch

Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂

Note that free does not equal open source. There is a fair amount of software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g. Adobe
PDF reader, but there are others.

You may want to check on Wikipedia what is [Free Software](https://en.wikipedia.org/wiki/Free_software) before jumping
to conclusions.

I capitalized "Free Software" for a reason.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Mon Jun 8 22:43:40 2026

From Newsgroup: comp.arch

On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:

Stephen Fuld <[email protected]d> writes:

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. ðŸ™‚

Note that free does not equal open source. There is a fair
amount of software that is freely available for which the source
is not. Many of these are reduced functionality versions of paid
for software, e.g. Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary
software), not free software.

I don't want to get into a semantic argument here. I don't know
what you mean by the term "chained software".

Non-free software. I put the more commonly used term in parentheses.

I only meant that anyone
could use it without paying anything to anyone.

That's not what "free software" means. The four essential freedoms of software are
<https://www.gnu.org/philosophy/free-sw.en.html#fs-definition>:

|* The freedom to run the program as you wish, for any purpose
(freedom 0). |
|* The freedom to study how the program works, and change it so it
does | your computing as you wish (freedom 1). Access to the source
code is | a precondition for this.
|
|* The freedom to redistribute copies so you can help others (freedom
2). |
|* The freedom to distribute copies of your modified versions to
others | (freedom 3). By doing this you can give the whole community
a chance | to benefit from your changes. Access to the source code
is a | precondition for this.
|
|A program is free software if it gives users adequately all of these |freedoms. Otherwise, it is nonfree.

In the appendix of "1984" George Orwell wrote:

|To give a single example, the word free still existed in
Newspeak, but |could only be used in such statements as "The dog
is free from lice" |or "This field is free from weeds." It could
not be used in its old |sense of "politically free" or
"intellectually free," since political |and intellectual freedom
no longer existed even as concepts, and were |therefore of
necessity nameless.

Some of us obviously already write and think in Newspeak.

I hardly think that using the word free to mean "you don't have to
pay for it" is Newspeak.

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).

- anton

I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software" and then he and
his devotees started to insist that it the the only correct meaning.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Mon Jun 8 22:51:52 2026

From Newsgroup: comp.arch

On Mon, 08 Jun 2026 15:19:57 GMT
[email protected] (Scott Lurndal) wrote:

Stephen Fuld <[email protected]d> writes:

On 6/7/2026 11:05 PM, Anton Ertl wrote:

Stephen Fuld <[email protected]d> writes:

On 6/4/2026 8:46 PM, Stefan Monnier wrote:

I started this thread by mentioning Free Software. ðŸ™‚

Note that free does not equal open source. There is a fair
amount of software that is freely available for which the source
is not. Many of these are reduced functionality versions of paid
for software, e.g. Adobe PDF reader, but there are others.

The Adobe PDF reader is chained software (aka proprietary
software), not free software.

I don't want to get into a semantic argument here. I don't know
what you mean by the term "chained software". I only meant that
anyone could use it without paying anything to anyone. In the sense
that John talked about, it is free beer.

Acroread sends basic telemetry to Adobe every time you use it,
so in a sense, it's not exactly free.

xpdf on the other hand....

I prefer SumatraPdf. GPL3, but not avalable outside Windows.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Mon Jun 8 18:03:43 2026

From Newsgroup: comp.arch

Anton Ertl [2026-06-08 16:18:37] wrote:

Stephen Fuld <[email protected]d> writes:

I hardly think that using the word free to mean "you don't have to pay
for it" is Newspeak.

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).

Well, Stephen is hardly using a recent meaning of the word "free".
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.

But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Mon Jun 8 17:44:59 2026

From Newsgroup: comp.arch

Terje Mathisen [2026-06-08 11:45:51] wrote:

Michael S wrote:
The obviously "most free" sw must be public domain, right?

As with most things related to freedom ... it depends.

Public domain offers "more freedom" when you consider the point of view
of the developers, who can use that software any way they want with no restrictions at all.

But not when you consider the point of view of the end-users who may
receive code compiled/derived from that public domain source code with
no way to recover that public domain source code, or to change or fix
it. It may even be illegal to try to recover it (since the DMCA
disallows several forms of reverse engineering).

From that end-user point of view, the GPL arguably ensures "more
freedom" than public domain.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Tue Jun 9 10:06:48 2026

From Newsgroup: comp.arch

On 09/06/2026 00:03, Stefan Monnier wrote:

Anton Ertl [2026-06-08 16:18:37] wrote:

Stephen Fuld <[email protected]d> writes:

I hardly think that using the word free to mean "you don't have to pay
for it" is Newspeak.

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).

Well, Stephen is hardly using a recent meaning of the word "free".
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.

But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.

It is not a mistake - it is merely a different but perfectly reasonable
use of the same word. The FSF has done (and continues to do) wonderful
things that are of huge benefit to the computing world, and I am a big
fan of what they term "Free Software". But they do not own the word
"free", nor do they have rights to determine the definition of the
phrase "free software". People can, and do, use the phrase meaning
"gratis software". In any discussion on the topic, it is good to be
entirely clear on the intended meanings - but that applies equally to
those who write "free software" meaning "libre software" and "free
software" meaning "gratis software". Neither are "mistaken", and both
can cause confusion. (In longer phrases, acronyms, or proper names of organisations, there should be no confusion - FOSS or FSF should be
clear to all.)

(As for the discussion about what is the most "free", or "libre",
licensing model - you can argue about it until you are blue in the face,
but no conclusion can be reached because it depends on the point of
view. Freedoms are always a balance and a tradeoff to some extent.)

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Tue Jun 9 17:24:24 2026

From Newsgroup: comp.arch

Michael S <[email protected]> writes:

On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton

I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"

The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1] Then some people
removed some or all freedoms from some software, typically with the
goal of making money from the software. Removing the freedoms and yet
not asking for money is a later development; this has often been
called shareware or freeware, but Stephen Fuld is the first one I have
seen who has called it "free software", and actually misunderstood a
reference to "Free Software" (capitalized).

[1] As an example, <https://en.wikipedia.org/wiki/SHARE_(computing)>
states:

|Originally, IBM distributed what software it provided in source
|form[2][3][4] and systems programmers commonly made small local
|additions or modifications and exchanged them with other users.

All four freedoms were exercised here, more than two decades before
the Free Software Foundation.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Tue Jun 9 21:15:41 2026

From Newsgroup: comp.arch

Anton Ertl wrote:

Michael S <[email protected]> writes:

On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton

I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"

The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1] Then some people
removed some or all freedoms from some software, typically with the
goal of making money from the software. Removing the freedoms and yet
not asking for money is a later development; this has often been
called shareware or freeware, but Stephen Fuld is the first one I have
seen who has called it "free software", and actually misunderstood a reference to "Free Software" (capitalized).

[1] As an example, <https://en.wikipedia.org/wiki/SHARE_(computing)>
states:

|Originally, IBM distributed what software it provided in source |form[2][3][4] and systems programmers commonly made small local
|additions or modifications and exchanged them with other users.

All four freedoms were exercised here, more than two decades before
the Free Software Foundation.

Yes, with one importnt restriction:

The software was free, but you could not use it except on IBM hardware,
which was quite expensive.

When clones started to appear (Amdahl?) I believe the free sw
disappeared, now it was explicitly licensed to only run on "real" IBM hardware?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stephen Fuld@[email protected] to comp.arch on Tue Jun 9 16:29:01 2026

From Newsgroup: comp.arch

On 6/9/2026 12:15 PM, Terje Mathisen wrote:

Anton Ertl wrote:

Michael S <[email protected]> writes:

On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:

Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton

I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"

The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1] Then some people
removed some or all freedoms from some software, typically with the
goal of making money from the software. Removing the freedoms and yet
not asking for money is a later development; this has often been
called shareware or freeware, but Stephen Fuld is the first one I have
seen who has called it "free software", and actually misunderstood a
reference to "Free Software" (capitalized).

[1] As an example, <https://en.wikipedia.org/wiki/SHARE_(computing)>
states:

|Originally, IBM distributed what software it provided in source
|form[2][3][4] and systems programmers commonly made small local
|additions or modifications and exchanged them with other users.

All four freedoms were exercised here, more than two decades before
the Free Software Foundation.

Yes, with one importnt restriction:

The software was free, but you could not use it except on IBM hardware, which was quite expensive.

When clones started to appear (Amdahl?) I believe the free sw
disappeared, now it was explicitly licensed to only run on "real" IBM hardware?

Sort of, but it was more complicated than that. In the 1960s (and
before), IBM "bundled" (i.e. it was freely included) all software with
the hardware. In 1969, the US government filed an anti-trust case
against IBM, claiming, among other things, monopolization of the
software market. One of the Government's goals was to support an
independent software market (which couldn't exist if IBM gave everything
away for free). The suit dragged on for years and was ultimately
withdrawn, but IBM was scared about what the suit could do. So it
initiated "unbundling" of software (and other things like education
classes), now charging separately for each software product the customer wanted. This was ultimately successful for the government, leading to
the success of companies like Syncsort (1971), and later several
competitive database systems (e.g Total, IDMS, etc.) But it also
allowed Amdahl (in 1971), and later other PCM hardware companies, to
sell competitive CPUs with the assurance that they could license the OS,
etc. from IBM.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Wed Jun 10 06:01:19 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> schrieb:

The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1]

There is an anecdote in "Abstracting Away the Machine".
IBM supplied a customer with its Fortran compiler (Fortran I at
the time). The customer noted that tape use was inefficient,
leading to longer than necessary compile times, and asked for
the source to improve it. Somebody at IBM refused, quipping "IBM
does not supply source code". So the customer went ahead, reverse
engieered the compiler and added the improvements anyway (which
were huge). When IBM noticed that, they asked for the improvement,
and the customer uipped back "$COMPANY does not supply object code",
and refused.

I'd have to search for the anecdote in the book to get the details
exactly right.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Thu Jun 11 15:14:26 2026

From Newsgroup: comp.arch

According to Stefan Monnier <[email protected]>:

But not when you consider the point of view of the end-users who may
receive code compiled/derived from that public domain source code with
no way to recover that public domain source code, or to change or fix
it. It may even be illegal to try to recover it (since the DMCA
disallows several forms of reverse engineering).

If it's really public domain, there is no bar to reverse engineering
since there is nobody who can complain about it. I agree there are
other kinds of software where the executable is freely available but
the authors choose not to provide source and could use the DMCA
against people who reverse engineer.

From that end-user point of view, the GPL arguably ensures "more
freedom" than public domain.

I definitely agree with "arguably".

Speaking of the DMCA, the US Copyright Office is starting its tenth proceeding to update the list of exemptions to the DMCA for research, analysis and other non-infringing uses. Worth a look if you're interested in the topic.

https://www.copyright.gov/1201/2027/
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From George Neuner@[email protected] to comp.arch on Thu Jun 11 13:43:29 2026

From Newsgroup: comp.arch

On Mon, 08 Jun 2026 17:44:59 -0400, Stefan Monnier
<[email protected]> wrote:

Terje Mathisen [2026-06-08 11:45:51] wrote:

Michael S wrote:
The obviously "most free" sw must be public domain, right?

As with most things related to freedom ... it depends.

Public domain offers "more freedom" when you consider the point of view
of the developers, who can use that software any way they want with no >restrictions at all.

But not when you consider the point of view of the end-users who may
receive code compiled/derived from that public domain source code with
no way to recover that public domain source code, or to change or fix
it. It may even be illegal to try to recover it (since the DMCA
disallows several forms of reverse engineering).

From that end-user point of view, the GPL arguably ensures "more
freedom" than public domain.

=== Stefan

Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public
domain.

Putting <whatever> under some kind of license - regardless of how
permissive it is - actually is easier to do in many places, and is
recognized in more places.

MMV.
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Fri Jun 12 01:04:53 2026

From Newsgroup: comp.arch

Anton Ertl <[email protected]> wrote:

David Brown <[email protected]> writes:

On 25/05/2026 16:28, Anton Ertl wrote:

Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).

An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related >>compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >>would check for overflow on "a + b", and report it at runtime. >>(Unfortunately, gcc does not do that unless the partial expression is >>assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.

OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.

Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:

|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.

As for what gcc-12.2 does for your example on AMD64:

long foo(long a, long b)
{
return a+b-a;
}

is compiled with gcc -O3 -ftrapv to:

0: 48 89 f0 mov %rsi,%rax
3: c3 ret

That is what I expect from '-ftrapv': running code should deliver
result as if using infinite precision arithmetic or overflow trap.
More tight specification could be that optimized code should not
generate overflow trap in cases when computing naively using C
semantics does not lead to overflow. Since the result above
agrees with result obtained using infinite precision arithmetic,
the code is fine and there is no need for runtime checks.

Of course, languages like C++ which turn traps into exceptions
and allow to use this as part of computations may have problem
here. More precisely, if they specify that overflow exception
must happen at given computational step, then optimizer may be
forced to generate code which is ther only to generate trap
and serves no other purpose. But the orignal intent of
overflow trap is to signal that real machine using fixed size
numbers can not deliver the same result as ideal machine
using infinte precion. If a language respects the intent,
then compiler can do a lot of optimizations.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Fri Jun 12 01:57:06 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> wrote:

Waldek Hebisch <[email protected]> schrieb:

Anton Ertl <[email protected]> wrote:

MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.

My guess is that GCC developers care more about -trapv than about
MIPS.

It is a common misconception to treat GCC developers as a
monolithic group. There are hobbyists (such as myself) but
I would guess only a small minority of work is done by them,
with the notable exception of some front ends such as Fortran or
(the most recent example) Algol 68. There are employees by
different companies: Linux distributors like RedHat or Suse,
Large software companies like Google, hardware vendors like
IBM, Intel or Qualcomm, ...

Well, normal employees care about things that their employer
tells them to do. Whatever the reason GCC developers
(that is people contributing to GCC) each have their agenda,
care more about some things and less about other.

For MIPS, there are not so many active people and commits.
mips64-linux-gnu is a secondary platform, so if it fails
bootstrap, a release would be held up, but a wrong-code
regression will not.

Counting changes since 2025-01-01 in the gcc/config directories
can give a good idea of the relative activity for different
subdirectories; I cut this off below 7, where the PDP-11 is (note
that architecture names are often historical, so i386 includes
x86_64, s390 includes Z, rs6000 includes POWER and so on).

539 ./riscv
435 ./i386
432 ./aarch64
177 ./loongarch
100 ./arm
85 ./s390
75 ./avr
72 ./xtensa
60 ./rs6000
51 ./gcn
39 ./nvptx
25 ./sparc
25 ./mips
20 ./arc
19 ./pa
19 ./bpf
19 ./alpha
18 ./pru
15 ./sh
13 ./rx
13 ./cris
12 ./or1k
12 ./microblaze
12 ./m68k
11 ./lm32
11 ./ia64
11 ./h8300
9 ./vax
9 ./nds32
8 ./mcore
8 ./epiphany
8 ./c6x
7 ./visium
7 ./rl78
7 ./pdp11
7 ./frv
7 ./csky

AFAICS several architectures officialy supported by GCC
struggle to work at all. I suspect that maintainers of MIPS
backend are happy that -trapv works and do not have resources
to make it efficient.

First, they would need to know about this, which requires a PR,
but resources may well be lacking.

There are currently 28 open "missed-optimization" bugs with mips
in their target field. Looking at a few architectures above,
RISC-V has 118, x86 has 943, aarch64 has 305, power has 133.
(Some bugs affect more than one architecture, of course).

But it is worth submitting a PR nonetheless, if anybody cares enough :-)

Frankly, I do not care enough. I mean, I like fact that GCC
supports several architectures. But I have use of x86_64, arm (both
32 bit and 64-bit one), RISC-V and few embedded processors, that is
I have processors and can run GCC outout on them. I even have some
use of s390/z, that is I have emulator (Hercules) and have some
interst in software running inside emulator. But I have essentially
no use of MIPS.

And in slightly different spirit, IIRC there were cases when bug
reports caused reaction of sort "Yes, it is buggy. It would be
too much effort to fix it, so we will just remove support".
I support is removed, then work needed to revive an architecture
is likely to be significantly larger than in case of bitrotten,
but still in-tree architecture.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Thu Jun 11 15:21:14 2026

From Newsgroup: comp.arch

David Brown [2026-06-09 10:06:48] wrote:

On 09/06/2026 00:03, Stefan Monnier wrote:

Well, Stephen is hardly using a recent meaning of the word "free".
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.
But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.

It is not a mistake - it is merely a different but perfectly reasonable use of the same word.

In an arbitrary context, I could agree, but here we're talking about
a subthread that started with:

On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:
> MitchAlsup [2026-05-26 20:54:30] wrote:
>> Encrypt the debug information (and put it in a
>> {1234-5678-9101-1121-...} folder) so that only the owner (not
>> licensee) of the code can debug it.
> I resent that. All code should be Free Software.

I think there is no ambiguity here.

Treating this "Free Software" to refer to price rather than to freedom
is an error that can be explained only by a lack of familiarity with the
idea of software freedom.

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@[email protected] to comp.arch on Fri Jun 12 13:02:41 2026

From Newsgroup: comp.arch

On 11/06/2026 21:21, Stefan Monnier wrote:

David Brown [2026-06-09 10:06:48] wrote:

On 09/06/2026 00:03, Stefan Monnier wrote:

Well, Stephen is hardly using a recent meaning of the word "free".
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.
But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.

It is not a mistake - it is merely a different but perfectly reasonable use >> of the same word.

In an arbitrary context, I could agree, but here we're talking about
a subthread that started with:

On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:
> MitchAlsup [2026-05-26 20:54:30] wrote:
>> Encrypt the debug information (and put it in a
>> {1234-5678-9101-1121-...} folder) so that only the owner (not
>> licensee) of the code can debug it.
> I resent that. All code should be Free Software.

I think there is no ambiguity here.

Treating this "Free Software" to refer to price rather than to freedom
is an error that can be explained only by a lack of familiarity with the
idea of software freedom.

Fair enough - I agree that in that context, the term "Free Software" is unambiguous. The capitalisation is important, and that was lost by the
post to which I replied. (There are a /lot/ of posts in this thread,
and when switching between two computers I have undoubtedly skipped many
of them.)

--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sat Jun 13 03:01:00 2026

From Newsgroup: comp.arch

According to George Neuner <[email protected]>:

As with most things related to freedom ... it depends.

Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public
domain.

I am not aware of any countries that do not have the public domain for
material whose copyright has expired, or for whatever reason was not
eligible for copyright in the first place. But you're right, in some
places it is impossible or at least impractical to relinquish your
rights and put something in the P.D. before it would get there anyway.

Putting <whatever> under some kind of license - regardless of how
permissive it is - actually is easier to do in many places, and is
recognized in more places.

Agreed. There are lots of licenses other than the GPL that are
used successfully for open source software.

R's,
John
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From George Neuner@[email protected] to comp.arch on Sat Jun 13 05:49:19 2026

From Newsgroup: comp.arch

On Sat, 13 Jun 2026 03:01:00 -0000 (UTC), John Levine
<[email protected]> wrote:

According to George Neuner <[email protected]>:

As with most things related to freedom ... it depends.

Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public >>domain.

I am not aware of any countries that do not have the public domain for >material whose copyright has expired, or for whatever reason was not
eligible for copyright in the first place. But you're right, in some
places it is impossible or at least impractical to relinquish your
rights and put something in the P.D. before it would get there anyway.

The Berne convention defined an implicit copyright that exists by
virtue of authorship and persists until the author's death. Though
the US does not recognize or enforce these implicit copyrights, most signatories to either Berne (1886) or UCC (1952) conventions do
recognize and enforce Berne copyrights.

Explicit copyrights - filed with Copyright offices - can be
voluntarily surrendered at any time. It is giving up the implicit
copyright that is the problem with public domain.

Putting <whatever> under some kind of license - regardless of how >>permissive it is - actually is easier to do in many places, and is >>recognized in more places.

Agreed. There are lots of licenses other than the GPL that are
used successfully for open source software.

R's,
John

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@[email protected] (Anton Ertl) to comp.arch on Sat Jun 13 10:52:08 2026

From Newsgroup: comp.arch

George Neuner <[email protected]> writes:

The Berne convention defined an implicit copyright that exists by
virtue of authorship and persists until the author's death. Though
the US does not recognize or enforce these implicit copyrights, most >signatories to either Berne (1886) or UCC (1952) conventions do
recognize and enforce Berne copyrights.

According to <https://en.wikipedia.org/wiki/Berne_convention>:

|The United States acceded to the convention on 16 November 1988, and
|the convention entered into force for the United States on 1 March
|1989.

How can the convention have entered into force in the US without the
US recognizing or enforcing implicit copyrights?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadibloc@[email protected] (John Savard) to comp.arch on Sat Jun 13 14:20:57 2026

From Newsgroup: comp.arch

On Sat, 13 Jun 2026 10:52:08 GMT, [email protected]
(Anton Ertl) wrote:

George Neuner <[email protected]> writes:

The Berne convention defined an implicit copyright that exists by
virtue of authorship and persists until the author's death. Though
the US does not recognize or enforce these implicit copyrights, most >>signatories to either Berne (1886) or UCC (1952) conventions do
recognize and enforce Berne copyrights.

According to <https://en.wikipedia.org/wiki/Berne_convention>:

|The United States acceded to the convention on 16 November 1988, and
|the convention entered into force for the United States on 1 March
|1989.

How can the convention have entered into force in the US without the
US recognizing or enforcing implicit copyrights?

Implicit copyrights do exist now in the U.S. because of its
ratification of the Berne convention.

But there are some limitations.

Nothing that entered the public domain prior to this ratification
became copyrighted again; there was no retroactive effect.

Also, U.S. parties are still incentivized to register their
copyrights, because this is necessary to recieve statutory damages and attorney's fees from a copyright lawsuit.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sat Jun 13 15:07:00 2026

From Newsgroup: comp.arch

John Levine <[email protected]> wrote:

According to George Neuner <[email protected]>:

As with most things related to freedom ... it depends.

Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public >>domain.

I am not aware of any countries that do not have the public domain for material whose copyright has expired,

My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.

Also, law of my country declares some author right as
untransfreable. Basically, author can sue if he/she/it
thinks that artistic integrity of the work is violated.
Theoretically, one could imagine some old, not longer commercialy
viable program to be released as public domain, new people fixing
old bugs and original developers suing that bug fixes deprive
users of orignal experience and hence violate artistic integrity.
Probably not going to work in court, but we had case when
sensible improvements to buildings were blocked by architects.

BTW. Our copyright law has notion of "area of exploration" and
states that copyright transfer is effective only for explicitely
transferred rights. All right in "areas of exploration" which
are not explicitely transferred stay with autors. I guess that
training LLM would count as new "area of exploration"...
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From John Levine@[email protected] to comp.arch on Sat Jun 13 17:25:27 2026

From Newsgroup: comp.arch

According to Waldek Hebisch <[email protected]>:

My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.

Do distributors pay state royalties on works of Shakespeare?
The Bible? Wow.

Also, law of my country declares some author right as
untransfreable. Basically, author can sue if he/she/it
thinks that artistic integrity of the work is violated.

Those are moral rights, introduced into Berne in the 1920s by
everyone's favorite copyright advocate, Benito Mussolini.

The US only recognizes moral rights for visual works like
paintings and sculpture. There was an interesting case in
2013 where the owner of a building containing artists' studios
who had allowed elaborate graffiti on the outside of the building
decided to tear it down, first whitewashing over the art. The
artists sued and won $6.7 million. https://en.wikipedia.org/wiki/5_Pointz

So don't let people doodle on your circuit boards, I guess.
--
Regards,
John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadibloc@[email protected] (John Savard) to comp.arch on Sun Jun 14 19:14:39 2026

From Newsgroup: comp.arch

On Sat, 13 Jun 2026 15:07:00 -0000 (UTC), [email protected] (Waldek
Hebisch) wrote:

My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.

I once read a science-fiction story in which, after Earth joined an interplanetary confederation, works that were in the public domain
became works for which the United Nations could charge royalties to
people from other planets.

The story was a detective story. It wasn't Shakespeare, but instead
Bollywood movies that a criminal was modifying and re-selling to a
planet to the culture of which those movies were well suited.

Also, law of my country declares some author right as
untransfreable. Basically, author can sue if he/she/it
thinks that artistic integrity of the work is violated.

Most European countries recognize the moral rights of authors.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Mon Jun 15 16:32:47 2026

From Newsgroup: comp.arch

John Levine <[email protected]> wrote:

According to Waldek Hebisch <[email protected]>:

My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.

Do distributors pay state royalties on works of Shakespeare?
The Bible? Wow.

I am not sure how they handle what in english zone is called
"derived works". If they reproduce first edition of Shakespeare
work or say Gutenberg Bible distributors are supposed to pay
(and I think that they do pay). Copyright to currently sold
Bible is attributed to translators.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,123
Nodes:	10 (0 / 10)
Uptime:	36:45:13
Calls:	14,371
Files:	186,380
D/L today:	2,877 files (805M bytes)
Messages:	2,540,654

Concertina IV Has Arrived

Who's Online

System Info