In any case, I think I've come up with something that is a reasonable compromise I can live with after all.
I've made my first change to Concertina IV. I'm not happy with the way things were before the change or the way they are now, so I may change it again.
The 16-bit short instructions only have 12 free bits available. That's not much to work with when there are 32 registers in each register bank.
Initially, I settled on four bits of opcode, along with the basic register specification scheme used for the 15-bit paired short instructions in Concertina II.
But choosing single and double precision floating-point as the only two types supported didn't rest easily with me. Single precision isn't really precise enough to be useful, or so I've heard.
The alternative of supporting 48-bit intermediate precision and double precision, while it appeals to me personally... is clearly untenable.
Medium is a nonstandard data type, and so it would not be widely used.
So instead I decided to only support double precision, and use the extra bits to allow additional ways to specify registers.
The result, of course, is messy.
So I'm considering going back to the earlier format, but instead of supporting two floating-point data types, to support one integer type and one floating type. But which integer type? 32-bit integer, or 64-bit long?
I could get more bits by going to _paired_ instructions. But I have some free space between 32-bit instructions so that I could just add those
while keeping 16-bit short instructions.
And this also led me to thinking about something else.
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So integer operations must sign-extend if they're on values shorter than 64 bits.
Propagating a bit takes time.
So should I design the ALU so that the sign extension takes place after
the rest of the instruction, and allow another 32-bit (or shorter) integer instruction to use results when they're ready, before sign extension? Is that just normal efficiency, or wasteful complexity?
In any case, I think I've come up with something that is a reasonable compromise I can live with after all.--- Synchronet 3.22a-Linux NewsLink 1.2
John Savard
quadi <[email protected]d> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
As far as integers go: all calculations produce proper integer values in
the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
Propagating a bit takes time.
A solved HW gate-level problem.
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
a) I didn't think this really had anything to do with little-endian
versus big-endian.
b) Yes, little-endian is more popular, but that's just because the
PDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.
As far as integers go: all calculations produce proper integer values
in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
If you have 64 bit registers, then if you want to avoid a gap between
the sign in a 32-bit number and the sign of a 64-bit number by placing
the 32-
bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.
Everything you have heard is both true and false::
There are many applications where DP is de rigueur {galactic
simulations} smaller precision simply will not do. Many of these would
like to go FP128 but performance is not there yet.
There is a growing demand for FP16 and FP8 data types for memory-size
and BW reasons.
There is a growing background need for FP128, too.
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
b) Yes, little-endian is more popular, but that's just because the PDP-11, >8080, and 6502 happened to choose it.
Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.
* 8080: Yes, because AMD64 inherited its byte order from it. But if
we go to the origin here, it's not the 8080 and not the 8008, but
the Datapoint 2200, which is remarkable, because it was designed as
a terminal for mainframes, and S/360 is big-endian.
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|The fact that most laptops and cloud computers today store numbers
|in little-endian format is carried forward from the original
|Datapoint 2200. Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries. Microprocessors descended from the
|Datapoint 2200 (the 8008, Z80, and the x86 chips used in most
|laptops and cloud computers today) kept the little-endian format
|used by that original Datapoint 2200.
b) Yes, little-endian is more popular, but that's just because thePDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
Anton Ertl <[email protected]> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Yes, you start with the least significant bit, but given that the architecture is not bit-addressed, this is irrelevant.
quadi <[email protected]d> writes:
b) Yes, little-endian is more popular, but that's just because the PDP-11, >>8080, and 6502 happened to choose it.
Thinking about it:
With the BCD support of instruction sets typically requiring piecing
together the complete operation of suboperations of less than full
length (e.g., bytes on the 6502 and the 80(2)86), little-endian is
actually easier. When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go >backwards from there. This is especially relevant if you do not want
to completely unroll the loop that handles these bytes.
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go backwards from there. This is especially relevant if you do not want to completely unroll the loop that handles these bytes.
On 5/20/26 04:09, quadi wrote:
b) Yes, little-endian is more popular, but that's just because thePDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.
Anton Ertl <[email protected]> schrieb:
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>> |in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Unfortunately, you are mistaken.
Yes, you start with the least significant bit, but given that the
architecture is not bit-addressed, this is irrelevant.
JMP with a two-byte address was little-endian on the Datapoint 2200,
On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
The reason I blame the PDP-11 for everything is that it was a hugely >influential machine. It was widely used in academic settings, and it was >also the machine for which UNIX was first widely distributed.
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to
completely unroll the loop that handles these bytes.
This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.
The reason I claim that BCD support strongly favors big-endian byte order
is this:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are >written from left to right, numerals appear in texts with the most >significant digit first.
So if one has a hardware instruction to convert from BCD to the string >representation of numbers, such as UNPK or EDIT, then those two >representations should have the same endian-ness.
And if one wants to use the same ALU for binary and BCD arithmetic, then >those have to have the same endianness.
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description> >>>>> says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>>> |in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Unfortunately, you are mistaken.
A claim without any supporting argument.
quadi <[email protected]d> writes:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive >>characters at increasing addresses - and, at least in languages that are >>written from left to right, numerals appear in texts with the most >>significant digit first.
So if one has a hardware instruction to convert from BCD to the string >>representation of numbers, such as UNPK or EDIT, then those two >>representations should have the same endian-ness.
Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
The 2200 did not have byte-addressable memory; memory contents only
could be used when they bubbled up through the shift registers.
Otherwise, the CPU had to wait. (It was a silicon version of the
mercury delay lines of the UNIVAC I).
So, how do you add or subtract values in memory? From low to high
value, saving carries. You then have a choice of either loading
them in sequence, in a single go, or to load the high value,
wait for half a microsecond and then load the low value.
Would you build such a machine in big-endian or little-endian?
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
a) I didn't think this really had anything to do with little-endian versus big-endian.
b) Yes, little-endian is more popular, but that's just because the PDP-11, 8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
--- Synchronet 3.22a-Linux NewsLink 1.2As far as integers go: all calculations produce proper integer values in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
If you have 64 bit registers, then if you want to avoid a gap between the sign in a 32-bit number and the sign of a 64-bit number by placing the 32- bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.
Propagating a bit takes time.
A solved HW gate-level problem.
That's good news, then I don't have a problem. I figured the solution
would be to use slightly slower gates with larger current output.
John Savard
According to Scott Lurndal <[email protected]>:
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
How did it handle carries? Let's say you're adding
099999999999999999999999999999999999999999999999999 000000000000000000000000000000000000000000000000001
If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.
quadi <[email protected]d> writes:
On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
The reason I blame the PDP-11 for everything is that it was a hugely
influential machine. It was widely used in academic settings, and it was
also the machine for which UNIX was first widely distributed.
But its byte order was not influential into this century. Unix and
its applications are portable, including between byte orders (or at
least they were, when there were still enough machines of either byte
order around that one could test that). And somehow the PDP-11 and
its offspring did not capture the workstation market and the server
market that involved from that, and which constituted the Unix
markets.
Instead, the big-endian 68000 and its offspring dominated that market
for a while, and was replaced with RISCs later, which had the same
byte order as the earlier machines from the same company (i.e.,
little-endian for DEC and big-endian for the others). And when the
market for workstations and server on RISCs shrunk down to almost
nothing, not only did these big-endian machine vanish, but the
offspring of the PDP-11 as well (and actually before some of the
big-endian RISCs). What remains of this world is AIX on Power, and I
have no idea how many installations there still are.
Linux on Power was switched to little-endian with the introduction of OpenPower, not because of the PDP-11 descendants, but because of the Datapoint 2200 descendants. And the Datapoint 2200 (announced in June
1970) was probably not influence by the PDP-11 (announced in January
1970).
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to >>> completely unroll the loop that handles these bytes.
This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.
If the numbers fit in one granule, yes, that benefit does not matter.
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?
The reason I claim that BCD support strongly favors big-endian byte order
is this:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are
written from left to right, numerals appear in texts with the most
significant digit first.
So if one has a hardware instruction to convert from BCD to the string
representation of numbers, such as UNPK or EDIT, then those two
representations should have the same endian-ness.
Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
John Levine <[email protected]> writes:
According to Scott Lurndal <[email protected]>:
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
How did it handle carries? Let's say you're adding
099999999999999999999999999999999999999999999999999
000000000000000000000000000000000000000000000000001
A value that overflows the size of the receiving field
cannot be represented, so the overflow toggle is set and
the instruction terminates _without modifying the
receiving field_.
The size of the receiving field is the larger of the
two source fields. So
ADD 0508 000000 100000 200000
would add the 5 digit value at address 0 to the
8 digit value at address 100000 and store the
result at address 200000.
If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.
Actually, that's the clever part. They count 9s.
Example 1: 10 digit receiving field, 10 digit addend, 1 digit augend:
Memory contents before:
000000: 9999999999
000010: 1
ADD 1001 000000 000010 000020
The result of the instruction is that the overflow toggle
will be set and the destination field will remain unmodified.
The algorithm implicitly fills leading zeros into
the shorter operand.
The first digit of the addend operand is read. '9' in
this case. The first digit of the augend is added (in this
case, implicitly zero) and the result is 9. A special
register (the 9's counter) is incremented and the algorithm
proceeds to the next digit. Wash, rinse and repeat until
reaching the last digit, where the sum of 9 + 1 will overflow
a single digit, so the instruction terminates with overflow.
If in the case you showed above, there was a zero in the
first digit of both operands, there is no posibility of
overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.
There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.
On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.
So in commenting on a different part of my design entirely, you've
pointed out an important flaw I will have to correct.
On Wed, 20 May 2026 18:07:14 +0000, John Levine wrote:
On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.
Yes, since while the System/360 gave you an error if you tried to use >unaligned operands in memory, this restriction was abolished with the >System/370. Only an unaligned operand can possibly cross a page boundary, >since pages have a power-of-two size greater than the size of any data
type.
To make this work S/370 and its successors first do a trial execution of
the instruction without storing anything to see if it causes a page
fault. If not, it then redoes the instruction for real, storing the
result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.
quadi <[email protected]d> posted:
So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.
My 66000 started out that way and the compiler showed that this choice
sucks.
When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.
I will have to review this point, however, to be sure.
Anton Ertl wrote:
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?
ieee754 does define decimal64, decimal128 and even decimal32, but the
first two has pretty much all the actual usage, probably (?) decimal128
as the majority, at least for all accumulators.
Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64
digits, would start with an exchange of the high and low 16-byte halves, >then a permute of each half to reverse the order. The final single-cycle >operation is the only overhead of the little vs high-endian inputs.
Next we duplicate the input by unpacking the high and low 16 bytes into
each byte value into 16 16-bit shorts, with the leading byte 0, then (in >parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >original input. About 15-20 cycles in total with well under 10% being
the byte order swap.
On Wed, 20 May 2026 15:42:03 +0000, Anton Ertl wrote:
quadi <[email protected]d> writes:
Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
An additional instruction is an additional instruction!
Terje Mathisen <[email protected]> writes:
BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64 >>digits, would start with an exchange of the high and low 16-byte halves, >>then a permute of each half to reverse the order. The final single-cycle >>operation is the only overhead of the little vs high-endian inputs.
Next we duplicate the input by unpacking the high and low 16 bytes into >>each byte value into 16 16-bit shorts, with the leading byte 0, then (in >>parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >>original input. About 15-20 cycles in total with well under 10% being
the byte order swap.
My thinking was along the lines of using VPERMB to do the
byte-swapping, the duplicating, and the unpacking in one step. E.g.,
if you have a 64-bit BCD number 1234567890123456 as the following
sequence of bytes
56 34 12 90 78 56 34 12
Then you have the index vector
7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
and VPERMB xmm1, xmm2, xmm3
(where the BCD number is in xmm3 and the index vector is in xmm2) will
put the following in xmm1:
12 12 34 34 56 56 78 78 90 90 12 12 34 34 56 56
So no extra instruction for the byte swapping.
The problem is that I now would like a masked parallel byte shift to
shift the even-indexed bytes right by 4 bits, but I don't find
parallel byte shifts. I guess the answer is to let the VPERMB arrange
the result as follows
1234 1234 5678 5678 9012 9012 3456 3456
^^^^ ^^^^ ^^^^ ^^^^
then use a masked VPSRLW for shifting the marked 16-bit pieces to the
right by 4 bits, resulting in
0123 1234 0567 5678 0901 9012 0345 3456
Now use VPSHUFB or VPERMB to rearrange the bytes in the intended order:
01 12 23 34 45 56 67 78 89 90 01 12 23 34 45 56
Finally, I have achieved my dream, insane and useless though it may be!
Well, I have taken the opportunity to squeeze one more little thing into
the instruction set that Concertina III had, but this time I could not squeeze quite as many of them in... 16-bit prefixes for instructions,
which allow the instruction set to be extended.
it will be necessary to have a special
compare instruction for unsigned integers.
Scott Lurndal wrote:
overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.
There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.
So it did process them top-down, but delayed writing the anything to the >output field until it was known that it would not overflow, and the same >happened for every subsequent partial sum of 9.
Yeah, that works but it probably caused some output hickups when a long >chain of potential carries finally resolved. :-)
On Thu, 21 May 2026 00:37:39 +0000, John Levine wrote:
result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.
When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.
My 66000 started out that way and the compiler showed that this choice sucks.
The good news is that this only concerns the 16-bit short instructions. A compiler can choose to ignore them if it can't handle them.
Currently, the 16-bit instructions provide the following:
All the basic operate instructions for two integer types; they can only operate on the first eight integer registers.
The basic floating operate instructions for one floating-point type; the register specification is the one used with Concertina II's paired 15-bit operate instructions; choose one of four banks of eight registers, and
both operands must be in that bank.
The idea is that it can be used for efficient pipelined code where four sequences of instructions which are independent are interleaved.
Everything else is straightforwards; the 24-bit short instructions and all the 32-bit and longer instructions that operate on registers allow the use of all 32 registers in a bank.--- Synchronet 3.22a-Linux NewsLink 1.2
Of course, though, the other restrictions are still present - seven
choices for an index register, seven choices for a base register (for each of three displacement sizes, 20, 16, and 12 bits).
I think I have indeed achieved the goal which, when I started out, I
thought might prove to be an "impossible dream" - combining what a CISC instruction set offers with what a RISC instruction set offers, and yet doing so without making the instructions longer than they usually are in those instruction types.
Except for register-to-register operate instructions being 24 bits instead of 16 bits, this has been achieved - but for a very limited subset of the possible register-to-register operate instructions, chosen by me as the
ones I think are the most useful and popular - and I realize the choice is subjective and hence potentially controversial - the 16-bit instruction length is retained!
I think it's an ISA that, in this respect, has achieved more than anyone could have expected!
Now, of course, whether or not this is an achievement that anyone cares about, that anyone wants, that anyone is interested in... well, I don't know.
John Savard
On Thu, 21 May 2026 00:06:54 +0000, quadi wrote:
I will have to review this point, however, to be sure.
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for signed numbers even if one is comparing, say, a positive number and a negative number which are both over half of the maximum possible magnitude for their format... it will be necessary to have a special compare instruction for unsigned integers.
Since there is opcode space for that readily available, though, there is
no difficulty in adding that.
John Savard
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result
for signed numbers even if one is comparing, say, a positive number and
a negative number which are both over half of the maximum possible
magnitude for their format... it will be necessary to have a special
compare instruction for unsigned integers.
quadi <[email protected]d> posted:
it will be necessary to have a special
compare instruction for unsigned integers.
Or a wider condition register !
quadi <[email protected]d> posted:
Currently, the 16-bit instructions provide the following:
All the basic operate instructions for two integer types; they can only
operate on the first eight integer registers.
I suspect you (and compiler) will end up not liking the restriction.
The basic floating operate instructions for one floating-point type;
the register specification is the one used with Concertina II's paired
15-bit operate instructions; choose one of four banks of eight
registers, and both operands must be in that bank.
I suspect you (and compiler) will end up not liking the restriction.
Amazingly enough, however, it turned out that in each case there was no difficulty in finding the additional opcode space that was needed.
I even managed to find enough opcode space to increase the size of the displacement field from 8 bits to 9 bits in all the branch instructions,
so that having 24-bit short instructions doesn't shorten their range.
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for >signed numbers even if one is comparing, say, a positive number and a >negative number which are both over half of the maximum possible magnitude >for their format... it will be necessary to have a special compare >instruction for unsigned integers.
The compare instruction in my ISA _does not_ return the same condition
codes as the subtract instruction. So if I compare bytes, the compare >instruction will correctly indicate that -100 is less than 100. The fact >that if you subtracted -100 from 100 as byte values, you wouldn't get 200, >since that doesn't fit into a signed byte, but the negative value -44 is >neither here nor there.
Because of this special handling of the MSB, I do need a different compare >instruction - not just the modified branch instructions for unsigned
values - to yield correct behavior.
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.
On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
While the System/360 had only two condition code bits, I do plan to have >full VZNC bits. However, unlike the System/360,
set of sixteen conditional branch instructions. I just have twelve: eight >instructions for testing between negative, zero, and positive nonzero in...
any combination, and instructions for separately testing for carry and >overflow.
I want a compare instruction which, for integers, isn't fooled by
overflows - and overflows happen at a different point in the two's >complement number circle for signed and unsigned; for unsigned, basically >carry takes the role of overflow. And I don't want to have to do two >instructions for the conditional branch afterwards to handle that.
=). One question in such a design is if there are cases where youwant to have the unsigned and signed conditions for the same operands,
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >to nicely and fully handle both the signed and unsigned cases; I checked >what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for >floating-point numbers.
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the conditional branch instructions, then I also have enough opcode space to fix that instead, so I likely will rework this part of the ISA into something more conventional.
I have made the first set of changes, using five-bit condition code fields to nicely and fully handle both the signed and unsigned cases; I checked what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers, which makes sense, given that they were originally in a coprocessor, but that means an extra set of instructions is needed.)
So, while it used a four-bit condition code field, I needed a five-bit one.
I did notice it didn't just always fail the signed tests if overflow was present; instead, in that case it switched plus and minus. Given that, and treating carry the same way for unsigned tests, you likely are right that--- Synchronet 3.22a-Linux NewsLink 1.2
an unsigned compare is not needed. Oh, wait; my assumed behavior that everything should just fail if there's an overflow... is reasonable for floating-point numbers.
John Savard
quadi <[email protected]d> posted:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >> > fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers,
which makes sense, given that they were originally in a coprocessor, but
that means an extra set of instructions is needed.)
So, while it used a four-bit condition code field, I needed a five-bit one.
x86 uses COZAP but this includes P=parity, which it is unlikely you do.
Thus, 4 bits are sufficient to define 16-states, of which you only need >10-states signless{EQ, NEQ}, signed{>=, >, <, <=}, unsigned{>=, >, <, <=}.
quadi <[email protected]d> writes:
On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
While the System/360 had only two condition code bits, I do plan to have >>full VZNC bits. However, unlike the System/360,
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
quadi <[email protected]d> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
I have made the first set of changes, using five-bit condition code
fields to nicely and fully handle both the signed and unsigned cases; I >>checked what the Motorola 68000 did, and found that it only provided a >>complete set of tests for signed values, but only two tests for unsigned >>ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
quadi <[email protected]d> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.
The usual setup is that FP operations silently overflow to +INF and
underflow to -INF. They do set sticky flags (called "exceptions" in
the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").
- anton
On Sat, 23 May 2026 09:28:45 +0000, Anton Ertl wrote:
I see four tests for unsigned conditions on the 68000
<https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
What I was going by was Table 3-19 on page 3-19 of the M68000 Family >Programmer's Reference Manual on the Internet Archive from Bitsavers; it >gives the available condition code tests on the architecture as:
0000 True
0001 False
0010 High not C and not Z
0011 Low or Same C or Z
0100 Carry Clear
0101 Carry Set
0110 Not Equal not Z
0111 Equal Z
1000 Overflow Clear not V
1001 Overflow Set V
1010 Plus not N
1011 Minus N
1100 Greater or Equal (N and V) or (not N and not V)
1101 Less Than (N and not V) or (not N and V)
1110 Greater Than (N and V and not Z) or (not N and not V and not Z) >1111 Less or Equal Z or (N and not V) or (not N and V)
I took Low or Same as unsigned, and Plus, Minus, Greater or Equal, Less >Than, Greater Than, and Less or Equal as signed.
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
On 2026-05-23 5:28 a.m., Anton Ertl wrote:
quadi <[email protected]d> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked >> what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
CS may also be called LO
CC may also be called HS
For the signed ones there is
GT >
LE <=
GE >=
LT <
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.
The usual setup is that FP operations silently overflow to +INF and underflow to -INF. They do set sticky flags (called "exceptions" in
Methinks overflow could be to +/- INF and underflow to zero or a denormal.
the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").
- anton
If one has CVNZ it is enough for both signed and unsigned integer conditional testing using only four bits.
The CVNZ could be repurposed for float comparisons. V = INF. C=inexact
for instance.
According to Anton Ertl <[email protected]>:
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
I agree that nobody else did that, and in retrospect it was an overoptimization.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
According to MitchAlsup <[email protected]d>:
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
They'd also have been better off making the addresses 32 bits and not
putting junk in the high byte, which caused endless pain later, but
they were really really worried about making low end models with 8K
bytes usable.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions
I think they thought they were saving on complexity and HW logic, but
According to MitchAlsup <[email protected]d>:
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions
four bits of B, 12 bits of D, 16 bit addresses
you're right that RX used another four bits.
I think they thought they were saving on complexity and HW logic, but
We don't have to guess. "Architecture of the IBM System/360" by Amdahl, Blaauw, and Brooks in the IBM Systems Journal in April 1964 described a lot of the reasoning, and they wrote a whole book about it.
They had to make a lot of other design decisions like 6 vs 8 bit
bytes, ones- vs twos-complement, length fields vs word marks for
variable length data, stack vs registers, floating point format (they
blew that one).
They said that the combination of a full length base register and a
short displacement "gives consequent gains in instruction density. The base-register approach was adopted, and then augmented, for some instructions, with a second level of indexing."
In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.
On the other hand, it's not obvious what a better use of the X
field would have been. I suppose they could have made instructions
three operand, e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running out
of PSW space.
On Sat, 23 May 2026 20:03:34 +0000, MitchAlsup wrote:
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
Remember the System/370, and its Extended Control Mode? All they lost
was the ability to switch the computer into an ASCII mode nobody ever
used.
In retrospect, B+X+D was probably a mistake since I believe that double indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Of course, though, people must have been able to get C compilers working
on z/Architecture, despite inefficiencies, or it wouldn't be possible to install Linux on those machines.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
On Sun, 24 May 2026 01:43:29 +0000, John Levine wrote:
In retrospect, B+X+D was probably a mistake since I believe that double
indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Since there were three-address machines back in the days before general >registers, I am surprised to hear that they didn't know how to write >compilers that made use of such a field.
But the "better use of the X field" is obvious - make the displacement
field 16 bits instead of 12 bits. Except, of course, that this would have >killed the SS format of instructions.
But I don't agree that B+X+D is a bad thing. An extra register add is an >extra instruction. And it's not rarely used; it's used every time an array >is accessed, and arrays are often accessed in inner loops!
On Sat, 23 May 2026 20:09:54 +0000, John Levine wrote:
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
12 bits, of course. And they felt that 12 bits were enough because memory >was such an issue back then.
In hindsight, of course having a two-bit condition code was a "mistake".
But C hadn't been invented yet, so nobody knew there would be any real use >for unsigned integers.
And the PSW really was full - when IBM went to System/370, they had to >repurpose a bit in the PSW that was already assigned to an existing
feature, ASCII mode. Since nobody ever used it, however, using it instead >for the System/370's "Extended Control Mode", wherein the PSW *did* get >doubled in length was possible.
Sure they did. S/360 had separate unsigned versions of add and subtract instructions. The results were the same but the condition codes were different and the unsigned versions couldn't overflow.
According to quadi <[email protected]d>:
But the "better use of the X field" is obvious - make the displacement >>field 16 bits instead of 12 bits. Except, of course, that this would
have killed the SS format of instructions.
Or worse had some instructions with 12 bit displacement and some with 16 which would have been a programming nightmare.
In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.
That is the view of MIPS and RISC_V
That is not the view of x86 or ARM or My 66000 or Mc 88K
Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],
Concerning the question about why IBM chose big-endian for the S/360
On Sun, 24 May 2026 09:32:07 +0000, Anton Ertl wrote:
Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],
It makes sense to trap on a floating-point overflow, but trapping on an integer overflow is usually a terrible idea.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior. Otherwise, programs like random number generators wouldn't work.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
quadi <[email protected]d> posted:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
They work just fine using unSigned integers.
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could
still fit in, just out of place.
But that meant that this one operation would be missing from the
minimum- length immediate instructions, and would still be treated as
out of the basic instruction set, getting immediate instructions that
were 16 bits longer, for them.
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
David Brown <[email protected]> writes:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.
Most programming environments I have had contact with don't trap on floating-point overflow.
So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???
The question is if an integer overflow means that something went
wrong.
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
After realizing that I did need a second instruction for unsigned
_division_ I then learned, to my shock, that division was not one, but
two, instructions, at least in my architecture, for integers.
And there didn't seem to be enough opcode space left for Divide Extensibly Unsigned.
I was able to re-adjust the 32-bit operate instructions so that the two places where only 96 opcodes were provided for the basic operate instructions could now provide 128 opcodes.--- Synchronet 3.22a-Linux NewsLink 1.2
The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could still fit in, just out of place.
But that meant that this one operation would be missing from the minimum- length immediate instructions, and would still be treated as out of the basic instruction set, getting immediate instructions that were 16 bits longer, for them.
The Pigeonhole Principle has finally bit me!
John Savard
David Brown <[email protected]> writes:-----------------
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton--- Synchronet 3.22a-Linux NewsLink 1.2
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)
It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.
Compilers have not always been good at taking advantage of all the
features provided by hardware
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.
David Brown <[email protected]> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.
Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:
|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.
As for what gcc-12.2 does for your example on AMD64:
long foo(long a, long b)
{
return a+b-a;
}
is compiled with gcc -O3 -ftrapv to:
0: 48 89 f0 mov %rsi,%rax
3: c3 ret
If "trap on overflow" has precise semantics in the code, then this >disables a range of useful optimisations and re-arrangements. If it is >just "use trapping arithmetic instructions", then it will miss many >possible cases of actual overflow in the code, which we might want to >catch.
Which would you prefer by default?
The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a >combined "(x + y) / 2", but that might introduce overflow.)
x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.
gcc-12.2 compiles
long bar(long x, long y)
{
return x/2+y/2;
}
on AMD64 to:
gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret
so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.
It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.
Compilers have not always been good at taking advantage of all the >features provided by hardware
GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage >of them.
Yes. But I leave that for another day.
- anton--- Synchronet 3.22a-Linux NewsLink 1.2
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
[email protected] (Anton Ertl) posted:
David Brown <[email protected]> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).
Both architectures got this one wrong--IMO--and so does RISC-V.
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)
If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.
It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
[email protected] (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode
bit for non-integer calculation instructions.
[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
...long bar(long x, long y)
{
return x/2+y/2;
}
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??
David Brown <[email protected]> writes:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.
Most programming environments I have had contact with don't trap on floating-point overflow.
So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???
The question is if an integer overflow means that something went
wrong. Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton
On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
Yes. And I am used to FORTRAN, which did not trap on integer overflows.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
On Mon, 25 May 2026 19:20:01 +0000, MitchAlsup wrote:
[email protected] (Anton Ertl) posted:
David Brown <[email protected]> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).
Both architectures got this one wrong--IMO--and so does RISC-V.
You may not have been replying to what Anton Ertl wrote above, since there was a lot in between that I snipped. But it does mention two architectures that took an approach to trapping on integer overflow... that I also tend
to disagree with.
What I'm used to is the System/360. While it made the mistake of having
two condition code bits instead of NZVC, the idea of having "trap on overflow" controlled by a bit in the PSW is... what I assumed to be normal and correct.
I could be wrong, as I haven't examined that approach critically and given full consideration to the alternatives.--- Synchronet 3.22a-Linux NewsLink 1.2
John Savard
David Brown <[email protected]> schrieb:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer >>> is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
In principle, yes.
In practice, people often used whatever "worked" on their systems.
Implementors have a certain right because they control what their
compiler does or does not do.
But users did so, as well, with
Numerical Recipes a(n in)famous example.
And yes, this bites people. You can see this at https://gcc.gnu.org/gcc-13/porting_to.html :
# GCC 13 includes new optimizations which may change behavior
# on integer overflow. Traditional code, like linear congruential
# pseudo-random number generators in old programs and relying on
# a specific, non-standard behavior may now generate unexpected
# results. The option -fsanitize=undefined can be used to detect
# such code at runtime.
# It is recommended to use the intrinsic subroutine RANDOM_NUMBER for
# random number generators or, if the old behavior is desired, to use
# the -fwrapv option. Note that this option can impact performance.
If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)
If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.
It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.
Fortran has no standard way of implementing this unless you
restrict yourself to sizes which do not overflow a signed integer.
Implementing LCGRNGs was one reason why I pushed for unsigned--- Synchronet 3.22a-Linux NewsLink 1.2
arithmetic (modulo 2**n) in Fortran. The attempt failed (not
taken up by WG5 after being endorsed by J3), but I implemented it
for gfortran anyway.
The hardware, of course, cannot always enable trapping on overflow if it is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
Sanitizers are also fairly good now, but of course cost performance.
MitchAlsup <[email protected]d> writes:
[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
...long bar(long x, long y)
{
return x/2+y/2;
}
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.
MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??
-ftrapv specifies trapping on overflow only for additions,
subtractions, and multiplications.
On Mon, 25 May 2026 16:45:07 +0000, MitchAlsup wrote:
My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode bit for non-integer calculation instructions.
That's nice. It's not an option I can consider, as having lots of
orthogonal modifiers on instructions would tend to increase their length.
A major goal of the Concertina II, III, and IV architectures is for instructions not to be longer than similar instructions on the Motorola 68020 or the IBM System/360 if at all possible.
Basically, the selling point is... "Your programs only get 10% bigger, if that, and yet you have 32 registers, so they run faster!".
Or they _would_, if the design didn't have so many extra transistors for supporting both IBM-format and Intel-format Decimal Floating Point, old- style IBM floats, simple floating (You too can work with numbers that go around the world 2 1/2 times!), packed decimal, mixed-radix arithmetic...
But, hey, supporting these things in hardware is faster than doing them in software!
And are people even going to _read_ the part of the manual that
explains... as is noted in the description of the original Concertina architecture...
This chip has 8-way simultaneous multi-threading, but only for programs which do not make use of extensions to the register set.
Only two programs per core may use the extended register banks with 128 elements.--- Synchronet 3.22a-Linux NewsLink 1.2
Only one program per core may use the vector registers for long vector instructions. The 256-bit short vector registers, on the other hand, like the integer and floating-point registers, are available to all
simultaneous threads.
John Savard
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution.
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
[email protected] (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.
An awkward thing about using trap on overflow is determining how
precisely it is defined.
On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
Yes. And I am used to FORTRAN, which did not trap on integer overflows.
David Brown <[email protected]> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler
would check for overflow on "a + b", and report it at runtime.
(Unfortunately, gcc does not do that unless the partial expression is
assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.
Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:
|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.
As for what gcc-12.2 does for your example on AMD64:
long foo(long a, long b)
{
return a+b-a;
}
is compiled with gcc -O3 -ftrapv to:
0: 48 89 f0 mov %rsi,%rax
3: c3 ret
If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.
Which would you prefer by default?
The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the
expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)
x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.
gcc-12.2 compiles
long bar(long x, long y)
{
return x/2+y/2;
}
on AMD64 to:
gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret
so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
It is not easy to see how a tool can avoid false positives and false
negatives and also conveniently optimise and re-arrange code.
It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.
Compilers have not always been good at taking advantage of all the
features provided by hardware
GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.
Yes. But I leave that for another day.
[email protected] (Anton Ertl) posted:I think that when an unexpected error is detected (whether it is with
MitchAlsup <[email protected]d> writes:
[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
On Sun, 24 May 2026 15:24:22 +0000, John Levine wrote:
Sure they did. S/360 had separate unsigned versions of add and subtract
instructions. The results were the same but the condition codes were
different and the unsigned versions couldn't overflow.
Ah, I didn't remember that!
On 5/25/2026 3:34 PM, quadi wrote:
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
[email protected] (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.
Possibly true.
The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.
Though, it is less obvious what a useful behavior is at the language level:
"signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...
Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat; Overflows traps from any code that naively assumes wrap-on-overflow semantics;
...
In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...
One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.
Then again, could maybe classify code, say:5, a language hint about in-range, wrap, trap, signal, throw
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.
"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.
For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.
BGB <[email protected]> posted:
On 5/25/2026 3:34 PM, quadi wrote:The important property is that overflow is detected precisely.
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
[email protected] (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even >>> test for fixed-point overflow, is a much worse idea.
Possibly true.
The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.
Though, it is less obvious what a useful behavior is at the language level: >> "signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...
Whether {trap, signal, throw} is performed is an environmental choice
not an ISA choice.
Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat;
Overflows traps from any code that naively assumes wrap-on-overflow
semantics;
...
In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...
One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.
If(overflow(??)) requires some flag to carry overflow from point of
detection to if(()).
And what happens if there is more than 1 overflow ??
Then again, could maybe classify code, say:5, a language hint about in-range, wrap, trap, signal, throw
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.
"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.
For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.
You would prefer::
AND R7,Rleft,#~(~0<<31)
AND R8,Rright,#~(~0<<31)
ADD Rd,R7,R8
AND Rd,Rd,#~(~0<<31)
That is ADDW range limits operands and performs a shorter ADD.
Matching C's int a,b; semantic. In general the integer instructions
ending with W apply C's int properties to the arithmetic. If compilers
were (WERE) really good at range determination those instructions would
be unnecessary--but they are not.
I (My 66000) had to put in sized integer calculation reasons, and by
doing so, gained 2%-4% in code density and a bit more in latency. -----------------------
BGB <[email protected]> posted:
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution.
Even on 64-bit variables/machines ??
On 26/05/2026 01:00, MitchAlsup wrote:I tend to like "Release with sometimes hard-to-grok debug info",
I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break
[email protected] (Anton Ertl) posted:
MitchAlsup <[email protected]d> writes:
[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.>>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
so that the developer can figure out what went wrong. When not
debugging, there is no sensible default handling that works for jet
engine controllers and video game frame generators.
But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and
"debug" builds. (Of course you might temporarily do builds with
different flags while chasing down a particular bug.)
David Brown wrote:
On 26/05/2026 01:00, MitchAlsup wrote:
I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break so that the developer can figure out what went wrong. When not debugging, there is no sensible default handling that works for jet
[email protected] (Anton Ertl) posted:
MitchAlsup <[email protected]d> writes:
[email protected] (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >>>> trapping on overflow without code substitution or being re-compiled. >>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
engine controllers and video game frame generators.
But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and "debug" builds. (Of course you might temporarily do builds with different flags while chasing down a particular bug.)
I tend to like "Release with sometimes hard-to-grok debug info",
typically resulting in a separate file with a best effort debug map of
the executable.
Then I can at least get some help when running the debugger and trying
to binary search my way into the spot where the bug resides.
Terje
On Mon, 25 May 2026 23:05:06 GMT, MitchAlsup <[email protected]d> wrote:
BGB <[email protected]> posted:
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution. >>Even on 64-bit variables/machines ??
Yes if there are options for 8/16/32 bit ops in 64 bit registers.
Encrypt the debug information (and put it in
a {1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
MitchAlsup [2026-05-26 20:54:30] wrote:
Encrypt the debug information (and put it in a
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
I resent that. All code should be Free Software.
Thomas Koenig <[email protected]> posted:
David Brown <[email protected]> schrieb:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer >>>>> is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
In principle, yes.
Principle is better in theory than in practice.
In practice, people often used whatever "worked" on their systems.
Face it, the poor slug writing the code may not have the faintest
grasp at the system qualities we are discussing, and does not care
to learn as long as he can slug through the writing and his pro-
gram not blow up catastrophically while it is under his purview.
That defines a lot of what is wrong with SW programming today.
Implementors have a certain right because they control what their
compiler does or does not do.
You would be surprised at how little influence implementors have
on compilers and other software.
Another One Bites the Dust.....
quadi <[email protected]d> posted:
A major goal of the Concertina II, III, and IV architectures is for
instructions not to be longer than similar instructions on the Motorola
68020 or the IBM System/360 if at all possible.
Basically, the selling point is... "Your programs only get 10% bigger,
if that, and yet you have 32 registers, so they run faster!".
Mine are getting 30% smaller and needing fewer instructions at the same
time
So even without block structure, I brought back VLIW features!
On Mon, 25 May 2026 23:03:03 +0000, MitchAlsup wrote:
quadi <[email protected]d> posted:
A major goal of the Concertina II, III, and IV architectures is for
instructions not to be longer than similar instructions on the Motorola
68020 or the IBM System/360 if at all possible.
Basically, the selling point is... "Your programs only get 10% bigger,
if that, and yet you have 32 registers, so they run faster!".
Mine are getting 30% smaller and needing fewer instructions at the same time
Well, then you're obviously doing something amazing with MY 68000, and I
don't have the experience to know which modifier bits, if added, would
save instructions often enough to more than pay for the space they take up.
I have to be content with doing the best I can, despite not being capable--- Synchronet 3.22a-Linux NewsLink 1.2
of doing much more than slavishly copying existing commercial
architectures.
John Savard
On Sat, 30 May 2026 04:02:45 +0000, quadi wrote:
So even without block structure, I brought back VLIW features!
I had a little opcode space remaining. So now I have made what is perhaps
my maddest addition to the Concertina IV architecture yet!
In the normal instruction set of the Concertina IV, it was necessary to extend the 32-bit instruction set to intrude, ever so slightly, into the portion of the opcode space where instructions begin with 11.
This was because in the 3/4 of the opcode space initially allocated to 32- bit instructions, there wasn't quite enough room for a Halfword Immediate instruction that was 32 bits long, but allowed all 32 registers to be used as destination registers.
Well, for the primary instruction set, this was no real problem. It may
have made decoding the lengths of instructions less simple and elegant,
but there was still enough space for instructions longer than 32 bits and for the short instructions, both 16-bit and 24-bit - which chopped that remaining space up into pieces anyways.
But in the 48-bit instructions with an instruction that can be predicated, and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.
So in there, the opcode space of 32-bit instructions starting with 11 is almost completely unused... but I can't use it for paired 15-bit short instructions because of that Halfword Immediate instruction.
Well, now the Halfword Immediate instruction for that case has been modified, so that paired short instructions including short instructions other than register-to-register operate instructions can be used.--- Synchronet 3.22a-Linux NewsLink 1.2
John Savard
On Sat, 30 May 2026 04:02:45 +0000, quadi wrote:
So even without block structure, I brought back VLIW features!
I had a little opcode space remaining. So now I have made what is
perhaps my maddest addition to the Concertina IV architecture yet!
So in there, the opcode space of 32-bit instructions starting with 11 is almost completely unused... but I can't use it for paired 15-bit short instructions because of that Halfword Immediate instruction.
Well, now the Halfword Immediate instruction for that case has been
modified, so that paired short instructions including short instructions other than register-to-register operate instructions can be used.
At least this reminded me that embedding instructions inside long instructions is, in one very important respect, very different from
having a block structure for program code. So I have now added a warning about how branching to an embedded instruction will not work unless a
number of strict conditions are met.
And now I've added the Branch to Embedded instruction, which points to
the larger instruction, and then indicates which embedded instruction
within it to which control is to be transferred as a method of avoiding
these restrictions, should anyone ever need such an instruction.
quadi <[email protected]d> posted:
But in the 48-bit instructions with an instruction that can be predicated, >> and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.
An architecture is just as much about what you leave out as what you
put in.
On 5/30/2026 12:15 PM, MitchAlsup wrote:
quadi <[email protected]d> posted:
snip
But in the 48-bit instructions with an instruction that can be predicated, >> and the 80-bit and 112-bit instructions with two or three instructions
which can be indicated explicitly as parallelizable... there's a field
that can _only_ be used for a 32-bit instruction.
An architecture is just as much about what you leave out as what you
put in.
John's answer - leave out as little as possible, preferably nothing! :-)
Having looked into this in some detail, both when IBM used bigendian
order on S/360 and DEC used little-endian on the PDP-11, neither
documented the reasons for the byte order choice at all. Not even a
litle bit.
On 5/30/2026 12:15 PM, MitchAlsup wrote:
An architecture is just as much about what you leave out as what you
put in.
John's answer - leave out as little as possible, preferably nothing!
On Wed, 20 May 2026 17:47:59 +0000, John Levine wrote:
Having looked into this in some detail, both when IBM used bigendian
order on S/360 and DEC used little-endian on the PDP-11, neither
documented the reasons for the byte order choice at all. Not even a
litle bit.
I suppose that, at the time, it was something that nobody felt was
important enough to document.
But to people who were around back then, the reasons would have been >obvious.
IBM mainframes were designed to ooze quality! So here and there, an extra >transistor or two was added if something seemed better. That's why the IBM >7090 used sign-magnitude arithmetic for integers.
And that's why the IBM 360 jumped ahead to the end of an integer and
worked backwards to add, because putting things in reverse order would
have shouted cheap.
The original PDP-11 only came with a 16-bit bus. But its designers aspired >to the level of consistency that the 360 had, but they wanted to do it on
a rock-bottom minicomputer budget. DEC minis, in fact, were cheaper than >most other brands of minicomputer at the time.
The PDP-11 made little-endian a thing. It was so new that the people >designing the floating-point unit didn't get the memo.
So what I've learned is that the world of computer architectures seems to
be like _Highlander_... "There can be only one".
I presume you are aware that the 704 and successors did indexing by two's >complement subtraction, which is not sign-magnitude.
The PDP-11 made little-endian a thing. It was so new that the people >>designing the floating-point unit didn't get the memo.
Nor did the people designing the extended multiplier, but they got it
mostly conssitent in the Vax.
But they got the brilliant idea - that more pedestrian
designers would never even considered for a second, or even thought of as >possible - of numbering the bytes in a word backwards too, so as to attain >consistency.
So why did the S/360 architects go for 2s-complement?
It costs me only 6 gates (2 gates of delay) to decode the length of an instruction--whereas it takes 4 gates to decode S/360 2-bit code for instruction length.
As I may have said once or twice before, we have plenty of guesses, but
since there is no documentation, the guesses are a waste of time.
I presume you are aware that the 704 and successors did indexing by
two's complement subtraction, which is not sign-magnitude. There is no documentation for that either, and I have looked quite hard. Pretty
please, do not guess unless you can cite sources.
This all indicates that byte-ordering decisions worked like in our
student group. The "right" choice seemed so obvious to everyone that we
did not communicate about it nor document it nor document the reasons
for it, and different contributors took different "right"
choices.
quadi <[email protected]d> schrieb:
So what I've learned is that the world of computer architectures seems
to be like _Highlander_... "There can be only one".
That is what people thought about the /360 until the Minis came along,
where companies were content with lower margins to serve new markets and customers at lower margins, but higher volume.
And then RISC, and PCs... and the low end that PCs are being attacked
from right now is mobile devices, and ARM.
For this kind of cycle, I highly recommend reading https://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma (the book not
the Wikipedia article itself) It talks a lot about hard drives, but
parallels to computers are obvious.
Anton Ertl <[email protected]> schrieb:
So why did the S/360 architects go for 2s-complement?
Brooks (who was program manager for /360) writes about this in
"The Design of Design". Unique zero and unified hardware were
his main points, IIRC.
John Levine <[email protected]> writes:
I presume you are aware that the 704 and successors did indexing by two's >>complement subtraction, which is not sign-magnitude.
Looking at the 704 manual ><https://ia802904.us.archive.org/12/items/bitsavers_ibm7042466_32932660/24-6661-2_704_Manual_1955_text.pdf>,
In any case, I don't think that the IBM 704 manual documents
2s-complement representation of negative numbers for any purpose.
So why did the S/360 architects go for 2s-complement?
One speculation ...
Nor did the people designing the extended multiplier, but they got it >>mostly conssitent in the Vax.
This all indicates that byte-ordering decisions worked like in our
student group. The "right" choice seemed so obvious to everyone that
we did not communicate about it nor document it nor document the
reasons for it, and different contributors took different "right"
choices.
|Sign representations. For the fixed-point arithmetic system, which is >|binary,the two's complement representation for negative numbers was >|selected.The well-known virtues of this system are the unique >|representation of zero and the absence of recomplementation.
What is "recomplementation"?
Which is why his architecture is converging so rapidly.
NOT.
I admit that the fact that one subtracts the index on an IBM 704 seems
very weird to me. Since the IBM 704 was made out of vacuum tubes, saving >them, instead of mere discrete transistors, let alone transistors on a >microchip with a billion of them, was probably more important.
My guess that sign-magnitude arithmetic was regarded as more prestigious, >until IBM outgrew that notion with the 360, does have a source, although
not an IBM source.
A 24-bit computer was advertised as having sign-magnitude integer >arithmetic, unlike cheaper machines which either used one's complement >integer arithmetic, or, even worse, two's complement integer arithmetic.--
I think it was the DDP-24, but offhand I'm not completely sure.
To guess - or to attempt to derive intelligence from the available >information - one might think that IBM considered indexing to be less >important or less visible than ordinary integer arithmetic per se.
John Savard
My equally uninformed guess is that their tab machines and their
commerical computers were decimal sign magnitude, so binary sign
magnitude was a short step away. It evidently took a while to realize
that while the two's complement negative represntation seemed less
intuitive, the logic was a lot simpler.
Well, after making the changes, I still had room - 1/4 as much as I had before - for 24-bit short instructions.
I wasn't happy. So I noticed that I actually had some unused space that
I could squeeze out. So now the 24-bit short instructions have 1/2 as
much space as they used to, which meant the only thing I had to give up
was the ability to change the condition codes.
According to Anton Ertl <[email protected]>:
|Sign representations. For the fixed-point arithmetic system, which is >|binary,the two's complement representation for negative numbers was >|selected.The well-known virtues of this system are the unique >|representation of zero and the absence of recomplementation.
What is "recomplementation"?
To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.
Straight one's complement doesn't have the recomplementation but does
have end around carry if there's a carry out of the high bit, and
shares with sign-magnitude the question of how you handle +0 and -0
which are different bit patterns but mathemetically equal.
On 5/30/26 3:15 PM, MitchAlsup wrote:
[snip]
It costs me only 6 gates (2 gates of delay) to decode the length of an instruction--whereas it takes 4 gates to decode S/360 2-bit code for instruction length.
Does the current version of My 66000 have three instruction
lengths or four? You mentioned before dropping "large" constants
as store operands, but I am not certain what that means.
Earlier, if I understood correctly, the longest instruction was
a store of a 64-bit constant with a 64-bit displacement,
requiring five 32-bit words.
If My 66000 has the same variability in instruction length as
S/360 (three sizes), then presumably the extra length decode
effort provides some other advantage, perhaps more flexibility
in length allocation (with a 2-bit size indicator, major opcodes
can only be allocated at 25% granularity)?
There may be an advantage in having different lengths have
different detection speed.
Since My 66000 only uses the extra words for immediates, there
*may* even be an advantage to detecting some illegal opcodes and
speculating that such are from constant words.
(An illegal
opcode field can indicate an immediate, a faulting instruction,
or a skipped instruction.) Such could introduce variable timing
for parsing a given fetch chunk, but that might be handled by
reducing the number of parsed instructions emitted and inserting
the slowly parsed instructions into the start of the next group
of parsed instructions.
My guess is that such would just be silly complexity even at 16---- Synchronet 3.22a-Linux NewsLink 1.2
wide parsing, especially given the likely minuscule (typical)
timing benefit (if any!). Process variation probably would have
vastly more impact on frequency than trying to exploit a
statistical bias in encoding. (The concept just seemed
interesting.)
Given that register dependencies also "carry", there may be some
opportunity for "width pipelining" (like the staggered ALUs of
the Pentium 4) in parsing, extracting register names, renaming
(at least with RAT-based renaming), and even insertion into a
scheduler. If a dependency means it would not be useful to
insert the operation into a scheduler, this additional delay
might be exploited.
What is "recomplementation"?
To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.
In microarchitecture, you can make the registers 2^(3+n)+1 bits long.
Then simply record that the mantissa is complemented (or not) when
used as an operand. We do this all the time in microarchitecture to
save gates/time/... depending on the implementation technology
constraints.
On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:[...]
MitchAlsup [2026-05-26 20:54:30] wrote:
Encrypt the debug information (and put it in aI resent that. All code should be Free Software.
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
However, people have the right to the fruit of their labors. To give them away for free is generous, but it should remain a personal choice.
quadi [2026-05-27 18:19:49] wrote:
However, people have the right to the fruit of their labors. To give
them away for free is generous, but it should remain a personal choice.
You don't need to encrypt the debug information of your programs in
order to earn a decent living.
I decided that one of my 32-bit instructions really needed to be
allocated twice as much opcode space as I had originally given it.
On Mon, 01 Jun 2026 14:51:14 -0400, Stefan Monnier wrote:
quadi [2026-05-27 18:19:49] wrote:
However, people have the right to the fruit of their labors. To give
them away for free is generous, but it should remain a personal
choice.
You don't need to encrypt the debug information of your programs in
order to earn a decent living.
Perhaps. But if someone can write a program that is so useful that it
could make him wealthy beyond the dreams of avarice, who am I to judge
him for seeking to maximize its revenue potential?
Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.
Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered
from left to right, following the convention of writing in Western
culture."
quadi [2026-05-27 18:19:49] wrote:
On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:[...]
MitchAlsup [2026-05-26 20:54:30] wrote:
Encrypt the debug information (and put it in aI resent that. All code should be Free Software.
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
However, people have the right to the fruit of their labors. To give them >> away for free is generous, but it should remain a personal choice.
You don't need to encrypt the debug information of your programs in
order to earn a decent living.
The case for big-endian is...
It makes core dumps easier to read.
According to MitchAlsup <[email protected]d>:
What is "recomplementation"?
To do sign magnitude arithmetic, you basically do it in one's
complement: bit flip negative operands to make them one's complement,
do the arithmetic, then bit flip the result if it's negative. That
last bit flip is recomplementation.
In microarchitecture, you can make the registers 2^(3+n)+1 bits long.
Then simply record that the mantissa is complemented (or not) when
used as an operand. We do this all the time in microarchitecture to
save gates/time/... depending on the implementation technology >constraints.
You can do that now, not so much when building computers out of vacuum
tubes in the 1950s.
Also, that works OK for registers, but at some point you need to
store values in memory at which point I'd think you'd need to do
the recomplementing.
In article <10uks4f$1dqo$[email protected]>, [email protected] (John Levine) wrote:
Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.
Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."
That explains why IBM mainframes number the most significant bit as zero,
the opposite way around to all the platforms I've worked on, which number
the least significant bit as zero.
I've always find the latter convention helpful for doing hex arithmetic
in my head or on paper. I _think_ big-endian SPARC, MIPS and POWER all
regard the least significant bit as bit zero, but I can no longer easily check that,
John--- Synchronet 3.22a-Linux NewsLink 1.2
On Tue, 02 Jun 2026 14:00:00 +0100, John Dallman wrote:
Brooks and Blaauw, two of the S/360 architects, consider the subject in their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered
from left to right, following the convention of writing in Western
culture."
On the other hand, Arabic is written from right to left, and yet the Arabs also write numbers with the most significant digit on the left. Hence, little-endian would seem more logical to them for the same reason.
Since this is, therefore, a cultural matter, and not something universal, like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.
So, while they can call it "the more logical convention", this isn't something everyone would agree with. The famous article on the subject,
"On Holy Wars and a Plea for Peace", by Danny Cohen from 1981 thus termed
it as being much less important which standard was chosen than for
everyone to choose the same one for compatibility, but he wasn't shy about expressing his personal preference for little-endian, referring to those
who practiced big-endian as "outlaws".
The case for big-endian is...
It makes computers easier to understand for most people in Western societies.
It makes core dumps easier to read.
Multi-precision compare is faster.
The case for little-endian is...
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."
That explains why IBM mainframes number the most significant bit as zero,
the opposite way around to all the platforms I've worked on, which number
the least significant bit as zero.
I've always find the latter convention helpful for doing hex arithmetic
in my head or on paper. I _think_ big-endian SPARC, MIPS and POWER all
regard the least significant bit as bit zero, but I can no longer easily >check that,
quadi <[email protected]d> posted:
On Tue, 02 Jun 2026 14:00:00 +0100, John Dallman wrote:
Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are numbered >> > from left to right, following the convention of writing in Western
culture."
On the other hand, Arabic is written from right to left, and yet the Arabs >> also write numbers with the most significant digit on the left. Hence,
little-endian would seem more logical to them for the same reason.
Chinese and Japanese is written top to bottom ...
Since this is, therefore, a cultural matter, and not something universal, >> like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.
Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.
Since this is, therefore, a cultural matter, and not something universal,
like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.
Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.
MitchAlsup <[email protected]d> schrieb:
Chinese and Japanese is written top to bottom ...
In classical times, yes, but modern texts are written left to right.
In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source interface specifications, having a proprietary debug format seems like a
good way to strongly limit your potential customer base.
On 6/2/2026 10:13 AM, MitchAlsup wrote:
Since this is, therefore, a cultural matter, and not something universal, >> like the laws of physics or mathematics, we can't tell what a man from
Mars would prefer.
Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.
But then you have the "discussion" with those who want to start with a
step to the right, followed by one to the left! :-). And that doesn't
even address (pun intended), the issue of when you have an even number
of bits/bytes/words, do you start with the one to the right of the
"middle" or the left. :-)
In article <10uks4f$1dqo$[email protected]>, [email protected] (John Levine) >wrote:
Having looked into this in some detail, both when IBM used
bigendian order on S/360 and DEC used little-endian on the
PDP-11, neither documented the reasons for the byte order
choice at all. Not even a litle bit.
Brooks and Blaauw, two of the S/360 architects, consider the subject in
their much later book _Computer Architecture_, on p. 99:
"The more logical convention, the Big Endian, considers the whole
storage space as one steam of bits. Bits, bytes and words are
numbered from left to right, following the convention of writing
in Western culture."
"We predict that Little Endian addressing will die out, just as
decimal addressing did."
Uh huh.
A few years later IBM added LOAD REVERSED and STORE REVERSED to z/Architecture and retroactively to S/390 mode on Z machines.
My take is, that in a world with different access widths (e.g.,
accessing a register for a 32-bit value or a 64-bit value),
bit-big-endian is a bad idea.
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:
In the current environment where every language is expected to be compatible with a generic IDE like Visual Studio Code, via open source interface specifications, having a proprietary debug format seems like a good way to strongly limit your potential customer base.
You appear to have understood his post in a different way than I did.
I wasn't thinking of the kind of debug information provided by a compiler.
I was thinking of leaving debug information in when one was distributing software to customers.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
On Tue, 02 Jun 2026 22:00:06 +0000, John Levine wrote:
"We predict that Little Endian addressing will die out, just as
decimal addressing did."
Uh huh.
A few years later IBM added LOAD REVERSED and STORE REVERSED to z/Architecture and retroactively to S/390 mode on Z machines.
I certainly would not hazard such a bold prediction.
The prediction, though, is not hard to understand. If big-endian is more straightforward and easier to understand, but just costs an extra
transistor here and there, then in the age of billion-transistor chips,
why wouldn't it die out?
However, just because something is going to die out _eventually_ doesn't mean it will do so any time soon. Interoperating and communicating with
that little-endian monster IBM created in 1981 is going to be important
for generating revenue for decades to come.
So the existence of load reversed and store reversed instructions doesn't prove they were wrong... even though I still would not dare to say they
are definitely right. I just think it's not unreasonable to think as they did, provided you account for a sufficiently long timeframe.
Of course, given a sufficiently long timeframe, we might all be speaking Arabic, in which case little-endian would be the logical choice. Although that would require fossil fuels being important for longer than the--- Synchronet 3.22a-Linux NewsLink 1.2
climate could sustain it...
John Savard
Linux has gone all in on LE.
Middle endian!! Start in the middle and then one step left followed by
one step write--more or less like PDP-11 FP.
On Tue, 02 Jun 2026 22:00:06 +0000, John Levine wrote:
"We predict that Little Endian addressing will die out, just as
decimal addressing did."
Uh huh.
A few years later IBM added LOAD REVERSED and STORE REVERSED to
z/Architecture and retroactively to S/390 mode on Z machines.
I certainly would not hazard such a bold prediction.
The prediction, though, is not hard to understand. If big-endian is more straightforward and easier to understand, but just costs an extra
transistor here and there, then in the age of billion-transistor chips,
why wouldn't it die out?
It causes problems with badly-written software.
On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:
It causes problems with badly-written software.
I don't see that as a fault of big-endian.
But there wouldn't have _been_ little-endian architectures to out-compete >big-endian if it hadn't been for the PDP-11. That was where the idea of >little-endian got started.
On Tue, 02 Jun 2026 15:59:33 +0000, Anton Ertl wrote:
My take is, that in a world with different access widths (e.g.,
accessing a register for a 32-bit value or a 64-bit value),
bit-big-endian is a bad idea.
There is an argument for that.
But if a computer does have bit-field instructions, I tend to consider it >insane for it to number bits in the opposite direction of its endianness.
In the more common case, where the machine is big-endian, and it is the
bit numbering that's little-endian, specifying a nine-bit field starting
in bit 6 of byte 4999 would give you bits 6 through 0 of byte 4999,
followed by bits 7 and 6 of byte 5000.
Yes, you the vendor do not want random customer debugging the code,
Yes, you the vendor do not want random customer debugging the code,
I also want a pony, but that doesn't make it right.
The customer will usually not want to debug your code, but sometimes
they will have to (e.g. because you the vendor don't exist any more or
don't find that product of commercial value any more, ...).
The customer deserves to be able to debug the code it's paid for.
On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:
It causes problems with badly-written software.
I don't see that as a fault of big-endian.
One has to exert oneself to write a program equivalent to
INTEGER*2 IP
EQUIVALENCE (I, IP)
I = 42
WRITE(6,11) IP
STOP
11 FORMAT(' ', 'VALUE IS: ', I3)
END
and so the fact that it will print
VALUE IS: 0
is not a bug, it's exactly what one should expect.
Yes, you the vendor do not want random customer debugging the code,
I also want a pony, but that doesn't make it right.
The customer will usually not want to debug your code, but sometimes
they will have to (e.g. because you the vendor don't exist any more or
don't find that product of commercial value any more, ...).
The customer deserves to be able to debug the code it's paid for.
=== Stefan--- Synchronet 3.22a-Linux NewsLink 1.2
The 68020 is bit-little-endian and byte-big-endian, and it has
bitfield instructions, and from what I have read, this has led to
problems (e.g., consider what to do if you have an array of 17-bit
fields: how do you access the nth element of the array?
I wasn't happy. So I noticed that I actually had some unused space that
I could squeeze out. So now the 24-bit short instructions have 1/2 as
much space as they used to, which meant the only thing I had to give up
was the ability to change the condition codes.
I found that I had some unused space within the 80-bit instructions, and
that was enough to let me restore the 24-bit short instructions to their former glory.
On Tue, 02 Jun 2026 15:59:33 GMT, [email protected][...]
(Anton Ertl) wrote:
The 68020 is bit-little-endian and byte-big-endian, and it has
bitfield instructions, and from what I have read, this has led to
problems (e.g., consider what to do if you have an array of 17-bit
fields: how do you access the nth element of the array?
If you mean a /packed/ array in which the 17-bit fields are stored bit >contiguously ... well that could get interesting.
According to quadi <[email protected]d>:
On Wed, 03 Jun 2026 13:54:01 +0000, Thomas Koenig wrote:
It causes problems with badly-written software.
I don't see that as a fault of big-endian.
Agreed. There were plenty of bugs porting BSD software from
the little-endian Vax to big-endian 68000 series. Buggy software
is buggy software.
Being able to debug code without the source code doesn't seem
a particulary common use case,
Scott Lurndal [2026-06-03 18:36:51] wrote:
Being able to debug code without the source code doesn't seem
a particulary common use case,
Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂
quadi <[email protected]d> posted:
On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:
In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source
interface specifications, having a proprietary debug format seems like a >> > good way to strongly limit your potential customer base.
You appear to have understood his post in a different way than I did.
I wasn't thinking of the kind of debug information provided by a compiler. >>
I was thinking of leaving debug information in when one was distributing
software to customers.
Yes, you the vendor do not want random customer debugging the code,
however, you want the ability to debug the code that was distributed
on whatever medium on customer's system(s)--
AND you want to debug one copy of the running code while others are using >other processes running the code under normal use.
MitchAlsup <[email protected]d> writes:
[email protected] (Anton Ertl) posted:
...long bar(long x, long y)
{
return x/2+y/2;
}
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
Architectures without overflow traps are notorious for excess instruction >>count when overflow detection is desired or mandated.
MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.
Anton Ertl <[email protected]> wrote:
MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.
My guess is that GCC developers care more about -trapv than about
MIPS.
AFAICS several architectures officialy supported by GCC
struggle to work at all. I suspect that maintainers of MIPS
backend are happy that -trapv works and do not have resources
to make it efficient.
That is true, but the issue at hand is how to achieve that. Leaving
debug information /in/ the executable, I think, is a bad idea.
Another useful method is to write out debug information as the program >executes and arrange that it either is suppressed or (alternatively)
goes to /dev/null unless some undocumented flag is given.
On Wed, 03 Jun 2026 00:55:35 GMT, MitchAlsup ><[email protected]d> wrote:
quadi <[email protected]d> posted:
On Tue, 02 Jun 2026 17:50:38 +0200, Terje Mathisen wrote:
In the current environment where every language is expected to be
compatible with a generic IDE like Visual Studio Code, via open source >>> > interface specifications, having a proprietary debug format seems like a >>> > good way to strongly limit your potential customer base.
You appear to have understood his post in a different way than I did.
I wasn't thinking of the kind of debug information provided by a compiler. >>>
I was thinking of leaving debug information in when one was distributing >>> software to customers.
Yes, you the vendor do not want random customer debugging the code, >>however, you want the ability to debug the code that was distributed
on whatever medium on customer's system(s)--
AND you want to debug one copy of the running code while others are using >>other processes running the code under normal use.
That is true, but the issue at hand is how to achieve that. Leaving
debug information /in/ the executable, I think, is a bad idea.
However, many (most?) toolchains provide a way to separate debug
symbols from the executable - either by generating a separate symbol
database in the 1st place, or by allowing debug data to be stripped
from the executables. If you have to debug at the client site, you
simply take the symbol database with you.
Another useful method is to write out debug information as the program >executes and arrange that it either is suppressed or (alternatively)
goes to /dev/null unless some undocumented flag is given.
Scott Lurndal [2026-06-03 18:36:51] wrote:
Being able to debug code without the source code doesn't seem
a particulary common use case,
Indeed, the source code should also be available, of course.
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of >software that is freely available for which the source is not. Many of >these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:
Note that free does not equal open source. There is a fair amount
of software that is freely available for which the source is not.
Many of these are reduced functionality versions of paid for
software, e.g. Adobe PDF reader, but there are others.
Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.
John Savard
On Mon, 8 Jun 2026 01:19:17 -0000 (UTC)
quadi <[email protected]d> wrote:
On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:
Note that free does not equal open source. There is a fair amount
of software that is freely available for which the source is not.
Many of these are reduced functionality versions of paid for
software, e.g. Adobe PDF reader, but there are others.
Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.
John Savard
I strongly disagree with statement that true open source software is equivalent of GPL.
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary software),
not free software.
In the appendix of "1984" George Orwell wrote:
|To give a single example, the word free still existed in Newspeak, but |could only be used in such statements as "The dog is free from lice"
|or "This field is free from weeds." It could not be used in its old
|sense of "politically free" or "intellectually free," since political
|and intellectual freedom no longer existed even as concepts, and were |therefore of necessity nameless.
Some of us obviously already write and think in Newspeak.
On 6/7/2026 11:05 PM, Anton Ertl wrote:
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary software),
not free software.
I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software". I only meant that anyone
could use it without paying anything to anyone. In the sense that John >talked about, it is free beer.
On 6/7/2026 11:05 PM, Anton Ertl wrote:
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of
these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary software),
not free software.
I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software".
I only meant that anyone
could use it without paying anything to anyone.
In the appendix of "1984" George Orwell wrote:
|To give a single example, the word free still existed in Newspeak, but
|could only be used in such statements as "The dog is free from lice"
|or "This field is free from weeds." It could not be used in its old
|sense of "politically free" or "intellectually free," since political
|and intellectual freedom no longer existed even as concepts, and were
|therefore of necessity nameless.
Some of us obviously already write and think in Newspeak.
I hardly think that using the word free to mean "you don't have to pay
for it" is Newspeak.
On Mon, 8 Jun 2026 01:19:17 -0000 (UTC)
quadi <[email protected]d> wrote:
On Sun, 07 Jun 2026 15:05:24 -0700, Stephen Fuld wrote:
Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many
of these are reduced functionality versions of paid for software,
e.g. Adobe PDF reader, but there are others.
Commonly, when this distinction is discussed in the open-source
community, the phrases "free as in beer" and "free as in freedom" are
used to distinguish between freeware that remains proprietary versus
true open- source software under the GPL.
I strongly disagree with statement that true open source software is equivalent of GPL.
Stephen Fuld <[email protected]d> writes:
On 6/7/2026 11:05 PM, Anton Ertl wrote:
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair amount of
software that is freely available for which the source is not. Many of >>>> these are reduced functionality versions of paid for software, e.g.
Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary software),
not free software.
I don't want to get into a semantic argument here. I don't know what
you mean by the term "chained software".
Non-free software. I put the more commonly used term in parentheses.
I only meant that anyone
could use it without paying anything to anyone.
That's not what "free software" means. The four essential freedoms of software are
<https://www.gnu.org/philosophy/free-sw.en.html#fs-definition>:
|* The freedom to run the program as you wish, for any purpose (freedom 0).
|
|* The freedom to study how the program works, and change it so it does
| your computing as you wish (freedom 1). Access to the source code is
| a precondition for this.
|
|* The freedom to redistribute copies so you can help others (freedom 2).
|
|* The freedom to distribute copies of your modified versions to others
| (freedom 3). By doing this you can give the whole community a chance
| to benefit from your changes. Access to the source code is a
| precondition for this.
|
|A program is free software if it gives users adequately all of these |freedoms. Otherwise, it is nonfree.
Indeed, the source code should also be available, of course.Note that free does not equal open source. There is a fair amount of software that is freely available for which the source is not. Many of
I started this thread by mentioning Free Software. 🙂
these are reduced functionality versions of paid for software, e.g. Adobe
PDF reader, but there are others.
Stephen Fuld <[email protected]d> writes:I tend to think that it was the other way around.
On 6/7/2026 11:05 PM, Anton Ertl wrote:
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair
amount of software that is freely available for which the source
is not. Many of these are reduced functionality versions of paid
for software, e.g. Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary
software), not free software.
I don't want to get into a semantic argument here. I don't know
what you mean by the term "chained software".
Non-free software. I put the more commonly used term in parentheses.
I only meant that anyone
could use it without paying anything to anyone.
That's not what "free software" means. The four essential freedoms of software are
<https://www.gnu.org/philosophy/free-sw.en.html#fs-definition>:
|* The freedom to run the program as you wish, for any purpose
(freedom 0). |
|* The freedom to study how the program works, and change it so it
does | your computing as you wish (freedom 1). Access to the source
code is | a precondition for this.
|
|* The freedom to redistribute copies so you can help others (freedom
2). |
|* The freedom to distribute copies of your modified versions to
others | (freedom 3). By doing this you can give the whole community
a chance | to benefit from your changes. Access to the source code
is a | precondition for this.
|
|A program is free software if it gives users adequately all of these |freedoms. Otherwise, it is nonfree.
In the appendix of "1984" George Orwell wrote:
|To give a single example, the word free still existed in
Newspeak, but |could only be used in such statements as "The dog
is free from lice" |or "This field is free from weeds." It could
not be used in its old |sense of "politically free" or
"intellectually free," since political |and intellectual freedom
no longer existed even as concepts, and were |therefore of
necessity nameless.
Some of us obviously already write and think in Newspeak.
I hardly think that using the word free to mean "you don't have to
pay for it" is Newspeak.
Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
- anton
Stephen Fuld <[email protected]d> writes:
On 6/7/2026 11:05 PM, Anton Ertl wrote:
Stephen Fuld <[email protected]d> writes:
On 6/4/2026 8:46 PM, Stefan Monnier wrote:
I started this thread by mentioning Free Software. 🙂
Note that free does not equal open source. There is a fair
amount of software that is freely available for which the source
is not. Many of these are reduced functionality versions of paid
for software, e.g. Adobe PDF reader, but there are others.
The Adobe PDF reader is chained software (aka proprietary
software), not free software.
I don't want to get into a semantic argument here. I don't know
what you mean by the term "chained software". I only meant that
anyone could use it without paying anything to anyone. In the sense
that John talked about, it is free beer.
Acroread sends basic telemetry to Adobe every time you use it,
so in a sense, it's not exactly free.
xpdf on the other hand....
Stephen Fuld <[email protected]d> writes:
I hardly think that using the word free to mean "you don't have to payOrwell did not think about that meaning when he gave an example of
for it" is Newspeak.
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
Michael S wrote:
The obviously "most free" sw must be public domain, right?
Anton Ertl [2026-06-08 16:18:37] wrote:
Stephen Fuld <[email protected]d> writes:
I hardly think that using the word free to mean "you don't have to payOrwell did not think about that meaning when he gave an example of
for it" is Newspeak.
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
Well, Stephen is hardly using a recent meaning of the word "free".
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.
But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.
On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:
Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton
I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"
Michael S <[email protected]> writes:
On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:
Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton
I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"
The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1] Then some people
removed some or all freedoms from some software, typically with the
goal of making money from the software. Removing the freedoms and yet
not asking for money is a later development; this has often been
called shareware or freeware, but Stephen Fuld is the first one I have
seen who has called it "free software", and actually misunderstood a reference to "Free Software" (capitalized).
[1] As an example, <https://en.wikipedia.org/wiki/SHARE_(computing)>
states:
|Originally, IBM distributed what software it provided in source |form[2][3][4] and systems programmers commonly made small local
|additions or modifications and exchanged them with other users.
All four freedoms were exercised here, more than two decades before
the Free Software Foundation.
Anton Ertl wrote:
Michael S <[email protected]> writes:
On Mon, 08 Jun 2026 16:18:37 GMT
[email protected] (Anton Ertl) wrote:
Orwell did not think about that meaning when he gave an example of
Newspeak use of "free", so if the meaning "gratis" for "free" existed
when he wrote the book in 1949, it was not widely-enough used to make
it into the book. In any case, the meaning "free from lice" existed
when Orwell wrote the book and still exists in Newspeak. Newspeak
does not introduce new meanings, but the elimines the "freedom"
meaning. And in your case, Newspeak obviously has been successful
(not the Ingsoc variant ("free from lice"), but the surveillance
capitalism variant ("you don't pay [money] for it")).
=20
- anton
I tend to think that it was the other way around.
RMS invented a new meaning of the term "free software"
The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1] Then some people
removed some or all freedoms from some software, typically with the
goal of making money from the software. Removing the freedoms and yet
not asking for money is a later development; this has often been
called shareware or freeware, but Stephen Fuld is the first one I have
seen who has called it "free software", and actually misunderstood a
reference to "Free Software" (capitalized).
[1] As an example, <https://en.wikipedia.org/wiki/SHARE_(computing)>
states:
|Originally, IBM distributed what software it provided in source
|form[2][3][4] and systems programmers commonly made small local
|additions or modifications and exchanged them with other users.
All four freedoms were exercised here, more than two decades before
the Free Software Foundation.
Yes, with one importnt restriction:
The software was free, but you could not use it except on IBM hardware, which was quite expensive.
When clones started to appear (Amdahl?) I believe the free sw
disappeared, now it was explicitly licensed to only run on "real" IBM hardware?
The story that I read was that all software originally was free in the
FSF sense (i.e., provided the four freedoms).[1]
But not when you consider the point of view of the end-users who may
receive code compiled/derived from that public domain source code with
no way to recover that public domain source code, or to change or fix
it. It may even be illegal to try to recover it (since the DMCA
disallows several forms of reverse engineering).
From that end-user point of view, the GPL arguably ensures "more
freedom" than public domain.
Terje Mathisen [2026-06-08 11:45:51] wrote:
Michael S wrote:
The obviously "most free" sw must be public domain, right?
As with most things related to freedom ... it depends.
Public domain offers "more freedom" when you consider the point of view
of the developers, who can use that software any way they want with no >restrictions at all.
But not when you consider the point of view of the end-users who may
receive code compiled/derived from that public domain source code with
no way to recover that public domain source code, or to change or fix
it. It may even be illegal to try to recover it (since the DMCA
disallows several forms of reverse engineering).
From that end-user point of view, the GPL arguably ensures "more
freedom" than public domain.
=== Stefan
David Brown <[email protected]> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related >>compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >>would check for overflow on "a + b", and report it at runtime. >>(Unfortunately, gcc does not do that unless the partial expression is >>assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.
Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:
|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.
As for what gcc-12.2 does for your example on AMD64:
long foo(long a, long b)
{
return a+b-a;
}
is compiled with gcc -O3 -ftrapv to:
0: 48 89 f0 mov %rsi,%rax
3: c3 ret
Waldek Hebisch <[email protected]> schrieb:
Anton Ertl <[email protected]> wrote:
MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.
My guess is that GCC developers care more about -trapv than about
MIPS.
It is a common misconception to treat GCC developers as a
monolithic group. There are hobbyists (such as myself) but
I would guess only a small minority of work is done by them,
with the notable exception of some front ends such as Fortran or
(the most recent example) Algol 68. There are employees by
different companies: Linux distributors like RedHat or Suse,
Large software companies like Google, hardware vendors like
IBM, Intel or Qualcomm, ...
For MIPS, there are not so many active people and commits.
mips64-linux-gnu is a secondary platform, so if it fails
bootstrap, a release would be held up, but a wrong-code
regression will not.
Counting changes since 2025-01-01 in the gcc/config directories
can give a good idea of the relative activity for different
subdirectories; I cut this off below 7, where the PDP-11 is (note
that architecture names are often historical, so i386 includes
x86_64, s390 includes Z, rs6000 includes POWER and so on).
539 ./riscv
435 ./i386
432 ./aarch64
177 ./loongarch
100 ./arm
85 ./s390
75 ./avr
72 ./xtensa
60 ./rs6000
51 ./gcn
39 ./nvptx
25 ./sparc
25 ./mips
20 ./arc
19 ./pa
19 ./bpf
19 ./alpha
18 ./pru
15 ./sh
13 ./rx
13 ./cris
12 ./or1k
12 ./microblaze
12 ./m68k
11 ./lm32
11 ./ia64
11 ./h8300
9 ./vax
9 ./nds32
8 ./mcore
8 ./epiphany
8 ./c6x
7 ./visium
7 ./rl78
7 ./pdp11
7 ./frv
7 ./csky
AFAICS several architectures officialy supported by GCC
struggle to work at all. I suspect that maintainers of MIPS
backend are happy that -trapv works and do not have resources
to make it efficient.
First, they would need to know about this, which requires a PR,
but resources may well be lacking.
There are currently 28 open "missed-optimization" bugs with mips
in their target field. Looking at a few architectures above,
RISC-V has 118, x86 has 943, aarch64 has 305, power has 133.
(Some bugs affect more than one architecture, of course).
But it is worth submitting a PR nonetheless, if anybody cares enough :-)
On 09/06/2026 00:03, Stefan Monnier wrote:
Well, Stephen is hardly using a recent meaning of the word "free".It is not a mistake - it is merely a different but perfectly reasonable use of the same word.
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.
But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.
David Brown [2026-06-09 10:06:48] wrote:
On 09/06/2026 00:03, Stefan Monnier wrote:
Well, Stephen is hardly using a recent meaning of the word "free".It is not a mistake - it is merely a different but perfectly reasonable use >> of the same word.
According to the OED, "free" as in "free of charge" traces back to the
13th century, so it clearly existed in Orwell's time.
But yes, I find it demoralizing that people within the computer world
are still making this mistake, after more than 40 years of FSF.
In an arbitrary context, I could agree, but here we're talking about
a subthread that started with:
On Wed, 27 May 2026 10:59:31 -0400, Stefan Monnier wrote:
> MitchAlsup [2026-05-26 20:54:30] wrote:
>> Encrypt the debug information (and put it in a
>> {1234-5678-9101-1121-...} folder) so that only the owner (not
>> licensee) of the code can debug it.
> I resent that. All code should be Free Software.
I think there is no ambiguity here.
Treating this "Free Software" to refer to price rather than to freedom
is an error that can be explained only by a lack of familiarity with the
idea of software freedom.
As with most things related to freedom ... it depends.
Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public
domain.
Putting <whatever> under some kind of license - regardless of how
permissive it is - actually is easier to do in many places, and is
recognized in more places.
According to George Neuner <[email protected]>:
As with most things related to freedom ... it depends.
Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public >>domain.
I am not aware of any countries that do not have the public domain for >material whose copyright has expired, or for whatever reason was not
eligible for copyright in the first place. But you're right, in some
places it is impossible or at least impractical to relinquish your
rights and put something in the P.D. before it would get there anyway.
--- Synchronet 3.22a-Linux NewsLink 1.2Putting <whatever> under some kind of license - regardless of how >>permissive it is - actually is easier to do in many places, and is >>recognized in more places.
Agreed. There are lots of licenses other than the GPL that are
used successfully for open source software.
R's,
John
The Berne convention defined an implicit copyright that exists by
virtue of authorship and persists until the author's death. Though
the US does not recognize or enforce these implicit copyrights, most >signatories to either Berne (1886) or UCC (1952) conventions do
recognize and enforce Berne copyrights.
George Neuner <[email protected]> writes:
The Berne convention defined an implicit copyright that exists by
virtue of authorship and persists until the author's death. Though
the US does not recognize or enforce these implicit copyrights, most >>signatories to either Berne (1886) or UCC (1952) conventions do
recognize and enforce Berne copyrights.
According to <https://en.wikipedia.org/wiki/Berne_convention>:
|The United States acceded to the convention on 16 November 1988, and
|the convention entered into force for the United States on 1 March
|1989.
How can the convention have entered into force in the US without the
US recognizing or enforcing implicit copyrights?
According to George Neuner <[email protected]>:
As with most things related to freedom ... it depends.
Not to mention that there are many countries that do not recognize
public domain. And even where it technically is recognized, some
countries have legal procedures that must be followed to relinquish
your rights and so complicate actually putting something into public >>domain.
I am not aware of any countries that do not have the public domain for material whose copyright has expired,
My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.
Also, law of my country declares some author right as
untransfreable. Basically, author can sue if he/she/it
thinks that artistic integrity of the work is violated.
My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.
Also, law of my country declares some author right as
untransfreable. Basically, author can sue if he/she/it
thinks that artistic integrity of the work is violated.
According to Waldek Hebisch <[email protected]>:
My country (Poland) has a rule that once copyright has expired
distributior of work should pay royalites to the state. In
am not sure how it works in "interesting" case, but clearly
this is quite different from US/UK meaning of public domain.
Do distributors pay state royalties on works of Shakespeare?
The Bible? Wow.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,123 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 36:45:13 |
| Calls: | 14,371 |
| Files: | 186,380 |
| D/L today: |
2,877 files (805M bytes) |
| Messages: | 2,540,654 |