Forum: War Ensemble BBS

Saving and restoring FP state

From Thomas Koenig@[email protected] to comp.arch on Sat Sep 13 14:55:58 2025

From Newsgroup: comp.arch

Fortran has an optional IEEE module. One of its important features
is that flags (IEEE exceptions) are set to quiet on entry of a procedure
and restored to signalling if it was signalling on entry, or keep it
signalling if it was raised on in the procedure. Similarly, rounding
modes are saved and restored for procedures. This is automatically
done if the right IEEE modules are used. A user can set rounding
modes or set and clear exceptions using the right modes.

Conceptually, this is the right thing to do. A library routine should
not produce different results depending on what a user did for his
own calculations.

Computationally, this can be quite expensive - having the meaning
of a calculation changed by changing FP state should (I hope so,
for corecntess's sake) flush any calculations done with the wrong
FP mode. Just calling a small library routine which does so could
be enough.

An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
on the library implementation, gfortran may be over-cautious
for the ieee_next_after function by saving and restoring fp state,
but I'm not sure that the overhead is actually from the FP state or
from calling extra functions.

So... What is the best way to do allow this to be more efficient?
The CPU could speculate on the FP mode (not sure if that is actually
done). Other suggestions? How do current CPUs do so?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Sat Sep 13 23:07:09 2025

From Newsgroup: comp.arch

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?
2. Does above said mean that caller has no way of modifying rounding
mode used by callee? If true, it defeats one of original reasons for
which Kahan invented rounding modes in the first place.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sat Sep 13 22:00:28 2025

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> posted:

Fortran has an optional IEEE module. One of its important features
is that flags (IEEE exceptions) are set to quiet on entry of a procedure
and restored to signalling if it was signalling on entry, or keep it signalling if it was raised on in the procedure. Similarly, rounding
modes are saved and restored for procedures. This is automatically
done if the right IEEE modules are used. A user can set rounding
modes or set and clear exceptions using the right modes.

How does a user set up his environment such that if TAN()* overflows
to infinity it returns with the OVERFLOW flag set ?!?

(*) EXP(), POW(,)

Conversely, how does one write a TAN()* subroutine with the above property ?

Conceptually, this is the right thing to do. A library routine should
not produce different results depending on what a user did for his
own calculations.

I would think that a user setting RM=ToZero would WANT a different
result from SIN() than the same call with RM=RNE ?!?

Computationally, this can be quite expensive

And contrary to how IEEE 754 has been used for 40 years.

- having the meaning
of a calculation changed by changing FP state should (I hope so,
for corecntess's sake) flush any calculations done with the wrong
FP mode. Just calling a small library routine which does so could
be enough.

Oh, and BTW, how does a user CALL a subroutine to set his RM when
the RETURN undoes the very nature of his request ?!?

An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
on the library implementation, gfortran may be over-cautious
for the ieee_next_after function by saving and restoring fp state,
but I'm not sure that the overhead is actually from the FP state or
from calling extra functions.

So... What is the best way to do allow this to be more efficient?

UN DO this change to IEEE 754.

The CPU could speculate on the FP mode (not sure if that is actually
done). Other suggestions? How do current CPUs do so?

Multi-threaded cores already have to ship different RMs to FUs on
each FP instruction. The HW is all there--its the ISAs that are screwed
up.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 07:18:38 2025

From Newsgroup: comp.arch

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

2. Does above said mean that caller has no way of modifying rounding
mode used by callee? If true, it defeats one of original reasons for
which Kahan invented rounding modes in the first place.

The caller cannot change the callee's rounding mode without the
callee having been designed for this (by taking the rounding mode
as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

I think that's a good idea. If a library routine is written
and debugged for a particular rounding mode, results should
not change because somebody up the call tree changed it.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Sun Sep 14 10:52:10 2025

From Newsgroup: comp.arch

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

It will answer the question if you also say that the rule stated above (rounding mode saved on entry and restored on exit) does not apply
to procedures from intrinsic modules. Or may be does apply to the
rest of procedures in intrinsic modules and IEEE_SET_ROUNDING_MODE is
an exception?

2. Does above said mean that caller has no way of modifying rounding
mode used by callee? If true, it defeats one of original reasons for
which Kahan invented rounding modes in the first place.

The caller cannot change the callee's rounding mode without the
callee having been designed for this (by taking the rounding mode
as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

I think that's a good idea. If a library routine is written
and debugged for a particular rounding mode, results should
not change because somebody up the call tree changed it.

Personally, I never found the whole rounding modes business useful in
my practice. So, can not say with straight face whether Fortran's take
on it is good idea or not. But I can say with good level of certainty
that William Kahan meant something else.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Sun Sep 14 12:07:23 2025

From Newsgroup: comp.arch

On Sat, 13 Sep 2025 22:00:28 GMT
MitchAlsup <[email protected]d> wrote:

Thomas Koenig <[email protected]> posted:

Fortran has an optional IEEE module. One of its important features
is that flags (IEEE exceptions) are set to quiet on entry of a
procedure and restored to signalling if it was signalling on entry,
or keep it signalling if it was raised on in the procedure.
Similarly, rounding modes are saved and restored for procedures.
This is automatically done if the right IEEE modules are used. A
user can set rounding modes or set and clear exceptions using the
right modes.

How does a user set up his environment such that if TAN()* overflows
to infinity it returns with the OVERFLOW flag set ?!?

(*) EXP(), POW(,)

Conversely, how does one write a TAN()* subroutine with the above
property ?

I think that in above paragraph Thomas Koenig uses the word
'signalling' in a sense that differs from it's use in IEEE-754 standard.
He uses to mean, using 754 language, "Immediate alternative exception
handling block associated with a block". Ugh.

In more simple terms, Thomas probably meant to say that Fortran behaves
as if [under Windows] at the entry of the procedure we did
int old = _controlfp(_MCW_EM , _MCW_EM );

and at the exit from procedure we did
int old = _controlfp(_MCW_EM , old);

I'd guess that there exists POSIX equivalent for that, but I don't know
what it is.

Conceptually, this is the right thing to do. A library routine
should not produce different results depending on what a user did
for his own calculations.

I would think that a user setting RM=ToZero would WANT a different
result from SIN() than the same call with RM=RNE ?!?

I am not sure at all in specific case of sin() or in cases of other
standard functions from <math.h>.
But for user's own or 3rd party procedures you are probably right.

Computationally, this can be quite expensive

And contrary to how IEEE 754 has been used for 40 years.

I would not be so categorical.

Contrary to intentions of IEEE-754 committee?
Sure.

Contrary to real-world use of IEEE-754 compliant hardware?
That assumes that actual use (of non-default RMs) is somewhat
wide-spread, which I do not believe.

- having the meaning
of a calculation changed by changing FP state should (I hope so,
for corecntess's sake) flush any calculations done with the wrong
FP mode. Just calling a small library routine which does so could
be enough.

Oh, and BTW, how does a user CALL a subroutine to set his RM when
the RETURN undoes the very nature of his request ?!?

An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
on the library implementation, gfortran may be over-cautious
for the ieee_next_after function by saving and restoring fp state,
but I'm not sure that the overhead is actually from the FP state or
from calling extra functions.

So... What is the best way to do allow this to be more efficient?

UN DO this change to IEEE 754.

The CPU could speculate on the FP mode (not sure if that is actually
done). Other suggestions? How do current CPUs do so?

Multi-threaded cores already have to ship different RMs to FUs on
each FP instruction. The HW is all there--its the ISAs that are
screwed up.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 10:09:49 2025

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> schrieb:

Thomas Koenig <[email protected]> posted:

Fortran has an optional IEEE module. One of its important features
is that flags (IEEE exceptions) are set to quiet on entry of a procedure
and restored to signalling if it was signalling on entry, or keep it
signalling if it was raised on in the procedure. Similarly, rounding
modes are saved and restored for procedures. This is automatically
done if the right IEEE modules are used. A user can set rounding
modes or set and clear exceptions using the right modes.

How does a user set up his environment such that if TAN()* overflows
to infinity it returns with the OVERFLOW flag set ?!?

(*) EXP(), POW(,)

TAN and other numeric intrinsics are defined very loosely in the
standard: "The result has a value equal to a processor-dependent
approximation to tan(X)". Unfortunately, the language standard does
not define this, but allows IEEE_OVERFLOW to signal in that case.
In practice, all implementations I have tested do so.

Conversely, how does one write a TAN()* subroutine with the above property ?

IEEE_OVERFLOW on exit is IEEE_OVERFLOW on entry || IEEE_OVERFLOW
when it is raised in the procedure.

Conceptually, this is the right thing to do. A library routine should
not produce different results depending on what a user did for his
own calculations.

I would think that a user setting RM=ToZero would WANT a different
result from SIN() than the same call with RM=RNE ?!?

You have to explicitly use the IEEE modules for this behavior,
which the intrinsic procedures do not do.

An example for dot product:

module mymod
implicit none
contains
subroutine my_dot(a, b, r_down, r_up, error)
use, intrinsic :: iso_fortran_env, only : real64
use, intrinsic:: ieee_arithmetic
integer :: i
real(real64), dimension(:), intent(in) :: a, b
real(real64), intent(out) :: r_down,r_up
logical, intent(out) :: error
if (size(a) /= size(b)) then
error = .true.
return
end if
r_down = 0
call ieee_set_rounding_mode(IEEE_DOWN)
do i=1,size(a,1)
r_down = ieee_fma(a(i),b(i),r_down)
end do
call ieee_set_rounding_mode(IEEE_UP)
do i=1,size(a,1)
r_up = ieee_fma(a(i),b(i),r_up)
end do
call ieee_get_flag(IEEE_OVERFLOW, error)
call ieee_set_flag(IEEE_OVERFLOW,.false.)
end subroutine my_dot
end module mymod

program main
use, intrinsic :: iso_fortran_env, only : real64
use mymod
integer, parameter :: n = 1000
real(real64), dimension(n) :: a, b
real(real64) :: r_down, r_up, r_mid
logical :: error
call random_number(a)
call random_number(b)
a = a - 0.5
b = b - 0.5
call my_dot (a, b, r_down, r_up, error)
if (error) stop "Oh no!"
r_mid = dot_product(a,b)
print '(1P,E22.15)',r_down,r_mid,r_up
end program main

[...]

Oh, and BTW, how does a user CALL a subroutine to set his RM when
the RETURN undoes the very nature of his request ?!?

That's a no-op :-)

But the way to do it is just to call the IEEE routines directly.

An example of the high cost is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
on the library implementation, gfortran may be over-cautious
for the ieee_next_after function by saving and restoring fp state,
but I'm not sure that the overhead is actually from the FP state or
from calling extra functions.

So... What is the best way to do allow this to be more efficient?

UN DO this change to IEEE 754.

Does IEEE 754 concern itself with how it is implemented in
programming languages? Terje?

The CPU could speculate on the FP mode (not sure if that is actually
done). Other suggestions? How do current CPUs do so?

Multi-threaded cores already have to ship different RMs to FUs on
each FP instruction. The HW is all there--its the ISAs that are screwed
up.

And that would be an interesting part - how to specify this
efficiently, and allow the microarchiteture not to flush when
rounding modes are changed.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 10:19:08 2025

From Newsgroup: comp.arch

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph 5,
which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

So, that part of my premise was wrong. Sorry.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Sun Sep 14 17:06:56 2025

From Newsgroup: comp.arch

On Sun, 14 Sep 2025 10:19:08 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for
procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph 5,
which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

So, that part of my premise was wrong. Sorry.

Now it sounds like matching Kahan's intentions.

--- Synchronet 3.21a-Linux NewsLink 1.2

From EricP@[email protected] to comp.arch on Sun Sep 14 10:52:02 2025

From Newsgroup: comp.arch

Thomas Koenig wrote:

MitchAlsup <[email protected]d> schrieb:

Thomas Koenig <[email protected]> posted:

The CPU could speculate on the FP mode (not sure if that is actually
done). Other suggestions? How do current CPUs do so?

Multi-threaded cores already have to ship different RMs to FUs on
each FP instruction. The HW is all there--its the ISAs that are screwed
up.

And that would be an interesting part - how to specify this
efficiently, and allow the microarchiteture not to flush when
rounding modes are changed.

It has to sync-wait for all older FP instructions to finish executing.

The x87 had separate control and status registers but
SSE merged this into a single Control Status Register MXCSR.
In order to read the current control bits it must also read the status bits, and to read the status bits it must wait until all outstanding SSE FP instructions have executed because the status register flags are defined
as the OR of all the older FP instruction flags. But the FP status bits
are not usually updated with control so reading them was unnecessary.

On way is to have separate control registers for FP control
(round mode RM, exception enables XE, etc) and FP status,
and separate instructions to read and write them.
Additionally one could have RM and other controls on each FP instruction
which would allow one to change the RM without having to save, set and
restore the control register.

Additionally the x64's LDMXCSR and STMXCSR instructions load and store
the current MXCSR value but *only with memory*, not to an integer register.
Not only should these be separate CR and SR registers,
the should allow the old CR to be saved/restored with an integer register.

What it looks like most users want is a combined "copy current FP CR to
integer register and set masked CR field to immediate or int register".
That allows users to save & set just the RM without a sync-wait or
touching memory with 1 instruction, and restore later.

However it is not clear if this would be suitable for Fortran as you say:

Thomas Koenig wrote:

Fortran has an optional IEEE module. One of its important features
is that flags (IEEE exceptions) are set to quiet on entry of a procedure
and restored to signalling if it was signalling on entry, or keep it signalling if it was raised on in the procedure. Similarly, rounding
modes are saved and restored for procedures. This is automatically
done if the right IEEE modules are used. A user can set rounding
modes or set and clear exceptions using the right modes.

This sounds like Fortran's defined FP status algorithm requires it
read the both current control and status,
then set the RM and mask exceptions and clear the status bits,
and later restore the old control flags, and optionally OR with old status. Unfortunately reading the current status register forces the sync-wait.

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sun Sep 14 16:16:23 2025

From Newsgroup: comp.arch

Michael S <[email protected]> wrote:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)

2. Does above said mean that caller has no way of modifying rounding
mode used by callee? If true, it defeats one of original reasons for
which Kahan invented rounding modes in the first place.

The caller cannot change the callee's rounding mode without the
callee having been designed for this (by taking the rounding mode
as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

I think that's a good idea. If a library routine is written
and debugged for a particular rounding mode, results should
not change because somebody up the call tree changed it.

Personally, I never found the whole rounding modes business useful in
my practice. So, can not say with straight face whether Fortran's take
on it is good idea or not. But I can say with good level of certainty
that William Kahan meant something else.

Kahan wrote a paper claiming that seting rounding mode to different
value is great debugging help, especially when no source is available.
So probably he really meant this. But first thing that authors of
numerical packages learn is to set rounding mode to desired value.
There are some codes which should work reasonably for all rounding
modes, but a lot of code critically depends on rounding and will
not work with different rounding mode. So, clearly this idea about
standard was wrong (as several other things in the standard).
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sun Sep 14 16:22:33 2025

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph 5,
which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

IIUC this means that user can not define routine like 'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be worked around using
foreign function interface).
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@[email protected] to comp.arch on Mon Sep 15 02:31:04 2025

From Newsgroup: comp.arch

On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:

Personally, I never found the whole rounding modes business useful in my practice.

I remember one of Kahan’s writings suggesting it is useful for testing numeric stability: if your code gives results that differ only slightly in
the four different rounding modes, then your calculations are *probably* stable; if the results vary a lot, then your calculations are *probably* unstable.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@[email protected] to comp.arch on Sun Sep 14 23:48:33 2025

From Newsgroup: comp.arch

Lawrence D’Oliveiro [2025-09-15 02:31:04] wrote:

On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:

Personally, I never found the whole rounding modes business useful in my
practice.

I remember one of Kahan’s writings suggesting it is useful for testing numeric stability: if your code gives results that differ only slightly in the four different rounding modes, then your calculations are *probably* stable; if the results vary a lot, then your calculations are *probably* unstable.

IIUC you can get the same result by adding a bit a noise to your inputs
and compare the output. Maybe it's easier to change rounding modes than
to add noise to your inputs?

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From aph@[email protected] to comp.arch on Mon Sep 15 07:34:41 2025

From Newsgroup: comp.arch

Stefan Monnier <[email protected]> wrote:

Lawrence D’Oliveiro [2025-09-15 02:31:04] wrote:

On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:

Personally, I never found the whole rounding modes business useful in my >>> practice.

I remember one of Kahan’s writings suggesting it is useful for testing
numeric stability: if your code gives results that differ only slightly in >> the four different rounding modes, then your calculations are *probably*
stable; if the results vary a lot, then your calculations are *probably*
unstable.

IIUC you can get the same result by adding a bit a noise to your inputs
and compare the output.

No, because the result of changing rounding mode is highly correlated
with the inputs, whereas noise is uncorrelated.

Andrew.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Mon Sep 15 15:42:12 2025

From Newsgroup: comp.arch

On Sun, 14 Sep 2025 16:22:33 -0000 (UTC)
[email protected] (Waldek Hebisch) wrote:

Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for
procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph
5, which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

IIUC this means that user can not define routine like
'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be
worked around using foreign function interface).

I am not sure that I understood.
Do you want to say that user can not write routines that call IEEE_SET_ROUNDING_MODE ? I don't see anything in citation above that
suggests that.
Or do you mean something else?

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@[email protected] (Waldek Hebisch) to comp.arch on Tue Sep 16 01:59:34 2025

From Newsgroup: comp.arch

Michael S <[email protected]> wrote:

On Sun, 14 Sep 2025 16:22:33 -0000 (UTC)
[email protected] (Waldek Hebisch) wrote:

Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for
procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph
5, which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

IIUC this means that user can not define routine like
'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be
worked around using foreign function interface).

I am not sure that I understood.
Do you want to say that user can not write routines that call IEEE_SET_ROUNDING_MODE ? I don't see anything in citation above that
suggests that.

Of course thet can.

Or do you mean something else?

The citation above clearly requires that all changes to rounding
mode done during execution of user routine are undone at exit
from that routine. Consequenty _user_ routine can not change
rounding mode of the caller. In other words, user routine can
not offer alternative implementation of IEEE_SET_ROUNDING_MODE.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Tue Sep 16 12:36:30 2025

From Newsgroup: comp.arch

On 9/14/2025 9:06 AM, Michael S wrote:

On Sun, 14 Sep 2025 10:19:08 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Michael S <[email protected]> schrieb:

On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
Thomas Koenig <[email protected]> wrote:

Similarly, rounding modes are saved and restored for
procedures.

I am not sure that I understand.
1. Is Fortran's equivalent of C's fesetround() is considered a
languge primitive rather than procedure?

Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
procedures from intrinsic modules, like COMPILER_OPTIONS
from ISO_FORTRAN_ENV, and user-defined procedures.
IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
(but an optional one). It need not be an external function;
the compiler is free to do other things to implement it.

(Hope this answers your question)

OOOPS.

Seems I misread the standard, missing out on clause 17.4 paragraph 5,
which states

In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
the processor shall not change the rounding modes on entry, and
on return shall ensure that the rounding modes are the same as
they were on entry.

So, that part of my premise was wrong. Sorry.

Now it sounds like matching Kahan's intentions.

So, this implies the rounding mode and similar are expected to follow
dynamic scoping rather than global scoping?...

May need to work on this some more if so, but it appears that a lot of existing C compilers treat it as global?...

I guess things also come up with ISA choices.

Can note, for example:
RISC-V always has a 3-bit rounding mode in the instruction;
GCC defaults to 7 (dynamic);
BGBCC uses 0 (RNE) or 7 (DYN), depending on FENV_ACCESS
XG1/2/3 have multiple several versions of some of the instructions:
FADD/FSUB/FMUL : Hard-wired RNE (no status flag updates)
FADDG/FSUBG/FMULG: Dynamic Rounding Mode (updates flags)
FADDA/FSUBA/FMULA: Reduced Precision
Binary64 format, but mimic Binary32 precision.
No flags updates.

It is possible to give a full static rounding mode (like in RISC-V), but
this requires a 64-bit encoding. Static rounding other than RNE is very
rare though.

But, yeah, I have now noticed that my attempt to add an IEEE exact mode,
due to pipeline timing issues in my Verilog code, was generating
spurious Emulation-Request traps, which in some cases was causing
problems (occasionally random data on the pipeline managed to cause the
FPU to occasionally generate an FPU exception whenever using an FPU instruction due to whatever had been in the pipeline before the
instruction in question was performed; regardless of the state of the
IEEE-754 emulation flag).

Had to fix this and then re-upload the Verilog core as I noted that this
bug leaked into the version I had posted online.

Also there was another related bug where FPU instructions in interrupt handlers could effect the FPU flags visible in userland. I ended up
adding a mechanism to partly disable both the flags behavior and traps
when inside an interrupt handler (so, interrupt handlers will always get DAZ/FTZ and similar).

But, yeah...

Otherwise:

Some people in RISC-V land now jumped onboard with adding Load/Store
Indexed and similar (and auto-increment, blarg). Their encodings clashed
with my Load/Store Pair, but ended up resolving the issue partly by
relocating my definition of LDP/SDP entirely over to the FLQ/FSQ
instructions (my core not implementing the Q extension, so FLQ/FSQ could
also be used for LDP/SDP).

In the newest variant, even numbers will encode FPU pairs, as before,
whereas Odd numbers encode GPR pairs. This is kinda backwards, but
minimizes code breakage (and allows a grace period for the proposed
encodings to "not break stuff"); though does still leave some "breaking changes".

Have noted that currently code compiled with RV-C enabled in BGBCC is
crash prone with my VL core; whereas code compiled with GCC seems to
work. May require more investigation (did recently start trying to make
use of more of the RV-C encodings in BGBCC, so maybe stumbled on
something here).

Had also been working on getting some previously unimplemented parts of
XG3 working (such as support for predication). One part that was causing problems was that BGBCC was using some stale encodings (from earlier in
the design of the encoding scheme). Seems also there was a bug where it
was trying to use the predication encoding rules on jumbo-prefixes.

Still not fully working yet though.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Tue Sep 16 17:50:09 2025

From Newsgroup: comp.arch

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt >handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Tue Sep 16 20:37:06 2025

From Newsgroup: comp.arch

On 9/16/2025 12:50 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt
handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but seemingly something was, in any case. I didn't see any particular reason
to forbid using the FPU inside of interrupt handlers (they are mostly
still plain C, differing mostly in that there are limited to the
operating in terms of the physical memory map).

But, yeah, in any case, the partial workaround was that interrupt
handlers don't update FPU flags. It was either this or give interrupt
handlers their own version of FPSR, but cheaper/easier to not bother and
have interrupt handlers behave as if FPSR were hard wired to all 0's.

Most likely possibilities:
MIDI / FM update ticks;
Tick causes FM instruments to update and similar;
PCM / WAVE update ticks.
Ticks for transferring audio from software to hardware loop buffers.

Both use some amount of floating point math internally.

Where, say:
Software loop buffer is mostly Binary16;
Audio hardware uses a small A-Law loop buffer;
Programs mostly submit audio as 16-bit PCM or similar, rate not
necessarily tied to hardware sample rate.

Didn't seem too unreasonable.

But, in any case, using FP instructions in an interrupt handler
shouldn't leave state changes that are visible in userland.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Robert Finch@[email protected] to comp.arch on Tue Sep 16 21:49:13 2025

From Newsgroup: comp.arch

On 2025-09-16 1:50 p.m., Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt
handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

IIRC I was using FP in an interrupt handler at one point for video
processing where the co-ordinates were already floats. I believe this
has since been changed to fixed point arithmetic.

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.arch on Wed Sep 17 13:57:06 2025

From Newsgroup: comp.arch

BGB <[email protected]> writes:

On 9/16/2025 12:50 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt
handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but >seemingly something was, in any case. I didn't see any particular reason
to forbid using the FPU inside of interrupt handlers (they are mostly
still plain C, differing mostly in that there are limited to the
operating in terms of the physical memory map).

The standard reasoning for prohibiting floating point in the
kernel is to improve system call overhead by not saving floating
point registers until and unless there is a context switch (and
even then, x86 has features that allow the OS to forgo saving
the floating point registers if they weren't used in the last
scheduling quantum).

But, in any case, using FP instructions in an interrupt handler
shouldn't leave state changes that are visible in userland.

A well understood problem handled by all off the shelf operating
systems.
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Wed Sep 17 13:05:58 2025

From Newsgroup: comp.arch

On 9/17/2025 8:57 AM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/16/2025 12:50 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt >>>> handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but
seemingly something was, in any case. I didn't see any particular reason
to forbid using the FPU inside of interrupt handlers (they are mostly
still plain C, differing mostly in that there are limited to the
operating in terms of the physical memory map).

The standard reasoning for prohibiting floating point in the
kernel is to improve system call overhead by not saving floating
point registers until and unless there is a context switch (and
even then, x86 has features that allow the OS to forgo saving
the floating point registers if they weren't used in the last
scheduling quantum).

In my case, this wasn't x86, and on my ISA the FPU stuff is done in
GPRs, which typically need to be saved/restored either way. Well, except
when running RISC-V code, which effectively splits the register space in
half (32+32 rather than 64).

The issue was that the FPSR is (now) aliased to SP(63:48), but there was
only a single SP; and the CPU core handles interrupts by causing SP and
SSP to switch places in decode.

The likely more proper solution would have been to have another FPSR
aliased to SSP(63:48) which also re-routes; where as-is SSP is currently
only a 48 bit register internally.

But, for now, easier was to disable the updates if inside an ISR.

This issue wouldn't have existed if still using GBR/GP for this, but GBR
has the disadvantage that it gets stomped whenever a reload occurs; so
it was either tweak the GBR reload mechanism to not stomp FPSR, or move
FPSR somewhere where it doesn't get stomped (the high bits of SP being
the most obvious choice).

Ran into a problem as some of my interrupt handling code does a sanity
check to verify that SP was intact between interrupt entry and return,
and some of this handling saw that SP changed unexpectedly and triggered
a break-point (otherwise, it might have gone unnoticed).

Though, this may be a payoff from being "needlessly pedantic" in this
case. There was a previous check (now disabled) where it would have also XOR'ed all the callee save registers together and then checked the known
state against the XOR (if a register having changed, it likely having
changed the XOR). Disabled as XOR'ing all them together has a high overhead.

Note that the logic does account for things like context switches, and
the SYSCALL interrupt is using a different prolog/epilog sequence which
is more optimized for context switching (but is only valid once a task
state is configured).

But, in any case, using FP instructions in an interrupt handler
shouldn't leave state changes that are visible in userland.

A well understood problem handled by all off the shelf operating
systems.

Possible.

In this case, the leaked state was caught by some code being pedantic
and noticing the HOB's of SP changing unexpectedly.

Otherwise, instruction predication in XG3 now seems to mostly work (was
mostly issues in BGBCC). Last issue was some paths where it was trying
to use RISC-V encodings in cases where predication was being used, and
the RISC-V ops not supporting predication.

It wouldn't have been as simple as simply converting the RISC-V ops to
XG3, as things are not always 1:1. In one of the places it came up, it
is a tangled mess, as both RISC-V and XG3 instruction-generation are
sorta tangled together.

Had to detect predication was being used (for the current instructions),
and effectively disable the use of RISC-V encodings in this case.

Though something still isn't perfect, as the Doom demos desync in a
different way, where a change in demo desync is usually evidence of a difference in program behavior (though can also be caused by memory corruption, etc, *).

...

*: Though, in some cases, it is sensitive to memory contents for
out-of-bounds memory accesses, which tended to differ between Doom
versions. Some of the Doom source ports try to deal with this by fingerprinting the IWAD and then simulating the contents of the
out-of-bounds memory and similar (along with various changes in game
behavior) for each engine version. My port doesn't really bother (so, I
live with the demo desync, but can still notice changes in demo desync).

Though, almost some possible debate as to whether to bring back in some
of the 2RI ops into XG3 (the 2RI-Imm10 space had been effectively from
XG3, or not carried over from XG1/XG2). When using predication, a few of
these become relevant again. In this mode, had been using the option of
simply using some of the 3R ops but directing the output to R0/ZR as a
special case to encode an intention to update the T bit, but this has
some drawbacks (such as the lack of a decent sized immediate field).

Though, in this case, the use of predication (and thus conditional
compare) is low enough to leave it as debatable as to whether or not it
would be a good idea to bring back these encodings (or just continue to
live with some more limited Imm6s encodings, and the occasional use of jumbo-prefixes when the Imm6 fails).

Actually, predication is itself debatable as it does effectively use
half the encoding space, and is technically a minority of the
instructions. But, does help with performance in some cases (namely, a
lot of the cases where XG3 was meant to address).

Could almost reuse the encoding space for a different set of 16-bit ops,
but don't necessarily want yet another 16-bit decoder (and, one can
argue, if code density matters enough to want to use 16-bit ops, jumping
over to RV64GC mode and using RV-C ops may make more sense).

Though, presently BGBCC doesn't support mixed RV-C and XG3 binaries, and
this would kinda be a mess (though, the original ARM+Thumb scheme exists
an example of the basic idea here).

So, for now, they will likely remain as predicated ops and similar.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Wed Sep 17 18:53:09 2025

From Newsgroup: comp.arch

BGB <[email protected]> posted:

On 9/17/2025 8:57 AM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/16/2025 12:50 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt >>>> handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but
seemingly something was, in any case. I didn't see any particular reason >> to forbid using the FPU inside of interrupt handlers (they are mostly
still plain C, differing mostly in that there are limited to the
operating in terms of the physical memory map).

The standard reasoning for prohibiting floating point in the
kernel is to improve system call overhead by not saving floating
point registers until and unless there is a context switch (and
even then, x86 has features that allow the OS to forgo saving
the floating point registers if they weren't used in the last
scheduling quantum).

In my case, this wasn't x86, and on my ISA the FPU stuff is done in
GPRs, which typically need to be saved/restored either way. Well, except when running RISC-V code, which effectively splits the register space in half (32+32 rather than 64).

The issue was that the FPSR is (now) aliased to SP(63:48), but there was only a single SP; and the CPU core handles interrupts by causing SP and
SSP to switch places in decode.

Switching SP with SSP does not work (fundamentally) when there are
4 privilege levels--as each privilege level needs its own unique SP.

The likely more proper solution would have been to have another FPSR
aliased to SSP(63:48) which also re-routes; where as-is SSP is currently only a 48 bit register internally.

So, when one does a::

ADD SP,SP,#big-number

do the HoBs of SP get changed (like any other GPR or do you special
case SP ?!?

But, for now, easier was to disable the updates if inside an ISR.

My 66000 code does not even know it is in an ISR--as ISRs do not
disable interrupts, disable exceptions, and are re-entrant from
the very first ISR instruction.

Thus, ISRs can do FP if they desire, Vectorize loops as needed,
and are not limited in their use of ISA.

This issue wouldn't have existed if still using GBR/GP for this, but GBR
has the disadvantage that it gets stomped whenever a reload occurs; so
it was either tweak the GBR reload mechanism to not stomp FPSR, or move
FPSR somewhere where it doesn't get stomped (the high bits of SP being
the most obvious choice).

My ISA does not have this problem as it does not need a GBR, universal constants (this time as a displacement) eliminated the need.
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@[email protected] to comp.arch on Thu Sep 18 12:41:48 2025

From Newsgroup: comp.arch

On 9/17/2025 1:53 PM, MitchAlsup wrote:

BGB <[email protected]> posted:

On 9/17/2025 8:57 AM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/16/2025 12:50 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 9/14/2025 9:06 AM, Michael S wrote:

Also there was another related bug where FPU instructions in interrupt >>>>>> handlers could effect the FPU flags visible in userland.

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but
seemingly something was, in any case. I didn't see any particular reason >>>> to forbid using the FPU inside of interrupt handlers (they are mostly
still plain C, differing mostly in that there are limited to the
operating in terms of the physical memory map).

The standard reasoning for prohibiting floating point in the
kernel is to improve system call overhead by not saving floating
point registers until and unless there is a context switch (and
even then, x86 has features that allow the OS to forgo saving
the floating point registers if they weren't used in the last
scheduling quantum).

In my case, this wasn't x86, and on my ISA the FPU stuff is done in
GPRs, which typically need to be saved/restored either way. Well, except
when running RISC-V code, which effectively splits the register space in
half (32+32 rather than 64).

The issue was that the FPSR is (now) aliased to SP(63:48), but there was
only a single SP; and the CPU core handles interrupts by causing SP and
SSP to switch places in decode.

Switching SP with SSP does not work (fundamentally) when there are
4 privilege levels--as each privilege level needs its own unique SP.

In this case, there are 3 major modes:
user, Supervisor, and ISR

But only 2 levels:
User|Supervisor, ISR
So User and Supervisor get their own SP, but in this case it is handled
by using a context switch.

The likely more proper solution would have been to have another FPSR
aliased to SSP(63:48) which also re-routes; where as-is SSP is currently
only a 48 bit register internally.

So, when one does a::

ADD SP,SP,#big-number

do the HoBs of SP get changed (like any other GPR or do you special
case SP ?!?

In premise, if you add a big enough value to SP, then it will stomp the
FPU state (at least as far as this sort of thing goes, it behaves like a
GPR).

In normal use, it wont matter, since SP only ever covers a small range
of values.

Granted, maybe this is a stupid way to do it...

Similarly, modifying the HOBs of SP can be used to modify the FPU state.

This does still pose a non-zero risk of breaking stuff in RISC-V mode,
where GCC is likely to be unaware of the wonk in the high order bits of SP.

Most likely case is special-casing the ADD and ADDI instructions in the
RISC-V decoder when working with SP (the high bits always being read a 0
and ignored on write).

In this case though, "ORI" or similar could be used to access the full register.

But, for now, easier was to disable the updates if inside an ISR.

My 66000 code does not even know it is in an ISR--as ISRs do not
disable interrupts, disable exceptions, and are re-entrant from
the very first ISR instruction.

Thus, ISRs can do FP if they desire, Vectorize loops as needed,
and are not limited in their use of ISA.

In my case, as noted, they can still use the FPU, but it is now
hard-wired to DAZ/FTZ mode. They also operate in physical memory addressing.

Likewise, no support for re-entrant interrupt handling.

For now, main way to do more than this is to perform a context-switch to
a supervisor mode task.

This issue wouldn't have existed if still using GBR/GP for this, but GBR
has the disadvantage that it gets stomped whenever a reload occurs; so
it was either tweak the GBR reload mechanism to not stomp FPSR, or move
FPSR somewhere where it doesn't get stomped (the high bits of SP being
the most obvious choice).

My ISA does not have this problem as it does not need a GBR, universal constants (this time as a displacement) eliminated the need.

FWIW: I could have used PC-rel, but this doesn't work well with NoMMU.
And, a shared address space following NoMMU rules has less overhead than multi-mapping pages or CoW.

Likewise, absolute addressing doesn't appeal to me (if anything, it is
worse than PC-rel).

The main other option (besides putting it in the HOB's of SP or similar)
would have been to create a new CR for it, but then would need to
save/restore this CR on context switch and similar (and adding a new CR
would have been more logic than shoving it into the HOB's of SP or similar).

Another option would have been the HOBs of SR, but as-is SR(63:32) is currently assumed to hold global system-level state; whereas FPSR is
clearly task-local.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Andy Valencia@[email protected] to comp.arch on Mon Sep 22 07:33:53 2025

From Newsgroup: comp.arch

BGB <[email protected]> writes:

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but seemingly something was,

I believe I noted once before that I had to guide a driver developer for
HP/UX away from using floating point for his driver's event counters. His reason for their use? They can count to bigger numbers.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html
--- Synchronet 3.21a-Linux NewsLink 1.2

From Terje Mathisen@[email protected] to comp.arch on Mon Sep 22 17:05:45 2025

From Newsgroup: comp.arch

Andy Valencia wrote:

BGB <[email protected]> writes:

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions, but
seemingly something was,

I believe I noted once before that I had to guide a driver developer for HP/UX away from using floating point for his driver's event counters. His reason for their use? They can count to bigger numbers.

Did you ask him what happened after the counter passed 2^53 and he added
1.0 to the counter? With RNE that's a NOP...

Using 64-bit counters are far better, even on a 32-bit CPU, even if you
then need thread-safe/atomic code sequences.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@[email protected] to comp.arch on Mon Sep 22 18:17:05 2025

From Newsgroup: comp.arch

On Mon, 22 Sep 2025 17:05:45 +0200
Terje Mathisen <[email protected]> wrote:

Andy Valencia wrote:

BGB <[email protected]> writes:

Why on earth would you use floating point instructions
in an interrupt handler?

I didn't go and track down which code was using FPU instructions,
but seemingly something was,

I believe I noted once before that I had to guide a driver
developer for HP/UX away from using floating point for his driver's
event counters. His reason for their use? They can count to
bigger numbers.

Did you ask him what happened after the counter passed 2^53 and he
added 1.0 to the counter? With RNE that's a NOP...

Using 64-bit counters are far better, even on a 32-bit CPU, even if
you then need thread-safe/atomic code sequences.

Terje

Since the counter is in memory, using FP does not help thread safety.
As to 2**53, that's enough for most things. At least for counting
events that do not happen more often than few millions times per
second.

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Thu Oct 2 07:54:48 2025
  from Moore, Ok via Telnet
- Microbot
  Wed Oct 1 09:22:45 2025
  from Moore, Ok via Telnet
- Microbot
  Tue Sep 30 10:39:40 2025
  from Moore, Ok via Telnet
- Microbot
  Mon Sep 29 12:33:00 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,071
Nodes:	10 (0 / 10)
Uptime:	70:52:27
Calls:	13,754
Calls today:	1
Files:	186,984
D/L today:	11,494 files (3,339M bytes)
Messages:	2,425,734

Saving and restoring FP state

Who's Online

Recent Visitors

System Info