• Saving and restoring FP state

    From Thomas Koenig@[email protected] to comp.arch on Sat Sep 13 14:55:58 2025
    From Newsgroup: comp.arch

    Fortran has an optional IEEE module. One of its important features
    is that flags (IEEE exceptions) are set to quiet on entry of a procedure
    and restored to signalling if it was signalling on entry, or keep it
    signalling if it was raised on in the procedure. Similarly, rounding
    modes are saved and restored for procedures. This is automatically
    done if the right IEEE modules are used. A user can set rounding
    modes or set and clear exceptions using the right modes.

    Conceptually, this is the right thing to do. A library routine should
    not produce different results depending on what a user did for his
    own calculations.

    Computationally, this can be quite expensive - having the meaning
    of a calculation changed by changing FP state should (I hope so,
    for corecntess's sake) flush any calculations done with the wrong
    FP mode. Just calling a small library routine which does so could
    be enough.

    An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
    on the library implementation, gfortran may be over-cautious
    for the ieee_next_after function by saving and restoring fp state,
    but I'm not sure that the overhead is actually from the FP state or
    from calling extra functions.

    So... What is the best way to do allow this to be more efficient?
    The CPU could speculate on the FP mode (not sure if that is actually
    done). Other suggestions? How do current CPUs do so?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Sat Sep 13 23:07:09 2025
    From Newsgroup: comp.arch

    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?
    2. Does above said mean that caller has no way of modifying rounding
    mode used by callee? If true, it defeats one of original reasons for
    which Kahan invented rounding modes in the first place.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Sep 13 22:00:28 2025
    From Newsgroup: comp.arch


    Thomas Koenig <[email protected]> posted:

    Fortran has an optional IEEE module. One of its important features
    is that flags (IEEE exceptions) are set to quiet on entry of a procedure
    and restored to signalling if it was signalling on entry, or keep it signalling if it was raised on in the procedure. Similarly, rounding
    modes are saved and restored for procedures. This is automatically
    done if the right IEEE modules are used. A user can set rounding
    modes or set and clear exceptions using the right modes.

    How does a user set up his environment such that if TAN()* overflows
    to infinity it returns with the OVERFLOW flag set ?!?

    (*) EXP(), POW(,)

    Conversely, how does one write a TAN()* subroutine with the above property ?

    Conceptually, this is the right thing to do. A library routine should
    not produce different results depending on what a user did for his
    own calculations.

    I would think that a user setting RM=ToZero would WANT a different
    result from SIN() than the same call with RM=RNE ?!?

    Computationally, this can be quite expensive

    And contrary to how IEEE 754 has been used for 40 years.

    - having the meaning
    of a calculation changed by changing FP state should (I hope so,
    for corecntess's sake) flush any calculations done with the wrong
    FP mode. Just calling a small library routine which does so could
    be enough.

    Oh, and BTW, how does a user CALL a subroutine to set his RM when
    the RETURN undoes the very nature of his request ?!?

    An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
    on the library implementation, gfortran may be over-cautious
    for the ieee_next_after function by saving and restoring fp state,
    but I'm not sure that the overhead is actually from the FP state or
    from calling extra functions.

    So... What is the best way to do allow this to be more efficient?

    UN DO this change to IEEE 754.

    The CPU could speculate on the FP mode (not sure if that is actually
    done). Other suggestions? How do current CPUs do so?

    Multi-threaded cores already have to ship different RMs to FUs on
    each FP instruction. The HW is all there--its the ISAs that are screwed
    up.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 07:18:38 2025
    From Newsgroup: comp.arch

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)

    2. Does above said mean that caller has no way of modifying rounding
    mode used by callee? If true, it defeats one of original reasons for
    which Kahan invented rounding modes in the first place.

    The caller cannot change the callee's rounding mode without the
    callee having been designed for this (by taking the rounding mode
    as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

    I think that's a good idea. If a library routine is written
    and debugged for a particular rounding mode, results should
    not change because somebody up the call tree changed it.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Sun Sep 14 10:52:10 2025
    From Newsgroup: comp.arch

    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)


    It will answer the question if you also say that the rule stated above (rounding mode saved on entry and restored on exit) does not apply
    to procedures from intrinsic modules. Or may be does apply to the
    rest of procedures in intrinsic modules and IEEE_SET_ROUNDING_MODE is
    an exception?

    2. Does above said mean that caller has no way of modifying rounding
    mode used by callee? If true, it defeats one of original reasons for
    which Kahan invented rounding modes in the first place.

    The caller cannot change the callee's rounding mode without the
    callee having been designed for this (by taking the rounding mode
    as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

    I think that's a good idea. If a library routine is written
    and debugged for a particular rounding mode, results should
    not change because somebody up the call tree changed it.

    Personally, I never found the whole rounding modes business useful in
    my practice. So, can not say with straight face whether Fortran's take
    on it is good idea or not. But I can say with good level of certainty
    that William Kahan meant something else.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Sun Sep 14 12:07:23 2025
    From Newsgroup: comp.arch

    On Sat, 13 Sep 2025 22:00:28 GMT
    MitchAlsup <[email protected]d> wrote:

    Thomas Koenig <[email protected]> posted:

    Fortran has an optional IEEE module. One of its important features
    is that flags (IEEE exceptions) are set to quiet on entry of a
    procedure and restored to signalling if it was signalling on entry,
    or keep it signalling if it was raised on in the procedure.
    Similarly, rounding modes are saved and restored for procedures.
    This is automatically done if the right IEEE modules are used. A
    user can set rounding modes or set and clear exceptions using the
    right modes.

    How does a user set up his environment such that if TAN()* overflows
    to infinity it returns with the OVERFLOW flag set ?!?

    (*) EXP(), POW(,)

    Conversely, how does one write a TAN()* subroutine with the above
    property ?

    I think that in above paragraph Thomas Koenig uses the word
    'signalling' in a sense that differs from it's use in IEEE-754 standard.
    He uses to mean, using 754 language, "Immediate alternative exception
    handling block associated with a block". Ugh.

    In more simple terms, Thomas probably meant to say that Fortran behaves
    as if [under Windows] at the entry of the procedure we did
    int old = _controlfp(_MCW_EM , _MCW_EM );

    and at the exit from procedure we did
    int old = _controlfp(_MCW_EM , old);

    I'd guess that there exists POSIX equivalent for that, but I don't know
    what it is.

    Conceptually, this is the right thing to do. A library routine
    should not produce different results depending on what a user did
    for his own calculations.

    I would think that a user setting RM=ToZero would WANT a different
    result from SIN() than the same call with RM=RNE ?!?


    I am not sure at all in specific case of sin() or in cases of other
    standard functions from <math.h>.
    But for user's own or 3rd party procedures you are probably right.

    Computationally, this can be quite expensive

    And contrary to how IEEE 754 has been used for 40 years.


    I would not be so categorical.

    Contrary to intentions of IEEE-754 committee?
    Sure.

    Contrary to real-world use of IEEE-754 compliant hardware?
    That assumes that actual use (of non-default RMs) is somewhat
    wide-spread, which I do not believe.

    - having the meaning
    of a calculation changed by changing FP state should (I hope so,
    for corecntess's sake) flush any calculations done with the wrong
    FP mode. Just calling a small library routine which does so could
    be enough.

    Oh, and BTW, how does a user CALL a subroutine to set his RM when
    the RETURN undoes the very nature of his request ?!?

    An example of the high cost is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
    on the library implementation, gfortran may be over-cautious
    for the ieee_next_after function by saving and restoring fp state,
    but I'm not sure that the overhead is actually from the FP state or
    from calling extra functions.

    So... What is the best way to do allow this to be more efficient?

    UN DO this change to IEEE 754.

    The CPU could speculate on the FP mode (not sure if that is actually
    done). Other suggestions? How do current CPUs do so?

    Multi-threaded cores already have to ship different RMs to FUs on
    each FP instruction. The HW is all there--its the ISAs that are
    screwed up.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 10:09:49 2025
    From Newsgroup: comp.arch

    MitchAlsup <[email protected]d> schrieb:

    Thomas Koenig <[email protected]> posted:

    Fortran has an optional IEEE module. One of its important features
    is that flags (IEEE exceptions) are set to quiet on entry of a procedure
    and restored to signalling if it was signalling on entry, or keep it
    signalling if it was raised on in the procedure. Similarly, rounding
    modes are saved and restored for procedures. This is automatically
    done if the right IEEE modules are used. A user can set rounding
    modes or set and clear exceptions using the right modes.

    How does a user set up his environment such that if TAN()* overflows
    to infinity it returns with the OVERFLOW flag set ?!?

    (*) EXP(), POW(,)

    TAN and other numeric intrinsics are defined very loosely in the
    standard: "The result has a value equal to a processor-dependent
    approximation to tan(X)". Unfortunately, the language standard does
    not define this, but allows IEEE_OVERFLOW to signal in that case.
    In practice, all implementations I have tested do so.

    Conversely, how does one write a TAN()* subroutine with the above property ?

    IEEE_OVERFLOW on exit is IEEE_OVERFLOW on entry || IEEE_OVERFLOW
    when it is raised in the procedure.

    Conceptually, this is the right thing to do. A library routine should
    not produce different results depending on what a user did for his
    own calculations.

    I would think that a user setting RM=ToZero would WANT a different
    result from SIN() than the same call with RM=RNE ?!?

    You have to explicitly use the IEEE modules for this behavior,
    which the intrinsic procedures do not do.

    An example for dot product:

    module mymod
    implicit none
    contains
    subroutine my_dot(a, b, r_down, r_up, error)
    use, intrinsic :: iso_fortran_env, only : real64
    use, intrinsic:: ieee_arithmetic
    integer :: i
    real(real64), dimension(:), intent(in) :: a, b
    real(real64), intent(out) :: r_down,r_up
    logical, intent(out) :: error
    if (size(a) /= size(b)) then
    error = .true.
    return
    end if
    r_down = 0
    call ieee_set_rounding_mode(IEEE_DOWN)
    do i=1,size(a,1)
    r_down = ieee_fma(a(i),b(i),r_down)
    end do
    call ieee_set_rounding_mode(IEEE_UP)
    do i=1,size(a,1)
    r_up = ieee_fma(a(i),b(i),r_up)
    end do
    call ieee_get_flag(IEEE_OVERFLOW, error)
    call ieee_set_flag(IEEE_OVERFLOW,.false.)
    end subroutine my_dot
    end module mymod

    program main
    use, intrinsic :: iso_fortran_env, only : real64
    use mymod
    integer, parameter :: n = 1000
    real(real64), dimension(n) :: a, b
    real(real64) :: r_down, r_up, r_mid
    logical :: error
    call random_number(a)
    call random_number(b)
    a = a - 0.5
    b = b - 0.5
    call my_dot (a, b, r_down, r_up, error)
    if (error) stop "Oh no!"
    r_mid = dot_product(a,b)
    print '(1P,E22.15)',r_down,r_mid,r_up
    end program main

    [...]

    Oh, and BTW, how does a user CALL a subroutine to set his RM when
    the RETURN undoes the very nature of his request ?!?

    That's a no-op :-)

    But the way to do it is just to call the IEEE routines directly.


    An example of the high cost is
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570 where, depending
    on the library implementation, gfortran may be over-cautious
    for the ieee_next_after function by saving and restoring fp state,
    but I'm not sure that the overhead is actually from the FP state or
    from calling extra functions.

    So... What is the best way to do allow this to be more efficient?

    UN DO this change to IEEE 754.

    Does IEEE 754 concern itself with how it is implemented in
    programming languages? Terje?


    The CPU could speculate on the FP mode (not sure if that is actually
    done). Other suggestions? How do current CPUs do so?

    Multi-threaded cores already have to ship different RMs to FUs on
    each FP instruction. The HW is all there--its the ISAs that are screwed
    up.

    And that would be an interesting part - how to specify this
    efficiently, and allow the microarchiteture not to flush when
    rounding modes are changed.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Sun Sep 14 10:19:08 2025
    From Newsgroup: comp.arch

    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph 5,
    which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    So, that part of my premise was wrong. Sorry.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Sun Sep 14 17:06:56 2025
    From Newsgroup: comp.arch

    On Sun, 14 Sep 2025 10:19:08 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for
    procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph 5,
    which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    So, that part of my premise was wrong. Sorry.

    Now it sounds like matching Kahan's intentions.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From EricP@[email protected] to comp.arch on Sun Sep 14 10:52:02 2025
    From Newsgroup: comp.arch

    Thomas Koenig wrote:
    MitchAlsup <[email protected]d> schrieb:
    Thomas Koenig <[email protected]> posted:

    The CPU could speculate on the FP mode (not sure if that is actually
    done). Other suggestions? How do current CPUs do so?
    Multi-threaded cores already have to ship different RMs to FUs on
    each FP instruction. The HW is all there--its the ISAs that are screwed
    up.

    And that would be an interesting part - how to specify this
    efficiently, and allow the microarchiteture not to flush when
    rounding modes are changed.

    It has to sync-wait for all older FP instructions to finish executing.

    The x87 had separate control and status registers but
    SSE merged this into a single Control Status Register MXCSR.
    In order to read the current control bits it must also read the status bits, and to read the status bits it must wait until all outstanding SSE FP instructions have executed because the status register flags are defined
    as the OR of all the older FP instruction flags. But the FP status bits
    are not usually updated with control so reading them was unnecessary.

    On way is to have separate control registers for FP control
    (round mode RM, exception enables XE, etc) and FP status,
    and separate instructions to read and write them.
    Additionally one could have RM and other controls on each FP instruction
    which would allow one to change the RM without having to save, set and
    restore the control register.

    Additionally the x64's LDMXCSR and STMXCSR instructions load and store
    the current MXCSR value but *only with memory*, not to an integer register.
    Not only should these be separate CR and SR registers,
    the should allow the old CR to be saved/restored with an integer register.

    What it looks like most users want is a combined "copy current FP CR to
    integer register and set masked CR field to immediate or int register".
    That allows users to save & set just the RM without a sync-wait or
    touching memory with 1 instruction, and restore later.

    However it is not clear if this would be suitable for Fortran as you say:

    Thomas Koenig wrote:
    Fortran has an optional IEEE module. One of its important features
    is that flags (IEEE exceptions) are set to quiet on entry of a procedure
    and restored to signalling if it was signalling on entry, or keep it signalling if it was raised on in the procedure. Similarly, rounding
    modes are saved and restored for procedures. This is automatically
    done if the right IEEE modules are used. A user can set rounding
    modes or set and clear exceptions using the right modes.

    This sounds like Fortran's defined FP status algorithm requires it
    read the both current control and status,
    then set the RM and mask exceptions and clear the status bits,
    and later restore the old control flags, and optionally OR with old status. Unfortunately reading the current status register forces the sync-wait.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sun Sep 14 16:16:23 2025
    From Newsgroup: comp.arch

    Michael S <[email protected]> wrote:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)

    2. Does above said mean that caller has no way of modifying rounding
    mode used by callee? If true, it defeats one of original reasons for
    which Kahan invented rounding modes in the first place.

    The caller cannot change the callee's rounding mode without the
    callee having been designed for this (by taking the rounding mode
    as an extra argument and using IEEE_SET_ROUNDING_MODE itself).

    I think that's a good idea. If a library routine is written
    and debugged for a particular rounding mode, results should
    not change because somebody up the call tree changed it.

    Personally, I never found the whole rounding modes business useful in
    my practice. So, can not say with straight face whether Fortran's take
    on it is good idea or not. But I can say with good level of certainty
    that William Kahan meant something else.

    Kahan wrote a paper claiming that seting rounding mode to different
    value is great debugging help, especially when no source is available.
    So probably he really meant this. But first thing that authors of
    numerical packages learn is to set rounding mode to desired value.
    There are some codes which should work reasonably for all rounding
    modes, but a lot of code critically depends on rounding and will
    not work with different rounding mode. So, clearly this idea about
    standard was wrong (as several other things in the standard).
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@[email protected] (Waldek Hebisch) to comp.arch on Sun Sep 14 16:22:33 2025
    From Newsgroup: comp.arch

    Thomas Koenig <[email protected]> wrote:
    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph 5,
    which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    IIUC this means that user can not define routine like 'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be worked around using
    foreign function interface).
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@[email protected] to comp.arch on Mon Sep 15 02:31:04 2025
    From Newsgroup: comp.arch

    On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:

    Personally, I never found the whole rounding modes business useful in my practice.

    I remember one of Kahan’s writings suggesting it is useful for testing numeric stability: if your code gives results that differ only slightly in
    the four different rounding modes, then your calculations are *probably* stable; if the results vary a lot, then your calculations are *probably* unstable.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@[email protected] to comp.arch on Sun Sep 14 23:48:33 2025
    From Newsgroup: comp.arch

    Lawrence D’Oliveiro [2025-09-15 02:31:04] wrote:
    On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:
    Personally, I never found the whole rounding modes business useful in my
    practice.
    I remember one of Kahan’s writings suggesting it is useful for testing numeric stability: if your code gives results that differ only slightly in the four different rounding modes, then your calculations are *probably* stable; if the results vary a lot, then your calculations are *probably* unstable.

    IIUC you can get the same result by adding a bit a noise to your inputs
    and compare the output. Maybe it's easier to change rounding modes than
    to add noise to your inputs?


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From aph@[email protected] to comp.arch on Mon Sep 15 07:34:41 2025
    From Newsgroup: comp.arch

    Stefan Monnier <[email protected]> wrote:
    Lawrence D’Oliveiro [2025-09-15 02:31:04] wrote:
    On Sun, 14 Sep 2025 10:52:10 +0300, Michael S wrote:
    Personally, I never found the whole rounding modes business useful in my >>> practice.
    I remember one of Kahan’s writings suggesting it is useful for testing
    numeric stability: if your code gives results that differ only slightly in >> the four different rounding modes, then your calculations are *probably*
    stable; if the results vary a lot, then your calculations are *probably*
    unstable.

    IIUC you can get the same result by adding a bit a noise to your inputs
    and compare the output.

    No, because the result of changing rounding mode is highly correlated
    with the inputs, whereas noise is uncorrelated.

    Andrew.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Mon Sep 15 15:42:12 2025
    From Newsgroup: comp.arch

    On Sun, 14 Sep 2025 16:22:33 -0000 (UTC)
    [email protected] (Waldek Hebisch) wrote:

    Thomas Koenig <[email protected]> wrote:
    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for
    procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph
    5, which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    IIUC this means that user can not define routine like
    'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be
    worked around using foreign function interface).


    I am not sure that I understood.
    Do you want to say that user can not write routines that call IEEE_SET_ROUNDING_MODE ? I don't see anything in citation above that
    suggests that.
    Or do you mean something else?


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@[email protected] (Waldek Hebisch) to comp.arch on Tue Sep 16 01:59:34 2025
    From Newsgroup: comp.arch

    Michael S <[email protected]> wrote:
    On Sun, 14 Sep 2025 16:22:33 -0000 (UTC)
    [email protected] (Waldek Hebisch) wrote:

    Thomas Koenig <[email protected]> wrote:
    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for
    procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph
    5, which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    IIUC this means that user can not define routine like
    'MY_SET_ROUNDING_MODE' and get desired effect (this probably can be
    worked around using foreign function interface).


    I am not sure that I understood.
    Do you want to say that user can not write routines that call IEEE_SET_ROUNDING_MODE ? I don't see anything in citation above that
    suggests that.

    Of course thet can.

    Or do you mean something else?

    The citation above clearly requires that all changes to rounding
    mode done during execution of user routine are undone at exit
    from that routine. Consequenty _user_ routine can not change
    rounding mode of the caller. In other words, user routine can
    not offer alternative implementation of IEEE_SET_ROUNDING_MODE.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Tue Sep 16 12:36:30 2025
    From Newsgroup: comp.arch

    On 9/14/2025 9:06 AM, Michael S wrote:
    On Sun, 14 Sep 2025 10:19:08 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sun, 14 Sep 2025 07:18:38 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Michael S <[email protected]> schrieb:
    On Sat, 13 Sep 2025 14:55:58 -0000 (UTC)
    Thomas Koenig <[email protected]> wrote:

    Similarly, rounding modes are saved and restored for
    procedures.

    I am not sure that I understand.
    1. Is Fortran's equivalent of C's fesetround() is considered a
    languge primitive rather than procedure?

    Fortran has intrinsic procedures (like SIN, MATMUL or CPU_TIME),
    procedures from intrinsic modules, like COMPILER_OPTIONS
    from ISO_FORTRAN_ENV, and user-defined procedures.
    IEEE_SET_ROUNDING_MODE is a procedure from an intrinsic module
    (but an optional one). It need not be an external function;
    the compiler is free to do other things to implement it.

    (Hope this answers your question)



    OOOPS.

    Seems I misread the standard, missing out on clause 17.4 paragraph 5,
    which states

    In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS,
    the processor shall not change the rounding modes on entry, and
    on return shall ensure that the rounding modes are the same as
    they were on entry.

    So, that part of my premise was wrong. Sorry.

    Now it sounds like matching Kahan's intentions.



    So, this implies the rounding mode and similar are expected to follow
    dynamic scoping rather than global scoping?...

    May need to work on this some more if so, but it appears that a lot of existing C compilers treat it as global?...


    I guess things also come up with ISA choices.


    Can note, for example:
    RISC-V always has a 3-bit rounding mode in the instruction;
    GCC defaults to 7 (dynamic);
    BGBCC uses 0 (RNE) or 7 (DYN), depending on FENV_ACCESS
    XG1/2/3 have multiple several versions of some of the instructions:
    FADD/FSUB/FMUL : Hard-wired RNE (no status flag updates)
    FADDG/FSUBG/FMULG: Dynamic Rounding Mode (updates flags)
    FADDA/FSUBA/FMULA: Reduced Precision
    Binary64 format, but mimic Binary32 precision.
    No flags updates.

    It is possible to give a full static rounding mode (like in RISC-V), but
    this requires a 64-bit encoding. Static rounding other than RNE is very
    rare though.


    But, yeah, I have now noticed that my attempt to add an IEEE exact mode,
    due to pipeline timing issues in my Verilog code, was generating
    spurious Emulation-Request traps, which in some cases was causing
    problems (occasionally random data on the pipeline managed to cause the
    FPU to occasionally generate an FPU exception whenever using an FPU instruction due to whatever had been in the pipeline before the
    instruction in question was performed; regardless of the state of the
    IEEE-754 emulation flag).

    Had to fix this and then re-upload the Verilog core as I noted that this
    bug leaked into the version I had posted online.

    Also there was another related bug where FPU instructions in interrupt handlers could effect the FPU flags visible in userland. I ended up
    adding a mechanism to partly disable both the flags behavior and traps
    when inside an interrupt handler (so, interrupt handlers will always get DAZ/FTZ and similar).

    But, yeah...



    Otherwise:

    Some people in RISC-V land now jumped onboard with adding Load/Store
    Indexed and similar (and auto-increment, blarg). Their encodings clashed
    with my Load/Store Pair, but ended up resolving the issue partly by
    relocating my definition of LDP/SDP entirely over to the FLQ/FSQ
    instructions (my core not implementing the Q extension, so FLQ/FSQ could
    also be used for LDP/SDP).

    In the newest variant, even numbers will encode FPU pairs, as before,
    whereas Odd numbers encode GPR pairs. This is kinda backwards, but
    minimizes code breakage (and allows a grace period for the proposed
    encodings to "not break stuff"); though does still leave some "breaking changes".


    Have noted that currently code compiled with RV-C enabled in BGBCC is
    crash prone with my VL core; whereas code compiled with GCC seems to
    work. May require more investigation (did recently start trying to make
    use of more of the RV-C encodings in BGBCC, so maybe stumbled on
    something here).


    Had also been working on getting some previously unimplemented parts of
    XG3 working (such as support for predication). One part that was causing problems was that BGBCC was using some stale encodings (from earlier in
    the design of the encoding scheme). Seems also there was a bug where it
    was trying to use the predication encoding rules on jumbo-prefixes.

    Still not fully working yet though.

    ...

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@[email protected] (Scott Lurndal) to comp.arch on Tue Sep 16 17:50:09 2025
    From Newsgroup: comp.arch

    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt >handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Tue Sep 16 20:37:06 2025
    From Newsgroup: comp.arch

    On 9/16/2025 12:50 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt
    handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?



    I didn't go and track down which code was using FPU instructions, but seemingly something was, in any case. I didn't see any particular reason
    to forbid using the FPU inside of interrupt handlers (they are mostly
    still plain C, differing mostly in that there are limited to the
    operating in terms of the physical memory map).

    But, yeah, in any case, the partial workaround was that interrupt
    handlers don't update FPU flags. It was either this or give interrupt
    handlers their own version of FPSR, but cheaper/easier to not bother and
    have interrupt handlers behave as if FPSR were hard wired to all 0's.


    Most likely possibilities:
    MIDI / FM update ticks;
    Tick causes FM instruments to update and similar;
    PCM / WAVE update ticks.
    Ticks for transferring audio from software to hardware loop buffers.

    Both use some amount of floating point math internally.

    Where, say:
    Software loop buffer is mostly Binary16;
    Audio hardware uses a small A-Law loop buffer;
    Programs mostly submit audio as 16-bit PCM or similar, rate not
    necessarily tied to hardware sample rate.

    Didn't seem too unreasonable.

    But, in any case, using FP instructions in an interrupt handler
    shouldn't leave state changes that are visible in userland.

    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Robert Finch@[email protected] to comp.arch on Tue Sep 16 21:49:13 2025
    From Newsgroup: comp.arch

    On 2025-09-16 1:50 p.m., Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt
    handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?

    IIRC I was using FP in an interrupt handler at one point for video
    processing where the co-ordinates were already floats. I believe this
    has since been changed to fixed point arithmetic.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@[email protected] (Scott Lurndal) to comp.arch on Wed Sep 17 13:57:06 2025
    From Newsgroup: comp.arch

    BGB <[email protected]> writes:
    On 9/16/2025 12:50 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt
    handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?



    I didn't go and track down which code was using FPU instructions, but >seemingly something was, in any case. I didn't see any particular reason
    to forbid using the FPU inside of interrupt handlers (they are mostly
    still plain C, differing mostly in that there are limited to the
    operating in terms of the physical memory map).

    The standard reasoning for prohibiting floating point in the
    kernel is to improve system call overhead by not saving floating
    point registers until and unless there is a context switch (and
    even then, x86 has features that allow the OS to forgo saving
    the floating point registers if they weren't used in the last
    scheduling quantum).


    But, in any case, using FP instructions in an interrupt handler
    shouldn't leave state changes that are visible in userland.

    A well understood problem handled by all off the shelf operating
    systems.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Wed Sep 17 13:05:58 2025
    From Newsgroup: comp.arch

    On 9/17/2025 8:57 AM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/16/2025 12:50 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt >>>> handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?



    I didn't go and track down which code was using FPU instructions, but
    seemingly something was, in any case. I didn't see any particular reason
    to forbid using the FPU inside of interrupt handlers (they are mostly
    still plain C, differing mostly in that there are limited to the
    operating in terms of the physical memory map).

    The standard reasoning for prohibiting floating point in the
    kernel is to improve system call overhead by not saving floating
    point registers until and unless there is a context switch (and
    even then, x86 has features that allow the OS to forgo saving
    the floating point registers if they weren't used in the last
    scheduling quantum).


    In my case, this wasn't x86, and on my ISA the FPU stuff is done in
    GPRs, which typically need to be saved/restored either way. Well, except
    when running RISC-V code, which effectively splits the register space in
    half (32+32 rather than 64).


    The issue was that the FPSR is (now) aliased to SP(63:48), but there was
    only a single SP; and the CPU core handles interrupts by causing SP and
    SSP to switch places in decode.

    The likely more proper solution would have been to have another FPSR
    aliased to SSP(63:48) which also re-routes; where as-is SSP is currently
    only a 48 bit register internally.

    But, for now, easier was to disable the updates if inside an ISR.

    This issue wouldn't have existed if still using GBR/GP for this, but GBR
    has the disadvantage that it gets stomped whenever a reload occurs; so
    it was either tweak the GBR reload mechanism to not stomp FPSR, or move
    FPSR somewhere where it doesn't get stomped (the high bits of SP being
    the most obvious choice).

    Ran into a problem as some of my interrupt handling code does a sanity
    check to verify that SP was intact between interrupt entry and return,
    and some of this handling saw that SP changed unexpectedly and triggered
    a break-point (otherwise, it might have gone unnoticed).

    Though, this may be a payoff from being "needlessly pedantic" in this
    case. There was a previous check (now disabled) where it would have also XOR'ed all the callee save registers together and then checked the known
    state against the XOR (if a register having changed, it likely having
    changed the XOR). Disabled as XOR'ing all them together has a high overhead.


    Note that the logic does account for things like context switches, and
    the SYSCALL interrupt is using a different prolog/epilog sequence which
    is more optimized for context switching (but is only valid once a task
    state is configured).



    But, in any case, using FP instructions in an interrupt handler
    shouldn't leave state changes that are visible in userland.

    A well understood problem handled by all off the shelf operating
    systems.

    Possible.

    In this case, the leaked state was caught by some code being pedantic
    and noticing the HOB's of SP changing unexpectedly.



    Otherwise, instruction predication in XG3 now seems to mostly work (was
    mostly issues in BGBCC). Last issue was some paths where it was trying
    to use RISC-V encodings in cases where predication was being used, and
    the RISC-V ops not supporting predication.

    It wouldn't have been as simple as simply converting the RISC-V ops to
    XG3, as things are not always 1:1. In one of the places it came up, it
    is a tangled mess, as both RISC-V and XG3 instruction-generation are
    sorta tangled together.

    Had to detect predication was being used (for the current instructions),
    and effectively disable the use of RISC-V encodings in this case.

    Though something still isn't perfect, as the Doom demos desync in a
    different way, where a change in demo desync is usually evidence of a difference in program behavior (though can also be caused by memory corruption, etc, *).

    ...


    *: Though, in some cases, it is sensitive to memory contents for
    out-of-bounds memory accesses, which tended to differ between Doom
    versions. Some of the Doom source ports try to deal with this by fingerprinting the IWAD and then simulating the contents of the
    out-of-bounds memory and similar (along with various changes in game
    behavior) for each engine version. My port doesn't really bother (so, I
    live with the demo desync, but can still notice changes in demo desync).


    Though, almost some possible debate as to whether to bring back in some
    of the 2RI ops into XG3 (the 2RI-Imm10 space had been effectively from
    XG3, or not carried over from XG1/XG2). When using predication, a few of
    these become relevant again. In this mode, had been using the option of
    simply using some of the 3R ops but directing the output to R0/ZR as a
    special case to encode an intention to update the T bit, but this has
    some drawbacks (such as the lack of a decent sized immediate field).

    Though, in this case, the use of predication (and thus conditional
    compare) is low enough to leave it as debatable as to whether or not it
    would be a good idea to bring back these encodings (or just continue to
    live with some more limited Imm6s encodings, and the occasional use of jumbo-prefixes when the Imm6 fails).

    Actually, predication is itself debatable as it does effectively use
    half the encoding space, and is technically a minority of the
    instructions. But, does help with performance in some cases (namely, a
    lot of the cases where XG3 was meant to address).


    Could almost reuse the encoding space for a different set of 16-bit ops,
    but don't necessarily want yet another 16-bit decoder (and, one can
    argue, if code density matters enough to want to use 16-bit ops, jumping
    over to RV64GC mode and using RV-C ops may make more sense).

    Though, presently BGBCC doesn't support mixed RV-C and XG3 binaries, and
    this would kinda be a mess (though, the original ARM+Thumb scheme exists
    an example of the basic idea here).

    So, for now, they will likely remain as predicated ops and similar.

    ...

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Wed Sep 17 18:53:09 2025
    From Newsgroup: comp.arch


    BGB <[email protected]> posted:

    On 9/17/2025 8:57 AM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/16/2025 12:50 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt >>>> handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?



    I didn't go and track down which code was using FPU instructions, but
    seemingly something was, in any case. I didn't see any particular reason >> to forbid using the FPU inside of interrupt handlers (they are mostly
    still plain C, differing mostly in that there are limited to the
    operating in terms of the physical memory map).

    The standard reasoning for prohibiting floating point in the
    kernel is to improve system call overhead by not saving floating
    point registers until and unless there is a context switch (and
    even then, x86 has features that allow the OS to forgo saving
    the floating point registers if they weren't used in the last
    scheduling quantum).


    In my case, this wasn't x86, and on my ISA the FPU stuff is done in
    GPRs, which typically need to be saved/restored either way. Well, except when running RISC-V code, which effectively splits the register space in half (32+32 rather than 64).


    The issue was that the FPSR is (now) aliased to SP(63:48), but there was only a single SP; and the CPU core handles interrupts by causing SP and
    SSP to switch places in decode.

    Switching SP with SSP does not work (fundamentally) when there are
    4 privilege levels--as each privilege level needs its own unique SP.

    The likely more proper solution would have been to have another FPSR
    aliased to SSP(63:48) which also re-routes; where as-is SSP is currently only a 48 bit register internally.

    So, when one does a::

    ADD SP,SP,#big-number

    do the HoBs of SP get changed (like any other GPR or do you special
    case SP ?!?

    But, for now, easier was to disable the updates if inside an ISR.

    My 66000 code does not even know it is in an ISR--as ISRs do not
    disable interrupts, disable exceptions, and are re-entrant from
    the very first ISR instruction.

    Thus, ISRs can do FP if they desire, Vectorize loops as needed,
    and are not limited in their use of ISA.

    This issue wouldn't have existed if still using GBR/GP for this, but GBR
    has the disadvantage that it gets stomped whenever a reload occurs; so
    it was either tweak the GBR reload mechanism to not stomp FPSR, or move
    FPSR somewhere where it doesn't get stomped (the high bits of SP being
    the most obvious choice).

    My ISA does not have this problem as it does not need a GBR, universal constants (this time as a displacement) eliminated the need.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Thu Sep 18 12:41:48 2025
    From Newsgroup: comp.arch

    On 9/17/2025 1:53 PM, MitchAlsup wrote:

    BGB <[email protected]> posted:

    On 9/17/2025 8:57 AM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/16/2025 12:50 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 9/14/2025 9:06 AM, Michael S wrote:


    Also there was another related bug where FPU instructions in interrupt >>>>>> handlers could effect the FPU flags visible in userland.

    Why on earth would you use floating point instructions
    in an interrupt handler?



    I didn't go and track down which code was using FPU instructions, but
    seemingly something was, in any case. I didn't see any particular reason >>>> to forbid using the FPU inside of interrupt handlers (they are mostly
    still plain C, differing mostly in that there are limited to the
    operating in terms of the physical memory map).

    The standard reasoning for prohibiting floating point in the
    kernel is to improve system call overhead by not saving floating
    point registers until and unless there is a context switch (and
    even then, x86 has features that allow the OS to forgo saving
    the floating point registers if they weren't used in the last
    scheduling quantum).


    In my case, this wasn't x86, and on my ISA the FPU stuff is done in
    GPRs, which typically need to be saved/restored either way. Well, except
    when running RISC-V code, which effectively splits the register space in
    half (32+32 rather than 64).


    The issue was that the FPSR is (now) aliased to SP(63:48), but there was
    only a single SP; and the CPU core handles interrupts by causing SP and
    SSP to switch places in decode.

    Switching SP with SSP does not work (fundamentally) when there are
    4 privilege levels--as each privilege level needs its own unique SP.


    In this case, there are 3 major modes:
    user, Supervisor, and ISR

    But only 2 levels:
    User|Supervisor, ISR
    So User and Supervisor get their own SP, but in this case it is handled
    by using a context switch.


    The likely more proper solution would have been to have another FPSR
    aliased to SSP(63:48) which also re-routes; where as-is SSP is currently
    only a 48 bit register internally.

    So, when one does a::

    ADD SP,SP,#big-number

    do the HoBs of SP get changed (like any other GPR or do you special
    case SP ?!?


    In premise, if you add a big enough value to SP, then it will stomp the
    FPU state (at least as far as this sort of thing goes, it behaves like a
    GPR).

    In normal use, it wont matter, since SP only ever covers a small range
    of values.

    Granted, maybe this is a stupid way to do it...


    Similarly, modifying the HOBs of SP can be used to modify the FPU state.

    This does still pose a non-zero risk of breaking stuff in RISC-V mode,
    where GCC is likely to be unaware of the wonk in the high order bits of SP.

    Most likely case is special-casing the ADD and ADDI instructions in the
    RISC-V decoder when working with SP (the high bits always being read a 0
    and ignored on write).

    In this case though, "ORI" or similar could be used to access the full register.



    But, for now, easier was to disable the updates if inside an ISR.

    My 66000 code does not even know it is in an ISR--as ISRs do not
    disable interrupts, disable exceptions, and are re-entrant from
    the very first ISR instruction.

    Thus, ISRs can do FP if they desire, Vectorize loops as needed,
    and are not limited in their use of ISA.


    In my case, as noted, they can still use the FPU, but it is now
    hard-wired to DAZ/FTZ mode. They also operate in physical memory addressing.

    Likewise, no support for re-entrant interrupt handling.


    For now, main way to do more than this is to perform a context-switch to
    a supervisor mode task.


    This issue wouldn't have existed if still using GBR/GP for this, but GBR
    has the disadvantage that it gets stomped whenever a reload occurs; so
    it was either tweak the GBR reload mechanism to not stomp FPSR, or move
    FPSR somewhere where it doesn't get stomped (the high bits of SP being
    the most obvious choice).

    My ISA does not have this problem as it does not need a GBR, universal constants (this time as a displacement) eliminated the need.


    FWIW: I could have used PC-rel, but this doesn't work well with NoMMU.
    And, a shared address space following NoMMU rules has less overhead than multi-mapping pages or CoW.

    Likewise, absolute addressing doesn't appeal to me (if anything, it is
    worse than PC-rel).


    The main other option (besides putting it in the HOB's of SP or similar)
    would have been to create a new CR for it, but then would need to
    save/restore this CR on context switch and similar (and adding a new CR
    would have been more logic than shoving it into the HOB's of SP or similar).

    Another option would have been the HOBs of SR, but as-is SR(63:32) is currently assumed to hold global system-level state; whereas FPSR is
    clearly task-local.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andy Valencia@[email protected] to comp.arch on Mon Sep 22 07:33:53 2025
    From Newsgroup: comp.arch

    BGB <[email protected]> writes:
    Why on earth would you use floating point instructions
    in an interrupt handler?
    I didn't go and track down which code was using FPU instructions, but seemingly something was,

    I believe I noted once before that I had to guide a driver developer for
    HP/UX away from using floating point for his driver's event counters. His reason for their use? They can count to bigger numbers.

    Andy Valencia
    Home page: https://www.vsta.org/andy/
    To contact me: https://www.vsta.org/contact/andy.html
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Terje Mathisen@[email protected] to comp.arch on Mon Sep 22 17:05:45 2025
    From Newsgroup: comp.arch

    Andy Valencia wrote:
    BGB <[email protected]> writes:
    Why on earth would you use floating point instructions
    in an interrupt handler?
    I didn't go and track down which code was using FPU instructions, but
    seemingly something was,

    I believe I noted once before that I had to guide a driver developer for HP/UX away from using floating point for his driver's event counters. His reason for their use? They can count to bigger numbers.

    Did you ask him what happened after the counter passed 2^53 and he added
    1.0 to the counter? With RNE that's a NOP...

    Using 64-bit counters are far better, even on a 32-bit CPU, even if you
    then need thread-safe/atomic code sequences.

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.arch on Mon Sep 22 18:17:05 2025
    From Newsgroup: comp.arch

    On Mon, 22 Sep 2025 17:05:45 +0200
    Terje Mathisen <[email protected]> wrote:

    Andy Valencia wrote:
    BGB <[email protected]> writes:
    Why on earth would you use floating point instructions
    in an interrupt handler?
    I didn't go and track down which code was using FPU instructions,
    but seemingly something was,

    I believe I noted once before that I had to guide a driver
    developer for HP/UX away from using floating point for his driver's
    event counters. His reason for their use? They can count to
    bigger numbers.

    Did you ask him what happened after the counter passed 2^53 and he
    added 1.0 to the counter? With RNE that's a NOP...

    Using 64-bit counters are far better, even on a 32-bit CPU, even if
    you then need thread-safe/atomic code sequences.

    Terje



    Since the counter is in memory, using FP does not help thread safety.
    As to 2**53, that's enough for most things. At least for counting
    events that do not happen more often than few millions times per
    second.


    --- Synchronet 3.21a-Linux NewsLink 1.2