• Re: branch splitting

    From Thomas Koenig@[email protected] to comp.arch on Sun Apr 5 06:49:00 2026
    From Newsgroup: comp.arch

    MitchAlsup <[email protected]d> schrieb:

    Paul Clayton <[email protected]> posted:

    On 11/5/25 3:43 PM, MitchAlsup wrote:
    [snip]
    I am now working on predictors for a 6-wide My 66000 machine--which is a bit
    different.
    a) VEC-LOOP loops do not alter the branch prediction tables.
    b) Predication clauses do not alter the BPTs.

    Not recording the history of predicates may have a negative
    effect on global history predictors. (I do not know if anyone
    has studied this, but it has been mentioned — e.g.,
    "[predication] has a negative side-effect because the removal
    of branches eliminates useful correlation information
    necessary for conventional branch predictors" from "Improving
    Branch Prediction and Predicated Execution in Out-of-Order
    Processors", Eduardo Quiñones et al., 2007.)

    It depends on where you are looking! If you think branch prediction
    alters where FETCH is Fetching, then MY 66000 predication does not
    do predication prediction--predication is used when the join point
    will have already been fetched by the time the condition is known.
    Then, either the then clause or the else clause will be nullified
    without backup (i.e., branch prediction repair).

    DECODE is still able to predict then-clause versus else-clause
    and maintain the no-backup property, as long as both sides are
    issued into the execution window.

    Predicate prediction can also be useful when the availability
    of the predicate is delayed. Similarly, selective eager
    execution might be worthwhile when the predicate is delayed;
    the selection is likely to be predictive (resource use might
    be a basis for selection but even estimating that might be
    predictive).

    The difference is that predication prediction never needs branch
    prediction repair.

    What happens to the instructions after the predicate?

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? And what happens if the prediction was wrong?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sun Apr 5 20:35:46 2026
    From Newsgroup: comp.arch


    Thomas Koenig <[email protected]> posted:

    MitchAlsup <[email protected]d> schrieb:

    Paul Clayton <[email protected]> posted:

    On 11/5/25 3:43 PM, MitchAlsup wrote:
    [snip]
    I am now working on predictors for a 6-wide My 66000 machine--which is a bit
    different.
    a) VEC-LOOP loops do not alter the branch prediction tables.
    b) Predication clauses do not alter the BPTs.

    Not recording the history of predicates may have a negative
    effect on global history predictors. (I do not know if anyone
    has studied this, but it has been mentioned — e.g.,
    "[predication] has a negative side-effect because the removal
    of branches eliminates useful correlation information
    necessary for conventional branch predictors" from "Improving
    Branch Prediction and Predicated Execution in Out-of-Order
    Processors", Eduardo Quiñones et al., 2007.)

    It depends on where you are looking! If you think branch prediction
    alters where FETCH is Fetching, then MY 66000 predication does not
    do predication prediction--predication is used when the join point
    will have already been fetched by the time the condition is known.
    Then, either the then clause or the else clause will be nullified
    without backup (i.e., branch prediction repair).

    DECODE is still able to predict then-clause versus else-clause
    and maintain the no-backup property, as long as both sides are
    issued into the execution window.

    Predicate prediction can also be useful when the availability
    of the predicate is delayed. Similarly, selective eager
    execution might be worthwhile when the predicate is delayed;
    the selection is likely to be predictive (resource use might
    be a basis for selection but even estimating that might be
    predictive).

    The difference is that predication prediction never needs branch
    prediction repair.

    What happens to the instructions after the predicate?

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Mon Apr 6 05:11:21 2026
    From Newsgroup: comp.arch

    MitchAlsup <[email protected]d> schrieb:

    Thomas Koenig <[email protected]> posted:

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    Yes, that can be simplified.

    Could the load in the original be speculatively executed?


    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.

    Does your answer apply to my original code or to the one
    that you posted? If it only applies to the latter, I can
    easily make up an example that cannot be simplified the
    way you did (let's take that as a given).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Mon Apr 6 16:24:36 2026
    From Newsgroup: comp.arch


    Thomas Koenig <[email protected]> posted:

    MitchAlsup <[email protected]d> schrieb:

    Thomas Koenig <[email protected]> posted:

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    Yes, that can be simplified.

    Could the load in the original be speculatively executed?


    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.

    Does your answer apply to my original code or to the one
    that you posted?

    Both.

    If it only applies to the latter, I can
    easily make up an example that cannot be simplified the
    way you did (let's take that as a given).

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@[email protected] to comp.arch on Tue Apr 7 22:53:59 2026
    From Newsgroup: comp.arch

    Dedicated a general-purpose register (out of 128 GPRs) to store the
    round mode for different data types. Each data type has a nibble to hold
    the round mode.

    Bits
    0 to 3 FLT - floating point
    4 to 7 DFLT - decimal float
    8 to 11 POS - posit (reserved)
    12 to 15 FIX - fixed point
    16 to 19 INT - integer (arithmetic shift right, average)

    If the dynamic round mode is changed, then the round mode is visible
    with the same register rename as other GPRs. The round mode is then
    easily updated with a bitfield insert (DEP).

    It does mean the round mode occupies an operand slot in the RS.

    I suppose there could be a separate rounding mode for each precision too.

    The round mode is separate from the FP status reg. which is not a GPR.
    The FP status reg. is stored in the ROB and eventually makes it back to
    the architectural FP status reg. It is not readable without using a FP
    FENCE instruction first.

    Thinking about merging status registers for different data types into
    the same status register.



    --- Synchronet 3.21f-Linux NewsLink 1.2