Forum: War Ensemble BBS

Re: branch splitting

From Thomas Koenig@[email protected] to comp.arch on Sun Apr 5 06:49:00 2026

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> schrieb:

Paul Clayton <[email protected]> posted:

On 11/5/25 3:43 PM, MitchAlsup wrote:
[snip]

I am now working on predictors for a 6-wide My 66000 machine--which is a bit
different.
a) VEC-LOOP loops do not alter the branch prediction tables.
b) Predication clauses do not alter the BPTs.

Not recording the history of predicates may have a negative
effect on global history predictors. (I do not know if anyone
has studied this, but it has been mentioned — e.g.,
"[predication] has a negative side-effect because the removal
of branches eliminates useful correlation information
necessary for conventional branch predictors" from "Improving
Branch Prediction and Predicated Execution in Out-of-Order
Processors", Eduardo Quiñones et al., 2007.)

It depends on where you are looking! If you think branch prediction
alters where FETCH is Fetching, then MY 66000 predication does not
do predication prediction--predication is used when the join point
will have already been fetched by the time the condition is known.
Then, either the then clause or the else clause will be nullified
without backup (i.e., branch prediction repair).

DECODE is still able to predict then-clause versus else-clause
and maintain the no-backup property, as long as both sides are
issued into the execution window.

Predicate prediction can also be useful when the availability
of the predicate is delayed. Similarly, selective eager
execution might be worthwhile when the predicate is delayed;
the selection is likely to be predictive (resource use might
be a basis for selection but even estimating that might be
predictive).

The difference is that predication prediction never needs branch
prediction repair.

What happens to the instructions after the predicate?

Let's say we have

[...]
peq0 r1,tf
mov r2,#24
mov r2,#48
ldd r3,[r4,r2,0]
[...]

can the ldd be speculatively executed or not? And what happens if the prediction was wrong?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Sun Apr 5 20:35:46 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> posted:

MitchAlsup <[email protected]d> schrieb:

Paul Clayton <[email protected]> posted:

On 11/5/25 3:43 PM, MitchAlsup wrote:
[snip]

I am now working on predictors for a 6-wide My 66000 machine--which is a bit
different.
a) VEC-LOOP loops do not alter the branch prediction tables.
b) Predication clauses do not alter the BPTs.

Not recording the history of predicates may have a negative
effect on global history predictors. (I do not know if anyone
has studied this, but it has been mentioned — e.g.,
"[predication] has a negative side-effect because the removal
of branches eliminates useful correlation information
necessary for conventional branch predictors" from "Improving
Branch Prediction and Predicated Execution in Out-of-Order
Processors", Eduardo Quiñones et al., 2007.)

It depends on where you are looking! If you think branch prediction
alters where FETCH is Fetching, then MY 66000 predication does not
do predication prediction--predication is used when the join point
will have already been fetched by the time the condition is known.
Then, either the then clause or the else clause will be nullified
without backup (i.e., branch prediction repair).

DECODE is still able to predict then-clause versus else-clause
and maintain the no-backup property, as long as both sides are
issued into the execution window.

Predicate prediction can also be useful when the availability
of the predicate is delayed. Similarly, selective eager
execution might be worthwhile when the predicate is delayed;
the selection is likely to be predictive (resource use might
be a basis for selection but even estimating that might be
predictive).

The difference is that predication prediction never needs branch
prediction repair.

What happens to the instructions after the predicate?

Let's say we have

[...]
peq0 r1,tf
mov r2,#24
mov r2,#48
ldd r3,[r4,r2,0]
[...]

can the ldd be speculatively executed or not? like::

peq0 r1,tf
ldd r3,[r4,#24]
ldd r3,[r4,#28]

And what happens if the prediction was wrong?

Some form of backup and do it right, along with some form of
not updating the cache on the one which was not supposed to
be executed {or TLB or L2}.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@[email protected] to comp.arch on Mon Apr 6 05:11:21 2026

From Newsgroup: comp.arch

MitchAlsup <[email protected]d> schrieb:

Thomas Koenig <[email protected]> posted:

Let's say we have

[...]
peq0 r1,tf
mov r2,#24
mov r2,#48
ldd r3,[r4,r2,0]
[...]

can the ldd be speculatively executed or not? like::

peq0 r1,tf
ldd r3,[r4,#24]
ldd r3,[r4,#28]

Yes, that can be simplified.

Could the load in the original be speculatively executed?

And what happens if the prediction was wrong?

Some form of backup and do it right, along with some form of
not updating the cache on the one which was not supposed to
be executed {or TLB or L2}.

Does your answer apply to my original code or to the one
that you posted? If it only applies to the latter, I can
easily make up an example that cannot be simplified the
way you did (let's take that as a given).
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@[email protected] to comp.arch on Mon Apr 6 16:24:36 2026

From Newsgroup: comp.arch

Thomas Koenig <[email protected]> posted:

MitchAlsup <[email protected]d> schrieb:

Thomas Koenig <[email protected]> posted:

Let's say we have

[...]
peq0 r1,tf
mov r2,#24
mov r2,#48
ldd r3,[r4,r2,0]
[...]

can the ldd be speculatively executed or not? like::

peq0 r1,tf
ldd r3,[r4,#24]
ldd r3,[r4,#28]

Yes, that can be simplified.

Could the load in the original be speculatively executed?

And what happens if the prediction was wrong?

Some form of backup and do it right, along with some form of
not updating the cache on the one which was not supposed to
be executed {or TLB or L2}.

Does your answer apply to my original code or to the one
that you posted?

Both.

If it only applies to the latter, I can
easily make up an example that cannot be simplified the
way you did (let's take that as a given).

--- Synchronet 3.21f-Linux NewsLink 1.2

From Robert Finch@[email protected] to comp.arch on Tue Apr 7 22:53:59 2026

From Newsgroup: comp.arch

Dedicated a general-purpose register (out of 128 GPRs) to store the
round mode for different data types. Each data type has a nibble to hold
the round mode.

Bits
0 to 3 FLT - floating point
4 to 7 DFLT - decimal float
8 to 11 POS - posit (reserved)
12 to 15 FIX - fixed point
16 to 19 INT - integer (arithmetic shift right, average)

If the dynamic round mode is changed, then the round mode is visible
with the same register rename as other GPRs. The round mode is then
easily updated with a bitfield insert (DEP).

It does mean the round mode occupies an operand slot in the RS.

I suppose there could be a separate rounding mode for each precision too.

The round mode is separate from the FP status reg. which is not a GPR.
The FP status reg. is stored in the ROB and eventually makes it back to
the architectural FP status reg. It is not readable without using a FP
FENCE instruction first.

Thinking about merging status registers for different data types into
the same status register.

--- Synchronet 3.21f-Linux NewsLink 1.2

Who's Online
Recent Visitors
- D2sk
  Sat Apr 25 08:20:10 2026
  from Fort Smith, Ar via Raw
- Noozle
  Sat Apr 25 08:05:40 2026
  from Noozle City via Telnet
- Microbot
  Sat Apr 25 02:25:30 2026
  from Moore, Ok via Telnet
- Kaptain_Krawdad
  Fri Apr 24 21:39:52 2026
  from Southern Il via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,114
Nodes:	10 (0 / 10)
Uptime:	492513:50:01
Calls:	14,267
Calls today:	3
Files:	186,320
D/L today:	26,858 files (8,724M bytes)
Messages:	2,518,447

Re: branch splitting

Who's Online

Recent Visitors

System Info