• Proper programming for the SPARC...

    From Chris M. Thomasson@[email protected] to comp.arch on Fri Jun 5 16:23:33 2026
    From Newsgroup: comp.arch

    Does not matter what memory model you are using RMO, PSO, TSO... I was
    always taught to program for it as if its always in RMO mode. So, if you
    want a atomic RMW with acquire semantics, we had to:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad


    For release:

    MEMBAR #LoadStore | #StoreStore
    atomic RMW


    Try to avoid #StoreLoad at all costs, unless absolutely necessary.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Jun 6 01:47:49 2026
    From Newsgroup: comp.arch


    "Chris M. Thomasson" <[email protected]> posted:

    Does not matter what memory model you are using RMO, PSO, TSO... I was always taught to program for it as if its always in RMO mode. So, if you want a atomic RMW with acquire semantics, we had to:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad

    Make sure all out-of-order LDs have been made visible system-wide
    after the atomic so that Lamport's criterion has been met.


    For release:

    MEMBAR #LoadStore | #StoreStore
    atomic RMW

    Same as above but s/LDs/STs/

    Note: My 66000 performs these on behalf of the programmer without a
    MEMBAR instruction.


    Try to avoid #StoreLoad at all costs, unless absolutely necessary.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Chris M. Thomasson@[email protected] to comp.arch on Sat Jun 6 11:57:36 2026
    From Newsgroup: comp.arch

    On 6/5/2026 6:47 PM, MitchAlsup wrote:
    "Chris M. Thomasson"<[email protected]> posted:

    Does not matter what memory model you are using RMO, PSO, TSO... I was
    always taught to program for it as if its always in RMO mode. So, if you
    want a atomic RMW with acquire semantics, we had to:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad
    Make sure all out-of-order LDs have been made visible system-wide
    after the atomic so that Lamport's criterion has been met.

    For release:

    MEMBAR #LoadStore | #StoreStore
    atomic RMW
    Same as above but s/LDs/STs/

    Note: My 66000 performs these on behalf of the programmer without a
    MEMBAR instruction.

    Is your 66000 automatically seq_cst? Or is it TSO?

    Simply because the explicit MEMBAR combinations I posted for
    acquire/release are designed for a fully relaxed model (RMO). On a SPARC running in TSO mode, those specific barriers should safely degrade into hardware no-ops because the arch natively enforces those orderings.

    If you 66000 doesn't need them, does it mean your atomic rmw's
    automatically carry acquire/release tracking in the pipeline, or is the
    whole arch just running on a stronger global memory model by default?




    Try to avoid #StoreLoad at all costs, unless absolutely necessary.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Jun 6 19:23:34 2026
    From Newsgroup: comp.arch


    "Chris M. Thomasson" <[email protected]> posted:

    On 6/5/2026 6:47 PM, MitchAlsup wrote:
    "Chris M. Thomasson"<[email protected]> posted:

    Does not matter what memory model you are using RMO, PSO, TSO... I was
    always taught to program for it as if its always in RMO mode. So, if you >> want a atomic RMW with acquire semantics, we had to:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad
    Make sure all out-of-order LDs have been made visible system-wide
    after the atomic so that Lamport's criterion has been met.

    For release:

    MEMBAR #LoadStore | #StoreStore
    atomic RMW
    Same as above but s/LDs/STs/

    Note: My 66000 performs these on behalf of the programmer without a
    MEMBAR instruction.

    Is your 66000 automatically seq_cst? Or is it TSO?

    Cacheable is more relaxed than TSO when just accessing memory.
    When the first LL is decoded, the LL address cannot leave the
    core until all older memory addresses have left the core.
    When SC is decoded, it will leave the core before any younger
    memory references leave the core. And when there are multiple
    participating cache lines, all the intermediate SCs become
    visible at the same instant.

    So, between a LL and the SC it is sequentially consistent, otherwise
    it is causal.

    Accesses to MMI/O are SC, accesses to control register headers (such
    as BARs) are strongly ordered.

    Accesses to ROM are unordered and incoherent.

    Simply because the explicit MEMBAR combinations I posted for
    acquire/release are designed for a fully relaxed model (RMO). On a SPARC running in TSO mode, those specific barriers should safely degrade into hardware no-ops because the arch natively enforces those orderings.

    If you 66000 doesn't need them, does it mean your atomic rmw's
    automatically carry acquire/release tracking in the pipeline, or is the whole arch just running on a stronger global memory model by default?

    It knows when it is running an ATOMIC event and switches at the boundaries.




    Try to avoid #StoreLoad at all costs, unless absolutely necessary.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Chris M. Thomasson@[email protected] to comp.arch on Sun Jun 7 12:43:43 2026
    From Newsgroup: comp.arch

    On 6/6/2026 12:23 PM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/5/2026 6:47 PM, MitchAlsup wrote:
    "Chris M. Thomasson"<[email protected]> posted:

    Does not matter what memory model you are using RMO, PSO, TSO... I was >>>> always taught to program for it as if its always in RMO mode. So, if you >>>> want a atomic RMW with acquire semantics, we had to:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad
    Make sure all out-of-order LDs have been made visible system-wide
    after the atomic so that Lamport's criterion has been met.

    For release:

    MEMBAR #LoadStore | #StoreStore
    atomic RMW
    Same as above but s/LDs/STs/

    Note: My 66000 performs these on behalf of the programmer without a
    MEMBAR instruction.

    Is your 66000 automatically seq_cst? Or is it TSO?

    Cacheable is more relaxed than TSO when just accessing memory.
    When the first LL is decoded, the LL address cannot leave the
    core until all older memory addresses have left the core.
    When SC is decoded, it will leave the core before any younger
    memory references leave the core. And when there are multiple
    participating cache lines, all the intermediate SCs become
    visible at the same instant.

    So, between a LL and the SC it is sequentially consistent, otherwise
    it is causal.

    Accesses to MMI/O are SC, accesses to control register headers (such
    as BARs) are strongly ordered.

    Accesses to ROM are unordered and incoherent.

    Simply because the explicit MEMBAR combinations I posted for
    acquire/release are designed for a fully relaxed model (RMO). On a SPARC
    running in TSO mode, those specific barriers should safely degrade into
    hardware no-ops because the arch natively enforces those orderings.

    If you 66000 doesn't need them, does it mean your atomic rmw's
    automatically carry acquire/release tracking in the pipeline, or is the
    whole arch just running on a stronger global memory model by default?

    It knows when it is running an ATOMIC event and switches at the boundaries.


    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;


    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000
    atomic_store(&g_p0, p0);


    //________________


    // Thread 2 (consumer)
    l0 = atomic_load(&g_p0);
    if (l0)
    {
    acquire_barrier(); // or equivalent
    l0->wizzfroboz();
    }
    ___________








    Try to avoid #StoreLoad at all costs, unless absolutely necessary.




    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Mon Jun 8 18:59:36 2026
    From Newsgroup: comp.arch


    "Chris M. Thomasson" <[email protected]> posted:

    On 6/6/2026 12:23 PM, MitchAlsup wrote:
    --------------------------
    It knows when it is running an ATOMIC event and switches at the boundaries.


    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;

    When used below, that pointer needs a different value/address.

    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000

    What functionality is ascribed to release_barrier ?
    Why does it have no identifier () ?
    Can there beonly 1 barrier ???

    atomic_store(&g_p0, p0);

    PRE {RW,D$},[Rg_p0]
    STDL Rp0,[Rg_p0]


    //________________


    // Thread 2 (consumer)
    l0 = atomic_load(&g_p0);

    LDL Rl0,[Rg_p0]
    INVAL [Rg_p0] // Terminate event

    if (l0)
    BEQ label0
    {
    acquire_barrier(); // or equivalent

    What functionality is ascribed to acquire_barrier ?
    Why does it have no identifier () ?
    Can there only be 1 barrier ???

    l0->wizzfroboz();

    CALX [Rl0,#wizzfroboz]
    }
    label0:
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Chris M. Thomasson@[email protected] to comp.arch on Tue Jun 9 11:38:18 2026
    From Newsgroup: comp.arch

    On 6/8/2026 11:59 AM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/6/2026 12:23 PM, MitchAlsup wrote:
    --------------------------
    It knows when it is running an ATOMIC event and switches at the boundaries. >>

    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order
    visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;

    When used below, that pointer needs a different value/address.

    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000

    What functionality is ascribed to release_barrier ?

    Fwiw, the exact same as on the SPARC.

    MEMBAR #LoadStore | #StoreStore
    atomic RMW

    The barrier doesn't need a variable identifier because it acts as a
    fence on the core's memory execution pipeline, forcing prior writes to
    drain before subsequent writes can execute.

    Please take careful note that the barrier must be placed before the
    atomic logic.


    For the acquire its:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad

    Please take careful note that the barrier must be placed after the
    atomic logic to prevent speculative reads from leaking backward in the pipeline.

    C++ adopted this exact decoupled fence paradigm with
    std::atomic_thread_fence:

    https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence

    Take note that no heavy #StoreLoad order has to be used for this classic publish/consume pattern. Just like a mutex. Well, Peterson's aside for a moment...

    acquire/release is NOT strong enough to order store followed by a load
    to another location.

    [snip what I have to ponder on that wrt your arch]

    I will get back to you. Busy with some other work.

    The SPARC is free form. Now, tagging a variable identifier wrt the
    membars might be more efficient, but how does your system work with a
    mutex to do that? Say the locked region is comprised of several
    unrelated variables?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Wed Jun 10 01:16:59 2026
    From Newsgroup: comp.arch


    "Chris M. Thomasson" <[email protected]> posted:

    On 6/8/2026 11:59 AM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/6/2026 12:23 PM, MitchAlsup wrote:
    --------------------------
    It knows when it is running an ATOMIC event and switches at the boundaries.


    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order >> visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;

    When used below, that pointer needs a different value/address.

    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000

    What functionality is ascribed to release_barrier ?

    Fwiw, the exact same as on the SPARC.

    MEMBAR #LoadStore | #StoreStore
    atomic RMW

    The barrier doesn't need a variable identifier because it acts as a
    fence on the core's memory execution pipeline, forcing prior writes to
    drain before subsequent writes can execute.

    Please take careful note that the barrier must be placed before the
    atomic logic.


    For the acquire its:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad

    Please take careful note that the barrier must be placed after the
    atomic logic to prevent speculative reads from leaking backward in the pipeline.

    In this case both *_barrier are NoOps.

    C++ adopted this exact decoupled fence paradigm with std::atomic_thread_fence:

    https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence

    Take note that no heavy #StoreLoad order has to be used for this classic publish/consume pattern. Just like a mutex. Well, Peterson's aside for a moment...

    acquire/release is NOT strong enough to order store followed by a load
    to another location.

    [snip what I have to ponder on that wrt your arch]

    I will get back to you. Busy with some other work.

    The SPARC is free form. Now, tagging a variable identifier wrt the
    membars might be more efficient, but how does your system work with a
    mutex to do that? Say the locked region is comprised of several
    unrelated variables?

    The core runs nominally in causal order.

    When an ATOMIC event starts (with a LL) the core reverts to sequentially consistent. All older memory references have to have left the core (L1
    and TLB) before the LL can leave the core.

    When an ATOMIC event ends (with a SC) the core reverts to causal. All participating lines become visible in the instant the SC is performed,
    while no references younger than the event can leave the core before
    the SC.

    In effect, the core inserts the MEMBARs on behalf of the program at
    changes to the ATOMIC-event status.
    --------------------------------

    Note: the core runs in one of 3 defined modes {Optimistic, careful,
    and methodological}. At completion of an ATOMIC-event (or context
    switch into) core reverts to optimistic.

    In Optimistic mode, core tries to barrel through the event, and if
    nobody saw it pass through, then all is good. However, if anybody
    interfered with the passage through the event, the first event fails,
    control is transferred to the Atomic-Control-point, and code continues
    in careful mode.

    {{The Atomic-Control-Point is the address of the first LL instruction
    unless a Branch-on-interference is performed which changes the ACP
    to the label of the branch.}}

    In Careful mode, the core enters the sequentially consistent state and carefully orders references inserting MEMBAR at the LL and another at
    the SC. If this fails, core enters Methodological mode, if success,
    core reverts to Optimistic.

    In methodological mode, core touches the participating inbound memory references, and when it finds the point-of-resolution, it bundles the participating addresses and ships them off to a system arbiter. The
    arbiter grants all (or none) and puts the core in a position to NaK
    interfering requests to its granted lines. A Granted ATOMIC event will
    succeed. Once finished the core reverts to Optimistic.

    {{The arbiter is much like a TLB in size and circuit organization.
    The Arbiter processes requests in arrival order, and returns grants
    in arrival order. Processes that don't share memory use independent
    arbiters.}}

    {{The point-of-resolution follows SW-instructions touching each par-
    ticipating line and precedes the first ST to a participating line}}

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Chris M. Thomasson@[email protected] to comp.arch on Wed Jun 10 13:35:19 2026
    From Newsgroup: comp.arch

    On 6/9/2026 6:16 PM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/8/2026 11:59 AM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/6/2026 12:23 PM, MitchAlsup wrote:
    --------------------------
    It knows when it is running an ATOMIC event and switches at the boundaries.


    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order >>>> visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;

    When used below, that pointer needs a different value/address.

    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000

    What functionality is ascribed to release_barrier ?

    Fwiw, the exact same as on the SPARC.

    MEMBAR #LoadStore | #StoreStore
    atomic RMW

    The barrier doesn't need a variable identifier because it acts as a
    fence on the core's memory execution pipeline, forcing prior writes to
    drain before subsequent writes can execute.

    Please take careful note that the barrier must be placed before the
    atomic logic.


    For the acquire its:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad

    Please take careful note that the barrier must be placed after the
    atomic logic to prevent speculative reads from leaking backward in the
    pipeline.

    In this case both *_barrier are NoOps.

    C++ adopted this exact decoupled fence paradigm with
    std::atomic_thread_fence:

    https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence

    Take note that no heavy #StoreLoad order has to be used for this classic
    publish/consume pattern. Just like a mutex. Well, Peterson's aside for a
    moment...

    acquire/release is NOT strong enough to order store followed by a load
    to another location.

    [snip what I have to ponder on that wrt your arch]

    I will get back to you. Busy with some other work.

    The SPARC is free form. Now, tagging a variable identifier wrt the
    membars might be more efficient, but how does your system work with a
    mutex to do that? Say the locked region is comprised of several
    unrelated variables?

    The core runs nominally in causal order.

    When an ATOMIC event starts (with a LL) the core reverts to sequentially consistent. All older memory references have to have left the core (L1
    and TLB) before the LL can leave the core.

    When an ATOMIC event ends (with a SC) the core reverts to causal. All participating lines become visible in the instant the SC is performed,
    while no references younger than the event can leave the core before
    the SC.

    In effect, the core inserts the MEMBARs on behalf of the program at
    changes to the ATOMIC-event status.
    --------------------------------

    Note: the core runs in one of 3 defined modes {Optimistic, careful,
    and methodological}. At completion of an ATOMIC-event (or context
    switch into) core reverts to optimistic.

    In Optimistic mode, core tries to barrel through the event, and if
    nobody saw it pass through, then all is good. However, if anybody
    interfered with the passage through the event, the first event fails,
    control is transferred to the Atomic-Control-point, and code continues
    in careful mode.

    {{The Atomic-Control-Point is the address of the first LL instruction
    unless a Branch-on-interference is performed which changes the ACP
    to the label of the branch.}}

    In Careful mode, the core enters the sequentially consistent state and carefully orders references inserting MEMBAR at the LL and another at
    the SC. If this fails, core enters Methodological mode, if success,
    core reverts to Optimistic.

    Sorry for the quick question. Will get back to you on this. Its
    interesting. So, you say seq_cst. So, it will automatically handle the
    store followed by a load to another location that TSO cannot handle? On
    the SPARC that requires a damn #StoreLoad. So, we try to avoid that. No
    matter what arch. But if the arch is automatically seq_cst in that area,
    then well... How does it compare to a tight algo, say RCU that can be
    used highly efficiently on a weak order system. It does not need
    seq_cst, or even acquire/release membars at all. It just need load order dependency.




    In methodological mode, core touches the participating inbound memory references, and when it finds the point-of-resolution, it bundles the participating addresses and ships them off to a system arbiter. The
    arbiter grants all (or none) and puts the core in a position to NaK interfering requests to its granted lines. A Granted ATOMIC event will succeed. Once finished the core reverts to Optimistic.

    {{The arbiter is much like a TLB in size and circuit organization.
    The Arbiter processes requests in arrival order, and returns grants
    in arrival order. Processes that don't share memory use independent arbiters.}}

    {{The point-of-resolution follows SW-instructions touching each par- ticipating line and precedes the first ST to a participating line}}


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Thu Jun 11 17:56:39 2026
    From Newsgroup: comp.arch


    "Chris M. Thomasson" <[email protected]> posted:

    On 6/9/2026 6:16 PM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/8/2026 11:59 AM, MitchAlsup wrote:

    "Chris M. Thomasson" <[email protected]> posted:

    On 6/6/2026 12:23 PM, MitchAlsup wrote:
    --------------------------
    It knows when it is running an ATOMIC event and switches at the boundaries.


    How would this common publish/consume pattern work on your 66000?
    Note: all atomic RMW are naked on the 66000 — no built-in memory order >>>> visibility on the atomics themselves.

    Pseudo-Code, sorry:
    _______________
    // say for fun... :^)

    // Visible to a specific core cluster
    // (e.g. cpuA cores 0-5, cpuB cores 3-7, etc.)
    g_p0 = nullptr;

    When used below, that pointer needs a different value/address.

    // Thread 1 (producer)
    // ... initialize p0 and do work ...

    // Publish
    release_barrier(); // or equivalent on 66000

    What functionality is ascribed to release_barrier ?

    Fwiw, the exact same as on the SPARC.

    MEMBAR #LoadStore | #StoreStore
    atomic RMW

    The barrier doesn't need a variable identifier because it acts as a
    fence on the core's memory execution pipeline, forcing prior writes to
    drain before subsequent writes can execute.

    Please take careful note that the barrier must be placed before the
    atomic logic.


    For the acquire its:

    atomic RMW
    MEMBAR #LoadStore | #LoadLoad

    Please take careful note that the barrier must be placed after the
    atomic logic to prevent speculative reads from leaking backward in the
    pipeline.

    In this case both *_barrier are NoOps.

    C++ adopted this exact decoupled fence paradigm with
    std::atomic_thread_fence:

    https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence

    Take note that no heavy #StoreLoad order has to be used for this classic >> publish/consume pattern. Just like a mutex. Well, Peterson's aside for a >> moment...

    acquire/release is NOT strong enough to order store followed by a load
    to another location.

    [snip what I have to ponder on that wrt your arch]

    I will get back to you. Busy with some other work.

    The SPARC is free form. Now, tagging a variable identifier wrt the
    membars might be more efficient, but how does your system work with a
    mutex to do that? Say the locked region is comprised of several
    unrelated variables?

    The core runs nominally in causal order.

    When an ATOMIC event starts (with a LL) the core reverts to sequentially consistent. All older memory references have to have left the core (L1
    and TLB) before the LL can leave the core.

    When an ATOMIC event ends (with a SC) the core reverts to causal. All participating lines become visible in the instant the SC is performed, while no references younger than the event can leave the core before
    the SC.

    In effect, the core inserts the MEMBARs on behalf of the program at
    changes to the ATOMIC-event status.
    --------------------------------

    Note: the core runs in one of 3 defined modes {Optimistic, careful,
    and methodological}. At completion of an ATOMIC-event (or context
    switch into) core reverts to optimistic.

    In Optimistic mode, core tries to barrel through the event, and if
    nobody saw it pass through, then all is good. However, if anybody interfered with the passage through the event, the first event fails, control is transferred to the Atomic-Control-point, and code continues
    in careful mode.

    {{The Atomic-Control-Point is the address of the first LL instruction unless a Branch-on-interference is performed which changes the ACP
    to the label of the branch.}}

    In Careful mode, the core enters the sequentially consistent state and carefully orders references inserting MEMBAR at the LL and another at
    the SC. If this fails, core enters Methodological mode, if success,
    core reverts to Optimistic.

    Sorry for the quick question. Will get back to you on this. Its
    interesting. So, you say seq_cst. So, it will automatically handle the
    store followed by a load to another location that TSO cannot handle? On

    As long as some inbound memory ref (LD or PRE) is performed prior to the
    store, the core will either become sequentially consistent or watch the
    address for interference (or both) the ST is ordered after the LD. In the
    case of Optimistic and Interference, core backs up to the LD and tries
    again.

    So, probably yes; this does what TSO cannot/does not.

    the SPARC that requires a damn #StoreLoad. So, we try to avoid that. No matter what arch. But if the arch is automatically seq_cst in that area, then well... How does it compare to a tight algo, say RCU that can be
    used highly efficiently on a weak order system. It does not need
    seq_cst, or even acquire/release membars at all. It just need load order dependency.

    I will have to look at RCU ...




    In methodological mode, core touches the participating inbound memory references, and when it finds the point-of-resolution, it bundles the participating addresses and ships them off to a system arbiter. The
    arbiter grants all (or none) and puts the core in a position to NaK interfering requests to its granted lines. A Granted ATOMIC event will succeed. Once finished the core reverts to Optimistic.

    {{The arbiter is much like a TLB in size and circuit organization.
    The Arbiter processes requests in arrival order, and returns grants
    in arrival order. Processes that don't share memory use independent arbiters.}}

    {{The point-of-resolution follows SW-instructions touching each par- ticipating line and precedes the first ST to a participating line}}


    --- Synchronet 3.22a-Linux NewsLink 1.2