• Concertina II: Finding Happiness Through Coding

    From quadi@[email protected] to comp.arch on Wed Apr 22 03:31:03 2026
    From Newsgroup: comp.arch

    I was not happy that when I did not use a block prefix, I had to omit the
    Load Medium and Store Medium instructions from the basic load/store instructions.

    I searched for available opcode space.

    I found a little; enough for the _other_ block prefixes. But not a full
    1/16 of the opcode space which is what the Type I header needed. Where did
    I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

    So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven-
    bit prefix.

    But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1,
    which is 136, which is greater than 128.

    However, if I made use of the fact that I would know if the preceding 16-
    bit zone began a 32-bit instruction, and added certain other restrictions
    on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

    This may be a step too far, so I've saved everything if I need to go back.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Wed Apr 22 18:15:17 2026
    From Newsgroup: comp.arch


    quadi <[email protected]d> posted:

    I was not happy that when I did not use a block prefix, I had to omit the Load Medium and Store Medium instructions from the basic load/store instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    I searched for available opcode space.

    I found a little; enough for the _other_ block prefixes. But not a full
    1/16 of the opcode space which is what the Type I header needed. Where did
    I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

    So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven- bit prefix.

    But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1, which is 136, which is greater than 128.

    However, if I made use of the fact that I would know if the preceding 16-
    bit zone began a 32-bit instruction, and added certain other restrictions
    on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

    This may be a step too far, so I've saved everything if I need to go back.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Thu Apr 23 03:35:44 2026
    From Newsgroup: comp.arch

    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <[email protected]d> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values;
    although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Thu Apr 23 17:17:48 2026
    From Newsgroup: comp.arch

    On 4/22/2026 10:35 PM, quadi wrote:
    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <[email protected]d> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values; although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.


    Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

    Mostly hadn't amounted to much.



    I did end up experimenting with support for a very niche converter:
    (31:0) => (63:0)
    As:
    (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

    Currently only available in an Imm32 instruction.

    Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
    that miss as Binary16:
    Multiples of 1/3, 1/5 and similar hit with this.

    It fails for patterns like 1/7, 1/9, ... or similar, which have a
    different bit pattern length (pattern doesn't repeat along an 8-bit
    spacing).

    Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
    that repeats on a multiple of 12 bits. But, this sort of thing is
    getting a bit niche (would need different patterns to deal with
    different fractions).

    But, is a relatively affordable way to deal with this pattern; even if
    it can't be crammed into a small size in the same way as simple BFP
    patterns (and encoding an index into a table of possible patterns wont
    save much over expressing the pattern directly).

    Also, the 12-bit pattern case can be noted to miss more with patterns
    that would hit with 8-bit or with binary16 (the 8-bit pattern case
    mostly overlaps as well with the area covered by Binary16). A 6-bit
    pattern could still overlap with Binary16's range, but would be more
    limited in the fractions it can deal with.



    Only really relevant for constant values though (as a live FP format,
    would be worse than normal BFP).

    Though, can make use of the extra bit left over from the Imm32f
    encodings (which are actually stored as Imm33). More a debate though of
    if it is worth the non-zero additional LUT cost to do so.


    But, this combination would leave, statistically:
    Imm16f: 63%
    Imm6f 25% (S.E3.M2)
    Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
    Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

    ...


    While Imm32fn has a higher hit rate than Imm32fu, they have a
    non-overlap, so the combined Imm32fun in this case seems to have around
    a 96% hit-rate, with around 4% in the "miss" category (irrational
    constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs
    2-bit for 1/3 and 1/5).

    If I added the 12-bit pattern (in addition to the existing two), could
    maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
    by itself has a lower hit-rate than simply truncating the Binary64 value
    to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
    do worse than trunc32 + 8b pattern.


    But, dunno.



    However, the relative usage of floating point immediate values is low
    enough that this doesn't make a big impact on code density.


    Not much more "low hanging fruit" for improving code density ATM, but it
    seems like if I could squeeze out a few more percent on overall code
    density, it could put XG3 more solidly in the lead vs RV64GC+JX (where,
    right now it is pretty close and which one wins/loses depends a lot on
    the program being tested).

    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Thu Apr 23 18:08:04 2026
    From Newsgroup: comp.arch

    On 4/23/2026 5:17 PM, BGB wrote:
    On 4/22/2026 10:35 PM, quadi wrote:
    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <[email protected]d> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store >>>> instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values;
    although not part of the IEEE 754 standard, they follow the pattern of
    the
    types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.


    Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

    Mostly hadn't amounted to much.



    I did end up experimenting with support for a very niche converter:
      (31:0) => (63:0)
    As:
      (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

    Currently only available in an Imm32 instruction.

    Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
    that miss as Binary16:
    Multiples of 1/3, 1/5 and similar hit with this.

    It fails for patterns like 1/7, 1/9, ... or similar, which have a
    different bit pattern length (pattern doesn't repeat along an 8-bit spacing).

    Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
    that repeats on a multiple of 12 bits. But, this sort of thing is
    getting a bit niche (would need different patterns to deal with
    different fractions).

    But, is a relatively affordable way to deal with this pattern; even if
    it can't be crammed into a small size in the same way as simple BFP
    patterns (and encoding an index into a table of possible patterns wont
    save much over expressing the pattern directly).

    Also, the 12-bit pattern case can be noted to miss more with patterns
    that would hit with 8-bit or with binary16 (the 8-bit pattern case
    mostly overlaps as well with the area covered by Binary16). A 6-bit
    pattern could still overlap with Binary16's range, but would be more
    limited in the fractions it can deal with.



    Only really relevant for constant values though (as a live FP format,
    would be worse than normal BFP).

    Though, can make use of the extra bit left over from the Imm32f
    encodings (which are actually stored as Imm33). More a debate though of
    if it is worth the non-zero additional LUT cost to do so.


    But, this combination would leave, statistically:
      Imm16f: 63%
        Imm6f 25% (S.E3.M2)
      Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
      Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

    ...


    While Imm32fn has a higher hit rate than Imm32fu, they have a non-
    overlap, so the combined Imm32fun in this case seems to have around a
    96% hit-rate, with around 4% in the "miss" category (irrational
    constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs 2-
    bit for 1/3 and 1/5).

    If I added the 12-bit pattern (in addition to the existing two), could
    maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
    by itself has a lower hit-rate than simply truncating the Binary64 value
    to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
    do worse than trunc32 + 8b pattern.



    Relevant, but failed to mention, 12-bit pattern:
    (31:4), (15:4), (15:4), (15:8), (3:0)

    Which is effectively S.E11.M4 apart from the bits forming the pattern.

    Though, could maybe see if there is some other patterns that could do
    better here is terms of average hit-rate.

    Or, if the seeming relative success of truncation and the 8-bit pattern
    is more of a "take it as good enough and leave it at that" thing.


    My last "big survey of floating point constants" had failed to take into account stats for repeating bit patterns (hadn't thought of trying to go
    this route at the time; had thought in terms of power-of-10 scaling, but
    this was much less feasible than trying to account more directly for the repeating bit patterns within the fractions).


    As noted, a repeating pattern can in premise deal with all smaller
    patters that have a common factor:
    12 can handle 2, 3, 4, 6;
    8 can handle 2, 4, 8.

    Downfall of the 12-bit pattern is that it doesn't leave enough bits in
    the mantissa for the top-end value.

    Though, could squeeze a few bits out of the exponent:
    (31:30),
    (30)?4'h0:4'hF,
    (29:4),
    (15:4), (15:4), (15:12), (3:0)

    Effectively giving 8-bits of usable mantissa.

    Or, maybe sacrifice the sign bit (almost always positive for FPU
    immediate values):
    1'b0, (30),
    (30) ? 4'h0 : 4'hF,
    (29:4),
    (31) ?
    { (15:4), (15:4), (15:12) } :
    { (11:4), (11:4), (11:4), (11:8) },
    (3:0)


    Haven't evaluated these possibilities yet though to determine possible
    effects on hit-rate...



    But, dunno.



    However, the relative usage of floating point immediate values is low
    enough that this doesn't make a big impact on code density.


    Not much more "low hanging fruit" for improving code density ATM, but it seems like if I could squeeze out a few more percent on overall code density, it could put XG3 more solidly in the lead vs RV64GC+JX (where, right now it is pretty close and which one wins/loses depends a lot on
    the program being tested).

    ...




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Fri Apr 24 05:29:12 2026
    From Newsgroup: comp.arch

    MitchAlsup <[email protected]d> schrieb:

    quadi <[email protected]d> posted:

    I was not happy that when I did not use a block prefix, I had to omit the >> Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    Obviously, this refers to steaks.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Fri Apr 24 12:01:34 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for
    Intermediate can be confused with I for Integer.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@[email protected] to comp.arch on Fri Apr 24 10:53:19 2026
    From Newsgroup: comp.arch

    On 2026-04-24 8:01 a.m., quadi wrote:
    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.

    John Savard

    What about triple and quad precision? Or extended triple precision?

    For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
    Float:8 myvar;

    Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
    is just:

    float byte myvar;
    float quad qvar;

    Can also use shorter form for some types like:
    double dvar;
    Instead of having to type ‘float double dvar;’

    Some float approximations will supply around 7 bits which works well to
    fill in the significand for the progression of 16, 32, 64, 128-bit floats.

    Having a 48-bit float type likely does not save any processing time over
    a 64-bit type. It is more a matter of storage space.

    48-bit floats in arrays may slow down indexed addressing; scaled index
    address modes are usually a power of two.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Fri Apr 24 10:22:45 2026
    From Newsgroup: comp.arch

    On 4/24/2026 7:01 AM, quadi wrote:
    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.


    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    ...

    Did look into 48b Load/Store ops, but didn't stick.
    Could have supported a 48-bit format mostly by using 48b Load/Store.
    Other option being to fake it by using 64b ops, and MUX'ing.
    Load, MUX, Store

    Less efficient, but TW was super niche and hard to justify keeping it
    around.




    But, yeah, otherwise disrupted by a PSU failure on main PC (yesterday).
    Waiting for a new PSU to show up, can't get back to "business as usual"
    until then. Failed PSU was a 750W Rosewill, ordered a 750W MSI,
    hopefully works... Was $30 more than another Rosewill, but hopefully
    worth it (there were also much more expensive PSUs, I just didn't go for
    the cheapest option in this case, but yeah).

    Lots of bad luck in general yesterday.


    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Fri Apr 24 16:08:03 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 10:53:19 -0400, Robert Finch wrote:

    What about triple and quad precision? Or extended triple precision?

    There is quad precision, referred to as extended precision.
    Normally, there is no 96-bit triple precision. That may, however, make an appearance when the computer is working with storage divided into 48-bit
    units instead of 32/64-bit units.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Fri Apr 24 16:12:26 2026
    From Newsgroup: comp.arch

    On Wed, 22 Apr 2026 03:31:03 +0000, quadi wrote:

    This may be a step too far, so I've saved everything if I need to go
    back.

    While I had tried to organize the coding scheme, I decided that it was too complex to be tolerable.

    The compromise of eliminating medium format floating-point loads and
    stores from the default basic instruction set was not tolerable.

    The compromise to the 15-bit paired instructions that preceded that was
    also not tolerable.

    So what to do? What I've been doing all along in this design process -
    move the compromise somewhere else, and see if I can put up with it. So
    now I've decided to take the 32-bit header for variable-length
    instructions, and put the compromise there. This required bringing back 16-
    bit short instructions.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Fri Apr 24 18:52:07 2026
    From Newsgroup: comp.arch


    BGB <[email protected]> posted:

    On 4/24/2026 7:01 AM, quadi wrote:
    --------------------
    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stefan Monnier@[email protected] to comp.arch on Fri Apr 24 09:28:31 2026
    From Newsgroup: comp.arch

    I was not happy that when I did not use a block prefix, I had to omit the >>> Load Medium and Store Medium instructions from the basic load/store
    instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,


    === Stefan
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@[email protected] to comp.arch on Fri Apr 24 20:05:45 2026
    From Newsgroup: comp.arch

    Stefan Monnier <[email protected]> schrieb:
    I was not happy that when I did not use a block prefix, I had to omit the >>>> Load Medium and Store Medium instructions from the basic load/store
    instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,

    Well done!
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Apr 25 01:48:04 2026
    From Newsgroup: comp.arch


    Thomas Koenig <[email protected]> posted:

    Stefan Monnier <[email protected]> schrieb:
    I was not happy that when I did not use a block prefix, I had to omit the
    Load Medium and Store Medium instructions from the basic load/store >>>> instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,

    Well done!

    You can sous vidé
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Sat Apr 25 02:08:10 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 16:12:26 +0000, quadi wrote:

    So what to do? What I've been doing all along in this design process -
    move the compromise somewhere else, and see if I can put up with it. So
    now I've decided to take the 32-bit header for variable-length
    instructions, and put the compromise there.

    Something truly evil has occurred to me. But since it _is_ so evil, I
    don't think that I will do it.

    Instead of removing the LM (Load Medium) and STM (Store Medium) basic memory-reference instructions...

    there are another two that, under a certain circumstance, could be removed.

    Under that circumstance, there would still be IB (Insert Byte) and IH
    (Insert Halfword), and ULB (Unsigned Load Byte) and ULH (Unsigned Load Halfword).

    But _not_ I (Insert) and UL (Unsigned Load).

    In the case of a *32-bit* architecture.

    So the truly evil thing would be...

    1) To decide that a 32-bit version of the architecture needs to be defined;
    2) To decide that it should be the default;
    3) To decide that not switching modes to get at instruction set extensions should apply to the switch between 32 bits and 64 bits too, so that 64-bit code would consist _entirely_ of instruction blocks that begin with a
    block header, because instructions without a header could only be in one state, that being 32-bits.

    Of course, it's 3) that exposes the true evilness of this scheme. So I
    don't think it's a place I want to go.

    However, let's say I do want to define 32-bit operation, but _with_ a mode bit.

    Then in 32-bit mode, those two opcodes would get used for an uncompromised variable-length instruction header.

    Now what is 64-bit mode going to look like? That would almost force going
    back to the option of demoting LM and STM. Or does it mean I have to come
    up with something more devious, something truly perverse, that somehow provides a headerless header, sneaking in an invisible mode bit in the
    code itself? But there's no such thing as a free bit; they're like midday meals in this regard.

    The System/360 had it simple - extra instructions needed for 64-bit
    operation? Just shove them in the 64-bit opcode space. But in Concertina
    II, instructions longer than 32 bits are somewhat wasteful in overhead,
    though I've tried... and they're _only_ available with a particular
    category of headers. So I feel I need to have everything _important_ in
    the basic 32-bit instruction set.

    This direction of thinking suggests... that I use some of the opcode space
    I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous Concertina II iterations. Emergency long instructions - inefficient because _both_ 32-
    bit words of the instruction have to begin with 9 or so overhead bits to indicate they belong to such an instruction... but less inefficient than adding a whole 32-bit header to the block if you just need one of them in
    the block.

    That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

    While such a capability may be a good thing in itself, though, using it as
    an excuse to uglify the basic instruction set is _still_ something I would want to avoid, so I don't think it solves the problem of restoring the 32-
    bit variable-length instruction header to its uncompromised glory.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Sat Apr 25 03:43:29 2026
    From Newsgroup: comp.arch

    On Sat, 25 Apr 2026 02:08:10 +0000, quadi wrote:

    This direction of thinking suggests... that I use some of the opcode
    space I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous
    Concertina II iterations. Emergency long instructions - inefficient
    because _both_ 32- bit words of the instruction have to begin with 9 or
    so overhead bits to indicate they belong to such an instruction... but
    less inefficient than adding a whole 32-bit header to the block if you
    just need one of them in the block.

    That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

    Thinking about this led me to do something completely different, which required me to use a bit more opcode space for headers - but I think I
    left enough where I grabbed it from to still do this as well.

    This was done to add some additional flexibility to one type of variable length code - now the _short_ instructions, instead of the memory-
    reference operate instructions, can be given the power to alter the
    condition codes.

    This comes at a cost, though. Now the memory-reference operate
    instructions can only use the first half of each register bank as
    destination registers, and the header no longer provides access to a
    secondary 32-bit instruction set as well, if this option is chosen.

    While this could be very useful, by allowing short instructions to be used
    in cases where they previously could not, it's still somewhat perverse,
    like much else I have done in this and previous iterations of Concertina
    II.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Sat Apr 25 00:38:35 2026
    From Newsgroup: comp.arch

    On 4/24/2026 1:52 PM, MitchAlsup wrote:

    BGB <[email protected]> posted:

    On 4/24/2026 7:01 AM, quadi wrote:
    --------------------
    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming. >>
    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.


    In my case, I was coming from SH-4 and x86.
    SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
    Though, differs from M68K and "i386" in various ways
    (eg, no "%" on registers, ...).
    Well, and 0x1234 vs $1234 or similar.
    Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
    But, seems even within GAS usage, this was inconsistent.
    Q/X: from x86 (though x86 also used DQ instead of X for some ops).


    At present, it seems like 'X' may have been a mistake (well, along with
    trying to use both sets of mnemonics and then trying to auto-detect the
    ASM style).

    Though, there is still the problem that there is no good or fully
    reliable way to tell the which ASM syntax is in use (and, neither
    annotates it, and since both evolved from variants of GAS ASM syntax, it
    makes it harder).

    ...

    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234
    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...



    Otherwise:
    New PSU showed up, and is installed, and main PC is working again.


    Decided to test the new decimal packing schemes against the "bulk
    scavenged FP constants" test, results currently for this test;
    Binary16 hit rate : 63.7%
    Truncated to 32 bits: 66.9%
    Packing, 8b-A: 73.9%
    Packing, 8b-B: 62.5%
    Packing, 12b : 61.3%
    T32 + 8b-B + 12b: 77.2%
    T32 + 8b-A: 76.9%

    This is lower than my earlier estimates based on my smaller scale tests.

    Where, as noted, unpacking patters:
    Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
    T32: (31:0), 32'h0
    8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
    8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (11:4), (11:4), (11:4), (11:8), (3:0)
    12: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (15:4), (15:4), (15:12), (3:0)


    The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
    is also what I had already implemented experimentally).

    While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.


    But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
    Fp16 case for FPU immediate forms).


    Still debatable if worth the costs (while it is improvement in hit rate,
    it is also a bit of a corner case).


    ...



    John Savard


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Apr 25 18:00:22 2026
    From Newsgroup: comp.arch


    BGB <[email protected]> posted:

    On 4/24/2026 1:52 PM, MitchAlsup wrote:
    ------------------
    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.
    ------------------
    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234

    Use C notation when possible.
    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...

    The [] notations tell ASM that the instruction has to be a
    memory reference, the () notations do not.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Sat Apr 25 13:39:25 2026
    From Newsgroup: comp.arch

    On 4/25/2026 1:00 PM, MitchAlsup wrote:

    BGB <[email protected]> posted:

    On 4/24/2026 1:52 PM, MitchAlsup wrote:
    ------------------
    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.

    OK.

    ------------------
    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234

    Use C notation when possible.

    That is my preference (I usually use 0x1234 without any extra
    adornment), usually...

    Except that the 6502/65C816 and M68K fans seem to really like using
    $1234 instead...

    Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).

    Likewise, GAS's i386 syntax was likely influenced by the M68K ASM syntax.


    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...

    The [] notations tell ASM that the instruction has to be a
    memory reference, the () notations do not.


    Could be.
    I think () comes mainly from the PDP/VAX/M68K style lineage...
    Whereas [] were used more with Intel and ARM and similar.



    As noted, I ended up preferring (Rb, Disp) over Disp(Rb), but RISC-V's standard ASM syntax went the other way.

    Neither ended up using @Rb syntax though, which was used by Hitachi and
    Texas Instruments, but in a way differing in the specifics from how DEC
    had used it in PDP and VAX (or was some of this more due to AT&T, hard
    to tell?...).

    ...




    --- Synchronet 3.21f-Linux NewsLink 1.2