• Concertina II Instead

    From quadi@[email protected] to comp.arch on Sat Mar 7 00:15:09 2026
    From Newsgroup: comp.arch

    I had considered proceeding on to a CISC Concertina III. However, after starting to look at that project, I found that there was a lack of opcode space in one spot.
    Just the other day, though, it occurred to me that there was a possibility
    of improving Concertina II.
    After a long period of changing it, because I was dissatisfied with the various options for shortening the memory-reference instructions by one
    bit, I decided to leave the memory-reference instructions at their full length, and claim opcode space from somewhere less important: I replaced 14-bit short instructions by 13-bit short instructions.
    What occurred to me was that I could instead keep the full-length memory- reference instructions and have 14-bit short instructions, and fit
    everything else into space left by unused opcodes for 14-bit short instructions.
    In order to make that work, I had to alter the 14-bit short instructions a bit. I re-ordered the fields in the shift instructions, so that I could
    put the supervisor call instruction in with them, and the 14-bit branch instructions now only came with a three-bit field to select the condition
    they could test.
    That let me fit all the instructions in.
    The opcode space for block headers, however, was reduced. Which is not
    really that much of a bad thing; it means that now the block headers will
    be pared down (to just one!) and thus greatly simplified.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sat Mar 7 19:00:02 2026
    From Newsgroup: comp.arch


    quadi <[email protected]d> posted:

    I had considered proceeding on to a CISC Concertina III. However, after starting to look at that project, I found that there was a lack of opcode space in one spot.
    Just the other day, though, it occurred to me that there was a possibility of improving Concertina II.
    After a long period of changing it, because I was dissatisfied with the various options for shortening the memory-reference instructions by one
    bit, I decided to leave the memory-reference instructions at their full length, and claim opcode space from somewhere less important: I replaced 14-bit short instructions by 13-bit short instructions.
    What occurred to me was that I could instead keep the full-length memory- reference instructions and have 14-bit short instructions, and fit everything else into space left by unused opcodes for 14-bit short instructions.
    In order to make that work, I had to alter the 14-bit short instructions a bit. I re-ordered the fields in the shift instructions, so that I could
    put the supervisor call instruction in with them, and the 14-bit branch instructions now only came with a three-bit field to select the condition they could test.

    I admire your effort.

    That let me fit all the instructions in.

    I smell danger--running out of OpCode space early in the design.
    After the general conceptualization of the ISA you should have
    half of the OpCode space available for future additions !!.

    The opcode space for block headers, however, was reduced. Which is not really that much of a bad thing; it means that now the block headers will
    be pared down (to just one!) and thus greatly simplified.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Sun Mar 8 01:14:42 2026
    From Newsgroup: comp.arch

    On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:

    I smell danger--running out of OpCode space early in the design.
    After the general conceptualization of the ISA you should have half of
    the OpCode space available for future additions !!.

    Well, while it is true that only 1/64th of the opcode space for 32-bit instructions is left, my current plan is to use 1/128th for headers, and
    the other 1/128th for instructions longer than 32 bits. Which means that
    there is still space for 511 times as many instructions as are already defined, even if I never went beyond 48 bits.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Sun Mar 8 06:08:14 2026
    From Newsgroup: comp.arch

    On Sun, 08 Mar 2026 01:14:42 +0000, quadi wrote:

    On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:

    I smell danger--running out of OpCode space early in the design. After
    the general conceptualization of the ISA you should have half of the
    OpCode space available for future additions !!.

    Well, while it is true that only 1/64th of the opcode space for 32-bit instructions is left, my current plan is to use 1/128th for headers, and
    the other 1/128th for instructions longer than 32 bits. Which means that there is still space for 511 times as many instructions as are already defined, even if I never went beyond 48 bits.

    However, the lack of opcode space did cause me one problem. Previously, I
    had a type of header which started with the four bits 1111. This was
    followed by fourteen two-bit prefixes, which applied to every 16 bits remaining in the 256-bit code block.

    They indicated:

    00 - 17-bit instruction, starting with 0
    01 - 17-bit instruction, starting with 1
    10 - start of a 32-bit or longer instruction
    11 - not the start of an instruction, don't start decoding here.

    Now, with only 1/64th of the opcode space left, the block header starts
    with a minimum of six fixed bits.

    A 10 prefix has to be followed by a 11 prefix. I thought of perhaps coming
    up with a very elaborate coding scheme to take advantage of this to save
    the bits I needed.

    But I decided to go with a much simpler option instead. A bit of
    compressive coding is still needed, but now the scheme is simple. I just switched from 17-bit short instructions to 16-bit instructions for code
    with mixed-length instructions. In some respects, the limitations of 16-
    bit instructions are complentary to those of 15-bit instructions, the ones that can occur in pairs within 32-bit instruction code, and so the two
    types can be mixed in a block to somewhat mitigate their limitations.

    Three bits can encode two prefixes with only three possibilities; since
    the start of a 32-bit or longer instruction can only be followed by a 16-
    bit extent not decoded, I only need seven values, not nine.

    So now I can combine 15-bit short instructions (rather than the
    drastically limited 14-bit short instructions) in plain 32-bit instruction code, without having some kind of awkward restriction on the memory-
    reference instructions... and everything fits.

    Instead of having sixteen types of headers, I have retreated to just three types of headers: one for a zero-overhead header, one for variable-length instructions, and one for the VLIW features. There is, however, space for more, and I am notoriously bad at resisting temptation.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Sun Mar 8 05:13:18 2026
    From Newsgroup: comp.arch

    On 3/7/2026 1:00 PM, MitchAlsup wrote:

    quadi <[email protected]d> posted:

    I had considered proceeding on to a CISC Concertina III. However, after
    starting to look at that project, I found that there was a lack of opcode
    space in one spot.
    Just the other day, though, it occurred to me that there was a possibility >> of improving Concertina II.
    After a long period of changing it, because I was dissatisfied with the
    various options for shortening the memory-reference instructions by one
    bit, I decided to leave the memory-reference instructions at their full
    length, and claim opcode space from somewhere less important: I replaced
    14-bit short instructions by 13-bit short instructions.
    What occurred to me was that I could instead keep the full-length memory-
    reference instructions and have 14-bit short instructions, and fit
    everything else into space left by unused opcodes for 14-bit short
    instructions.
    In order to make that work, I had to alter the 14-bit short instructions a >> bit. I re-ordered the fields in the shift instructions, so that I could
    put the supervisor call instruction in with them, and the 14-bit branch
    instructions now only came with a three-bit field to select the condition
    they could test.

    I admire your effort.

    That let me fit all the instructions in.

    I smell danger--running out of OpCode space early in the design.
    After the general conceptualization of the ISA you should have
    half of the OpCode space available for future additions !!.


    In my case, I still have the F3 and F9 blocks unused.
    * F0, F1, F2, F8: In Use
    * F3, F9: Still Unused
    * FE, FF: Jumbo Prefixes

    N/E in XG3:
    F4..F7, F8..FD
    FE/FF: Remapped to FA/FB (with FA/FB becoming N/E).

    Where, otherwise (XG1/XG2):
    F4..F7: Repeat F0..F3, but with the "WEX" flag set.
    FC/FD: Repeat F8/F9, with WEX flag set.


    The EA/EB, and EE/EF blocks are effectively unused in XG3.
    XG1/XG2: Had encoded some PrWEX encodings.

    Where:
    E0..E3, E8/E9: Same as F0..F3, F8/F9, but ?T.
    E4..E7, EC/ED: Same as F0..F3, F8/F9, but ?F.

    Non-Ex/Fx:
    XG1: 16-bit ops go here.
    XG2: N/E, bits used to extend register fields to 6 bits.


    Had looked into using EA/EB and EE/EF for pair-encoded instructions in
    XG3, but the gains from the pair-encoded instructions wouldn't really be enough to justify the costs of having them. The constraints imposed by a
    pair encoding make the gains far less than with a 16/32 encoding scheme.


    The opcode space for block headers, however, was reduced. Which is not
    really that much of a bad thing; it means that now the block headers will
    be pared down (to just one!) and thus greatly simplified.

    John Savard

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Sun Mar 8 18:43:38 2026
    From Newsgroup: comp.arch

    I have now begun uploading the description of the revised Concertina II instruction set to my web site. The block headers, the 32-bit instruction formats, and the 16-bit and 15-bit instruction formats are now all present
    at

    http://www.quadibloc.com/arch/ct25int.htm

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MitchAlsup@[email protected] to comp.arch on Sun Mar 8 20:44:55 2026
    From Newsgroup: comp.arch


    quadi <[email protected]d> posted:

    On Sun, 08 Mar 2026 01:14:42 +0000, quadi wrote:

    On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:

    I smell danger--running out of OpCode space early in the design. After
    the general conceptualization of the ISA you should have half of the
    OpCode space available for future additions !!.

    Well, while it is true that only 1/64th of the opcode space for 32-bit instructions is left, my current plan is to use 1/128th for headers, and the other 1/128th for instructions longer than 32 bits. Which means that there is still space for 511 times as many instructions as are already defined, even if I never went beyond 48 bits.

    However, the lack of opcode space did cause me one problem. Previously, I had a type of header which started with the four bits 1111. This was followed by fourteen two-bit prefixes, which applied to every 16 bits remaining in the 256-bit code block.

    They indicated:

    00 - 17-bit instruction, starting with 0
    01 - 17-bit instruction, starting with 1
    10 - begin instruction with 32-bit parcel
    11 - append another 32-bit instruction parcel.

    11 simply adds another 32-bits to the current instruction parcel.
    This gives access to {16, 32, 48, 64, 80, 96, ...}-bit instructions.

    This can be treeified rather easily for wide decode.


    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Sun Mar 8 17:20:33 2026
    From Newsgroup: comp.arch

    On 3/8/2026 3:44 PM, MitchAlsup wrote:

    quadi <[email protected]d> posted:

    On Sun, 08 Mar 2026 01:14:42 +0000, quadi wrote:

    On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:

    I smell danger--running out of OpCode space early in the design. After >>>> the general conceptualization of the ISA you should have half of the
    OpCode space available for future additions !!.

    Well, while it is true that only 1/64th of the opcode space for 32-bit
    instructions is left, my current plan is to use 1/128th for headers, and >>> the other 1/128th for instructions longer than 32 bits. Which means that >>> there is still space for 511 times as many instructions as are already
    defined, even if I never went beyond 48 bits.

    However, the lack of opcode space did cause me one problem. Previously, I
    had a type of header which started with the four bits 1111. This was
    followed by fourteen two-bit prefixes, which applied to every 16 bits
    remaining in the 256-bit code block.

    They indicated:

    00 - 17-bit instruction, starting with 0
    01 - 17-bit instruction, starting with 1
    10 - begin instruction with 32-bit parcel
    11 - append another 32-bit instruction parcel.

    11 simply adds another 32-bits to the current instruction parcel.
    This gives access to {16, 32, 48, 64, 80, 96, ...}-bit instructions.

    This can be treeified rather easily for wide decode.


    Hmm:
    xxx0: 16 bits
    xx01: 32 bits (final)
    xx11: 32 bits (non-final)

    But, still basically the same idea:
    16/32/48/64/80/...

    Unlike the RV schemes, scales to larger sizes without consuming an ever
    larger percentage of the encoding bits.

    The 16-bit space would only be 2/3 the size of the RV encoding space,
    but the encoding space could go a little further if used more
    efficiently (namely, not burning it on needlessly large immediate and displacement fields).

    ...



    John Savard

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Mon Mar 9 02:53:20 2026
    From Newsgroup: comp.arch

    On Sun, 08 Mar 2026 06:08:14 +0000, quadi wrote:

    But I decided to go with a much simpler option instead. A bit of
    compressive coding is still needed, but now the scheme is simple. I just switched from 17-bit short instructions to 16-bit instructions for code
    with mixed-length instructions. In some respects, the limitations of 16-
    bit instructions are complentary to those of 15-bit instructions, the
    ones that can occur in pairs within 32-bit instruction code, and so the
    two types can be mixed in a block to somewhat mitigate their
    limitations.

    Mixing these two types of short instructions in a single block is... an awkward and complicated workaround.

    I've decided to drop that capability, because doing so makes more opcode
    space available for 48-bit (and longer) instructions in the variable-
    length instruction blocks. I found that certain highly desirable classes
    of 48-bit instructions are made impossible otherwise.

    Basically, by having the basic instruction set designed so that memory- reference instructions are not compromised, and yet 15-bit short
    instructions (instead of the very cramped 14-bit short instructions) are available... has meant that everything else beyond the basic instruction
    set is hit with severe constraints.

    The most painful one was the loss of uncompromised 17-bit short
    instructions. But 16-bit short instructions still avoid the big problem of 15-bit short instructions that many here found objectionable.

    Why is the Concertina II instruction set so cramped for opcode space? I
    think I've answered that before. I'm trying to do what hasn't been
    attempted before - have an instruction set with 16-bit displacements,
    since that's what both CISC and RISC micros have, but with addressing
    options as found in CISC, but banks of 32 registers, instead of just eight
    (or maybe 16), like RISC designs have.

    It wouldn't be surprising if there wasn't room to include both what RISC
    had extra space for, and what CISC had extra space for. But what pleases
    me is that if one makes a little extra effort... such an instruction set
    *is* achievable.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Mon Mar 9 03:29:20 2026
    From Newsgroup: comp.arch

    On Mon, 09 Mar 2026 02:53:20 +0000, quadi wrote:

    Mixing these two types of short instructions in a single block is... an awkward and complicated workaround.

    I've decided to drop that capability, because doing so makes more opcode space available for 48-bit (and longer) instructions in the variable-
    length instruction blocks. I found that certain highly desirable classes
    of 48-bit instructions are made impossible otherwise.

    In recent previous versions of Concertina II, it was the opcode space used
    for paired short instructions, whether 15-bit or the more recent 14-bit
    ones, that was used for long instructions. I had thought of not doing it
    this time because that space would be somewhat more fragmented, as I was
    using the last part of it for a major chunk of the 32-bit instruction set.

    But on further examination, it was clear that this objection was not due
    to any real issue, and the space would be needed desperately. Another incidental consequence is doubling the opcode space available for block headers, but that doesn't increase it enough to allow a return to 17-bit
    short instructions in variable-length instruction blocks.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Tue Mar 10 01:00:19 2026
    From Newsgroup: comp.arch

    On Sun, 08 Mar 2026 18:43:38 +0000, quadi wrote:

    I have now begun uploading the description of the revised Concertina II instruction set to my web site. The block headers, the 32-bit
    instruction formats, and the 16-bit and 15-bit instruction formats are
    now all present at

    http://www.quadibloc.com/arch/ct25int.htm

    The instructions longer than 32 bits have also been uploaded. They follow
    what was included with the previous iteration; 15-bit short instructions
    are not available when 16-bit instructions are available, even though the 16-bit instructions aren't a strict superset of the 15-bit ones.

    This has also let me add a small group of additional 32-bit instructions
    when in a block where different instruction lengths can be freely mixed,
    one with a Type II header.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Tue Mar 10 13:21:15 2026
    From Newsgroup: comp.arch

    Given that I was able to reduce the prefix for paired short instructions
    from 1111 to 11, allowing the paired short instructions to return to being
    15 bits long...

    since 14-bit short instructions are possible, then 11 could be the prefix
    for a single short instruction.

    Thus, the squish of opcode space that made this iteration of Concertina II possible _also_ makes a CISC instruction set possible. However, the short instructions and the instructions longer than 32 bits are _both_
    *severely* constrained in opcode space in the CISC mode.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Tue Mar 10 15:15:36 2026
    From Newsgroup: comp.arch

    On Tue, 10 Mar 2026 13:21:15 +0000, quadi wrote:

    Thus, the squish of opcode space that made this iteration of Concertina
    II possible _also_ makes a CISC instruction set possible. However, the
    short instructions and the instructions longer than 32 bits are _both_ *severely* constrained in opcode space in the CISC mode.

    And thus I had to re-think the longer instructions in CISC mode, making a tweak to their definitions so that important functionality was not lost.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@[email protected] to comp.arch on Tue Mar 10 16:57:47 2026
    From Newsgroup: comp.arch

    On 3/10/2026 8:21 AM, quadi wrote:
    Given that I was able to reduce the prefix for paired short instructions
    from 1111 to 11, allowing the paired short instructions to return to being
    15 bits long...

    since 14-bit short instructions are possible, then 11 could be the prefix
    for a single short instruction.

    Thus, the squish of opcode space that made this iteration of Concertina II possible _also_ makes a CISC instruction set possible. However, the short instructions and the instructions longer than 32 bits are _both_
    *severely* constrained in opcode space in the CISC mode.


    FWIW:
    IME, while pair encoding scheme can result in space savings over a pure 32/64/96 coding scheme, while avoiding the misalignment issues of a
    16/32 coding scheme, a downside of a pair encoding is that the potential
    space savings are significantly reduced relative to a 16/32 scheme.

    Say, for example:
    An effective 16/32 scheme can get around a 20% space savings;
    An effective pair-encoding implicitly drops to around 8%.

    Mostly because it can only save space in cases when both instructions
    can be pair encoded, versus when either instruction could be 16-bit encoded.

    Though, that said, pair encoding is an attractive option when the main
    other option is 32-bit only, and one already has some mechanism in place
    to deal with cracking an instruction.


    As noted before, had considered this within the context of my XG3
    encoding scheme, but ended up deciding against it because the savings
    seemed like they were too small to make it worthwhile.

    ...

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From quadi@[email protected] to comp.arch on Wed Mar 11 02:04:52 2026
    From Newsgroup: comp.arch

    On Tue, 10 Mar 2026 16:57:47 -0500, BGB wrote:

    FWIW:
    IME, while pair encoding scheme can result in space savings over a pure 32/64/96 coding scheme, while avoiding the misalignment issues of a
    16/32 coding scheme, a downside of a pair encoding is that the potential space savings are significantly reduced relative to a 16/32 scheme.

    Say, for example:
    An effective 16/32 scheme can get around a 20% space savings;
    An effective pair-encoding implicitly drops to around 8%.

    Mostly because it can only save space in cases when both instructions
    can be pair encoded, versus when either instruction could be 16-bit
    encoded.

    Though, that said, pair encoding is an attractive option when the main
    other option is 32-bit only, and one already has some mechanism in place
    to deal with cracking an instruction.

    I am aware of the issue you are raising here, and I certainly am aware
    that forcing the programmer to choose shorter instructions in pairs limits
    the potential space savings.

    So why did I use this mechanism?

    For one thing, I used it in order to simply fetching and decoding instructions. If every instruction is 32 bits long, neither longer nor shorter, then it's very easy to fetch a block of memory, and decode all
    the instructions in it in parallel - because you already know where they
    begin and end.

    For another, look at the way I squeezed a short instruction - which I
    would prefer to be 17 bits long - into 15 bits. The register banks are
    divided into four groups, and the two operands must both be registers from
    the same group in a 15 bit instruction.

    That shows something about the type of code I expect to execute. Code
    where instructions belonging to multiple threads (of a sort, not real
    threads that execute asynchronously) are interleaved, so as to make it
    easier to execute the instructions simultaneously in a pipeline.

    That gives a bit of flexibility in instruction ordering, so it makes it
    easier to pair up short instructions.

    And, as further evidence that I'm aware that having to use short
    instructions in pairs is a disadvantage... this, along with the desire to
    use pseudo-immediates (because I do accept Mitch Alsup's reasoning that getting data almost for free from the instruction stream beats an
    additional memory access, with all the overhead that entails) led me to
    set up the block header mechanism (which Mitch Alsup rightly criticizes; I just felt it was the least bad way to achieve what I wanted) so that
    fetching instructions to decode _remained_ as naively straightforward _as
    if_ all the instructions were the same length... even when the
    instructions were allowed to vary in length.

    And so with what are currently the Type I, II, and IV headers, the
    instruction stream consists of variable length instructions; short instructions can be placed individually at any position in the instruction stream.

    There's even a CISC mode now, since I've squeezed things so much that this
    ISA is capable, with slight tweaking, of just having plain variable length instructions without blocks. But it's just barely capable of that; in that form, the short instructions only have 14 bits to play with, so the
    repertoire of those instructions is limited, and therefore the potential
    space savings they provide is less.

    Of course when block headers allow 17-bit instructions at arbitrary positions... that _would_ maximize space savings, but there's the overhead
    of the space the block header takes up. So any choice that is made
    involves tradeoffs.

    I also have a goal of making the ISA simple to implement, so in the CISC
    mode, instead of just saying "leave the register field all zeroes, and put
    the immediate right after the instruction", I have said that the pseudo- immediates aren't available in CISC mode. That avoids having to decode anything but the leading bits of the instruction in order to determine
    where the next instruction starts.

    It isn't the greatest variable-length instruction architecture; that capability is basically an afterthought appended to an architecture
    intended to be used with block headers.

    John Savard
    --- Synchronet 3.21d-Linux NewsLink 1.2