• Paul Tarau versus Mr. Taskmanager, who would win? [A PDP-11Humunkulus from 1979]

    From Mild Shock@[email protected] to comp.lang.prolog on Fri Apr 24 02:43:26 2026
    From Newsgroup: comp.lang.prolog

    Hi,

    Ok I was looking at this learning challenge,
    producing vector (y1,y2,y3,y4) from a vector
    (x1,x2,x3,x4), System R can do it via least square?

    | 0 0 0 1 | | x1 | | x4 |
    | 0 0 1 0 | | x2 | = | x3 |
    | 0 1 0 0 | | x3 | | x2 |
    | 1 0 0 0 | | x4 | | x1 |

    How it started:

    "multiplicative RNNs arises naturally from a
    proof-theoretic interpretation of next-token
    prediction as nested intuitionistic implication"
    Paul Tarau - 2026
    https://arxiv.org/abs/2601.19915

    How its going:

    "Dave uses a PDP-11 to train a real Neural
    Network complete with Transformers and
    Attention so you can see them at their most basic."
    Mr. Taskmanager - 2026
    https://www.youtube.com/watch?v=OUE3FSIk46g

    We see Doctor Frankstein in action from
    the Bronze Age of Computing, producing
    a Humunkulus, the progenitor of todays

    Bulgakov Shuriks in the Hyperscale Age!

    Bye

    P.S.: My impression neither cut to the core, that
    this incredible transformer most likely
    produced this deterministic attention:

    | -1 | * | k | + | 5 | = | k' |

    Or differently expressed y_k = x_{5-k}.

    How did the transformer do it? It produced
    a neural network with 1216 parameters, but
    didn't use embeddings or polar encoding

    of positions. But if we strip the noise
    and denoise from the position encoding,
    the denoise is done via softmax. We somehow

    must get the above, right? I still need to
    verify my claim! BTW: The PDP-11 assembly
    from 1979 uses wider example not with n=4

    but with n=8.
    --- Synchronet 3.21f-Linux NewsLink 1.2