• Re: Cookies in boxes - algorithmic challenge

    From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 16:37:30 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Wed, 1 Apr 2026 16:34:47 +0300
    Michael S <[email protected]> wrote:

    <snip>

    My implementation #1.
    It is based on the proof provided by somebody with abstract math
    attitude.
    He said that his proof is something that the 5th grade schoolchildren
    could find with in 5 minutes.
    I am no 5th pupil any longer. Even understanding of his proof took
    me significantly longer than 5 minutes :(
    Here is my attempt of translation (original proof was not in English):

    "Solution would be done by induction by number of cookies of each sort
    (the same for all sorts).
    For K=2 the solution is simple. Please figure it out yourself.
    Now let's take K > 2 and postulate that for numbers of cookies per box
    that are smaller than K the assertion already proven.
    Let's take some disposition for which the assertion is correct and
    remove the corresponding "thread" (i.e. one cookie from each box, all
    of different sorts). Then supposition of induction is correct for
    remaining disposition, which means that we can remove yet another
    thread, then yet another etc.. for a total of K threads. That is,
    for K>2 there are more than 2 threads.
    Now let's exchange places between any 2 two cookies from different
    boxes. Since by doing so we are touching 1 or 2 threads then there
    still be at least one thread untouched (at least k-2 threads, actually [M.S.]). It means that the property "we can remove a thread" is
    not disturbed by our action (i.e. by exchange of two cookies).
    What is left it is to realize that by means of such exchanges any
    disposition can be mutated from any other disposition.
    That's all."

    I am confident that my understanding of this description, if indeed
    it ever occurs, would take a good deal longer than five minutes.
    Sadly I think it is representative of many of the explanations
    offered by mathematical types.

    And here is a code. It is not very fast. But the speed is almost the
    same on average and in the worst case.

    #include <stdint.h>
    #include <stdbool.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint8_t na[N_BOXES] = {0};
    typedef struct { uint8_t s,d; } xch_t;
    xch_t xch[N_BOXES*BOX_SIZE];
    int i = 0;
    for (int src_bi = 0; src_bi < N_BOXES; ++src_bi) {
    for (int src_ci = na[src_bi]; src_ci < BOX_SIZE; ++src_ci) {
    uint8_t dst_bi = boxes.b[src_bi][src_ci];
    while (dst_bi != src_bi) {
    uint8_t dst_ci = na[dst_bi];
    uint8_t dst_v = boxes.b[dst_bi][dst_ci];
    na[dst_bi] = dst_ci+1;
    xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
    dst_bi = dst_v;
    }
    xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
    }
    na[src_bi] = BOX_SIZE;
    }

    for (int bi = 0; bi < N_BOXES; ++bi)
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    boxes.b[bi][ci] = bi;

    uint8_t tx[N_BOXES][BOX_SIZE];
    for (int bi = 0; bi < N_BOXES; ++bi)
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    tx[bi][ci] = ci;

    peek_t threads[BOX_SIZE];
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    for (int bi = 0; bi < N_BOXES; ++bi)
    threads[ci].b[bi] = ci;

    for (int i = N_BOXES*BOX_SIZE-1; i >= 0; --i) {
    uint8_t src_bi = xch[i].s;
    uint8_t dst_bi = xch[i].d;
    uint8_t dst_ci = na[dst_bi] - 1;
    na[dst_bi] = dst_ci;
    if (src_bi != dst_bi) {
    uint8_t src_ci = na[src_bi];
    uint8_t src_v = boxes.b[src_bi][src_ci];
    if (src_v != dst_bi) {
    uint8_t src_t = tx[src_bi][src_ci];
    uint8_t dst_t = tx[dst_bi][dst_ci];
    if (src_t != dst_t) {
    // 2 threads broken. Fix
    uint8_t v2bi[N_BOXES];
    for (int bi = 0; bi < N_BOXES; ++bi)
    v2bi[boxes.b[bi][threads[src_t].b[bi]]] = bi;
    boxes.b[src_bi][src_ci] = dst_bi;
    boxes.b[dst_bi][dst_ci] = src_v;
    for (uint8_t v = dst_bi; v != src_v;) {
    uint8_t bi = v2bi[v];
    uint8_t c0 = threads[src_t].b[bi];
    uint8_t c1 = threads[dst_t].b[bi];
    threads[src_t].b[bi] = c1;
    threads[dst_t].b[bi] = c0;
    tx[bi][c1] = src_t;
    tx[bi][c0] = dst_t;
    v = boxes.b[bi][c1];
    }
    } else {
    boxes.b[src_bi][src_ci] = dst_bi;
    boxes.b[dst_bi][dst_ci] = src_v;
    }
    }
    }
    }

    return threads[0];
    }

    In most cases I'm not a big fan of interactive debugging, but
    trying to understand how this works might merit an exception.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 17:07:25 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Wed, 1 Apr 2026 16:34:47 +0300
    Michael S <[email protected]> wrote:

    <snip>

    My solution #2.
    That is my own, with no proof preceding code. It went in other direction
    - from hacking the code to proving it later.
    It is pretty fast. Can be made faster yet by replication of some code
    and re-arrangement in the find() routine, but for purposes of
    illustration I wanted it short.

    Just fyi.. I did some speed comparisons between the code below and
    the compatible solver that I wrote. Your code is faster, scales
    better, and also has better worst-case behavior (which is to say
    that my code has worse worst-case behavior). So I think the timing
    numbers I was seeing before may have been just the result of being
    lucky in the draw of random numbers.

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>

    #include "solver.h"

    static
    uint8_t solver_find(
    const uint8_t b_idx[N_BOXES][N_BOXES],
    uint8_t result[N_BOXES],
    const uint8_t free[N_BOXES],
    int n_free,
    uint8_t v,
    const uint8_t bx[N_BOXES][BOX_SIZE])
    {
    typedef struct {
    uint8_t parent, bi;
    } db_rec;
    db_rec db[N_BOXES];
    bool valid[N_BOXES] = {0};
    uint8_t que[N_BOXES], *wr = que, *rd = que;

    valid[v] = true;
    *wr++ = v;
    for (;;) {
    uint8_t vv = *rd++;
    for (int i = 0; i < n_free; ++i) {
    uint8_t bi = free[i];
    uint8_t ci = b_idx[bi][vv];
    if (ci != BOX_SIZE) {
    result[bi] = ci;
    while (vv != v) {
    bi = db[vv].bi;
    vv = db[vv].parent;
    result[bi] = b_idx[bi][vv];
    };
    return i;
    }
    }
    // add neighbors to database
    for (int bi = 0; bi < N_BOXES; ++bi) {
    if (b_idx[bi][vv] != BOX_SIZE) {
    uint8_t a = bx[bi][result[bi]];
    if (!valid[a]) {
    valid[a] = true;
    db[a] = (db_rec){ .parent = vv, .bi = bi };
    *wr++ = a;
    }
    }
    }
    }
    }

    peek_t solver(boxes_t boxes)
    {
    // build index
    uint8_t b_idx[N_BOXES][N_BOXES];
    memset(b_idx, BOX_SIZE, sizeof(b_idx));
    for (int bi = 0; bi < N_BOXES; ++bi) {
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t v = boxes.b[bi][ci];
    if (b_idx[bi][v] == BOX_SIZE)
    b_idx[bi][v] = ci;
    }
    }

    uint8_t free_boxes[N_BOXES];
    for (int bi = 0; bi < N_BOXES; ++bi)
    free_boxes[bi] = bi;
    int n_free = N_BOXES;

    peek_t result = {0};
    bool used_sorts[N_BOXES]={0};
    while (n_free > 0) {
    int bi = free_boxes[--n_free]; // new set
    result.b[bi] = 0;
    uint8_t v = boxes.b[bi][0];
    used_sorts[v] = true;
    uint8_t pend_que[N_BOXES],
    *pend_wr = pend_que, *pend_rd = pend_que;
    *pend_wr++ = bi;
    while (pend_wr != pend_rd && n_free != 0) {
    int bk = *pend_rd++;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    v = boxes.b[bk][ci];
    if (!used_sorts[v]) {
    // new value in set
    uint8_t v_fi = solver_find(
    b_idx, result.b, free_boxes,
    n_free, v, boxes.b);
    uint8_t v_bi = free_boxes[v_fi];
    used_sorts[v] = true;
    *pend_wr++ = v_bi;
    free_boxes[v_fi] = free_boxes[--n_free];
    // remove from free list
    }
    }
    }
    }
    return result;
    }

    I think I could eventually understand what this code is doing, and
    after that get a sense of how and why it works. In its current
    form I think that would take me a fair amount of time. It would
    help to have a roadmap, more evocative names, more functional
    decomposition, or ideally all three. Please understand, I'm not
    asking or suggesting you do anything; my intention is only to
    provide feedback that may be of some benefit to you.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 21:22:51 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Here I am again, ready now to write a more substantive response.

    So, here is the challenge:
    There are 20 sorts of cookies, 10 of each sort. They are placed in
    20 boxes, again, 10 cookies in each boxes. You have to take exactly
    one cookie from each box, all 20 of different sorts.
    Smart math heads proved that for any distribution a solution exists.
    I don't ask you to repeat the proof. Just peek cookies!

    I think the problem is interesting. I thought I would try coding
    something up.

    It's obvious that one can find the solution by exhausting. Don't do
    it. Indeed, the number of possible peek orders is finite, but ii is
    large - 2.4e18.
    Be smarter.
    On modern PC I want to get a solution in less than 1 msec, bonus for
    less than 50usec.

    [.. C code to drive an exercise a proposed solution ..]

    You have to implement function solver() in module solver.c

    I ran into trouble when I tried to work within the posted framework
    to drive a solver. The difficulties were of several kinds. The
    code didn't compile in my standard compilation environment, which
    is roughly this (the c99 might be c11 but that made no difference):

    gcc -x c -std=c99 -pedantic ...

    What were the problems?
    My gcc 14.2.0 on msys2 produces no warnings or errors with above flags.

    These source lines

    struct timespec t0;
    clock_gettime(CLOCK_MONOTONIC, &t0);

    produced these diagnostics

    tb.c:134:21: error: storage size of 't0' isn't known
    tb.c:135:5: warning: implicit declaration of function 'clock_gettime' [...]
    tb.c:135:19: error: 'CLOCK_MONOTONIC' undeclared (first use in this function)

    under -std=c99. Apparently the first diagnostic, about
    struct timespec, is fixed under -std=c11. It wasn't hard
    to fix these, but it was irksome.

    I was able to work through that problem but it was irksome. The
    second difficulty is the interface to solver() seems rather clunky
    to me, maybe not for the input but for the output, and it wasn't
    clear to me what the specification is for the output (I did manage
    to figure this out but only much later). I was having to put more
    effort into figuring out the interface than solving the problem.

    Yes, I should have explained it, at least briefly.
    I'd guess that you expected that values in result vector correspond to
    sorts of cookies, when in fact they are indices to position of the
    cookie in the box.
    I am sorry.

    Yes, that is in fact what I was thinking. Fortunately the difficulty
    was resolved when the checker reported a value out of range.

    Finally I abandoned the idea of working within the structure of the
    posted driver code and wrote a solver to my own specification.
    Doing that was fairly straightforward and took half an hour or so
    (disclaimer: after I had spent a good amount of time just thinking
    about the problem). After writing the solver I reimplemented a
    driver framework to drive it, conforming to the interface I was
    using. Some debugging was needed to get everything working. I did
    some rudimentary time measurements for the solver. Results were
    good (more about the specifics later).

    After doing the new solver I went back to the posted driver code,
    and grafted together the new solver with the orginal driver. Some
    glue code was needed to reconcile the differences in interface
    between the new solver and the original driver. Then I needed to
    figure out the semantics of the output, which were different than
    what I expected and originally understood. I figured all that out
    and got things working. Sadly the code was uglier than I would have
    liked.

    After that, I talked to a friend to try a different approach to
    producing a solution. After some light editing by myself -- mostly
    just formatting and some name changes -- the code below was put into
    the hopper. (Note: I claim no credit for code below. I did some
    editing to make it more readable but beyond that none of it is a
    result of my efforts.)

    #include <stdint.h>
    #include "solver.h"

    static uint32_t adjacent[ N_BOXES ];
    static int cookie_of_box[ N_BOXES ];
    static int box_of_cookie[ N_BOXES ];
    static uint32_t seen;

    static int
    search( int b ){
    uint32_t m = adjacent[b];
    while( m ){
    uint32_t bit = m & -m;
    m ^= bit;
    int c = __builtin_ctz( bit );
    if( seen & 1u<<c ) continue;
    seen |= 1u<<c;
    if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
    ) ){ box_of_cookie[c] = b;
    cookie_of_box[b] = c;
    return 1;
    }
    }
    return 0;
    }

    peek_t
    solver( boxes_t boxes ){
    peek_t res;

    for( int i = 0; i < N_BOXES; i++ ){
    uint32_t m = 0;
    for( int k = 0; k < BOX_SIZE; k++ ){
    m |= 1u << boxes.b[i][k];
    }
    adjacent[i] = m;
    cookie_of_box[i] = -1;
    box_of_cookie[i] = -1;
    }

    for( int i = 0; i < N_BOXES; i++ ){
    seen = 0;
    search( i );
    }

    for( int i = 0; i < N_BOXES; i++ ){
    int c = cookie_of_box[i];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if( boxes.b[i][k] == c ){
    res.b[i] = k;
    break;
    }
    }
    }
    return res;
    }

    Despite being derived independently, this code uses the same sort of
    approach that I had used, except my code was recursive rather than
    iterative.

    Running the above code with default settings (128 11) produced this
    output

    o.k.
    med=0.003542 msec, min=0.003487 msec, max=0.003911 msec

    The timing results for my second code effort were similar, except
    that the value for max was about 0.3 msec.

    My defaults were chosen for much slower solutions.
    For fast solutions like yours or mine with default parameters an
    overhead of clock_gettime() call is too significant.
    In such case it's better to use rep1 of at least 10000.

    It turns out that running with a larger number of trials didn't
    change the results much. I think my averages got a bit worse and my
    worst case got a bit better, but as I recall the difference wasn't
    anything dramatic.

    Informal timing measurements on my original solver -- in particular
    using a different interface, and done simply by using 'time' in the
    shell to measure a million calls to the solver -- gave per-call
    times that were better by about a factor of five. I need to be
    clear that this solver does not conform to the original interface,
    and probably most of speedup is due to choosing an interface that
    gives an easier fit to the solver.

    It's not something I'd expect.
    I used "by value" type of interface both for input and for output in
    order to reduce a chance of buggy solutions to mess with the test bench.
    I fully expect that "by reference" type of interface could be somewhat faster. May be, 1.5x faster for very fast solutions. But I certainly
    don't expect a factor of five.

    I will try to explain below.

    Sorry not to have something better for you. It was a fair amount of
    work to produce the results above.

    I read your code briefly, but didn't try to understand it yet.

    I need to say again that the code I posted was not code that I wrote.
    The approach used is similar to code I did write, but the actual code
    is quite a bit different.

    I plugged it into my test bench and measured speed on very old home PC
    (Intel i5-3450). With big values of rep1 it was ~3.3 times faster
    than my original solution and ~1.9 times slower than my 2nd solution.
    So, speed-wise your solution is good.
    One thing that I don't particularly like after taking a glance - it
    does not appear to be thread-safe. But it is probably very easy to make
    it thread safe.

    Do you mean because it is using (and changing) global objects? Yes
    that is a drawback.

    Another thing that I don't particularly like - if we want N_BOXES > 32
    then it requires modifications; if we want N_BOXES > 64 then it
    requires more serious modifications.

    As it turns out my code handles N_BOXES up to 52. You may laugh
    when you see why.

    Now few words about my own solutions.
    1. Original solution:
    This solution is based on the proof that was given to me by somebody
    with abstract math mind.

    Yes, I posted a comment about that

    2. It is solution that I found independently, for which I hopefully
    have a proof in the head, but I didn't write it out yet and didn't show
    yet to said mathematician. It would not surprise me if idea of these solution is the same as yours.

    I don't really understand that code yet, but it looks like your
    code is a bit fancier than mine (the code that wasn't posted
    because it uses a different interface).

    Later today (tonight) I plan to post both solutions here.

    Here is an outline of the first (non-interface-compatible) code that
    I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means done)
    a "solution state" to direct the search (conceptually by value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in a
    way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values can
    simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected box"
    set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order bits
    this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 10 12:23:14 2026
    From Newsgroup: comp.lang.c

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...]

    Here I am again, ready now to write a more substantive response.

    So, here is the challenge:
    There are 20 sorts of cookies, 10 of each sort. They are placed
    in 20 boxes, again, 10 cookies in each boxes. You have to take
    exactly one cookie from each box, all 20 of different sorts.
    Smart math heads proved that for any distribution a solution
    exists. I don't ask you to repeat the proof. Just peek cookies!

    I think the problem is interesting. I thought I would try coding
    something up.

    It's obvious that one can find the solution by exhausting. Don't
    do it. Indeed, the number of possible peek orders is finite, but
    ii is large - 2.4e18.
    Be smarter.
    On modern PC I want to get a solution in less than 1 msec, bonus
    for less than 50usec.

    [.. C code to drive an exercise a proposed solution ..]

    You have to implement function solver() in module solver.c

    I ran into trouble when I tried to work within the posted framework
    to drive a solver. The difficulties were of several kinds. The
    code didn't compile in my standard compilation environment, which
    is roughly this (the c99 might be c11 but that made no difference):

    gcc -x c -std=c99 -pedantic ...

    What were the problems?
    My gcc 14.2.0 on msys2 produces no warnings or errors with above
    flags.

    These source lines

    struct timespec t0;
    clock_gettime(CLOCK_MONOTONIC, &t0);

    produced these diagnostics

    tb.c:134:21: error: storage size of 't0' isn't known
    tb.c:135:5: warning: implicit declaration of function
    'clock_gettime' [...] tb.c:135:19: error: 'CLOCK_MONOTONIC'
    undeclared (first use in this function)

    under -std=c99. Apparently the first diagnostic, about
    struct timespec, is fixed under -std=c11. It wasn't hard
    to fix these, but it was irksome.


    Sounds like under Linux and with -std=c99 flag function clock_gettime()
    and related structures and constants are not declared/defined by default
    in time.h.
    Man page suggests magic pixie dust:

    #define _XOPEN_SOURCE 600
    #include <time.h>
    #include <unistd.h>


    If you ask why I used clock_gettime() instead of C standard
    timespec_get() then the answer is that on Windows, eps. on older
    versions like Win7 and WS2008 (which I prefer over newer stuff for
    reasons unrelated to quality of C RTL), precision of timespec_get() is
    poor. clock_gettime(CLOCK_MONOTONIC,) is much better.

    I was able to work through that problem but it was irksome. The
    second difficulty is the interface to solver() seems rather clunky
    to me, maybe not for the input but for the output, and it wasn't
    clear to me what the specification is for the output (I did manage
    to figure this out but only much later). I was having to put more
    effort into figuring out the interface than solving the problem.

    Yes, I should have explained it, at least briefly.
    I'd guess that you expected that values in result vector correspond
    to sorts of cookies, when in fact they are indices to position of
    the cookie in the box.
    I am sorry.

    Yes, that is in fact what I was thinking. Fortunately the difficulty
    was resolved when the checker reported a value out of range.

    Finally I abandoned the idea of working within the structure of the
    posted driver code and wrote a solver to my own specification.
    Doing that was fairly straightforward and took half an hour or so
    (disclaimer: after I had spent a good amount of time just thinking
    about the problem). After writing the solver I reimplemented a
    driver framework to drive it, conforming to the interface I was
    using. Some debugging was needed to get everything working. I did
    some rudimentary time measurements for the solver. Results were
    good (more about the specifics later).

    After doing the new solver I went back to the posted driver code,
    and grafted together the new solver with the orginal driver. Some
    glue code was needed to reconcile the differences in interface
    between the new solver and the original driver. Then I needed to
    figure out the semantics of the output, which were different than
    what I expected and originally understood. I figured all that out
    and got things working. Sadly the code was uglier than I would
    have liked.

    After that, I talked to a friend to try a different approach to
    producing a solution. After some light editing by myself -- mostly
    just formatting and some name changes -- the code below was put
    into the hopper. (Note: I claim no credit for code below. I did
    some editing to make it more readable but beyond that none of it
    is a result of my efforts.)

    #include <stdint.h>
    #include "solver.h"

    static uint32_t adjacent[ N_BOXES ];
    static int cookie_of_box[ N_BOXES ];
    static int box_of_cookie[ N_BOXES ];
    static uint32_t seen;

    static int
    search( int b ){
    uint32_t m = adjacent[b];
    while( m ){
    uint32_t bit = m & -m;
    m ^= bit;
    int c = __builtin_ctz( bit );
    if( seen & 1u<<c ) continue;
    seen |= 1u<<c;
    if( box_of_cookie[c] == -1 || search(
    box_of_cookie[c] ) ){ box_of_cookie[c] = b;
    cookie_of_box[b] = c;
    return 1;
    }
    }
    return 0;
    }

    peek_t
    solver( boxes_t boxes ){
    peek_t res;

    for( int i = 0; i < N_BOXES; i++ ){
    uint32_t m = 0;
    for( int k = 0; k < BOX_SIZE; k++ ){
    m |= 1u << boxes.b[i][k];
    }
    adjacent[i] = m;
    cookie_of_box[i] = -1;
    box_of_cookie[i] = -1;
    }

    for( int i = 0; i < N_BOXES; i++ ){
    seen = 0;
    search( i );
    }

    for( int i = 0; i < N_BOXES; i++ ){
    int c = cookie_of_box[i];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if( boxes.b[i][k] == c ){
    res.b[i] = k;
    break;
    }
    }
    }
    return res;
    }

    Despite being derived independently, this code uses the same sort
    of approach that I had used, except my code was recursive rather
    than iterative.

    Running the above code with default settings (128 11) produced this
    output

    o.k.
    med=0.003542 msec, min=0.003487 msec, max=0.003911 msec

    The timing results for my second code effort were similar, except
    that the value for max was about 0.3 msec.

    My defaults were chosen for much slower solutions.
    For fast solutions like yours or mine with default parameters an
    overhead of clock_gettime() call is too significant.
    In such case it's better to use rep1 of at least 10000.

    It turns out that running with a larger number of trials didn't
    change the results much. I think my averages got a bit worse and my
    worst case got a bit better, but as I recall the difference wasn't
    anything dramatic.


    After testing on another (Windows) systems I have to admit that my
    objections about default value of rep1 applies only to Win7. Even on
    Ws2008, which is almost the same kernel, overhead of clock_gettime() is sufficiently low for rep1=128 to give fair measurements.

    Informal timing measurements on my original solver -- in particular
    using a different interface, and done simply by using 'time' in the
    shell to measure a million calls to the solver -- gave per-call
    times that were better by about a factor of five. I need to be
    clear that this solver does not conform to the original interface,
    and probably most of speedup is due to choosing an interface that
    gives an easier fit to the solver.

    It's not something I'd expect.
    I used "by value" type of interface both for input and for output in
    order to reduce a chance of buggy solutions to mess with the test
    bench. I fully expect that "by reference" type of interface could
    be somewhat faster. May be, 1.5x faster for very fast solutions.
    But I certainly don't expect a factor of five.

    I will try to explain below.

    Sorry not to have something better for you. It was a fair amount
    of work to produce the results above.

    I read your code briefly, but didn't try to understand it yet.

    I need to say again that the code I posted was not code that I wrote.
    The approach used is similar to code I did write, but the actual code
    is quite a bit different.

    I plugged it into my test bench and measured speed on very old home
    PC (Intel i5-3450). With big values of rep1 it was ~3.3 times
    faster than my original solution and ~1.9 times slower than my 2nd solution. So, speed-wise your solution is good.
    One thing that I don't particularly like after taking a glance - it
    does not appear to be thread-safe. But it is probably very easy to
    make it thread safe.

    Do you mean because it is using (and changing) global objects? Yes
    that is a drawback.


    Yes.

    Another thing that I don't particularly like - if we want N_BOXES >
    32 then it requires modifications; if we want N_BOXES > 64 then it requires more serious modifications.

    As it turns out my code handles N_BOXES up to 52. You may laugh
    when you see why.

    Now few words about my own solutions.
    1. Original solution:
    This solution is based on the proof that was given to me by somebody
    with abstract math mind.

    Yes, I posted a comment about that


    <snip the rest, will respond later >

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 10 16:16:38 2026
    From Newsgroup: comp.lang.c

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:


    <snip, answered in the other post>

    As it turns out my code handles N_BOXES up to 52. You may laugh
    when you see why.


    The number of small+capital letters in English alphabet?
    Probably, not.
    Indeed, after reading the rest, it's not that.

    Now few words about my own solutions.
    1. Original solution:
    This solution is based on the proof that was given to me by somebody
    with abstract math mind.

    Yes, I posted a comment about that

    2. It is solution that I found independently, for which I hopefully
    have a proof in the head, but I didn't write it out yet and didn't
    show yet to said mathematician. It would not surprise me if idea
    of these solution is the same as yours.

    I don't really understand that code yet, but it looks like your
    code is a bit fancier than mine (the code that wasn't posted
    because it uses a different interface).

    Later today (tonight) I plan to post both solutions here.

    Here is an outline of the first (non-interface-compatible) code that
    I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means done)
    a "solution state" to direct the search (conceptually by value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in a
    way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values can
    simply be directly compared, without any masking.

    Is it proven that this step helps?
    To me it sounds unnecessary.

    Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected box"
    set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order bits
    this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    It makes sense, but it looks to me as solution by exhausting, augmented
    by "smallest number of boxes" heuristic.
    It's not obvious [to me] without further thinking that the worst case is
    not very slow.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 10 17:02:58 2026
    From Newsgroup: comp.lang.c

    On Thu, 09 Apr 2026 16:37:30 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 1 Apr 2026 16:34:47 +0300
    Michael S <[email protected]> wrote:

    <snip>

    My implementation #1.
    It is based on the proof provided by somebody with abstract math
    attitude.
    He said that his proof is something that the 5th grade
    schoolchildren could find with in 5 minutes.
    I am no 5th pupil any longer. Even understanding of his proof took
    me significantly longer than 5 minutes :(
    Here is my attempt of translation (original proof was not in
    English):

    "Solution would be done by induction by number of cookies of each
    sort (the same for all sorts).
    For K=2 the solution is simple. Please figure it out yourself.
    Now let's take K > 2 and postulate that for numbers of cookies per
    box that are smaller than K the assertion already proven.
    Let's take some disposition for which the assertion is correct and
    remove the corresponding "thread" (i.e. one cookie from each box,
    all of different sorts). Then supposition of induction is correct
    for remaining disposition, which means that we can remove yet
    another thread, then yet another etc.. for a total of K threads.
    That is, for K>2 there are more than 2 threads.
    Now let's exchange places between any 2 two cookies from different
    boxes. Since by doing so we are touching 1 or 2 threads then there
    still be at least one thread untouched (at least k-2 threads,
    actually [M.S.]). It means that the property "we can remove a
    thread" is not disturbed by our action (i.e. by exchange of two
    cookies). What is left it is to realize that by means of such
    exchanges any disposition can be mutated from any other disposition.
    That's all."

    I am confident that my understanding of this description, if indeed
    it ever occurs, would take a good deal longer than five minutes.
    Sadly I think it is representative of many of the explanations
    offered by mathematical types.


    He was not fully sutisfied himself. 20 hours later he came with other formulation. Here is my attemp of translatioon:

    1. There exist a disposition of cookies in boxes, for which it is
    possible to select one cookie from each box, all of different sorts.
    That is obvious. [MS: For example, each sort of cookies in its own
    dedicated box. That is a disposition that I use in my code] Let's call
    the selection "thread".

    2. There exists an operation that changes the disposition, but
    preserves the property "it is possible to select a thread" (the
    description of the operation would be given separately, a.t.m. it does
    not matter).

    3. By means of multiple operations of that type it is possible to turn
    any arbitrary disposition to any other. Since after every step the
    property "it is possible to select a thread" is preserved and because
    there exist a disposition, for which this property is true, it means
    that the property is true for any possible disposition.

    4. The operations in question - exchange of any pair of cookies.
    Comment: Induction is used only in (2) for proof that an operation
    preserves the desired property.

    For me, the second variant was less helpful. May be, for you it would
    be more helpful.


    And here is a code. It is not very fast. But the speed is almost
    the same on average and in the worst case.

    #include <stdint.h>
    #include <stdbool.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint8_t na[N_BOXES] = {0};
    typedef struct { uint8_t s,d; } xch_t;
    xch_t xch[N_BOXES*BOX_SIZE];
    int i = 0;
    for (int src_bi = 0; src_bi < N_BOXES; ++src_bi) {
    for (int src_ci = na[src_bi]; src_ci < BOX_SIZE; ++src_ci) {
    uint8_t dst_bi = boxes.b[src_bi][src_ci];
    while (dst_bi != src_bi) {
    uint8_t dst_ci = na[dst_bi];
    uint8_t dst_v = boxes.b[dst_bi][dst_ci];
    na[dst_bi] = dst_ci+1;
    xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
    dst_bi = dst_v;
    }
    xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
    }
    na[src_bi] = BOX_SIZE;
    }

    for (int bi = 0; bi < N_BOXES; ++bi)
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    boxes.b[bi][ci] = bi;

    uint8_t tx[N_BOXES][BOX_SIZE];
    for (int bi = 0; bi < N_BOXES; ++bi)
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    tx[bi][ci] = ci;

    peek_t threads[BOX_SIZE];
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    for (int bi = 0; bi < N_BOXES; ++bi)
    threads[ci].b[bi] = ci;

    for (int i = N_BOXES*BOX_SIZE-1; i >= 0; --i) {
    uint8_t src_bi = xch[i].s;
    uint8_t dst_bi = xch[i].d;
    uint8_t dst_ci = na[dst_bi] - 1;
    na[dst_bi] = dst_ci;
    if (src_bi != dst_bi) {
    uint8_t src_ci = na[src_bi];
    uint8_t src_v = boxes.b[src_bi][src_ci];
    if (src_v != dst_bi) {
    uint8_t src_t = tx[src_bi][src_ci];
    uint8_t dst_t = tx[dst_bi][dst_ci];
    if (src_t != dst_t) {
    // 2 threads broken. Fix
    uint8_t v2bi[N_BOXES];
    for (int bi = 0; bi < N_BOXES; ++bi)
    v2bi[boxes.b[bi][threads[src_t].b[bi]]] = bi;
    boxes.b[src_bi][src_ci] = dst_bi;
    boxes.b[dst_bi][dst_ci] = src_v;
    for (uint8_t v = dst_bi; v != src_v;) {
    uint8_t bi = v2bi[v];
    uint8_t c0 = threads[src_t].b[bi];
    uint8_t c1 = threads[dst_t].b[bi];
    threads[src_t].b[bi] = c1;
    threads[dst_t].b[bi] = c0;
    tx[bi][c1] = src_t;
    tx[bi][c0] = dst_t;
    v = boxes.b[bi][c1];
    }
    } else {
    boxes.b[src_bi][src_ci] = dst_bi;
    boxes.b[dst_bi][dst_ci] = src_v;
    }
    }
    }
    }

    return threads[0];
    }

    In most cases I'm not a big fan of interactive debugging, but
    trying to understand how this works might merit an exception.

    Frankly, not all difficulties of following my code caused by the fact
    that algorithm is derived from abstract proof.
    Significant share of hurdles are rooted in my somewhat conflicting
    desires to avoid all searches, except for inevitable search in the
    innermost loop of the second pass, but at the same time to keep format
    of the governing database (xch array+na array) minimalistic.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 10 18:06:44 2026
    From Newsgroup: comp.lang.c

    On Thu, 09 Apr 2026 17:07:25 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 1 Apr 2026 16:34:47 +0300
    Michael S <[email protected]> wrote:

    <snip>

    My solution #2.
    That is my own, with no proof preceding code. It went in other
    direction
    - from hacking the code to proving it later.
    It is pretty fast. Can be made faster yet by replication of some
    code and re-arrangement in the find() routine, but for purposes of illustration I wanted it short.

    Just fyi.. I did some speed comparisons between the code below and
    the compatible solver that I wrote. Your code is faster, scales
    better, and also has better worst-case behavior (which is to say
    that my code has worse worst-case behavior). So I think the timing
    numbers I was seeing before may have been just the result of being
    lucky in the draw of random numbers.

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>

    #include "solver.h"

    static
    uint8_t solver_find(
    const uint8_t b_idx[N_BOXES][N_BOXES],
    uint8_t result[N_BOXES],
    const uint8_t free[N_BOXES],
    int n_free,
    uint8_t v,
    const uint8_t bx[N_BOXES][BOX_SIZE])
    {
    typedef struct {
    uint8_t parent, bi;
    } db_rec;
    db_rec db[N_BOXES];
    bool valid[N_BOXES] = {0};
    uint8_t que[N_BOXES], *wr = que, *rd = que;

    valid[v] = true;
    *wr++ = v;
    for (;;) {
    uint8_t vv = *rd++;
    for (int i = 0; i < n_free; ++i) {
    uint8_t bi = free[i];
    uint8_t ci = b_idx[bi][vv];
    if (ci != BOX_SIZE) {
    result[bi] = ci;
    while (vv != v) {
    bi = db[vv].bi;
    vv = db[vv].parent;
    result[bi] = b_idx[bi][vv];
    };
    return i;
    }
    }
    // add neighbors to database
    for (int bi = 0; bi < N_BOXES; ++bi) {
    if (b_idx[bi][vv] != BOX_SIZE) {
    uint8_t a = bx[bi][result[bi]];
    if (!valid[a]) {
    valid[a] = true;
    db[a] = (db_rec){ .parent = vv, .bi = bi };
    *wr++ = a;
    }
    }
    }
    }
    }

    peek_t solver(boxes_t boxes)
    {
    // build index
    uint8_t b_idx[N_BOXES][N_BOXES];
    memset(b_idx, BOX_SIZE, sizeof(b_idx));
    for (int bi = 0; bi < N_BOXES; ++bi) {
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t v = boxes.b[bi][ci];
    if (b_idx[bi][v] == BOX_SIZE)
    b_idx[bi][v] = ci;
    }
    }

    uint8_t free_boxes[N_BOXES];
    for (int bi = 0; bi < N_BOXES; ++bi)
    free_boxes[bi] = bi;
    int n_free = N_BOXES;

    peek_t result = {0};
    bool used_sorts[N_BOXES]={0};
    while (n_free > 0) {
    int bi = free_boxes[--n_free]; // new set
    result.b[bi] = 0;
    uint8_t v = boxes.b[bi][0];
    used_sorts[v] = true;
    uint8_t pend_que[N_BOXES],
    *pend_wr = pend_que, *pend_rd = pend_que;
    *pend_wr++ = bi;
    while (pend_wr != pend_rd && n_free != 0) {
    int bk = *pend_rd++;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    v = boxes.b[bk][ci];
    if (!used_sorts[v]) {
    // new value in set
    uint8_t v_fi = solver_find(
    b_idx, result.b, free_boxes,
    n_free, v, boxes.b);
    uint8_t v_bi = free_boxes[v_fi];
    used_sorts[v] = true;
    *pend_wr++ = v_bi;
    free_boxes[v_fi] = free_boxes[--n_free];
    // remove from free list
    }
    }
    }
    }
    return result;
    }

    I think I could eventually understand what this code is doing, and
    after that get a sense of how and why it works. In its current
    form I think that would take me a fair amount of time. It would
    help to have a roadmap, more evocative names, more functional
    decomposition, or ideally all three. Please understand, I'm not
    asking or suggesting you do anything; my intention is only to
    provide feedback that may be of some benefit to you.

    I don't have a roadmap, but I have a variant of [approximately] the
    same algorithm implemented in the style that is closer to your
    preferences: implicit bookkeeping by recursion instead of explicit
    lists and back-trace pointers. I even borrowed the name "seen" from the
    code of your friend.
    Hope it helps:

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>

    #include "solver.h"

    typedef struct {
    uint8_t result[N_BOXES];
    // result contains sorts of cookies per box
    uint8_t free_boxes[N_BOXES];
    int n_free;
    uint8_t used_boxes[N_BOXES];
    int n_used;
    bool seen[N_BOXES];
    uint8_t b_idx[N_BOXES][N_BOXES];
    } solver_state_t;

    static
    bool solver_find(solver_state_t* s, uint8_t v, bool first)
    {
    int n_free = s->n_free;
    for (int i = 0; i < n_free; ++i) {
    uint8_t bi = s->free_boxes[i];
    uint8_t ci = s->b_idx[bi][v];
    if (ci != BOX_SIZE) {
    s->result[bi] = v;
    s->free_boxes[i] = s->free_boxes[--n_free];
    // remove box from free list
    s->n_free = n_free;
    s->used_boxes[s->n_used++] = bi; // add box to used set
    return true;
    }
    }

    if (first) {
    memset(s->seen, 0, sizeof(s->seen));
    s->seen[v] = true;
    }

    int n_used = s->n_used;
    for (int i = 0; i < n_used; ++i) {
    int bi = s->used_boxes[i];
    if (s->b_idx[bi][v] != BOX_SIZE) {
    uint8_t a = s->result[bi];
    if (!s->seen[a]) {
    s->seen[a] = true;
    if (solver_find(s, a, false)) {
    s->result[bi] = v;
    return true;
    }
    }
    }
    }
    return false;
    }

    peek_t solver(boxes_t boxes)
    {
    solver_state_t s;
    // build index
    memset(s.b_idx, BOX_SIZE, sizeof(s.b_idx));
    for (int bi = 0; bi < N_BOXES; ++bi) {
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t v = boxes.b[bi][ci];
    if (s.b_idx[bi][v] == BOX_SIZE)
    s.b_idx[bi][v] = ci;
    }
    }

    for (int bi = 0; bi < N_BOXES; ++bi)
    s.free_boxes[bi] = bi;
    s.n_free = N_BOXES;

    bool used_sorts[N_BOXES]={0};
    while (s.n_free > 0) {
    int bi = s.free_boxes[--s.n_free]; // new set
    uint8_t v = boxes.b[bi][0];
    s.result[bi] = v;
    used_sorts[v] = true;
    s.used_boxes[0] = bi;
    s.n_used = 1;
    int rd_i = 0; // used_boxes accessed as queue
    while (rd_i != s.n_used && s.n_free != 0) {
    int bk = s.used_boxes[rd_i++];
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    v = boxes.b[bk][ci];
    if (!used_sorts[v]) {
    // new value in set
    solver_find(&s, v, true);
    used_sorts[v] = true;
    }
    }
    }
    }

    // translate result from sorts to indices of selected cookies
    peek_t peek;
    for (int bi = 0; bi < N_BOXES; ++bi)
    peek.b[bi] = s.b_idx[bi][s.result[bi]];
    return peek;
    }

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Fri Apr 10 22:41:08 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:
    Tim Rentsch <[email protected]> wrote:
    Michael S <[email protected]> writes:
    Tim Rentsch <[email protected]> wrote:
    [...]
    I ran into trouble when I tried to work within the posted framework
    to drive a solver. The difficulties were of several kinds. The
    code didn't compile in my standard compilation environment, which
    is roughly this (the c99 might be c11 but that made no difference):

    gcc -x c -std=c99 -pedantic ...

    What were the problems?
    My gcc 14.2.0 on msys2 produces no warnings or errors with above
    flags.

    These source lines

    struct timespec t0;
    clock_gettime(CLOCK_MONOTONIC, &t0);

    produced these diagnostics

    tb.c:134:21: error: storage size of 't0' isn't known
    tb.c:135:5: warning: implicit declaration of function
    'clock_gettime' [...] tb.c:135:19: error: 'CLOCK_MONOTONIC'
    undeclared (first use in this function)

    under -std=c99. Apparently the first diagnostic, about
    struct timespec, is fixed under -std=c11. It wasn't hard
    to fix these, but it was irksome.

    Sounds like under Linux and with -std=c99 flag function clock_gettime()
    and related structures and constants are not declared/defined by default
    in time.h.

    Yes, gcc follows the ISO standard exactly, and does not define
    symbols like clock_gettime, because the C standard does not
    allow conforming impementations to do so.

    Man page suggests magic pixie dust:

    #define _XOPEN_SOURCE 600
    #include <time.h>
    #include <unistd.h>

    It is my practice not to mix ISO-conformant and non-ISO-conformant
    code in the same translation unit. It wasn't hard to find the
    necessary tweak for a separate source file used to get the
    current time:

    #define _POSIX_C_SOURCE 199309L
    #include <time.h>

    after which both the 'struct timespec' and 'clock_gettime()' were
    quite okay. I packaged the time in a way so it could get across
    the function call interface and back to the main program.

    If you ask why I used clock_gettime() instead of C standard
    timespec_get() then the answer is that on Windows, eps. on older
    versions like Win7 and WS2008 (which I prefer over newer stuff for
    reasons unrelated to quality of C RTL), precision of timespec_get() is
    poor. clock_gettime(CLOCK_MONOTONIC,) is much better.

    Makes sense.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 03:45:21 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 16:37:30 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 1 Apr 2026 16:34:47 +0300
    Michael S <[email protected]> wrote:

    <snip>

    My implementation #1.
    It is based on the proof provided by somebody with abstract math
    attitude.
    He said that his proof is something that the 5th grade
    schoolchildren could find with in 5 minutes.
    I am no 5th pupil any longer. Even understanding of his proof took
    me significantly longer than 5 minutes :(
    Here is my attempt of translation (original proof was not in
    English):

    "Solution would be done by induction by number of cookies of each
    sort (the same for all sorts).
    For K=2 the solution is simple. Please figure it out yourself.
    Now let's take K > 2 and postulate that for numbers of cookies per
    box that are smaller than K the assertion already proven.
    Let's take some disposition for which the assertion is correct and
    remove the corresponding "thread" (i.e. one cookie from each box,
    all of different sorts). Then supposition of induction is correct
    for remaining disposition, which means that we can remove yet
    another thread, then yet another etc.. for a total of K threads.
    That is, for K>2 there are more than 2 threads.
    Now let's exchange places between any 2 two cookies from different
    boxes. Since by doing so we are touching 1 or 2 threads then there
    still be at least one thread untouched (at least k-2 threads,
    actually [M.S.]). It means that the property "we can remove a
    thread" is not disturbed by our action (i.e. by exchange of two
    cookies). What is left it is to realize that by means of such
    exchanges any disposition can be mutated from any other disposition.
    That's all."

    I am confident that my understanding of this description, if indeed
    it ever occurs, would take a good deal longer than five minutes.
    Sadly I think it is representative of many of the explanations
    offered by mathematical types.

    He was not fully sutisfied himself. 20 hours later he came with other formulation. Here is my attemp of translatioon:

    1. There exist a disposition of cookies in boxes, for which it is
    possible to select one cookie from each box, all of different sorts.
    That is obvious. [MS: For example, each sort of cookies in its own dedicated box. That is a disposition that I use in my code] Let's call
    the selection "thread".

    2. There exists an operation that changes the disposition, but
    preserves the property "it is possible to select a thread" (the
    description of the operation would be given separately, a.t.m. it does
    not matter).

    3. By means of multiple operations of that type it is possible to turn
    any arbitrary disposition to any other. Since after every step the
    property "it is possible to select a thread" is preserved and because
    there exist a disposition, for which this property is true, it means
    that the property is true for any possible disposition.

    4. The operations in question - exchange of any pair of cookies.
    Comment: Induction is used only in (2) for proof that an operation
    preserves the desired property.

    For me, the second variant was less helpful. May be, for you it would
    be more helpful.

    I do at least understand it. It doesn't really help me construct
    a solution, because I don't see how to transition from one choice
    function (aka "thread") to another after exchanging a pair of
    cookies. (Nor do I want an explanation of how to do so.)

    [...code...]

    In most cases I'm not a big fan of interactive debugging, but
    trying to understand how this works might merit an exception.

    Frankly, not all difficulties of following my code caused by the fact
    that algorithm is derived from abstract proof.
    Significant share of hurdles are rooted in my somewhat conflicting
    desires to avoid all searches, except for inevitable search in the
    innermost loop of the second pass, but at the same time to keep format
    of the governing database (xch array+na array) minimalistic.

    For what it's worth, I think your code has a better chance of
    being understandable than the mathematical description of finding
    a solution. Not interested enough to pursue it though.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 04:27:54 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    (just a few short replies)

    [...]

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values can
    simply be directly compared, without any masking.

    Is it proven that this step helps?
    To me it sounds unnecessary.

    My experience with backtracking algorithms tells me that
    selecting a path where the number of possible choices is
    smallest usually does a better job of finding a solution
    quickly.

    [...]

    I think that's it. I hope it makes sense.

    It makes sense, but it looks to me as solution by exhausting, augmented
    by "smallest number of boxes" heuristic.
    It's not obvious [to me] without further thinking that the worst case is
    not very slow.

    Several short responses.

    I think "by exhaustion" is usually taken to mean an unguided
    exploration of all alternatives, even ones that aren't plausible.
    Backtracking isn't that.

    Backtracking is recognized as a viable search strategy in lots
    of different kinds of problems, including problems that are
    known to be NP complete. It has a good pedigree.

    My goals in writing the program were to write a program that is
    understandable, that does find solutions, and that has reasonable
    performance. Since the program did find answers and also met your
    stated performance goals, I have no reason to be dissatisfied.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 04:31:32 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    [...]

    I don't have a roadmap, but I have a variant of [approximately] the
    same algorithm implemented in the style that is closer to your
    preferences: implicit bookkeeping by recursion instead of explicit
    lists and back-trace pointers. I even borrowed the name "seen" from the
    code of your friend.

    [..code..]

    I appreciate the response. It will take me some time to look
    at this, and so I'm not sure when I can get to it.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Sat Apr 11 22:56:37 2026
    From Newsgroup: comp.lang.c

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:


    Here is an outline of the first (non-interface-compatible) code that
    I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means done)
    a "solution state" to direct the search (conceptually by value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in a
    way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values can
    simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected box"
    set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order bits
    this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    I need clarification for the last step.
    What exactly do we do after our fist guess failed?
    Supposedly continue to the next sort of cookies that has the same # of
    boxes as the one we just tried.
    But what we do after that? Do we have to try cookies with higher number
    of boxes?
    If yes, does it apply in case when cookies that we tried before had
    just one box?

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 15:57:23 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Here is an outline of the first (non-interface-compatible) code that
    I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means done)
    a "solution state" to direct the search (conceptually by value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in a
    way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values can
    simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected box"
    set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order bits
    this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    I need clarification for the last step.

    Here is a mixture of C code and pseudocode. A SolveState is an
    array, one element for each cookie type.


    Answer
    solve( SolveState *problem ){
    Answer answer;
    jmp_buf found;
    if( seek( &answer, found, problem ) ) return answer;

    printf( "No joy\n" );
    exit(0);
    }
    /* Note that seek() returns true if and only if a longjmp() was done. */


    _Bool
    seek( Answer *answer, jmp_buf found, SolveState *problem ){
    if( setjmp( found ) ) return true;
    return look( answer, found, 0, problem );
    }
    /* The call to look() after the if() starts the search going. */
    /* If an answer is found, a longjmp() will go back to the 'return true;' */


    _Bool
    look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
    SolveState work;
    ...

    if( level == COOKIE_LIMIT ) longjmp( found, 1 );

    ... ignoring the "optimization" step of choosing the best cookie ...

    for each box B in in->cookies[level] {
    remember box B for the cookie in->cookies[level], in *answer
    copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
    remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
    look( out, found, level+1, &work );
    }

    return false;
    }
    /* Note that 'true' is never returned by this function. */
    /* If there are more boxes to try for the current cookie, recurse. */
    /* When there are no more boxes to try, return false. */
    /* If/when all of the cookies have been assigned, longjmp() */
    /* Doing a longjmp() will cause seek() to return true. */


    The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
    for() loop.

    Now for your questions...

    What exactly do we do after our fist guess failed?

    If you mean the first box for the current cookie, go on to the
    next box for that cookie.

    If you mean all the boxes for the current cookie failed, simply
    return.

    If you mean all the boxes for the cookie on the first level, simply
    return, which will cause an outer call to fail irrevocably. Under
    the conditions of the problem, this never happens, if the code has
    been written correctly.

    Supposedly continue to the next sort of cookies that has the same # of
    boxes as the one we just tried.
    But what we do after that? Do we have to try cookies with higher number
    of boxes?

    No. At each level there is exactly one cookie to try, and we try
    each of the boxes that cookie might be taken from (at this point in
    the search); if none of those boxes work, backtrack -- which is to
    say, return to the next outer level.

    If yes, does it apply in case when cookies that we tried before had
    just one box?

    At each level, we are never concerned with the cookie choices that
    were made at previous levels (which means a smaller level index)
    as those choices are "frozen" for this part of the search.

    I hope these answers clear up what you're looking for.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Sun Apr 12 02:15:43 2026
    From Newsgroup: comp.lang.c

    On Sat, 11 Apr 2026 15:57:23 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Here is an outline of the first (non-interface-compatible) code
    that I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means
    done) a "solution state" to direct the search (conceptually by
    value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in
    a way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values
    can simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected
    box" set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order
    bits this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    I need clarification for the last step.

    Here is a mixture of C code and pseudocode. A SolveState is an
    array, one element for each cookie type.


    Answer
    solve( SolveState *problem ){
    Answer answer;
    jmp_buf found;
    if( seek( &answer, found, problem ) ) return answer;

    printf( "No joy\n" );
    exit(0);
    }
    /* Note that seek() returns true if and only if a longjmp() was done.
    */


    _Bool
    seek( Answer *answer, jmp_buf found, SolveState *problem ){
    if( setjmp( found ) ) return true;
    return look( answer, found, 0, problem );
    }
    /* The call to look() after the if() starts the search going. */
    /* If an answer is found, a longjmp() will go back to the 'return
    true;' */


    _Bool
    look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
    SolveState work;
    ...

    if( level == COOKIE_LIMIT ) longjmp( found, 1 );

    ... ignoring the "optimization" step of choosing the best cookie
    ...

    for each box B in in->cookies[level] {
    remember box B for the cookie in->cookies[level], in *answer
    copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
    remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
    look( out, found, level+1, &work );
    }

    return false;
    }
    /* Note that 'true' is never returned by this function. */
    /* If there are more boxes to try for the current cookie, recurse. */
    /* When there are no more boxes to try, return false. */
    /* If/when all of the cookies have been assigned, longjmp() */
    /* Doing a longjmp() will cause seek() to return true. */


    The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
    for() loop.

    Now for your questions...

    What exactly do we do after our fist guess failed?

    If you mean the first box for the current cookie, go on to the
    next box for that cookie.

    If you mean all the boxes for the current cookie failed, simply
    return.

    If you mean all the boxes for the cookie on the first level, simply
    return, which will cause an outer call to fail irrevocably. Under
    the conditions of the problem, this never happens, if the code has
    been written correctly.

    Supposedly continue to the next sort of cookies that has the same #
    of boxes as the one we just tried.
    But what we do after that? Do we have to try cookies with higher
    number of boxes?

    No. At each level there is exactly one cookie to try, and we try
    each of the boxes that cookie might be taken from (at this point in
    the search); if none of those boxes work, backtrack -- which is to
    say, return to the next outer level.

    If yes, does it apply in case when cookies that we tried before had
    just one box?

    At each level, we are never concerned with the cookie choices that
    were made at previous levels (which means a smaller level index)
    as those choices are "frozen" for this part of the search.

    I hope these answers clear up what you're looking for.

    It iterates differently from what I thought it does.
    I should digest a new information. Not tonight.




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Mon Apr 13 02:03:28 2026
    From Newsgroup: comp.lang.c

    On Sat, 11 Apr 2026 15:57:23 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Here is an outline of the first (non-interface-compatible) code
    that I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means
    done) a "solution state" to direct the search (conceptually by
    value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in
    a way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values
    can simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected
    box" set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order
    bits this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    I need clarification for the last step.

    Here is a mixture of C code and pseudocode. A SolveState is an
    array, one element for each cookie type.


    Answer
    solve( SolveState *problem ){
    Answer answer;
    jmp_buf found;
    if( seek( &answer, found, problem ) ) return answer;

    printf( "No joy\n" );
    exit(0);
    }
    /* Note that seek() returns true if and only if a longjmp() was done.
    */


    _Bool
    seek( Answer *answer, jmp_buf found, SolveState *problem ){
    if( setjmp( found ) ) return true;
    return look( answer, found, 0, problem );
    }
    /* The call to look() after the if() starts the search going. */
    /* If an answer is found, a longjmp() will go back to the 'return
    true;' */


    _Bool
    look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
    SolveState work;
    ...

    if( level == COOKIE_LIMIT ) longjmp( found, 1 );

    ... ignoring the "optimization" step of choosing the best cookie
    ...

    for each box B in in->cookies[level] {
    remember box B for the cookie in->cookies[level], in *answer
    copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
    remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
    look( out, found, level+1, &work );
    }

    return false;
    }
    /* Note that 'true' is never returned by this function. */
    /* If there are more boxes to try for the current cookie, recurse. */
    /* When there are no more boxes to try, return false. */
    /* If/when all of the cookies have been assigned, longjmp() */
    /* Doing a longjmp() will cause seek() to return true. */


    The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
    for() loop.

    Now for your questions...

    What exactly do we do after our fist guess failed?

    If you mean the first box for the current cookie, go on to the
    next box for that cookie.

    If you mean all the boxes for the current cookie failed, simply
    return.

    If you mean all the boxes for the cookie on the first level, simply
    return, which will cause an outer call to fail irrevocably. Under
    the conditions of the problem, this never happens, if the code has
    been written correctly.

    Supposedly continue to the next sort of cookies that has the same #
    of boxes as the one we just tried.
    But what we do after that? Do we have to try cookies with higher
    number of boxes?

    No. At each level there is exactly one cookie to try, and we try
    each of the boxes that cookie might be taken from (at this point in
    the search); if none of those boxes work, backtrack -- which is to
    say, return to the next outer level.

    If yes, does it apply in case when cookies that we tried before had
    just one box?

    At each level, we are never concerned with the cookie choices that
    were made at previous levels (which means a smaller level index)
    as those choices are "frozen" for this part of the search.

    I hope these answers clear up what you're looking for.

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>
    #include <limits.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint64_t boxes_of_sorts[N_BOXES] = {0};
    uint64_t sorts_of_boxes[N_BOXES];
    uint8_t b_idx[N_BOXES][N_BOXES];
    // build indices
    for (int bi = 0; bi < N_BOXES; ++bi) {
    uint64_t sorts = 0;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t sort = boxes.b[bi][ci];
    sorts |= (uint64_t)1 << sort;
    b_idx[bi][sort] = ci;
    boxes_of_sorts[sort] |= (uint64_t)1 << bi;
    }
    sorts_of_boxes[bi] = sorts;
    }

    peek_t result={0};
    uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
    uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
    struct stack_node {
    uint64_t free_boxes, free_sorts, boxes;
    int sel_sort;
    } stack[N_BOXES];
    int level = 0;
    do {
    // find free sort that remains in the smallest # of free boxes
    uint64_t msk = free_sorts;
    int nb_min = INT_MAX;
    int sel_sort = N_BOXES;
    while (msk) {
    int sort = __builtin_ctzll(msk);
    uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
    if (boxes) {
    int nb = __builtin_popcountll(boxes);
    if (nb_min > nb) {
    nb_min = nb;
    sel_sort = sort;
    if (nb == 1)
    break;
    }
    }
    msk ^= (uint64_t)1 << sort;
    }

    if (sel_sort == N_BOXES)
    goto pop;

    // candidate sort found
    uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
    back:;
    int bi = __builtin_ctzll(boxes);
    result.b[bi] = b_idx[bi][sel_sort];
    boxes ^= (uint64_t)1 << bi;
    // preserve state
    stack[level] = (struct stack_node){
    .free_boxes=free_boxes, .free_sorts=free_sorts,
    .boxes=boxes, .sel_sort = sel_sort};
    // go to the next level
    free_boxes ^= (uint64_t)1 << bi;
    free_sorts ^= (uint64_t)1 << sel_sort;
    ++level;
    continue;

    // return to previous level
    pop:
    while (level > 0) {
    --level;
    boxes = stack[level].boxes;
    if (boxes) {
    free_boxes = stack[level].free_boxes;
    free_sorts = stack[level].free_sorts;
    sel_sort = stack[level].sel_sort;
    goto back;
    }
    }
    // should never come here
    break;
    } while (level < N_BOXES);
    return result;
    }



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Mon Apr 13 17:11:29 2026
    From Newsgroup: comp.lang.c

    On Sat, 11 Apr 2026 15:57:23 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 09 Apr 2026 21:22:51 -0700
    Tim Rentsch <[email protected]> wrote:

    Here is an outline of the first (non-interface-compatible) code
    that I wrote.

    Use a straightforward backtracking scheme. Make a choice at the
    current level, and do a recursive call to try the next level. If
    the next level returns, it failed, so go on to the next choice at
    this level and call again.

    The output value is an array (passed recursively via pointer) with
    either a cookie type for each box or a box number for each cookie
    type (I don't remember which). This array is filled in as the
    recursive backtrack progresses.

    The whole search is started by doing a setjmp() before the call to
    the top level. If the search succeeds, a longjmp() is done to
    deliver an answer. That means that if the call to the top level
    returns then the search was a failure. The central recursive
    function has four parameters:

    an Answer *'out', to hold the solution
    a jmp_buf 'found', for a longjmp() when a full answer is found
    a counter/index to reflect search depth (starting 0, 20 means
    done) a "solution state" to direct the search (conceptually by
    value)

    The "solution state" is at the heart of the algorithm. It's a
    structure holding an array of 20 "cookie values", where a cookie
    value is represented by an unsigned 64-bit int. The bits are
    used as follows:

    low six bits: the cookie type
    next 52 bits: a "box mask", indicating at least one cookie of
    this type in the corresponding box (so up to 52
    boxes)
    upper bits: count of boxes (ie, number of ones in the box mask)

    The cookie type is important when storing a partial result in the
    answer, not at any other time.

    The box mask is important to know which boxes to try during the
    backtracking.

    The count of boxes is important to help guide the backtracking, in
    a way that we hope will get an answer faster.

    At each level of recursion, do the following:

    If the search depth is the number of boxes, longjmp()

    Next find the cookie of cookies in the solution state at or
    above this level that has the smallest number of boxes. Because
    the box count is held in the uppermost bits the 64-bit values
    can simply be directly compared, without any masking. Move that
    cookie to the array location corresponding to the current depth
    (and put the cookie that was there in the hole left by doing the
    move).

    For each box in the box mask of that cookie, try selecting that
    box for this cookie, which means:

    copy the state of cookies above this level to a new solution
    state, so the changed solution state described above can be
    left alone

    for each of those cookies, if the cookie has the "selected
    box" set in its box mask, clear the bit and reduce the box count
    by one. (again since the box count is held in high order
    bits this count can be reduced by just subtracting a scaled one
    value.)

    recurse to continue searching at the next level

    If we run out of boxes to check, just return; continue trying
    alternatives at the next level up, if there is one.

    I think that's it. I hope it makes sense.

    I need clarification for the last step.

    Here is a mixture of C code and pseudocode. A SolveState is an
    array, one element for each cookie type.


    Answer
    solve( SolveState *problem ){
    Answer answer;
    jmp_buf found;
    if( seek( &answer, found, problem ) ) return answer;

    printf( "No joy\n" );
    exit(0);
    }
    /* Note that seek() returns true if and only if a longjmp() was done.
    */


    _Bool
    seek( Answer *answer, jmp_buf found, SolveState *problem ){
    if( setjmp( found ) ) return true;
    return look( answer, found, 0, problem );
    }
    /* The call to look() after the if() starts the search going. */
    /* If an answer is found, a longjmp() will go back to the 'return
    true;' */


    _Bool
    look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
    SolveState work;
    ...

    if( level == COOKIE_LIMIT ) longjmp( found, 1 );

    ... ignoring the "optimization" step of choosing the best cookie
    ...

    for each box B in in->cookies[level] {
    remember box B for the cookie in->cookies[level], in *answer
    copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
    remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
    look( out, found, level+1, &work );
    }

    return false;
    }
    /* Note that 'true' is never returned by this function. */
    /* If there are more boxes to try for the current cookie, recurse. */
    /* When there are no more boxes to try, return false. */
    /* If/when all of the cookies have been assigned, longjmp() */
    /* Doing a longjmp() will cause seek() to return true. */


    The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
    for() loop.

    Now for your questions...

    What exactly do we do after our fist guess failed?

    If you mean the first box for the current cookie, go on to the
    next box for that cookie.

    If you mean all the boxes for the current cookie failed, simply
    return.

    If you mean all the boxes for the cookie on the first level, simply
    return, which will cause an outer call to fail irrevocably. Under
    the conditions of the problem, this never happens, if the code has
    been written correctly.

    Supposedly continue to the next sort of cookies that has the same #
    of boxes as the one we just tried.
    But what we do after that? Do we have to try cookies with higher
    number of boxes?

    No. At each level there is exactly one cookie to try, and we try
    each of the boxes that cookie might be taken from (at this point in
    the search); if none of those boxes work, backtrack -- which is to
    say, return to the next outer level.

    If yes, does it apply in case when cookies that we tried before had
    just one box?

    At each level, we are never concerned with the cookie choices that
    were made at previous levels (which means a smaller level index)
    as those choices are "frozen" for this part of the search.

    I hope these answers clear up what you're looking for.


    Just now I paid attention that when posting last night I made a
    mistake in cut&past. It result in the post that contained the source
    code but missed text that explains what it's all about.
    So, second try.

    Could the code below be considered an implementation of your algorithm?
    It uses different programming technique:
    - iteration, goto and explicit stack instead of recursion
    - loop instead of longjmp
    - popcount instead of maintaining 'count of boxes' field
    - free_sorts bit mask instead of list of sorts of cookies indexed by
    levels (later on I realized that this modification was unnecessary)
    I coded it in such manner because as the next step I am planning to add instrumentation that would be easier in iterative code than in the
    recursive one.
    It seems to me that resulting search order is either exactly or at
    least approximately the same as in your description.


    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>
    #include <limits.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint64_t boxes_of_sorts[N_BOXES] = {0};
    uint64_t sorts_of_boxes[N_BOXES];
    uint8_t b_idx[N_BOXES][N_BOXES];
    // build indices
    for (int bi = 0; bi < N_BOXES; ++bi) {
    uint64_t sorts = 0;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t sort = boxes.b[bi][ci];
    sorts |= (uint64_t)1 << sort;
    b_idx[bi][sort] = ci;
    boxes_of_sorts[sort] |= (uint64_t)1 << bi;
    }
    sorts_of_boxes[bi] = sorts;
    }

    peek_t result={0};
    uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
    uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
    struct stack_node {
    uint64_t free_boxes, free_sorts, boxes;
    int sel_sort;
    } stack[N_BOXES];
    int level = 0;
    do {
    // find free sort that remains in the smallest # of free boxes
    uint64_t msk = free_sorts;
    int nb_min = INT_MAX;
    int sel_sort = N_BOXES;
    while (msk) {
    int sort = __builtin_ctzll(msk);
    uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
    if (boxes) {
    int nb = __builtin_popcountll(boxes);
    if (nb_min > nb) {
    nb_min = nb;
    sel_sort = sort;
    if (nb == 1)
    break;
    }
    }
    msk ^= (uint64_t)1 << sort;
    }

    if (sel_sort == N_BOXES)
    goto pop;

    // candidate sort found
    uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
    back:;
    int bi = __builtin_ctzll(boxes);
    result.b[bi] = b_idx[bi][sel_sort];
    boxes ^= (uint64_t)1 << bi;
    // preserve state
    stack[level] = (struct stack_node){
    .free_boxes=free_boxes, .free_sorts=free_sorts,
    .boxes=boxes, .sel_sort = sel_sort};
    // go to the next level
    free_boxes ^= (uint64_t)1 << bi;
    free_sorts ^= (uint64_t)1 << sel_sort;
    ++level;
    continue;

    // return to previous level
    pop:
    while (level > 0) {
    --level;
    boxes = stack[level].boxes;
    if (boxes) {
    free_boxes = stack[level].free_boxes;
    free_sorts = stack[level].free_sorts;
    sel_sort = stack[level].sel_sort;
    goto back;
    }
    }
    // should never come here
    break;
    } while (level < N_BOXES);
    return result;
    }













    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Wed Apr 15 02:15:26 2026
    From Newsgroup: comp.lang.c

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:


    After that, I talked to a friend to try a different approach to
    producing a solution. After some light editing by myself -- mostly
    just formatting and some name changes -- the code below was put into
    the hopper. (Note: I claim no credit for code below. I did some
    editing to make it more readable but beyond that none of it is a
    result of my efforts.)

    #include <stdint.h>
    #include "solver.h"

    static uint32_t adjacent[ N_BOXES ];
    static int cookie_of_box[ N_BOXES ];
    static int box_of_cookie[ N_BOXES ];
    static uint32_t seen;

    static int
    search( int b ){
    uint32_t m = adjacent[b];
    while( m ){
    uint32_t bit = m & -m;
    m ^= bit;
    int c = __builtin_ctz( bit );
    if( seen & 1u<<c ) continue;
    seen |= 1u<<c;
    if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
    ) ){ box_of_cookie[c] = b;
    cookie_of_box[b] = c;
    return 1;
    }
    }
    return 0;
    }

    peek_t
    solver( boxes_t boxes ){
    peek_t res;

    for( int i = 0; i < N_BOXES; i++ ){
    uint32_t m = 0;
    for( int k = 0; k < BOX_SIZE; k++ ){
    m |= 1u << boxes.b[i][k];
    }
    adjacent[i] = m;
    cookie_of_box[i] = -1;
    box_of_cookie[i] = -1;
    }

    for( int i = 0; i < N_BOXES; i++ ){
    seen = 0;
    search( i );
    }

    for( int i = 0; i < N_BOXES; i++ ){
    int c = cookie_of_box[i];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if( boxes.b[i][k] == c ){
    res.b[i] = k;
    break;
    }
    }
    }
    return res;
    }

    Despite being derived independently, this code uses the same sort of
    approach that I had used, except my code was recursive rather than
    iterative.



    I finally looked closer to the code of your friend.
    To me, it does not look at all similar to description of "your"
    algorithm in your other post.
    It appears more similar to my 2nd algorithm, but sort of orthogonal to
    it - my outermost loop iterate through sorts of cookies; in the code of
    your friend outermost loop iterate through boxes.

    Another difference, less important for robustness, more important for
    measured speed, esp. on avearge, is when search decides to revert
    previous selection. I am doing it only after all possibilities to
    select at the top level failed. Your friend makes no such effort and
    easily dives deep. In principle, both approaches are equivalent, but in physical world recursive call is significantly slower than iteration of
    simple loop and excessive updates of state variables are also slower
    then lookups without updates.

    And, of course, apart from difference in algorithm there is a
    significant difference in implementation technique. Your friend took
    advantage of the fact that the challenge was specified for relatively
    small number of boxes and intensely used bit masks. I [didn't took
    similar advantage [except for using byte variables instead of short or
    int] and used arrays. Of course, the technique applied by your friend
    is faster. His solution ended up ~2 times slower then mine not because
    of technique, but because of algorithmic difference mentioned in
    previous paragraph.

    Today I modified his code, applying variant of the same algorithmic
    change. At the same opportunity I removed use of static data and
    increased size of bit masks to 64 bits. As expected, resulting code was
    very fast, ~3 times faster than my own.

    Here is a code:
    // Modification of code of friend of Tim Rentsch
    #include <stdint.h>
    #include "solver.h"

    typedef struct {
    uint64_t cookies_in_box[ N_BOXES ]; // helper index, constant after
    init uint64_t selected;
    uint8_t box_of_cookie [ N_BOXES ]; // valid for selected sorts
    } state_t;

    // return 0 on success, seen mask on failure
    static uint64_t search(state_t* s, int b, uint64_t seen)
    {
    uint64_t m = s->cookies_in_box[b] & ~seen;
    if (m & ~s->selected) {
    int c = __builtin_ctzll(m & ~s->selected);
    s->box_of_cookie[c] = b;
    s->selected |= 1ull << c;
    return 0; // success
    }
    while( m ){
    int c = __builtin_ctz(m);
    seen = search(s, s->box_of_cookie[c], seen | (1ull << c));
    if (seen == 0) {
    s->box_of_cookie[c] = b;
    return 0; // success
    }
    m &= ~seen;
    }
    return seen;
    }

    peek_t
    solver( boxes_t boxes ){
    state_t s;
    for (int i = 0; i < N_BOXES; i++) {
    uint64_t m = 0;
    for (int k = 0; k < BOX_SIZE; k++)
    m |= 1ull << boxes.b[i][k];
    s.cookies_in_box[i] = m;
    }

    s.selected = 0;
    for( int i = 0; i < N_BOXES; i++ )
    search(&s, i, 0);

    // translate result from box_of_cookie[] to index of cookie in box
    peek_t res;
    for (int c = 0; c < N_BOXES; ++c) {
    int b = s.box_of_cookie[c];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if (boxes.b[b][k] == c) {
    res.b[b] = k;
    break;
    }
    }
    }
    return res;
    }

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 02:57:32 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:

    After that, I talked to a friend to try a different approach to
    producing a solution. After some light editing by myself -- mostly
    just formatting and some name changes -- the code below was put into
    the hopper. (Note: I claim no credit for code below. I did some
    editing to make it more readable but beyond that none of it is a
    result of my efforts.)

    #include <stdint.h>
    #include "solver.h"

    static uint32_t adjacent[ N_BOXES ];
    static int cookie_of_box[ N_BOXES ];
    static int box_of_cookie[ N_BOXES ];
    static uint32_t seen;

    static int
    search( int b ){
    uint32_t m = adjacent[b];
    while( m ){
    uint32_t bit = m & -m;
    m ^= bit;
    int c = __builtin_ctz( bit );
    if( seen & 1u<<c ) continue;
    seen |= 1u<<c;
    if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
    ) ){ box_of_cookie[c] = b;
    cookie_of_box[b] = c;
    return 1;
    }
    }
    return 0;
    }

    peek_t
    solver( boxes_t boxes ){
    peek_t res;

    for( int i = 0; i < N_BOXES; i++ ){
    uint32_t m = 0;
    for( int k = 0; k < BOX_SIZE; k++ ){
    m |= 1u << boxes.b[i][k];
    }
    adjacent[i] = m;
    cookie_of_box[i] = -1;
    box_of_cookie[i] = -1;
    }

    for( int i = 0; i < N_BOXES; i++ ){
    seen = 0;
    search( i );
    }

    for( int i = 0; i < N_BOXES; i++ ){
    int c = cookie_of_box[i];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if( boxes.b[i][k] == c ){
    res.b[i] = k;
    break;
    }
    }
    }
    return res;
    }

    Despite being derived independently, this code uses the same sort of
    approach that I had used, except my code was recursive rather than
    iterative.

    I finally looked closer to the code of your friend.
    To me, it does not look at all similar to description of "your"
    algorithm in your other post.
    It appears more similar to my 2nd algorithm, but sort of orthogonal to
    it - my outermost loop iterate through sorts of cookies; in the code of
    your friend outermost loop iterate through boxes.

    I confess I didn't look closely at the algorithm but relied
    on comments about how the code worked. I tried to express
    this in my earlier posting when I said "the same sort of
    approach" rather than "the same approach". Sorry for any
    confusion.

    Another difference, less important for robustness, more important for measured speed, esp. on avearge, is when search decides to revert
    previous selection. I am doing it only after all possibilities to
    select at the top level failed. Your friend makes no such effort and
    easily dives deep. In principle, both approaches are equivalent, but
    in physical world recursive call is significantly slower than
    iteration of simple loop and excessive updates of state variables are
    also slower then lookups without updates.

    And, of course, apart from difference in algorithm there is a
    significant difference in implementation technique. Your friend took advantage of the fact that the challenge was specified for relatively
    small number of boxes and intensely used bit masks. I [didn't took
    similar advantage [except for using byte variables instead of short
    or int] and used arrays. Of course, the technique applied by your
    friend is faster. His solution ended up ~2 times slower then mine
    not because of technique, but because of algorithmic difference
    mentioned in previous paragraph.

    Today I modified his code, applying variant of the same algorithmic
    change. At the same opportunity I removed use of static data and
    increased size of bit masks to 64 bits. As expected, resulting code
    was very fast, ~3 times faster than my own. [...code...]

    I'm happy to hear the posting provided an opportunity to more deeply
    explore the problem. Also it's interesting to learn the results of
    your investigations; thank you for those.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 03:22:00 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    [...]

    After seeing your more recent posting I decided to focus on
    that and let this one slide by.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 23:52:14 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Sat, 11 Apr 2026 15:57:23 -0700
    Tim Rentsch <[email protected]> wrote:

    [..some earlier back and forth..]

    Just now I paid attention that when posting last night I made a
    mistake in cut&past. It result in the post that contained the
    source code but missed text that explains what it's all about.
    So, second try.

    Could the code below be considered an implementation of your
    algorithm?

    I'm not sure. I think it's close, but I'm having trouble being
    more definite than that.

    It uses different programming technique:
    - iteration, goto and explicit stack instead of recursion

    I got that.

    - loop instead of longjmp

    I got that too.

    - popcount instead of maintaining 'count of boxes' field

    I see that. Interesting choice.

    - free_sorts bit mask instead of list of sorts of cookies
    indexed by levels (later on I realized that this modification
    was unnecessary)

    I see that you did this but I was confused about how it worked.

    I coded it in such manner because as the next step I am planning
    to add instrumentation that would be easier in iterative code than
    in the recursive one.

    Yes, and surely there are other reasons to want an iterative version
    rather than a recursive one.

    It seems to me that resulting search order is either exactly or at
    least approximately the same as in your description.

    Exactly or at least approximately -- I'll go for that. :)

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>
    #include <limits.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint64_t boxes_of_sorts[N_BOXES] = {0};
    uint64_t sorts_of_boxes[N_BOXES];
    uint8_t b_idx[N_BOXES][N_BOXES];
    // build indices
    for (int bi = 0; bi < N_BOXES; ++bi) {
    uint64_t sorts = 0;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t sort = boxes.b[bi][ci];
    sorts |= (uint64_t)1 << sort;
    b_idx[bi][sort] = ci;
    boxes_of_sorts[sort] |= (uint64_t)1 << bi;
    }
    sorts_of_boxes[bi] = sorts;
    }

    peek_t result={0};
    uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
    uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
    struct stack_node {
    uint64_t free_boxes, free_sorts, boxes;
    int sel_sort;
    } stack[N_BOXES];
    int level = 0;
    do {
    // find free sort that remains in the smallest # of free boxes
    uint64_t msk = free_sorts;
    int nb_min = INT_MAX;
    int sel_sort = N_BOXES;
    while (msk) {
    int sort = __builtin_ctzll(msk);
    uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
    if (boxes) {
    int nb = __builtin_popcountll(boxes);
    if (nb_min > nb) {
    nb_min = nb;
    sel_sort = sort;
    if (nb == 1)
    break;
    }
    }
    msk ^= (uint64_t)1 << sort;
    }

    if (sel_sort == N_BOXES)
    goto pop;

    // candidate sort found
    uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
    back:;
    int bi = __builtin_ctzll(boxes);
    result.b[bi] = b_idx[bi][sel_sort];
    boxes ^= (uint64_t)1 << bi;
    // preserve state
    stack[level] = (struct stack_node){
    .free_boxes=free_boxes, .free_sorts=free_sorts,
    .boxes=boxes, .sel_sort = sel_sort};
    // go to the next level
    free_boxes ^= (uint64_t)1 << bi;
    free_sorts ^= (uint64_t)1 << sel_sort;
    ++level;
    continue;

    // return to previous level
    pop:
    while (level > 0) {
    --level;
    boxes = stack[level].boxes;
    if (boxes) {
    free_boxes = stack[level].free_boxes;
    free_sorts = stack[level].free_sorts;
    sel_sort = stack[level].sel_sort;
    goto back;
    }
    }
    // should never come here
    break;
    } while (level < N_BOXES);
    return result;
    }

    I'm leaving the code in even though I don't have a lot to say
    about it. Trying to understand it I kept getting an internal
    "complexity threshold exceeded" exception. Finally I decided I
    would just try to rewrite it, preserving the spirit although not
    all of the details. My code ended up being more lines of source
    than yours although the core routine is shorter. There are three
    parts that I broke out into separate functions (not shown). Those
    parts are, one, a function 'bsets_from_boxes()' to turn a boxes_t
    into a collection of bitsets holding the boxes corresponding to
    each cookie type; two, a function 'best_next_ctype()' that
    chooses which cookie type to consider next; and three, a final
    function 'answer()' to turn the 'box_chosen[]' bitset array into
    the representation needed by the interface. Here is the central
    routine:


    peek_t
    solver( boxes_t box_contents ){
    Bset boxes_free[ COOKIE_LIMIT + 1 ];
    Ctype ctypes[ COOKIE_LIMIT ];
    Bset boxes_to_try[ COOKIE_LIMIT ];
    Bset box_chosen[ COOKIE_LIMIT ];

    Bsets bsets = bsets_from_boxes( &box_contents );
    Index depth = 0;
    Bset free = boxes_free[0] = ~((Bset){-1} << COOKIE_LIMIT-1 << 1);

    for( Index i = 0; i < LIMIT_OF( ctypes ); i++ ) ctypes[i] = i;

    do {
    Ctype ctype = best_next_ctype( depth, free, &bsets, ctypes );
    Bset chosen;

    boxes_to_try[depth] = bsets.bset_of_ctype[ ctype ] & free;

    while(
    chosen = boxes_to_try[depth],
    boxes_to_try[depth] ^= chosen ^= chosen & chosen-1,
    !chosen
    ){
    assert( depth > 0 ), --depth;
    free = boxes_free[ depth ];
    ctype = ctypes[ depth ];
    }

    box_chosen[ ctype ] = chosen;
    assert( free & chosen );
    boxes_free[ depth+1 ] = free = free ^ chosen;

    } while( ++depth < COOKIE_LIMIT );

    return answer( box_chosen, ctypes, &box_contents );
    }

    Two further comments.

    One, by keeping the box_chosen[] array as bit masks, the use of
    the __builtin bit function gets deferred and so is done only once
    for each output slot, at the end (in the 'answer()' function).

    Two, the function 'best_next_ctype()' manipulates the ctypes[]
    array as well as returning the cookie type at the appropriate
    depth in the array. I tried different policies for when to
    select the "best" cookie type, including "always", "never", and
    "only for the first N levels", with several choices of N.
    Different policies produced different timing values but there
    wasn't any obvious relation between them. Something that
    occurred to me only later is a policy "choose the best out of the
    next K cookie types", for different values of K.

    The main thing is I hope the above routine does a better job of
    conveying my thoughts and understandings.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 17 14:44:10 2026
    From Newsgroup: comp.lang.c

    On Thu, 16 Apr 2026 23:52:14 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Sat, 11 Apr 2026 15:57:23 -0700
    Tim Rentsch <[email protected]> wrote:

    [..some earlier back and forth..]

    Just now I paid attention that when posting last night I made a
    mistake in cut&past. It result in the post that contained the
    source code but missed text that explains what it's all about.
    So, second try.

    Could the code below be considered an implementation of your
    algorithm?

    I'm not sure. I think it's close, but I'm having trouble being
    more definite than that.

    It uses different programming technique:
    - iteration, goto and explicit stack instead of recursion

    I got that.

    - loop instead of longjmp

    I got that too.

    - popcount instead of maintaining 'count of boxes' field

    I see that. Interesting choice.

    - free_sorts bit mask instead of list of sorts of cookies
    indexed by levels (later on I realized that this modification
    was unnecessary)

    I see that you did this but I was confused about how it worked.

    I coded it in such manner because as the next step I am planning
    to add instrumentation that would be easier in iterative code than
    in the recursive one.

    Yes, and surely there are other reasons to want an iterative version
    rather than a recursive one.

    It seems to me that resulting search order is either exactly or at
    least approximately the same as in your description.

    Exactly or at least approximately -- I'll go for that. :)

    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>
    #include <limits.h>

    #include "solver.h"

    peek_t solver(boxes_t boxes)
    {
    uint64_t boxes_of_sorts[N_BOXES] = {0};
    uint64_t sorts_of_boxes[N_BOXES];
    uint8_t b_idx[N_BOXES][N_BOXES];
    // build indices
    for (int bi = 0; bi < N_BOXES; ++bi) {
    uint64_t sorts = 0;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    uint8_t sort = boxes.b[bi][ci];
    sorts |= (uint64_t)1 << sort;
    b_idx[bi][sort] = ci;
    boxes_of_sorts[sort] |= (uint64_t)1 << bi;
    }
    sorts_of_boxes[bi] = sorts;
    }

    peek_t result={0};
    uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
    uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
    struct stack_node {
    uint64_t free_boxes, free_sorts, boxes;
    int sel_sort;
    } stack[N_BOXES];
    int level = 0;
    do {
    // find free sort that remains in the smallest # of free boxes
    uint64_t msk = free_sorts;
    int nb_min = INT_MAX;
    int sel_sort = N_BOXES;
    while (msk) {
    int sort = __builtin_ctzll(msk);
    uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
    if (boxes) {
    int nb = __builtin_popcountll(boxes);
    if (nb_min > nb) {
    nb_min = nb;
    sel_sort = sort;
    if (nb == 1)
    break;
    }
    }
    msk ^= (uint64_t)1 << sort;
    }

    if (sel_sort == N_BOXES)
    goto pop;

    // candidate sort found
    uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
    back:;
    int bi = __builtin_ctzll(boxes);
    result.b[bi] = b_idx[bi][sel_sort];
    boxes ^= (uint64_t)1 << bi;
    // preserve state
    stack[level] = (struct stack_node){
    .free_boxes=free_boxes, .free_sorts=free_sorts,
    .boxes=boxes, .sel_sort = sel_sort};
    // go to the next level
    free_boxes ^= (uint64_t)1 << bi;
    free_sorts ^= (uint64_t)1 << sel_sort;
    ++level;
    continue;

    // return to previous level
    pop:
    while (level > 0) {
    --level;
    boxes = stack[level].boxes;
    if (boxes) {
    free_boxes = stack[level].free_boxes;
    free_sorts = stack[level].free_sorts;
    sel_sort = stack[level].sel_sort;
    goto back;
    }
    }
    // should never come here
    break;
    } while (level < N_BOXES);
    return result;
    }

    I'm leaving the code in even though I don't have a lot to say
    about it. Trying to understand it I kept getting an internal
    "complexity threshold exceeded" exception. Finally I decided I
    would just try to rewrite it, preserving the spirit although not
    all of the details. My code ended up being more lines of source
    than yours although the core routine is shorter. There are three
    parts that I broke out into separate functions (not shown). Those
    parts are, one, a function 'bsets_from_boxes()' to turn a boxes_t
    into a collection of bitsets holding the boxes corresponding to
    each cookie type; two, a function 'best_next_ctype()' that
    chooses which cookie type to consider next; and three, a final
    function 'answer()' to turn the 'box_chosen[]' bitset array into
    the representation needed by the interface. Here is the central
    routine:


    peek_t
    solver( boxes_t box_contents ){
    Bset boxes_free[ COOKIE_LIMIT + 1 ];
    Ctype ctypes[ COOKIE_LIMIT ];
    Bset boxes_to_try[ COOKIE_LIMIT ];
    Bset box_chosen[ COOKIE_LIMIT ];

    Bsets bsets = bsets_from_boxes( &box_contents );
    Index depth = 0;
    Bset free = boxes_free[0] = ~((Bset){-1} <<
    COOKIE_LIMIT-1 << 1);
    for( Index i = 0; i < LIMIT_OF( ctypes ); i++ ) ctypes[i] = i;

    do {
    Ctype ctype = best_next_ctype( depth, free, &bsets, ctypes );
    Bset chosen;

    boxes_to_try[depth] = bsets.bset_of_ctype[ ctype ] & free;

    while(
    chosen = boxes_to_try[depth],
    boxes_to_try[depth] ^= chosen ^= chosen & chosen-1,
    !chosen
    ){
    assert( depth > 0 ), --depth;
    free = boxes_free[ depth ];
    ctype = ctypes[ depth ];
    }

    box_chosen[ ctype ] = chosen;
    assert( free & chosen );
    boxes_free[ depth+1 ] = free = free ^ chosen;

    } while( ++depth < COOKIE_LIMIT );

    return answer( box_chosen, ctypes, &box_contents );
    }

    Two further comments.

    One, by keeping the box_chosen[] array as bit masks, the use of
    the __builtin bit function gets deferred and so is done only once
    for each output slot, at the end (in the 'answer()' function).

    Two, the function 'best_next_ctype()' manipulates the ctypes[]
    array as well as returning the cookie type at the appropriate
    depth in the array. I tried different policies for when to
    select the "best" cookie type, including "always", "never", and
    "only for the first N levels", with several choices of N.
    Different policies produced different timing values but there
    wasn't any obvious relation between them. Something that
    occurred to me only later is a policy "choose the best out of the
    next K cookie types", for different values of K.


    I'd guess that every regular c.l.c reader knows that you like guessing
    games. Even if I don't think that it is appropriate here, up to the
    certain limit, I am willing to participate.
    It's pretty easy to guess that the code above preceded by:

    #include <assert.h>
    #include "solver.h"

    typedef uint64_t Bset;
    typedef uint8_t Ctype;
    typedef size_t Index;
    enum { COOKIE_LIMIT = N_BOXES };
    typedef struct {
    Bset bset_of_ctype[COOKIE_LIMIT];
    // more fields? I can't tell
    } Bsets;
    #define LIMIT_OF(x) (sizeof(x)/sizeof(x[0]))

    static Bsets bsets_from_boxes(const boxes_t* );
    static Ctype best_next_ctype(Index, Bset, Bsets*, Ctype* );
    static peek_t answer(const Bset box_chosen[], const Ctype ctypes[],
    const boxes_t* boxes);

    It's also easy to guess that bsets_from_boxes contains following code:

    static Bsets bsets_from_boxes(const boxes_t* boxes)
    {
    Bsets ret={0};
    for (Index bi = 0; bi < N_BOXES; ++bi) {
    for (int ci = 0; ci < BOX_SIZE; ++ci)
    ret.bset_of_ctype[boxes->b[bi][ci]] |= (Bset)1 << bi;
    }
    return ret;
    }

    But quite possibly there is more code.

    I am far less certain about answer(). In particular, I have no idea why
    it needs its 2nd parameter. My guess is that it's something like that:

    static
    peek_t answer(const Bset box_chosen[], const Ctype unused[],
    const boxes_t* boxes)
    {
    peek_t ret;
    for (int i = 0; i < COOKIE_LIMIT; ++i) {
    int bi = __builtin_ctzll(box_chosen[i]);
    for (int ci = 0; ; ++ci)
    if (boxes->b[bi][ci] == i) {
    ret.b[bi] = ci;
    break;
    }
    }
    return ret;
    }

    However I both can not and don't want to guess the content of your
    cores routine best_next_ctype(). And without it I don't have much to
    say about you solution.

    The main thing is I hope the above routine does a better job of
    conveying my thoughts and understandings.

    It looks like unlike your friend you missed the key - equity of number
    of boxes and number of sorts of cookies is not accidental! It's the
    most important thing in the whole exercise. It's what make solution
    feasible for much bigger values of N_BOXES, probably up to several K
    in less than 1 sec.
    But, then again, without source code of best_next_ctype() I can not be
    sure about it.





    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 17 17:04:44 2026
    From Newsgroup: comp.lang.c

    On Thu, 16 Apr 2026 02:57:32 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 08 Apr 2026 08:42:11 -0700
    Tim Rentsch <[email protected]> wrote:

    After that, I talked to a friend to try a different approach to
    producing a solution. After some light editing by myself -- mostly
    just formatting and some name changes -- the code below was put
    into the hopper. (Note: I claim no credit for code below. I did
    some editing to make it more readable but beyond that none of it
    is a result of my efforts.)

    #include <stdint.h>
    #include "solver.h"

    static uint32_t adjacent[ N_BOXES ];
    static int cookie_of_box[ N_BOXES ];
    static int box_of_cookie[ N_BOXES ];
    static uint32_t seen;

    static int
    search( int b ){
    uint32_t m = adjacent[b];
    while( m ){
    uint32_t bit = m & -m;
    m ^= bit;
    int c = __builtin_ctz( bit );
    if( seen & 1u<<c ) continue;
    seen |= 1u<<c;
    if( box_of_cookie[c] == -1 || search(
    box_of_cookie[c] ) ){ box_of_cookie[c] = b;
    cookie_of_box[b] = c;
    return 1;
    }
    }
    return 0;
    }

    peek_t
    solver( boxes_t boxes ){
    peek_t res;

    for( int i = 0; i < N_BOXES; i++ ){
    uint32_t m = 0;
    for( int k = 0; k < BOX_SIZE; k++ ){
    m |= 1u << boxes.b[i][k];
    }
    adjacent[i] = m;
    cookie_of_box[i] = -1;
    box_of_cookie[i] = -1;
    }

    for( int i = 0; i < N_BOXES; i++ ){
    seen = 0;
    search( i );
    }

    for( int i = 0; i < N_BOXES; i++ ){
    int c = cookie_of_box[i];
    for( int k = 0; k < BOX_SIZE; k++ ){
    if( boxes.b[i][k] == c ){
    res.b[i] = k;
    break;
    }
    }
    }
    return res;
    }

    Despite being derived independently, this code uses the same sort
    of approach that I had used, except my code was recursive rather
    than iterative.

    I finally looked closer to the code of your friend.
    To me, it does not look at all similar to description of "your"
    algorithm in your other post.
    It appears more similar to my 2nd algorithm, but sort of orthogonal
    to it - my outermost loop iterate through sorts of cookies; in the
    code of your friend outermost loop iterate through boxes.

    I confess I didn't look closely at the algorithm but relied
    on comments about how the code worked. I tried to express
    this in my earlier posting when I said "the same sort of
    approach" rather than "the same approach". Sorry for any
    confusion.

    Another difference, less important for robustness, more important
    for measured speed, esp. on avearge, is when search decides to
    revert previous selection. I am doing it only after all
    possibilities to select at the top level failed. Your friend makes
    no such effort and easily dives deep. In principle, both
    approaches are equivalent, but in physical world recursive call is significantly slower than iteration of simple loop and excessive
    updates of state variables are also slower then lookups without
    updates.

    And, of course, apart from difference in algorithm there is a
    significant difference in implementation technique. Your friend
    took advantage of the fact that the challenge was specified for
    relatively small number of boxes and intensely used bit masks. I
    [didn't took similar advantage [except for using byte variables
    instead of short or int] and used arrays. Of course, the technique
    applied by your friend is faster. His solution ended up ~2 times
    slower then mine not because of technique, but because of
    algorithmic difference mentioned in previous paragraph.

    Today I modified his code, applying variant of the same algorithmic
    change. At the same opportunity I removed use of static data and
    increased size of bit masks to 64 bits. As expected, resulting code
    was very fast, ~3 times faster than my own. [...code...]

    I'm happy to hear the posting provided an opportunity to more deeply
    explore the problem. Also it's interesting to learn the results of
    your investigations; thank you for those.


    I was aware of possibility of taking advantage of the fact that value
    of N_BOXES specified in the challenge is <= than dominant width of
    machine word.
    I intentionally avoided it until this point, for two reasons, one
    abstract (A) and one practical (P).
    A. the challenge was meant to be algorithmic with lesser emphasis on implementation
    P. I planned (and still plan) experimental testing of BigO

    But I couldn't resist the appeal of fast :(

    The achieved 3x speed up was actually disappointing. I expected more.
    It seems, by now building helper indices takes more time than solver
    itself, which makes further progress in this part pointless.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Fri Apr 17 07:40:13 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Thu, 16 Apr 2026 23:52:14 -0700
    Tim Rentsch <[email protected]> wrote:

    [...]

    I'd guess that every regular c.l.c reader knows that you like guessing
    games. Even if I don't think that it is appropriate here, up to the
    certain limit, I am willing to participate.

    I posted what I did because I wanted the focus to be on reading the
    code and not on running the code. I was hoping that understanding
    would be feasible without needing any guessing.

    It's pretty easy to guess that the code above preceded by:

    [...]

    Looks reasonable.

    However I both can not and don't want to guess the content of your
    cores routine best_next_ctype(). And without it I don't have much to
    say about you solution.

    Here is one implementation of best_next_ctype():

    Ctype
    best_next_ctype( Index depth, Bset free, Bsets *bsets, Ctype ctypes[] ){
    // permute?
    return ctypes[depth];
    }

    At the "permute?" comment an implementation could perform any
    permutation of the elements of the ctypes[] array, within the range
    of ctypes[depth] and ctypes[COOKIE_LIMIT-1], and the result should
    still be a working solution. Or just leave it as a comment. The
    correctness of the code doesn't depend on what permutation is done
    or how the choice is made.

    The main thing is I hope the above routine does a better job of
    conveying my thoughts and understandings.

    It looks like unlike your friend you missed the key - equity of number
    of boxes and number of sorts of cookies is not accidental!

    The way the code is written it allows the value of COOKIE_LIMIT to
    be chosen independently of the value of N_BOXES (or at least I hope
    it does). I didn't explore any such cases but it might be
    interesting to do that.

    It's the
    most important thing in the whole exercise. It's what make solution
    feasible for much bigger values of N_BOXES, probably up to several K
    in less than 1 sec.

    Of course for the code posted the value of N_BOXES must be no more
    than the number of bits in the bitset type Bset (which is meant to
    be short for "Box set", or a set of box values).

    But, then again, without source code of best_next_ctype() I can not be
    sure about it.

    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do the
    same thing as your code? I was trying to match the behavior of your
    code but I wasn't able to figure out how it decided where to go
    next.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Fri Apr 17 18:30:17 2026
    From Newsgroup: comp.lang.c

    On Fri, 17 Apr 2026 07:40:13 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 16 Apr 2026 23:52:14 -0700
    Tim Rentsch <[email protected]> wrote:

    [...]

    I'd guess that every regular c.l.c reader knows that you like
    guessing games. Even if I don't think that it is appropriate here,
    up to the certain limit, I am willing to participate.

    I posted what I did because I wanted the focus to be on reading the
    code and not on running the code. I was hoping that understanding
    would be feasible without needing any guessing.

    It's pretty easy to guess that the code above preceded by:

    [...]

    Looks reasonable.

    However I both can not and don't want to guess the content of your
    cores routine best_next_ctype(). And without it I don't have much
    to say about you solution.

    Here is one implementation of best_next_ctype():

    Ctype
    best_next_ctype( Index depth, Bset free, Bsets *bsets, Ctype ctypes[]
    ){ // permute?
    return ctypes[depth];
    }

    At the "permute?" comment an implementation could perform any
    permutation of the elements of the ctypes[] array, within the range
    of ctypes[depth] and ctypes[COOKIE_LIMIT-1], and the result should
    still be a working solution. Or just leave it as a comment. The
    correctness of the code doesn't depend on what permutation is done
    or how the choice is made.

    The main thing is I hope the above routine does a better job of
    conveying my thoughts and understandings.

    It looks like unlike your friend you missed the key - equity of
    number of boxes and number of sorts of cookies is not accidental!

    The way the code is written it allows the value of COOKIE_LIMIT to
    be chosen independently of the value of N_BOXES (or at least I hope
    it does). I didn't explore any such cases but it might be
    interesting to do that.

    It's the
    most important thing in the whole exercise. It's what make solution feasible for much bigger values of N_BOXES, probably up to several K
    in less than 1 sec.

    Of course for the code posted the value of N_BOXES must be no more
    than the number of bits in the bitset type Bset (which is meant to
    be short for "Box set", or a set of box values).

    But, then again, without source code of best_next_ctype() I can not
    be sure about it.

    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do the
    same thing as your code? I was trying to match the behavior of your
    code but I wasn't able to figure out how it decided where to go
    next.

    No, as it is, it does not match the behavior of my code or of code of
    your friend. One very simple, but very important element of search state
    is missing.
    I am reasonably sure that it can not be emulated by permutation of
    ctypes array within best_next_ctype() routine.

    As to observed timing, if your code is run for big number of cases, e.g. rep1=128, rep2=9999 (small modification of test bench required to enable
    bigger rep2), it sometimes ends up very slow. My test bench is not well
    suited to find exactly how slow, but my guess is above 0.1 sec. That's
    very different from mine or your friends, where difference between
    median and worst case is small, likeley less than 3x.


    More convenient test bench that reports PRNG and can take PRNG seed as
    input:

    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h>
    #include <string.h>
    #include <time.h>

    #include "solver.h"

    static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
    uint64_t* prngs, uint64_t seed);
    int main(int argz, char** argv)
    {
    int rep1=128, rep2=11;
    unsigned long long seed = 1;
    if (argz > 1) {
    if (strcmp(argv[1], "?")== 0 ||
    strcmp(argv[1], "-?")== 0) {
    fprintf(stderr,
    "Usage:\n"
    "greg_solver_tb [rep1 [rep2]] -s=seed\n"
    "where\n"
    "rep1 - number of solver calls in one time measurement set.
    Default 128\n"
    "rep2 - number of repetitions of time measurement. Default 11\n"
    "seed - PRNG seed\n"
    );
    return 0;
    }

    static const int mx[2] = { 1e7, 10000 };
    for (int arg_i = 1, val_i = 0; arg_i < 4 && arg_i < argz; ++arg_i) {
    char* arg = argv[arg_i];
    if (strncmp(arg, "-s=", 3)== 0) {
    char* seedstr = arg+3;
    char* endp;
    seed = strtoull(seedstr, &endp, 0);
    if (endp == seedstr) {
    fprintf(stderr, "seed argument '%s' is not a number\n",
    seedstr);
    return 1;
    }
    } else {
    char* endp;
    long long n = strtoll(arg, &endp, 0);
    if (endp == arg) {
    fprintf(stderr, "argument '%s' is not a number\n", arg);
    return 1;
    }
    int n_mx = mx[val_i];
    if (n < 1 || n > n_mx) {
    fprintf(stderr, "argument '%s' out of range. Please specify
    value in range[1:%d]\n"
    , arg, n_mx);
    return 1;
    }
    if (val_i==0)
    rep1 = n;
    else
    rep2 = n;
    ++val_i;
    }
    }
    }

    void* mem =
    malloc((sizeof(boxes_t)+sizeof(peek_t)+sizeof(uint64_t))*rep1);
    if (!mem) {
    perror("malloc()");
    return 1;
    }

    uint64_t* prngs = mem;
    boxes_t* boxes = (boxes_t*)&prngs[rep1];
    peek_t* results = (peek_t*)&boxes[rep1];
    int ret = test(boxes, results, rep1, rep2, prngs, seed);
    free(mem);

    return ret;
    }

    static uint32_t my_prng(uint64_t* pState)
    {
    // we don't need very good PRNG,
    // but it has to repeatable cross-platform,
    // so C RTL rand() is not suitable
    const uint64_t PRNG_A = 2862933555777941757ull;
    const uint64_t PRNG_C = 20260329ull;
    uint64_t s = *pState*PRNG_A + PRNG_C;
    *pState = s;
    return s >> 32;
    }

    static void make_random_boxes(boxes_t* dst, uint64_t* prng)
    {
    uint8_t pool[N_BOXES*BOX_SIZE];
    for (int i = 0; i < N_BOXES; ++i)
    for (int k = 0; k < BOX_SIZE; ++k)
    pool[i*BOX_SIZE+k] = i;

    for (int i = 0; i < N_BOXES; ++i) {
    for (int k = 0; k < BOX_SIZE; ++k) {
    uint32_t rnd = my_prng(prng);
    int idx0 = i*BOX_SIZE+k;
    int idx1 = ((N_BOXES*BOX_SIZE-idx0)*(uint64_t)rnd >> 32)+idx0;
    uint8_t v0 = pool[idx0];
    uint8_t v1 = pool[idx1];
    dst->b[i][k] = v1;
    pool[idx1] = v0;
    }
    }
    }

    static bool validate(const boxes_t* bx, const peek_t* res)
    {
    bool cookies[N_BOXES]={0};
    for (int i = 0; i < N_BOXES; ++i) {
    unsigned v = res->b[i];
    if (v >= BOX_SIZE) {
    fprintf(stderr, "res[%d] = %d. Out of range!\n", i, v);
    return false;
    }
    unsigned c = bx->b[i][v];
    if (cookies[c]) {
    fprintf(stderr, "res[%d] = %d. bx[%d][%d]=%d repeated.\n"
    , i, v
    , i, v, c);
    return false;
    }
    cookies[c] = true;
    }
    return true;
    }

    static int cmp_ll(const void* pa, const void* pb) {
    long long a = *(const long long*)pa;
    long long b = *(const long long*)pb;
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
    }


    static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
    uint64_t* prngs, uint64_t seed)
    {
    uint64_t prng = seed;
    long long dt[rep2];
    long long mn, mx;
    uint64_t mn_prng, mx_prng;
    for (int it = 0; it < rep2; ++it) {
    for (int i = 0; i < rep1; ++i) {
    prngs[i] = prng;
    make_random_boxes(&boxes[i], &prng);
    }
    struct timespec t0;
    clock_gettime(CLOCK_MONOTONIC, &t0);
    for (int i = 0; i < rep1; ++i)
    results[i] = solver(boxes[i]);
    struct timespec t1;
    clock_gettime(CLOCK_MONOTONIC, &t1);
    long long delta_t = (t1.tv_sec-t0.tv_sec)*(long long)1e9+ t1.tv_nsec-t0.tv_nsec;
    dt[it] = delta_t;
    if (it == 0 || mn > delta_t) {
    mn = delta_t;
    mn_prng = prngs[0];
    }
    if (it == 0 || mx < delta_t) {
    mx = delta_t;
    mx_prng = prngs[0];
    }

    for (int i = 0; i < rep1; ++i) {
    if (!validate(&boxes[i], &results[i])) {
    fprintf(stderr,
    "Test fail at iteration %d,%d. prng %llu\n"
    ,it, i, prngs[i]);
    fprintf(stderr, "boxes:\n");
    for (int bi = 0; bi < N_BOXES; ++bi) {
    for (int k = 0; k < BOX_SIZE; ++k)
    fprintf(stderr, " %2d", boxes[i].b[bi][k]);
    fprintf(stderr, " : %d => %2d\n"
    , results[i].b[bi]
    , boxes[i].b[bi][results[i].b[bi]]);
    }
    return 1;
    }
    }
    }

    qsort(dt, rep2, sizeof(*dt), cmp_ll);
    long long med = dt[rep2/2];
    double scale = 1e-6/rep1;
    printf("o.k.\nmed=%.6f msec. min=%.6f msec, prng %llu. max=%.6f msec,
    prng %llu.\n"
    , med*scale
    , mn*scale, mn_prng
    , mx*scale, mx_prng
    );

    return 0;
    }






    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 17:58:56 2026
    From Newsgroup: comp.lang.c

    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    #include <iostream>
    #include <array>
    #include <random>
    #include <cstdint>
    #include <algorithm>
    #include <span>

    using namespace std;

    int main()
    {
    using box = array<uint8_t, 10>;
    array<box, 20> boxes;
    for( size_t bx = 0; bx < boxes.size(); ++bx )
    for( uint8_t &cookie : boxes[bx] )
    cookie = (uint8_t)bx;
    mt19937_64 mt;
    uniform_int_distribution<size_t>
    rndBox( 0, 19 ),
    rndCookie( 1, 9 );
    for( box &bx : boxes )
    {
    for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
    swap( ck, boxes[rndBox( mt )][rndCookie( mt )] );
    swap( bx[0], boxes[rndBox( mt )][0] );
    }
    for( box &bx : boxes )
    sort( bx.begin(), bx.end() );
    }
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 18:08:11 2026
    From Newsgroup: comp.lang.c

    Now the shuffling is complete:

    int main()
    {
    using box = array<uint8_t, 10>;
    array<box, 20> boxes;
    for( size_t bx = 0; bx < boxes.size(); ++bx )
    for( uint8_t &cookie : boxes[bx] )
    cookie = (uint8_t)bx;
    mt19937_64 mt;
    uniform_int_distribution<size_t>
    rndBox( 0, 19 ),
    rndCookie19( 1, 9 ),
    rndCookie09( 0, 9 );
    for( box &bx : boxes )
    {
    for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
    swap( ck, boxes[rndBox( mt )][rndCookie19( mt )] );
    for( uint8_t &ck : bx )
    swap( ck, bx[rndCookie09( mt )] );
    }
    }
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bart@[email protected] to comp.lang.c on Fri Apr 17 17:47:48 2026
    From Newsgroup: comp.lang.c

    On 17/04/2026 16:58, Bonita Montero wrote:
    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    That's not possible. There are 20 types of cookie, but you can only have
    10 per box. Those might all be of the same kind.

    (I took an array of 200 cookies (10 times one instance of each) and
    randomly shuffled, then stored those in the boxes. Or you can populate
    the boxes with an ordered selection then shuffle in situ.)


    #include <iostream>
    #include <array>
    #include <random>
    #include <cstdint>
    #include <algorithm>
    #include <span>

    using namespace std;

    int main()
    {
        using box = array<uint8_t, 10>;
        array<box, 20> boxes;
        for( size_t bx = 0; bx < boxes.size(); ++bx )
            for( uint8_t &cookie : boxes[bx] )
                cookie = (uint8_t)bx;
        mt19937_64 mt;
        uniform_int_distribution<size_t>
            rndBox( 0, 19 ),
            rndCookie( 1, 9 );
        for( box &bx : boxes )
        {
            for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
                swap( ck, boxes[rndBox( mt )][rndCookie( mt )] );
            swap( bx[0], boxes[rndBox( mt )][0] );
        }
        for( box &bx : boxes )
            sort( bx.begin(), bx.end() );
    }

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 19:24:01 2026
    From Newsgroup: comp.lang.c

    Am 17.04.2026 um 18:47 schrieb Bart:
    On 17/04/2026 16:58, Bonita Montero wrote:
    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    That's not possible. There are 20 types of cookie, but you can only have
    10 per box. Those might all be of the same kind.

    _At least one_ is possible, but not all kinds.

    There was a subtle bug with my latest code.
    Here's the correct shuffling:

    #include <iostream>
    #include <array>
    #include <random>
    #include <cstdint>
    #include <algorithm>
    #include <span>

    using namespace std;

    int main()
    {
    constexpr size_t
    Boxes = 20,
    Cookies = 10;
    using box = array<uint8_t, Cookies>;
    array<box, Boxes> boxes;
    for( size_t flavour = 0; flavour < Boxes; ++flavour )
    fill( boxes[flavour].begin(), boxes[flavour].end(), (uint8_t)flavour );
    mt19937_64 mt;
    uniform_int_distribution<size_t>
    rndBox( 0, Boxes - 1 ),
    rndInBoxX1( 1, Cookies - 1 ),
    rndInBox( 0, Cookies - 1 );
    for( box &bx : boxes )
    for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
    swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
    for( box &bx : boxes )
    for( uint8_t &cookie : bx )
    swap( cookie, bx[rndInBox( mt )] );
    }
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 19:52:35 2026
    From Newsgroup: comp.lang.c

    I guess that works. Coding style is perfect for my taste.

    #include <iostream>
    #include <array>
    #include <random>
    #include <cstdint>
    #include <algorithm>
    #include <span>
    #include <iomanip>

    using namespace std;

    int main()
    {
    constexpr size_t
    Boxes = 20,
    Cookies = 10;
    using box = array<uint8_t, Cookies>;
    array<box, Boxes> boxes;
    for( size_t iFlavour = 0; iFlavour < Boxes; ++iFlavour )
    fill( boxes[iFlavour].begin(), boxes[iFlavour].end(), (uint8_t)iFlavour );
    mt19937_64 mt( (random_device())() );
    uniform_int_distribution<size_t>
    rndBox( 0, Boxes - 1 ),
    rndInBoxX1( 1, Cookies - 1 ),
    rndInBox( 0, Cookies - 1 );
    for( box &bx : boxes )
    for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
    swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
    for( box &bx : boxes )
    for( uint8_t &cookie : bx )
    swap( cookie, bx[rndInBox( mt )] );
    for( box &bx : boxes )
    swap( bx, boxes[rndBox( mt )] );
    for( size_t iBx = 0; iBx < Boxes; ++iBx )
    {
    auto attrs = []( auto &prnt ) -> auto & { return prnt << setw( 2 ) <<
    setfill( ' ' ) << right; };
    attrs( cout << "box " ) << (int)iBx << ": ";
    box &bx = boxes[iBx];
    for( uint8_t cookie : span( bx.begin(), bx.end() - 1 ) )
    attrs( cout ) << (int)cookie << ", ";
    attrs( cout ) << (int)bx[Cookies - 1] << endl;
    }
    cout << endl;
    for( size_t iFlavour = 0; iFlavour < Boxes; ++iFlavour )
    for( size_t iBox = 0; iBox < Boxes; ++iBox )
    for( size_t iCookie = 0; iCookie < Cookies; ++iCookie )
    if( boxes[iBox][iCookie] == iFlavour )
    {
    cout << "flavour " << iFlavour << " is at place " << iCookie << "
    in box " << iBox << endl;
    iCookie = Cookies;
    iBox = Boxes;
    }
    }
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 20:17:23 2026
    From Newsgroup: comp.lang.c

    There was still a bug in the before code: Cookies could be taken
    from a box more than once. I guess this code is perfect now:

    #include <iostream>
    #include <array>
    #include <random>
    #include <cstdint>
    #include <algorithm>
    #include <span>
    #include <iomanip>

    using namespace std;

    int main()
    {
    constexpr size_t
    Flavours = 20,
    Cookies = 10;
    using box = array<uint8_t, Cookies>;
    array<box, Flavours> boxes;
    for( size_t iFlavour = 0; iFlavour < Flavours; ++iFlavour )
    fill( boxes[iFlavour].begin(), boxes[iFlavour].end(), (uint8_t)iFlavour );
    mt19937_64 mt( (random_device())() );
    uniform_int_distribution<size_t>
    rndBox( 0, Flavours - 1 ),
    rndInBoxX1( 1, Cookies - 1 ),
    rndInBox( 0, Cookies - 1 );
    for( box &bx : boxes )
    for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
    swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
    for( box &bx : boxes )
    for( uint8_t &cookie : bx )
    swap( cookie, bx[rndInBox( mt )] );
    for( box &bx : boxes )
    swap( bx, boxes[rndBox( mt )] );
    for( size_t iBx = 0; iBx < Flavours; ++iBx )
    {
    auto attrs = []( auto &prnt ) -> auto & { return prnt << setw( 2 ) <<
    setfill( ' ' ) << right; };
    attrs( cout << "box " ) << (int)iBx << ": ";
    box &bx = boxes[iBx];
    for( uint8_t cookie : span( bx.begin(), bx.end() - 1 ) )
    attrs( cout ) << (int)cookie << ", ";
    attrs( cout ) << (int)bx[Cookies - 1] << endl;
    }
    cout << endl;
    array<bool, Flavours> used { false };
    for( size_t iFlavour = 0; iFlavour < Flavours; ++iFlavour )
    for( size_t iBox = 0; iBox < Flavours; ++iBox )
    if( !used[iBox] )
    for( size_t iCookie = 0; iCookie < Cookies; ++iCookie )
    if( boxes[iBox][iCookie] == iFlavour )
    {
    cout << "flavour " << iFlavour << " is at place " << iCookie << "
    in box " << iBox << endl;
    used[iBox] = true;
    iCookie = Cookies;
    iBox = Flavours;
    }
    }
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@[email protected] (Scott Lurndal) to comp.lang.c on Fri Apr 17 19:58:28 2026
    From Newsgroup: comp.lang.c

    Bonita Montero <[email protected]> writes:
    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    No, because this is comp.lang.c, not comp.lang.c++.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@[email protected] to comp.lang.c on Sat Apr 18 05:52:46 2026
    From Newsgroup: comp.lang.c

    Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

    Bonita Montero <[email protected]> writes:

    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    No, because this is comp.lang.c, not comp.lang.c++.

    "The solution does not have to be in C, ..."

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bart@[email protected] to comp.lang.c on Sat Apr 18 11:53:30 2026
    From Newsgroup: comp.lang.c

    On 18/04/2026 04:52, Bonita Montero wrote:
    Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

    Bonita Montero <[email protected]> writes:

    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    No, because this is comp.lang.c, not comp.lang.c++.

    "The solution does not have to be in C, ..."


    OK, here's one in my scripting language:

    proc main=
    names := splitstring("Martin Hohenberg, ... Andres Cordova", ", ")
    for i, x in names do
    names[i] := splitstring(x, " ")
    end

    isort(names, {a, b: convlc(a[$]) > convlc(b[$])})

    for i, x in names do
    println i, joinstrings(x, " ")
    end
    end

    Its output is shown below. The parsing is not thorough, for examples it expects exactly one comma and one space between names, but it works for
    the OP's test data.

    If the string data is excluded, my version is 245 characters, compared
    with 1589 for your C++. Compilation time is 0.0 seconds.

    1 Paul Abbott
    2 Neil Anuskiewicz
    3 Masa Bando
    4 Tim Bando
    5 George Brower
    6 Kyle Burkholder
    7 John Carmack
    8 Dale Carstensen
    9 Jonathan Cast
    10 Christopher Chang
    11 Gratiela Chergu
    12 William Christensen
    13 Michael Ciagala
    14 Andres Cordova
    15 James Cronin
    16 Nodarius Darjania
    17 darrin
    18 Rod Davenport
    19 Chip Davis
    20 Killer Delicious
    21 Sven Dowideit
    22 Steven Evans
    23 Cheryl Evry
    24 Daniel Garber
    25 Mark Gardner
    26 Donald Greer
    27 Hal Hildebrand
    28 Martin Hohenberg
    29 David L. Jessup
    30 Liudmilla Karukina
    31 Ken Kennedy
    32 Ken LaCrosse
    33 Clemens Ladisch
    34 Ron Lauzon
    35 Wenfeng Liang
    36 Brendan Long
    37 Jacob Lyles
    38 Jim McCloskey
    39 Mordant
    40 Mike Nichols
    41 Michael Nygard
    42 Malcolm Ocean
    43 Doug Phillips
    44 Mark Ping
    45 Matt Pollard
    46 ReallyBored
    47 Irving Rivas
    48 Jan Roudaut
    49 Dewey Sasser
    50 John Simpson
    51 Bill Soistman
    52 Hsueh Sung
    53 taishi28012
    54 Jane Tang
    55 Tom Taylor
    56 TheFred
    57 Paul Theodoropolous
    58 Jerod Tufte
    59 Les Vogel
    60 Xingyu Wang
    61 Arnold F. Williams
    62 Stan Witherspoon
    63 Dave Witten
    64 Connor Wood
    65 Wojciech Woytniak
    66 Jae Yang
    67 Άρης Δάσος


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bart@[email protected] to comp.lang.c on Sat Apr 18 12:04:08 2026
    From Newsgroup: comp.lang.c

    On 18/04/2026 11:53, Bart wrote:
    On 18/04/2026 04:52, Bonita Montero wrote:
    Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

    Bonita Montero <[email protected]> writes:

    Is this a correct way to randomize the boxes so that there's at least
    one cookie of each kind per box ?

    No, because this is comp.lang.c, not comp.lang.c++.

    "The solution does not have to be in C, ..."


    OK, here's one in my scripting language:

    Um, this is for DFS's sorting challenge, not Michael S's cookie puzzle.

    DFS I think specified a solution in C.

    So both my and your versions aren't allowed.

    (I'm not going to do a C version; it's far too much fiddly work.

    For a real application, I would look at creating a suite of functions to
    help out, but it's not worth doing that here.)

    If the string data is excluded, my version is 245 characters, compared
    with 1589 for your C++. Compilation time is 0.0 seconds.

    That is, the C++ for the name sorting program.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Sat Apr 18 20:35:12 2026
    From Newsgroup: comp.lang.c

    On Sat, 18 Apr 2026 05:52:46 +0200
    Bonita Montero <[email protected]> wrote:

    Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

    Bonita Montero <[email protected]> writes:

    Is this a correct way to randomize the boxes so that there's at
    least one cookie of each kind per box ?

    No, because this is comp.lang.c, not comp.lang.c++.

    "The solution does not have to be in C, ..."


    The solution, i.e. implementation of the function solver(), does not
    have to be in C. But it has to be pluggable into specified test bench,
    which is coded in C.
    It does not sound as you're interested in finding solution of the
    challenge.
    On my part, I am not interested in your attempts to provide an
    alternative test bench.

    In unlikely case that I am wrong about your intentions, below provided
    variant of test bench that can be compiled both as C and as C++.


    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h>
    #include <string.h>
    #include <time.h>

    #include "solver.h"

    static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
    uint64_t* prngs, uint64_t seed);
    int main(int argz, char** argv)
    {
    int rep1=128, rep2=11;
    unsigned long long seed = 1;
    if (argz > 1) {
    if (strcmp(argv[1], "?")== 0 ||
    strcmp(argv[1], "-?")== 0) {
    fprintf(stderr,
    "Usage:\n"
    "greg_solver_tb [rep1 [rep2]] -s=seed\n"
    "where\n"
    "rep1 - number of solver calls in one time measurement set.
    Default 128\n" "rep2 - number of repetitions of time measurement.
    Default 11\n" "seed - PRNG seed\n"
    );
    return 0;
    }

    static const int mx[2] = { (int)1e7, 10000 };
    for (int arg_i = 1, val_i = 0; arg_i < 4 && arg_i < argz; ++arg_i) {
    char* arg = argv[arg_i];
    if (strncmp(arg, "-s=", 3)== 0) {
    char* seedstr = arg+3;
    char* endp;
    seed = strtoull(seedstr, &endp, 0);
    if (endp == seedstr) {
    fprintf(stderr, "seed argument '%s' is not a number\n",
    seedstr); return 1;
    }
    } else {
    char* endp;
    long long n = strtoll(arg, &endp, 0);
    if (endp == arg) {
    fprintf(stderr, "argument '%s' is not a number\n", arg);
    return 1;
    }
    int n_mx = mx[val_i];
    if (n < 1 || n > n_mx) {
    fprintf(stderr, "argument '%s' out of range. Please specify
    value in range[1:%d]\n" , arg, n_mx);
    return 1;
    }
    if (val_i==0)
    rep1 = n;
    else
    rep2 = n;
    ++val_i;
    }
    }
    }

    void* mem =
    malloc((sizeof(boxes_t)+sizeof(peek_t)+sizeof(uint64_t))*rep1); if
    (!mem) { perror("malloc()");
    return 1;
    }

    uint64_t* prngs = (uint64_t*)mem;
    boxes_t* boxes = (boxes_t*)&prngs[rep1];
    peek_t* results = (peek_t*)&boxes[rep1];
    int ret = test(boxes, results, rep1, rep2, prngs, seed);
    free(mem);

    return ret;
    }

    static uint32_t my_prng(uint64_t* pState)
    {
    // we don't need very good PRNG,
    // but it has to repeatable cross-platform,
    // so C RTL rand() is not suitable
    const uint64_t PRNG_A = 2862933555777941757ull;
    const uint64_t PRNG_C = 20260329ull;
    uint64_t s = *pState*PRNG_A + PRNG_C;
    *pState = s;
    return s >> 32;
    }

    static void make_random_boxes(boxes_t* dst, uint64_t* prng)
    {
    uint8_t pool[N_BOXES*BOX_SIZE];
    for (int i = 0; i < N_BOXES; ++i)
    for (int k = 0; k < BOX_SIZE; ++k)
    pool[i*BOX_SIZE+k] = i;

    for (int i = 0; i < N_BOXES; ++i) {
    for (int k = 0; k < BOX_SIZE; ++k) {
    uint32_t rnd = my_prng(prng);
    int idx0 = i*BOX_SIZE+k;
    int idx1 = ((N_BOXES*BOX_SIZE-idx0)*(uint64_t)rnd >> 32)+idx0;
    uint8_t v0 = pool[idx0];
    uint8_t v1 = pool[idx1];
    dst->b[i][k] = v1;
    pool[idx1] = v0;
    }
    }
    }

    static bool validate(const boxes_t* bx, const peek_t* res)
    {
    bool cookies[N_BOXES]={0};
    for (int i = 0; i < N_BOXES; ++i) {
    unsigned v = res->b[i];
    if (v >= BOX_SIZE) {
    fprintf(stderr, "res[%d] = %d. Out of range!\n", i, v);
    return false;
    }
    unsigned c = bx->b[i][v];
    if (cookies[c]) {
    fprintf(stderr, "res[%d] = %d. bx[%d][%d]=%d repeated.\n"
    , i, v
    , i, v, c);
    return false;
    }
    cookies[c] = true;
    }
    return true;
    }

    static int cmp_ll(const void* pa, const void* pb) {
    long long a = *(const long long*)pa;
    long long b = *(const long long*)pb;
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
    }


    static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
    uint64_t* prngs, uint64_t seed)
    {
    uint64_t prng = seed;
    long long dt[rep2];
    long long mn, mx;
    uint64_t mn_prng, mx_prng;
    for (int it = 0; it < rep2; ++it) {
    for (int i = 0; i < rep1; ++i) {
    prngs[i] = prng;
    make_random_boxes(&boxes[i], &prng);
    }
    struct timespec t0;
    clock_gettime(CLOCK_MONOTONIC, &t0);
    for (int i = 0; i < rep1; ++i)
    results[i] = solver(boxes[i]);
    struct timespec t1;
    clock_gettime(CLOCK_MONOTONIC, &t1);
    long long delta_t = (t1.tv_sec-t0.tv_sec)*(long long)1e9+ t1.tv_nsec-t0.tv_nsec; dt[it] = delta_t;
    if (it == 0 || mn > delta_t) {
    mn = delta_t;
    mn_prng = prngs[0];
    }
    if (it == 0 || mx < delta_t) {
    mx = delta_t;
    mx_prng = prngs[0];
    }

    for (int i = 0; i < rep1; ++i) {
    if (!validate(&boxes[i], &results[i])) {
    fprintf(stderr,
    "Test fail at iteration %d,%d. prng %llu\n"
    ,it, i, prngs[i]);
    fprintf(stderr, "boxes:\n");
    for (int bi = 0; bi < N_BOXES; ++bi) {
    for (int k = 0; k < BOX_SIZE; ++k)
    fprintf(stderr, " %2d", boxes[i].b[bi][k]);
    fprintf(stderr, " : %d => %2d\n"
    , results[i].b[bi]
    , boxes[i].b[bi][results[i].b[bi]]);
    }
    return 1;
    }
    }
    }

    qsort(dt, rep2, sizeof(*dt), cmp_ll);
    long long med = dt[rep2/2];
    double scale = 1e-6/rep1;
    printf("o.k.\nmed=%.6f msec. min=%.6f msec, prng %llu. max=%.6f msec,
    prng %llu.\n" , med*scale
    , mn*scale, mn_prng
    , mx*scale, mx_prng
    );

    return 0;
    }

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Mon Apr 20 00:43:18 2026
    From Newsgroup: comp.lang.c

    On Fri, 17 Apr 2026 07:40:13 -0700
    Tim Rentsch <[email protected]> wrote:


    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do the
    same thing as your code? I was trying to match the behavior of your
    code but I wasn't able to figure out how it decided where to go
    next.

    Here is the simplest form of the algorithm that I can come with.
    Essentially, it is a serial non-recursive variant of the code of your
    friend with no algorithmic or technical enhancements.
    Even in this form it is not slow and hopefully makes the key idea more
    obvious.


    #include <stdint.h>
    #include <stdbool.h>
    #include "solver.h"

    peek_t
    solver( boxes_t boxes ){
    uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
    for (int i = 0; i < N_BOXES; i++)
    box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

    // Selection proceeds in progressive steps.
    // At each step we select one new box and one new sort.
    // A box and a sort selected at step i are never unselected later
    // although association between selected boxes and sorts
    // can be changed
    for (int step = 0; step < N_BOXES; ++step) {
    typedef struct {
    uint8_t box, sort;
    } trace_t;
    trace_t levels[N_BOXES];
    bool seen[N_BOXES] = {0};
    for (int lvl = 0, box = step; ;) {
    // look for unseen sort in box
    int sort;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    sort = boxes.b[box][ci];
    if (!seen[sort])
    break;
    }
    if (seen[sort]) { // no unseen sorts in box
    box = levels[--lvl].box; // return to previous level
    continue;
    }
    // Unseen sort found
    // It will become a new selection for box
    int selected_box = box_of_cookie[sort];
    if (selected_box == N_BOXES) {
    // sort unselected.
    // it means that search at step i succeed
    // commit preserved assignments
    for (int k = 0; k < lvl; ++k)
    box_of_cookie[levels[k].sort] = levels[k].box;
    // assign new sort
    box_of_cookie[sort] = box;
    break;
    }

    // sort already selected
    // Save new assignment for sort.
    // Saved box also serves us if we return
    // to the current level from above
    levels[lvl++] = (trace_t){ .sort=sort, .box=box};
    // Mark sort as seen.
    // That mark guarantees forward progress of the search
    seen[sort] = true;
    // Continue search.
    // At the next level we will look for new selection
    // for old box of sort
    box = selected_box;
    }
    }

    // translate result from box_of_cookie[] to index of cookie in box
    peek_t res;
    for (int c = 0; c < N_BOXES; ++c) {
    int b = box_of_cookie[c];
    for (int k = 0; k < BOX_SIZE; ++k) {
    if (boxes.b[b][k] == c) {
    res.b[b] = k;
    break;
    }
    }
    }
    return res;
    }



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Tue Apr 21 20:37:28 2026
    From Newsgroup: comp.lang.c

    On Mon, 20 Apr 2026 00:43:18 +0300
    Michael S <[email protected]> wrote:

    On Fri, 17 Apr 2026 07:40:13 -0700
    Tim Rentsch <[email protected]> wrote:


    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do the
    same thing as your code? I was trying to match the behavior of your
    code but I wasn't able to figure out how it decided where to go
    next.

    Here is the simplest form of the algorithm that I can come with.
    Essentially, it is a serial non-recursive variant of the code of your
    friend with no algorithmic or technical enhancements.
    Even in this form it is not slow and hopefully makes the key idea more obvious.


    #include <stdint.h>
    #include <stdbool.h>
    #include "solver.h"

    peek_t
    solver( boxes_t boxes ){
    uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
    for (int i = 0; i < N_BOXES; i++)
    box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

    // Selection proceeds in progressive steps.
    // At each step we select one new box and one new sort.
    // A box and a sort selected at step i are never unselected later
    // although association between selected boxes and sorts
    // can be changed
    for (int step = 0; step < N_BOXES; ++step) {
    typedef struct {
    uint8_t box, sort;
    } trace_t;
    trace_t levels[N_BOXES];
    bool seen[N_BOXES] = {0};
    for (int lvl = 0, box = step; ;) {
    // look for unseen sort in box
    int sort;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    sort = boxes.b[box][ci];
    if (!seen[sort])
    break;
    }
    if (seen[sort]) { // no unseen sorts in box
    box = levels[--lvl].box; // return to previous level
    continue;
    }
    // Unseen sort found
    // It will become a new selection for box
    int selected_box = box_of_cookie[sort];
    if (selected_box == N_BOXES) {
    // sort unselected.
    // it means that search at step i succeed
    // commit preserved assignments
    for (int k = 0; k < lvl; ++k)
    box_of_cookie[levels[k].sort] = levels[k].box;
    // assign new sort
    box_of_cookie[sort] = box;
    break;
    }

    // sort already selected
    // Save new assignment for sort.
    // Saved box also serves us if we return
    // to the current level from above
    levels[lvl++] = (trace_t){ .sort=sort, .box=box};
    // Mark sort as seen.
    // That mark guarantees forward progress of the search
    seen[sort] = true;
    // Continue search.
    // At the next level we will look for new selection
    // for old box of sort
    box = selected_box;
    }
    }

    // translate result from box_of_cookie[] to index of cookie in box
    peek_t res;
    for (int c = 0; c < N_BOXES; ++c) {
    int b = box_of_cookie[c];
    for (int k = 0; k < BOX_SIZE; ++k) {
    if (boxes.b[b][k] == c) {
    res.b[b] = k;
    break;
    }
    }
    }
    return res;
    }




    And here is somewhat less simple variant. It is not necessarily easy to
    follow by itself, but if one studied the code above then enhancements
    made here are rather obvious.
    What is interesting about it is that despite absence of 'parallel'
    tricks this variant is the fastest that I measured so far!

    #include <stdint.h>
    #include <stdbool.h>
    #include "solver.h"

    peek_t
    solver( boxes_t boxes ){
    uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
    for (int i = 0; i < N_BOXES; i++)
    box_of_cookie[i]=N_BOXES;// value N_BOXES marks unselected sort

    uint8_t ci_min[N_BOXES];
    // all cookies [0:ci_min[i]-] in box i are selected
    peek_t res;
    for(int i = 0; i < N_BOXES; ++i) {
    typedef struct {
    uint8_t box, ci;
    } trace_t;
    trace_t levels[N_BOXES];
    bool sort_seen[N_BOXES] = {0};
    ci_min[i] = 0;
    for (int lvl = 0, box = i; ;) {
    if (ci_min[box] != BOX_SIZE) {
    // look for unselected sort in the box
    bool done = false;
    for (int ci = ci_min[box]; ci < BOX_SIZE; ++ci) {
    int sort = boxes.b[box][ci];
    if (box_of_cookie[sort] == N_BOXES) {
    // sort unselected.
    // It means that search at step i succeed
    // Assign new sort
    res.b[box] = ci;
    box_of_cookie[sort] = box;
    // set new high water mark
    ci_min[box] = ci + 1;
    // commit preserved assignments
    for (int k = 0; k < lvl; ++k) {
    int box = levels[k].box;
    int ci = levels[k].ci;
    res.b[box] = ci;
    box_of_cookie[boxes.b[box][ci]] = box;
    }
    done = true;
    break;
    }
    }
    if (done)
    break;
    ci_min[box] = BOX_SIZE;
    }
    // all cookies in the box belong to selected sorts

    // look for unseen sort in box
    bool back = true;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    int sort = boxes.b[box][ci];
    if (!sort_seen[sort]) {
    // Unseen sort found
    // If current branch of search succeed then
    // it will become a new selection for box
    // Save new selection for sort.
    // Saved box also serves us
    // if we return to the current level from above
    levels[lvl++] = (trace_t){ .ci=ci, .box=box};
    // Mark sort as seen.
    // That mark guarantees forward progress of the search
    sort_seen[sort] = true;
    // Continue the search.
    // Look for new selection for the old box of sort
    box = box_of_cookie[sort];
    back = false;
    break;
    }
    }
    if (back) // no unseen sorts in box
    box = levels[--lvl].box; // return to previous level
    }
    }
    return res;
    }








    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@[email protected] to comp.lang.c on Tue Apr 21 18:22:34 2026
    From Newsgroup: comp.lang.c

    Michael S <[email protected]> writes:

    On Fri, 17 Apr 2026 07:40:13 -0700
    Tim Rentsch <[email protected]> wrote:

    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do the
    same thing as your code? I was trying to match the behavior of your
    code but I wasn't able to figure out how it decided where to go
    next.

    Here is the simplest form of the algorithm that I can come with.
    Essentially, it is a serial non-recursive variant of the code of your
    friend with no algorithmic or technical enhancements.
    Even in this form it is not slow and hopefully makes the key idea more obvious.

    I have difficulty reading code written in the style used below. I
    might describe it broadly as too many trees and not enough forest.
    That comment is not meant as criticism but given just for
    informational value.

    My hope has been that my explanations, given first in prose and then
    in code, would let you understand how I approached the problem, so
    you could see how your approach differs from mine and then explain
    what the differences are. As things stand I don't know what are the
    common points of reference, or even whether there are any important
    ones.

    Having said that, let me ask this. What is your goal? I find
    myself wondering if we are talking past each other.


    #include <stdint.h>
    #include <stdbool.h>
    #include "solver.h"

    peek_t
    solver( boxes_t boxes ){
    uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
    for (int i = 0; i < N_BOXES; i++)
    box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

    // Selection proceeds in progressive steps.
    // At each step we select one new box and one new sort.
    // A box and a sort selected at step i are never unselected later
    // although association between selected boxes and sorts
    // can be changed
    for (int step = 0; step < N_BOXES; ++step) {
    typedef struct {
    uint8_t box, sort;
    } trace_t;
    trace_t levels[N_BOXES];
    bool seen[N_BOXES] = {0};
    for (int lvl = 0, box = step; ;) {
    // look for unseen sort in box
    int sort;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    sort = boxes.b[box][ci];
    if (!seen[sort])
    break;
    }
    if (seen[sort]) { // no unseen sorts in box
    box = levels[--lvl].box; // return to previous level
    continue;
    }
    // Unseen sort found
    // It will become a new selection for box
    int selected_box = box_of_cookie[sort];
    if (selected_box == N_BOXES) {
    // sort unselected.
    // it means that search at step i succeed
    // commit preserved assignments
    for (int k = 0; k < lvl; ++k)
    box_of_cookie[levels[k].sort] = levels[k].box;
    // assign new sort
    box_of_cookie[sort] = box;
    break;
    }

    // sort already selected
    // Save new assignment for sort.
    // Saved box also serves us if we return
    // to the current level from above
    levels[lvl++] = (trace_t){ .sort=sort, .box=box};
    // Mark sort as seen.
    // That mark guarantees forward progress of the search
    seen[sort] = true;
    // Continue search.
    // At the next level we will look for new selection
    // for old box of sort
    box = selected_box;
    }
    }

    // translate result from box_of_cookie[] to index of cookie in box
    peek_t res;
    for (int c = 0; c < N_BOXES; ++c) {
    int b = box_of_cookie[c];
    for (int k = 0; k < BOX_SIZE; ++k) {
    if (boxes.b[b][k] == c) {
    res.b[b] = k;
    break;
    }
    }
    }
    return res;
    }

    Another question: is the above typical of how you usually write
    code, or are you writing it like that for my benefit?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@[email protected] to comp.lang.c on Wed Apr 22 12:05:30 2026
    From Newsgroup: comp.lang.c

    On Tue, 21 Apr 2026 18:22:34 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Fri, 17 Apr 2026 07:40:13 -0700
    Tim Rentsch <[email protected]> wrote:

    My questions are these. One, could you understand what the code is
    doing and how it works? And two, if given an appropriate choice of
    what permutation is done in best_next_ctype(), does this code do
    the same thing as your code? I was trying to match the behavior
    of your code but I wasn't able to figure out how it decided where
    to go next.

    Here is the simplest form of the algorithm that I can come with. Essentially, it is a serial non-recursive variant of the code of
    your friend with no algorithmic or technical enhancements.
    Even in this form it is not slow and hopefully makes the key idea
    more obvious.

    I have difficulty reading code written in the style used below. I
    might describe it broadly as too many trees and not enough forest.
    That comment is not meant as criticism but given just for
    informational value.

    My hope has been that my explanations, given first in prose and then
    in code, would let you understand how I approached the problem, so
    you could see how your approach differs from mine and then explain
    what the differences are. As things stand I don't know what are the
    common points of reference, or even whether there are any important
    ones.

    Having said that, let me ask this. What is your goal? I find
    myself wondering if we are talking past each other.


    My general goal is to show that while this challenge may look like
    variant of travelling salesman problem, it actually is not.
    It is not NP complete and has many solutions with mean complexity of
    little above M*N and worst case complexity still below M*N*N.
    My 2nd goal is to outline common characteristics of various robust
    solutions ignoring as much as possible technical details:
    - N progressive steps
    - within each step, forward progress due to recording of failed
    attempts ('seen' bit array).


    #include <stdint.h>
    #include <stdbool.h>
    #include "solver.h"

    peek_t
    solver( boxes_t boxes ){
    uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
    for (int i = 0; i < N_BOXES; i++)
    box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected
    sort

    // Selection proceeds in progressive steps.
    // At each step we select one new box and one new sort.
    // A box and a sort selected at step i are never unselected later
    // although association between selected boxes and sorts
    // can be changed
    for (int step = 0; step < N_BOXES; ++step) {
    typedef struct {
    uint8_t box, sort;
    } trace_t;
    trace_t levels[N_BOXES];
    bool seen[N_BOXES] = {0};
    for (int lvl = 0, box = step; ;) {
    // look for unseen sort in box
    int sort;
    for (int ci = 0; ci < BOX_SIZE; ++ci) {
    sort = boxes.b[box][ci];
    if (!seen[sort])
    break;
    }
    if (seen[sort]) { // no unseen sorts in box
    box = levels[--lvl].box; // return to previous level
    continue;
    }
    // Unseen sort found
    // It will become a new selection for box
    int selected_box = box_of_cookie[sort];
    if (selected_box == N_BOXES) {
    // sort unselected.
    // it means that search at step i succeed
    // commit preserved assignments
    for (int k = 0; k < lvl; ++k)
    box_of_cookie[levels[k].sort] = levels[k].box;
    // assign new sort
    box_of_cookie[sort] = box;
    break;
    }

    // sort already selected
    // Save new assignment for sort.
    // Saved box also serves us if we return
    // to the current level from above
    levels[lvl++] = (trace_t){ .sort=sort, .box=box};
    // Mark sort as seen.
    // That mark guarantees forward progress of the search
    seen[sort] = true;
    // Continue search.
    // At the next level we will look for new selection
    // for old box of sort
    box = selected_box;
    }
    }

    // translate result from box_of_cookie[] to index of cookie in box
    peek_t res;
    for (int c = 0; c < N_BOXES; ++c) {
    int b = box_of_cookie[c];
    for (int k = 0; k < BOX_SIZE; ++k) {
    if (boxes.b[b][k] == c) {
    res.b[b] = k;
    break;
    }
    }
    }
    return res;
    }

    Another question: is the above typical of how you usually write
    code, or are you writing it like that for my benefit?

    Do you mean, relatively long comments distributed through the code vs concentrated at the beginning?
    I don't know, because I don't try to remember such things. I'd have to
    read a sizable corpus of my own production code in order to give an
    answer.


    --- Synchronet 3.21f-Linux NewsLink 1.2