Forum: War Ensemble BBS

Re: Cookies in boxes - algorithmic challenge

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 16:37:30 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Wed, 1 Apr 2026 16:34:47 +0300
Michael S <[email protected]> wrote:

<snip>

My implementation #1.
It is based on the proof provided by somebody with abstract math
attitude.
He said that his proof is something that the 5th grade schoolchildren
could find with in 5 minutes.
I am no 5th pupil any longer. Even understanding of his proof took
me significantly longer than 5 minutes :(
Here is my attempt of translation (original proof was not in English):

"Solution would be done by induction by number of cookies of each sort
(the same for all sorts).
For K=2 the solution is simple. Please figure it out yourself.
Now let's take K > 2 and postulate that for numbers of cookies per box
that are smaller than K the assertion already proven.
Let's take some disposition for which the assertion is correct and
remove the corresponding "thread" (i.e. one cookie from each box, all
of different sorts). Then supposition of induction is correct for
remaining disposition, which means that we can remove yet another
thread, then yet another etc.. for a total of K threads. That is,
for K>2 there are more than 2 threads.
Now let's exchange places between any 2 two cookies from different
boxes. Since by doing so we are touching 1 or 2 threads then there
still be at least one thread untouched (at least k-2 threads, actually [M.S.]). It means that the property "we can remove a thread" is
not disturbed by our action (i.e. by exchange of two cookies).
What is left it is to realize that by means of such exchanges any
disposition can be mutated from any other disposition.
That's all."

I am confident that my understanding of this description, if indeed
it ever occurs, would take a good deal longer than five minutes.
Sadly I think it is representative of many of the explanations
offered by mathematical types.

And here is a code. It is not very fast. But the speed is almost the
same on average and in the worst case.

#include <stdint.h>
#include <stdbool.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint8_t na[N_BOXES] = {0};
typedef struct { uint8_t s,d; } xch_t;
xch_t xch[N_BOXES*BOX_SIZE];
int i = 0;
for (int src_bi = 0; src_bi < N_BOXES; ++src_bi) {
for (int src_ci = na[src_bi]; src_ci < BOX_SIZE; ++src_ci) {
uint8_t dst_bi = boxes.b[src_bi][src_ci];
while (dst_bi != src_bi) {
uint8_t dst_ci = na[dst_bi];
uint8_t dst_v = boxes.b[dst_bi][dst_ci];
na[dst_bi] = dst_ci+1;
xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
dst_bi = dst_v;
}
xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
}
na[src_bi] = BOX_SIZE;
}

for (int bi = 0; bi < N_BOXES; ++bi)
for (int ci = 0; ci < BOX_SIZE; ++ci)
boxes.b[bi][ci] = bi;

uint8_t tx[N_BOXES][BOX_SIZE];
for (int bi = 0; bi < N_BOXES; ++bi)
for (int ci = 0; ci < BOX_SIZE; ++ci)
tx[bi][ci] = ci;

peek_t threads[BOX_SIZE];
for (int ci = 0; ci < BOX_SIZE; ++ci)
for (int bi = 0; bi < N_BOXES; ++bi)
threads[ci].b[bi] = ci;

for (int i = N_BOXES*BOX_SIZE-1; i >= 0; --i) {
uint8_t src_bi = xch[i].s;
uint8_t dst_bi = xch[i].d;
uint8_t dst_ci = na[dst_bi] - 1;
na[dst_bi] = dst_ci;
if (src_bi != dst_bi) {
uint8_t src_ci = na[src_bi];
uint8_t src_v = boxes.b[src_bi][src_ci];
if (src_v != dst_bi) {
uint8_t src_t = tx[src_bi][src_ci];
uint8_t dst_t = tx[dst_bi][dst_ci];
if (src_t != dst_t) {
// 2 threads broken. Fix
uint8_t v2bi[N_BOXES];
for (int bi = 0; bi < N_BOXES; ++bi)
v2bi[boxes.b[bi][threads[src_t].b[bi]]] = bi;
boxes.b[src_bi][src_ci] = dst_bi;
boxes.b[dst_bi][dst_ci] = src_v;
for (uint8_t v = dst_bi; v != src_v;) {
uint8_t bi = v2bi[v];
uint8_t c0 = threads[src_t].b[bi];
uint8_t c1 = threads[dst_t].b[bi];
threads[src_t].b[bi] = c1;
threads[dst_t].b[bi] = c0;
tx[bi][c1] = src_t;
tx[bi][c0] = dst_t;
v = boxes.b[bi][c1];
}
} else {
boxes.b[src_bi][src_ci] = dst_bi;
boxes.b[dst_bi][dst_ci] = src_v;
}
}
}
}

return threads[0];
}

In most cases I'm not a big fan of interactive debugging, but
trying to understand how this works might merit an exception.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 17:07:25 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Wed, 1 Apr 2026 16:34:47 +0300
Michael S <[email protected]> wrote:

<snip>

My solution #2.
That is my own, with no proof preceding code. It went in other direction
- from hacking the code to proving it later.
It is pretty fast. Can be made faster yet by replication of some code
and re-arrangement in the find() routine, but for purposes of
illustration I wanted it short.

Just fyi.. I did some speed comparisons between the code below and
the compatible solver that I wrote. Your code is faster, scales
better, and also has better worst-case behavior (which is to say
that my code has worse worst-case behavior). So I think the timing
numbers I was seeing before may have been just the result of being
lucky in the draw of random numbers.

#include <stdint.h>
#include <string.h>
#include <stdbool.h>

#include "solver.h"

static
uint8_t solver_find(
const uint8_t b_idx[N_BOXES][N_BOXES],
uint8_t result[N_BOXES],
const uint8_t free[N_BOXES],
int n_free,
uint8_t v,
const uint8_t bx[N_BOXES][BOX_SIZE])
{
typedef struct {
uint8_t parent, bi;
} db_rec;
db_rec db[N_BOXES];
bool valid[N_BOXES] = {0};
uint8_t que[N_BOXES], *wr = que, *rd = que;

valid[v] = true;
*wr++ = v;
for (;;) {
uint8_t vv = *rd++;
for (int i = 0; i < n_free; ++i) {
uint8_t bi = free[i];
uint8_t ci = b_idx[bi][vv];
if (ci != BOX_SIZE) {
result[bi] = ci;
while (vv != v) {
bi = db[vv].bi;
vv = db[vv].parent;
result[bi] = b_idx[bi][vv];
};
return i;
}
}
// add neighbors to database
for (int bi = 0; bi < N_BOXES; ++bi) {
if (b_idx[bi][vv] != BOX_SIZE) {
uint8_t a = bx[bi][result[bi]];
if (!valid[a]) {
valid[a] = true;
db[a] = (db_rec){ .parent = vv, .bi = bi };
*wr++ = a;
}
}
}
}
}

peek_t solver(boxes_t boxes)
{
// build index
uint8_t b_idx[N_BOXES][N_BOXES];
memset(b_idx, BOX_SIZE, sizeof(b_idx));
for (int bi = 0; bi < N_BOXES; ++bi) {
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t v = boxes.b[bi][ci];
if (b_idx[bi][v] == BOX_SIZE)
b_idx[bi][v] = ci;
}
}

uint8_t free_boxes[N_BOXES];
for (int bi = 0; bi < N_BOXES; ++bi)
free_boxes[bi] = bi;
int n_free = N_BOXES;

peek_t result = {0};
bool used_sorts[N_BOXES]={0};
while (n_free > 0) {
int bi = free_boxes[--n_free]; // new set
result.b[bi] = 0;
uint8_t v = boxes.b[bi][0];
used_sorts[v] = true;
uint8_t pend_que[N_BOXES],
*pend_wr = pend_que, *pend_rd = pend_que;
*pend_wr++ = bi;
while (pend_wr != pend_rd && n_free != 0) {
int bk = *pend_rd++;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
v = boxes.b[bk][ci];
if (!used_sorts[v]) {
// new value in set
uint8_t v_fi = solver_find(
b_idx, result.b, free_boxes,
n_free, v, boxes.b);
uint8_t v_bi = free_boxes[v_fi];
used_sorts[v] = true;
*pend_wr++ = v_bi;
free_boxes[v_fi] = free_boxes[--n_free];
// remove from free list
}
}
}
}
return result;
}

I think I could eventually understand what this code is doing, and
after that get a sense of how and why it works. In its current
form I think that would take me a fair amount of time. It would
help to have a roadmap, more evocative names, more functional
decomposition, or ideally all three. Please understand, I'm not
asking or suggesting you do anything; my intention is only to
provide feedback that may be of some benefit to you.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 9 21:22:51 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Here I am again, ready now to write a more substantive response.

So, here is the challenge:
There are 20 sorts of cookies, 10 of each sort. They are placed in
20 boxes, again, 10 cookies in each boxes. You have to take exactly
one cookie from each box, all 20 of different sorts.
Smart math heads proved that for any distribution a solution exists.
I don't ask you to repeat the proof. Just peek cookies!

I think the problem is interesting. I thought I would try coding
something up.

It's obvious that one can find the solution by exhausting. Don't do
it. Indeed, the number of possible peek orders is finite, but ii is
large - 2.4e18.
Be smarter.
On modern PC I want to get a solution in less than 1 msec, bonus for
less than 50usec.

[.. C code to drive an exercise a proposed solution ..]

You have to implement function solver() in module solver.c

I ran into trouble when I tried to work within the posted framework
to drive a solver. The difficulties were of several kinds. The
code didn't compile in my standard compilation environment, which
is roughly this (the c99 might be c11 but that made no difference):

gcc -x c -std=c99 -pedantic ...

What were the problems?
My gcc 14.2.0 on msys2 produces no warnings or errors with above flags.

These source lines

struct timespec t0;
clock_gettime(CLOCK_MONOTONIC, &t0);

produced these diagnostics

tb.c:134:21: error: storage size of 't0' isn't known
tb.c:135:5: warning: implicit declaration of function 'clock_gettime' [...]
tb.c:135:19: error: 'CLOCK_MONOTONIC' undeclared (first use in this function)

under -std=c99. Apparently the first diagnostic, about
struct timespec, is fixed under -std=c11. It wasn't hard
to fix these, but it was irksome.

I was able to work through that problem but it was irksome. The
second difficulty is the interface to solver() seems rather clunky
to me, maybe not for the input but for the output, and it wasn't
clear to me what the specification is for the output (I did manage
to figure this out but only much later). I was having to put more
effort into figuring out the interface than solving the problem.

Yes, I should have explained it, at least briefly.
I'd guess that you expected that values in result vector correspond to
sorts of cookies, when in fact they are indices to position of the
cookie in the box.
I am sorry.

Yes, that is in fact what I was thinking. Fortunately the difficulty
was resolved when the checker reported a value out of range.

Finally I abandoned the idea of working within the structure of the
posted driver code and wrote a solver to my own specification.
Doing that was fairly straightforward and took half an hour or so
(disclaimer: after I had spent a good amount of time just thinking
about the problem). After writing the solver I reimplemented a
driver framework to drive it, conforming to the interface I was
using. Some debugging was needed to get everything working. I did
some rudimentary time measurements for the solver. Results were
good (more about the specifics later).

After doing the new solver I went back to the posted driver code,
and grafted together the new solver with the orginal driver. Some
glue code was needed to reconcile the differences in interface
between the new solver and the original driver. Then I needed to
figure out the semantics of the output, which were different than
what I expected and originally understood. I figured all that out
and got things working. Sadly the code was uglier than I would have
liked.

After that, I talked to a friend to try a different approach to
producing a solution. After some light editing by myself -- mostly
just formatting and some name changes -- the code below was put into
the hopper. (Note: I claim no credit for code below. I did some
editing to make it more readable but beyond that none of it is a
result of my efforts.)

#include <stdint.h>
#include "solver.h"

static uint32_t adjacent[ N_BOXES ];
static int cookie_of_box[ N_BOXES ];
static int box_of_cookie[ N_BOXES ];
static uint32_t seen;

static int
search( int b ){
uint32_t m = adjacent[b];
while( m ){
uint32_t bit = m & -m;
m ^= bit;
int c = __builtin_ctz( bit );
if( seen & 1u<<c ) continue;
seen |= 1u<<c;
if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
) ){ box_of_cookie[c] = b;
cookie_of_box[b] = c;
return 1;
}
}
return 0;
}

peek_t
solver( boxes_t boxes ){
peek_t res;

for( int i = 0; i < N_BOXES; i++ ){
uint32_t m = 0;
for( int k = 0; k < BOX_SIZE; k++ ){
m |= 1u << boxes.b[i][k];
}
adjacent[i] = m;
cookie_of_box[i] = -1;
box_of_cookie[i] = -1;
}

for( int i = 0; i < N_BOXES; i++ ){
seen = 0;
search( i );
}

for( int i = 0; i < N_BOXES; i++ ){
int c = cookie_of_box[i];
for( int k = 0; k < BOX_SIZE; k++ ){
if( boxes.b[i][k] == c ){
res.b[i] = k;
break;
}
}
}
return res;
}

Despite being derived independently, this code uses the same sort of
approach that I had used, except my code was recursive rather than
iterative.

Running the above code with default settings (128 11) produced this
output

o.k.
med=0.003542 msec, min=0.003487 msec, max=0.003911 msec

The timing results for my second code effort were similar, except
that the value for max was about 0.3 msec.

My defaults were chosen for much slower solutions.
For fast solutions like yours or mine with default parameters an
overhead of clock_gettime() call is too significant.
In such case it's better to use rep1 of at least 10000.

It turns out that running with a larger number of trials didn't
change the results much. I think my averages got a bit worse and my
worst case got a bit better, but as I recall the difference wasn't
anything dramatic.

Informal timing measurements on my original solver -- in particular
using a different interface, and done simply by using 'time' in the
shell to measure a million calls to the solver -- gave per-call
times that were better by about a factor of five. I need to be
clear that this solver does not conform to the original interface,
and probably most of speedup is due to choosing an interface that
gives an easier fit to the solver.

It's not something I'd expect.
I used "by value" type of interface both for input and for output in
order to reduce a chance of buggy solutions to mess with the test bench.
I fully expect that "by reference" type of interface could be somewhat faster. May be, 1.5x faster for very fast solutions. But I certainly
don't expect a factor of five.

I will try to explain below.

Sorry not to have something better for you. It was a fair amount of
work to produce the results above.

I read your code briefly, but didn't try to understand it yet.

I need to say again that the code I posted was not code that I wrote.
The approach used is similar to code I did write, but the actual code
is quite a bit different.

I plugged it into my test bench and measured speed on very old home PC
(Intel i5-3450). With big values of rep1 it was ~3.3 times faster
than my original solution and ~1.9 times slower than my 2nd solution.
So, speed-wise your solution is good.
One thing that I don't particularly like after taking a glance - it
does not appear to be thread-safe. But it is probably very easy to make
it thread safe.

Do you mean because it is using (and changing) global objects? Yes
that is a drawback.

Another thing that I don't particularly like - if we want N_BOXES > 32
then it requires modifications; if we want N_BOXES > 64 then it
requires more serious modifications.

As it turns out my code handles N_BOXES up to 52. You may laugh
when you see why.

Now few words about my own solutions.
1. Original solution:
This solution is based on the proof that was given to me by somebody
with abstract math mind.

Yes, I posted a comment about that

2. It is solution that I found independently, for which I hopefully
have a proof in the head, but I didn't write it out yet and didn't show
yet to said mathematician. It would not surprise me if idea of these solution is the same as yours.

I don't really understand that code yet, but it looks like your
code is a bit fancier than mine (the code that wasn't posted
because it uses a different interface).

Later today (tonight) I plan to post both solutions here.

Here is an outline of the first (non-interface-compatible) code that
I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means done)
a "solution state" to direct the search (conceptually by value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in a
way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values can
simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected box"
set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order bits
this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 10 12:23:14 2026

From Newsgroup: comp.lang.c

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...]

Here I am again, ready now to write a more substantive response.

So, here is the challenge:
There are 20 sorts of cookies, 10 of each sort. They are placed
in 20 boxes, again, 10 cookies in each boxes. You have to take
exactly one cookie from each box, all 20 of different sorts.
Smart math heads proved that for any distribution a solution
exists. I don't ask you to repeat the proof. Just peek cookies!

I think the problem is interesting. I thought I would try coding
something up.

It's obvious that one can find the solution by exhausting. Don't
do it. Indeed, the number of possible peek orders is finite, but
ii is large - 2.4e18.
Be smarter.
On modern PC I want to get a solution in less than 1 msec, bonus
for less than 50usec.

[.. C code to drive an exercise a proposed solution ..]

You have to implement function solver() in module solver.c

I ran into trouble when I tried to work within the posted framework
to drive a solver. The difficulties were of several kinds. The
code didn't compile in my standard compilation environment, which
is roughly this (the c99 might be c11 but that made no difference):

gcc -x c -std=c99 -pedantic ...

What were the problems?
My gcc 14.2.0 on msys2 produces no warnings or errors with above
flags.

These source lines

struct timespec t0;
clock_gettime(CLOCK_MONOTONIC, &t0);

produced these diagnostics

tb.c:134:21: error: storage size of 't0' isn't known
tb.c:135:5: warning: implicit declaration of function
'clock_gettime' [...] tb.c:135:19: error: 'CLOCK_MONOTONIC'
undeclared (first use in this function)

under -std=c99. Apparently the first diagnostic, about
struct timespec, is fixed under -std=c11. It wasn't hard
to fix these, but it was irksome.

Sounds like under Linux and with -std=c99 flag function clock_gettime()
and related structures and constants are not declared/defined by default
in time.h.
Man page suggests magic pixie dust:

#define _XOPEN_SOURCE 600
#include <time.h>
#include <unistd.h>

If you ask why I used clock_gettime() instead of C standard
timespec_get() then the answer is that on Windows, eps. on older
versions like Win7 and WS2008 (which I prefer over newer stuff for
reasons unrelated to quality of C RTL), precision of timespec_get() is
poor. clock_gettime(CLOCK_MONOTONIC,) is much better.

I was able to work through that problem but it was irksome. The
second difficulty is the interface to solver() seems rather clunky
to me, maybe not for the input but for the output, and it wasn't
clear to me what the specification is for the output (I did manage
to figure this out but only much later). I was having to put more
effort into figuring out the interface than solving the problem.

Yes, I should have explained it, at least briefly.
I'd guess that you expected that values in result vector correspond
to sorts of cookies, when in fact they are indices to position of
the cookie in the box.
I am sorry.

Yes, that is in fact what I was thinking. Fortunately the difficulty
was resolved when the checker reported a value out of range.

Finally I abandoned the idea of working within the structure of the
posted driver code and wrote a solver to my own specification.
Doing that was fairly straightforward and took half an hour or so
(disclaimer: after I had spent a good amount of time just thinking
about the problem). After writing the solver I reimplemented a
driver framework to drive it, conforming to the interface I was
using. Some debugging was needed to get everything working. I did
some rudimentary time measurements for the solver. Results were
good (more about the specifics later).

After doing the new solver I went back to the posted driver code,
and grafted together the new solver with the orginal driver. Some
glue code was needed to reconcile the differences in interface
between the new solver and the original driver. Then I needed to
figure out the semantics of the output, which were different than
what I expected and originally understood. I figured all that out
and got things working. Sadly the code was uglier than I would
have liked.

After that, I talked to a friend to try a different approach to
producing a solution. After some light editing by myself -- mostly
just formatting and some name changes -- the code below was put
into the hopper. (Note: I claim no credit for code below. I did
some editing to make it more readable but beyond that none of it
is a result of my efforts.)

#include <stdint.h>
#include "solver.h"

static uint32_t adjacent[ N_BOXES ];
static int cookie_of_box[ N_BOXES ];
static int box_of_cookie[ N_BOXES ];
static uint32_t seen;

static int
search( int b ){
uint32_t m = adjacent[b];
while( m ){
uint32_t bit = m & -m;
m ^= bit;
int c = __builtin_ctz( bit );
if( seen & 1u<<c ) continue;
seen |= 1u<<c;
if( box_of_cookie[c] == -1 || search(
box_of_cookie[c] ) ){ box_of_cookie[c] = b;
cookie_of_box[b] = c;
return 1;
}
}
return 0;
}

peek_t
solver( boxes_t boxes ){
peek_t res;

for( int i = 0; i < N_BOXES; i++ ){
uint32_t m = 0;
for( int k = 0; k < BOX_SIZE; k++ ){
m |= 1u << boxes.b[i][k];
}
adjacent[i] = m;
cookie_of_box[i] = -1;
box_of_cookie[i] = -1;
}

for( int i = 0; i < N_BOXES; i++ ){
seen = 0;
search( i );
}

for( int i = 0; i < N_BOXES; i++ ){
int c = cookie_of_box[i];
for( int k = 0; k < BOX_SIZE; k++ ){
if( boxes.b[i][k] == c ){
res.b[i] = k;
break;
}
}
}
return res;
}

Despite being derived independently, this code uses the same sort
of approach that I had used, except my code was recursive rather
than iterative.

Running the above code with default settings (128 11) produced this
output

o.k.
med=0.003542 msec, min=0.003487 msec, max=0.003911 msec

The timing results for my second code effort were similar, except
that the value for max was about 0.3 msec.

My defaults were chosen for much slower solutions.
For fast solutions like yours or mine with default parameters an
overhead of clock_gettime() call is too significant.
In such case it's better to use rep1 of at least 10000.

It turns out that running with a larger number of trials didn't
change the results much. I think my averages got a bit worse and my
worst case got a bit better, but as I recall the difference wasn't
anything dramatic.

After testing on another (Windows) systems I have to admit that my
objections about default value of rep1 applies only to Win7. Even on
Ws2008, which is almost the same kernel, overhead of clock_gettime() is sufficiently low for rep1=128 to give fair measurements.

Informal timing measurements on my original solver -- in particular
using a different interface, and done simply by using 'time' in the
shell to measure a million calls to the solver -- gave per-call
times that were better by about a factor of five. I need to be
clear that this solver does not conform to the original interface,
and probably most of speedup is due to choosing an interface that
gives an easier fit to the solver.

It's not something I'd expect.
I used "by value" type of interface both for input and for output in
order to reduce a chance of buggy solutions to mess with the test
bench. I fully expect that "by reference" type of interface could
be somewhat faster. May be, 1.5x faster for very fast solutions.
But I certainly don't expect a factor of five.

I will try to explain below.

Sorry not to have something better for you. It was a fair amount
of work to produce the results above.

I read your code briefly, but didn't try to understand it yet.

I need to say again that the code I posted was not code that I wrote.
The approach used is similar to code I did write, but the actual code
is quite a bit different.

I plugged it into my test bench and measured speed on very old home
PC (Intel i5-3450). With big values of rep1 it was ~3.3 times
faster than my original solution and ~1.9 times slower than my 2nd solution. So, speed-wise your solution is good.
One thing that I don't particularly like after taking a glance - it
does not appear to be thread-safe. But it is probably very easy to
make it thread safe.

Do you mean because it is using (and changing) global objects? Yes
that is a drawback.

Yes.

Another thing that I don't particularly like - if we want N_BOXES >
32 then it requires modifications; if we want N_BOXES > 64 then it requires more serious modifications.

As it turns out my code handles N_BOXES up to 52. You may laugh
when you see why.

Now few words about my own solutions.
1. Original solution:
This solution is based on the proof that was given to me by somebody
with abstract math mind.

Yes, I posted a comment about that

<snip the rest, will respond later >

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 10 16:16:38 2026

From Newsgroup: comp.lang.c

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

<snip, answered in the other post>

As it turns out my code handles N_BOXES up to 52. You may laugh
when you see why.

The number of small+capital letters in English alphabet?
Probably, not.
Indeed, after reading the rest, it's not that.

Now few words about my own solutions.
1. Original solution:
This solution is based on the proof that was given to me by somebody
with abstract math mind.

Yes, I posted a comment about that

2. It is solution that I found independently, for which I hopefully
have a proof in the head, but I didn't write it out yet and didn't
show yet to said mathematician. It would not surprise me if idea
of these solution is the same as yours.

I don't really understand that code yet, but it looks like your
code is a bit fancier than mine (the code that wasn't posted
because it uses a different interface).

Later today (tonight) I plan to post both solutions here.

Here is an outline of the first (non-interface-compatible) code that
I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means done)
a "solution state" to direct the search (conceptually by value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in a
way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values can
simply be directly compared, without any masking.

Is it proven that this step helps?
To me it sounds unnecessary.

Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected box"
set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order bits
this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

It makes sense, but it looks to me as solution by exhausting, augmented
by "smallest number of boxes" heuristic.
It's not obvious [to me] without further thinking that the worst case is
not very slow.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 10 17:02:58 2026

From Newsgroup: comp.lang.c

On Thu, 09 Apr 2026 16:37:30 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 1 Apr 2026 16:34:47 +0300
Michael S <[email protected]> wrote:

<snip>

My implementation #1.
It is based on the proof provided by somebody with abstract math
attitude.
He said that his proof is something that the 5th grade
schoolchildren could find with in 5 minutes.
I am no 5th pupil any longer. Even understanding of his proof took
me significantly longer than 5 minutes :(
Here is my attempt of translation (original proof was not in
English):

"Solution would be done by induction by number of cookies of each
sort (the same for all sorts).
For K=2 the solution is simple. Please figure it out yourself.
Now let's take K > 2 and postulate that for numbers of cookies per
box that are smaller than K the assertion already proven.
Let's take some disposition for which the assertion is correct and
remove the corresponding "thread" (i.e. one cookie from each box,
all of different sorts). Then supposition of induction is correct
for remaining disposition, which means that we can remove yet
another thread, then yet another etc.. for a total of K threads.
That is, for K>2 there are more than 2 threads.
Now let's exchange places between any 2 two cookies from different
boxes. Since by doing so we are touching 1 or 2 threads then there
still be at least one thread untouched (at least k-2 threads,
actually [M.S.]). It means that the property "we can remove a
thread" is not disturbed by our action (i.e. by exchange of two
cookies). What is left it is to realize that by means of such
exchanges any disposition can be mutated from any other disposition.
That's all."

I am confident that my understanding of this description, if indeed
it ever occurs, would take a good deal longer than five minutes.
Sadly I think it is representative of many of the explanations
offered by mathematical types.

He was not fully sutisfied himself. 20 hours later he came with other formulation. Here is my attemp of translatioon:

1. There exist a disposition of cookies in boxes, for which it is
possible to select one cookie from each box, all of different sorts.
That is obvious. [MS: For example, each sort of cookies in its own
dedicated box. That is a disposition that I use in my code] Let's call
the selection "thread".

2. There exists an operation that changes the disposition, but
preserves the property "it is possible to select a thread" (the
description of the operation would be given separately, a.t.m. it does
not matter).

3. By means of multiple operations of that type it is possible to turn
any arbitrary disposition to any other. Since after every step the
property "it is possible to select a thread" is preserved and because
there exist a disposition, for which this property is true, it means
that the property is true for any possible disposition.

4. The operations in question - exchange of any pair of cookies.
Comment: Induction is used only in (2) for proof that an operation
preserves the desired property.

For me, the second variant was less helpful. May be, for you it would
be more helpful.

And here is a code. It is not very fast. But the speed is almost
the same on average and in the worst case.

#include <stdint.h>
#include <stdbool.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint8_t na[N_BOXES] = {0};
typedef struct { uint8_t s,d; } xch_t;
xch_t xch[N_BOXES*BOX_SIZE];
int i = 0;
for (int src_bi = 0; src_bi < N_BOXES; ++src_bi) {
for (int src_ci = na[src_bi]; src_ci < BOX_SIZE; ++src_ci) {
uint8_t dst_bi = boxes.b[src_bi][src_ci];
while (dst_bi != src_bi) {
uint8_t dst_ci = na[dst_bi];
uint8_t dst_v = boxes.b[dst_bi][dst_ci];
na[dst_bi] = dst_ci+1;
xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
dst_bi = dst_v;
}
xch[i++] = (xch_t){ .s=src_bi, .d=dst_bi};
}
na[src_bi] = BOX_SIZE;
}

for (int bi = 0; bi < N_BOXES; ++bi)
for (int ci = 0; ci < BOX_SIZE; ++ci)
boxes.b[bi][ci] = bi;

uint8_t tx[N_BOXES][BOX_SIZE];
for (int bi = 0; bi < N_BOXES; ++bi)
for (int ci = 0; ci < BOX_SIZE; ++ci)
tx[bi][ci] = ci;

peek_t threads[BOX_SIZE];
for (int ci = 0; ci < BOX_SIZE; ++ci)
for (int bi = 0; bi < N_BOXES; ++bi)
threads[ci].b[bi] = ci;

for (int i = N_BOXES*BOX_SIZE-1; i >= 0; --i) {
uint8_t src_bi = xch[i].s;
uint8_t dst_bi = xch[i].d;
uint8_t dst_ci = na[dst_bi] - 1;
na[dst_bi] = dst_ci;
if (src_bi != dst_bi) {
uint8_t src_ci = na[src_bi];
uint8_t src_v = boxes.b[src_bi][src_ci];
if (src_v != dst_bi) {
uint8_t src_t = tx[src_bi][src_ci];
uint8_t dst_t = tx[dst_bi][dst_ci];
if (src_t != dst_t) {
// 2 threads broken. Fix
uint8_t v2bi[N_BOXES];
for (int bi = 0; bi < N_BOXES; ++bi)
v2bi[boxes.b[bi][threads[src_t].b[bi]]] = bi;
boxes.b[src_bi][src_ci] = dst_bi;
boxes.b[dst_bi][dst_ci] = src_v;
for (uint8_t v = dst_bi; v != src_v;) {
uint8_t bi = v2bi[v];
uint8_t c0 = threads[src_t].b[bi];
uint8_t c1 = threads[dst_t].b[bi];
threads[src_t].b[bi] = c1;
threads[dst_t].b[bi] = c0;
tx[bi][c1] = src_t;
tx[bi][c0] = dst_t;
v = boxes.b[bi][c1];
}
} else {
boxes.b[src_bi][src_ci] = dst_bi;
boxes.b[dst_bi][dst_ci] = src_v;
}
}
}
}

return threads[0];
}

In most cases I'm not a big fan of interactive debugging, but
trying to understand how this works might merit an exception.

Frankly, not all difficulties of following my code caused by the fact
that algorithm is derived from abstract proof.
Significant share of hurdles are rooted in my somewhat conflicting
desires to avoid all searches, except for inevitable search in the
innermost loop of the second pass, but at the same time to keep format
of the governing database (xch array+na array) minimalistic.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 10 18:06:44 2026

From Newsgroup: comp.lang.c

On Thu, 09 Apr 2026 17:07:25 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 1 Apr 2026 16:34:47 +0300
Michael S <[email protected]> wrote:

<snip>

My solution #2.
That is my own, with no proof preceding code. It went in other
direction
- from hacking the code to proving it later.
It is pretty fast. Can be made faster yet by replication of some
code and re-arrangement in the find() routine, but for purposes of illustration I wanted it short.

Just fyi.. I did some speed comparisons between the code below and
the compatible solver that I wrote. Your code is faster, scales
better, and also has better worst-case behavior (which is to say
that my code has worse worst-case behavior). So I think the timing
numbers I was seeing before may have been just the result of being
lucky in the draw of random numbers.

#include <stdint.h>
#include <string.h>
#include <stdbool.h>

#include "solver.h"

static
uint8_t solver_find(
const uint8_t b_idx[N_BOXES][N_BOXES],
uint8_t result[N_BOXES],
const uint8_t free[N_BOXES],
int n_free,
uint8_t v,
const uint8_t bx[N_BOXES][BOX_SIZE])
{
typedef struct {
uint8_t parent, bi;
} db_rec;
db_rec db[N_BOXES];
bool valid[N_BOXES] = {0};
uint8_t que[N_BOXES], *wr = que, *rd = que;

valid[v] = true;
*wr++ = v;
for (;;) {
uint8_t vv = *rd++;
for (int i = 0; i < n_free; ++i) {
uint8_t bi = free[i];
uint8_t ci = b_idx[bi][vv];
if (ci != BOX_SIZE) {
result[bi] = ci;
while (vv != v) {
bi = db[vv].bi;
vv = db[vv].parent;
result[bi] = b_idx[bi][vv];
};
return i;
}
}
// add neighbors to database
for (int bi = 0; bi < N_BOXES; ++bi) {
if (b_idx[bi][vv] != BOX_SIZE) {
uint8_t a = bx[bi][result[bi]];
if (!valid[a]) {
valid[a] = true;
db[a] = (db_rec){ .parent = vv, .bi = bi };
*wr++ = a;
}
}
}
}
}

peek_t solver(boxes_t boxes)
{
// build index
uint8_t b_idx[N_BOXES][N_BOXES];
memset(b_idx, BOX_SIZE, sizeof(b_idx));
for (int bi = 0; bi < N_BOXES; ++bi) {
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t v = boxes.b[bi][ci];
if (b_idx[bi][v] == BOX_SIZE)
b_idx[bi][v] = ci;
}
}

uint8_t free_boxes[N_BOXES];
for (int bi = 0; bi < N_BOXES; ++bi)
free_boxes[bi] = bi;
int n_free = N_BOXES;

peek_t result = {0};
bool used_sorts[N_BOXES]={0};
while (n_free > 0) {
int bi = free_boxes[--n_free]; // new set
result.b[bi] = 0;
uint8_t v = boxes.b[bi][0];
used_sorts[v] = true;
uint8_t pend_que[N_BOXES],
*pend_wr = pend_que, *pend_rd = pend_que;
*pend_wr++ = bi;
while (pend_wr != pend_rd && n_free != 0) {
int bk = *pend_rd++;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
v = boxes.b[bk][ci];
if (!used_sorts[v]) {
// new value in set
uint8_t v_fi = solver_find(
b_idx, result.b, free_boxes,
n_free, v, boxes.b);
uint8_t v_bi = free_boxes[v_fi];
used_sorts[v] = true;
*pend_wr++ = v_bi;
free_boxes[v_fi] = free_boxes[--n_free];
// remove from free list
}
}
}
}
return result;
}

I think I could eventually understand what this code is doing, and
after that get a sense of how and why it works. In its current
form I think that would take me a fair amount of time. It would
help to have a roadmap, more evocative names, more functional
decomposition, or ideally all three. Please understand, I'm not
asking or suggesting you do anything; my intention is only to
provide feedback that may be of some benefit to you.

I don't have a roadmap, but I have a variant of [approximately] the
same algorithm implemented in the style that is closer to your
preferences: implicit bookkeeping by recursion instead of explicit
lists and back-trace pointers. I even borrowed the name "seen" from the
code of your friend.
Hope it helps:

#include <stdint.h>
#include <string.h>
#include <stdbool.h>

#include "solver.h"

typedef struct {
uint8_t result[N_BOXES];
// result contains sorts of cookies per box
uint8_t free_boxes[N_BOXES];
int n_free;
uint8_t used_boxes[N_BOXES];
int n_used;
bool seen[N_BOXES];
uint8_t b_idx[N_BOXES][N_BOXES];
} solver_state_t;

static
bool solver_find(solver_state_t* s, uint8_t v, bool first)
{
int n_free = s->n_free;
for (int i = 0; i < n_free; ++i) {
uint8_t bi = s->free_boxes[i];
uint8_t ci = s->b_idx[bi][v];
if (ci != BOX_SIZE) {
s->result[bi] = v;
s->free_boxes[i] = s->free_boxes[--n_free];
// remove box from free list
s->n_free = n_free;
s->used_boxes[s->n_used++] = bi; // add box to used set
return true;
}
}

if (first) {
memset(s->seen, 0, sizeof(s->seen));
s->seen[v] = true;
}

int n_used = s->n_used;
for (int i = 0; i < n_used; ++i) {
int bi = s->used_boxes[i];
if (s->b_idx[bi][v] != BOX_SIZE) {
uint8_t a = s->result[bi];
if (!s->seen[a]) {
s->seen[a] = true;
if (solver_find(s, a, false)) {
s->result[bi] = v;
return true;
}
}
}
}
return false;
}

peek_t solver(boxes_t boxes)
{
solver_state_t s;
// build index
memset(s.b_idx, BOX_SIZE, sizeof(s.b_idx));
for (int bi = 0; bi < N_BOXES; ++bi) {
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t v = boxes.b[bi][ci];
if (s.b_idx[bi][v] == BOX_SIZE)
s.b_idx[bi][v] = ci;
}
}

for (int bi = 0; bi < N_BOXES; ++bi)
s.free_boxes[bi] = bi;
s.n_free = N_BOXES;

bool used_sorts[N_BOXES]={0};
while (s.n_free > 0) {
int bi = s.free_boxes[--s.n_free]; // new set
uint8_t v = boxes.b[bi][0];
s.result[bi] = v;
used_sorts[v] = true;
s.used_boxes[0] = bi;
s.n_used = 1;
int rd_i = 0; // used_boxes accessed as queue
while (rd_i != s.n_used && s.n_free != 0) {
int bk = s.used_boxes[rd_i++];
for (int ci = 0; ci < BOX_SIZE; ++ci) {
v = boxes.b[bk][ci];
if (!used_sorts[v]) {
// new value in set
solver_find(&s, v, true);
used_sorts[v] = true;
}
}
}
}

// translate result from sorts to indices of selected cookies
peek_t peek;
for (int bi = 0; bi < N_BOXES; ++bi)
peek.b[bi] = s.b_idx[bi][s.result[bi]];
return peek;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Fri Apr 10 22:41:08 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

Tim Rentsch <[email protected]> wrote:

[...]

I ran into trouble when I tried to work within the posted framework
to drive a solver. The difficulties were of several kinds. The
code didn't compile in my standard compilation environment, which
is roughly this (the c99 might be c11 but that made no difference):

gcc -x c -std=c99 -pedantic ...

What were the problems?
My gcc 14.2.0 on msys2 produces no warnings or errors with above
flags.

These source lines

struct timespec t0;
clock_gettime(CLOCK_MONOTONIC, &t0);

produced these diagnostics

tb.c:134:21: error: storage size of 't0' isn't known
tb.c:135:5: warning: implicit declaration of function
'clock_gettime' [...] tb.c:135:19: error: 'CLOCK_MONOTONIC'
undeclared (first use in this function)

under -std=c99. Apparently the first diagnostic, about
struct timespec, is fixed under -std=c11. It wasn't hard
to fix these, but it was irksome.

Sounds like under Linux and with -std=c99 flag function clock_gettime()
and related structures and constants are not declared/defined by default
in time.h.

Yes, gcc follows the ISO standard exactly, and does not define
symbols like clock_gettime, because the C standard does not
allow conforming impementations to do so.

Man page suggests magic pixie dust:

#define _XOPEN_SOURCE 600
#include <time.h>
#include <unistd.h>

It is my practice not to mix ISO-conformant and non-ISO-conformant
code in the same translation unit. It wasn't hard to find the
necessary tweak for a separate source file used to get the
current time:

#define _POSIX_C_SOURCE 199309L
#include <time.h>

after which both the 'struct timespec' and 'clock_gettime()' were
quite okay. I packaged the time in a way so it could get across
the function call interface and back to the main program.

If you ask why I used clock_gettime() instead of C standard
timespec_get() then the answer is that on Windows, eps. on older
versions like Win7 and WS2008 (which I prefer over newer stuff for
reasons unrelated to quality of C RTL), precision of timespec_get() is
poor. clock_gettime(CLOCK_MONOTONIC,) is much better.

Makes sense.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 03:45:21 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 16:37:30 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 1 Apr 2026 16:34:47 +0300
Michael S <[email protected]> wrote:

<snip>

My implementation #1.
It is based on the proof provided by somebody with abstract math
attitude.
He said that his proof is something that the 5th grade
schoolchildren could find with in 5 minutes.
I am no 5th pupil any longer. Even understanding of his proof took
me significantly longer than 5 minutes :(
Here is my attempt of translation (original proof was not in
English):

"Solution would be done by induction by number of cookies of each
sort (the same for all sorts).
For K=2 the solution is simple. Please figure it out yourself.
Now let's take K > 2 and postulate that for numbers of cookies per
box that are smaller than K the assertion already proven.
Let's take some disposition for which the assertion is correct and
remove the corresponding "thread" (i.e. one cookie from each box,
all of different sorts). Then supposition of induction is correct
for remaining disposition, which means that we can remove yet
another thread, then yet another etc.. for a total of K threads.
That is, for K>2 there are more than 2 threads.
Now let's exchange places between any 2 two cookies from different
boxes. Since by doing so we are touching 1 or 2 threads then there
still be at least one thread untouched (at least k-2 threads,
actually [M.S.]). It means that the property "we can remove a
thread" is not disturbed by our action (i.e. by exchange of two
cookies). What is left it is to realize that by means of such
exchanges any disposition can be mutated from any other disposition.
That's all."

I am confident that my understanding of this description, if indeed
it ever occurs, would take a good deal longer than five minutes.
Sadly I think it is representative of many of the explanations
offered by mathematical types.

He was not fully sutisfied himself. 20 hours later he came with other formulation. Here is my attemp of translatioon:

1. There exist a disposition of cookies in boxes, for which it is
possible to select one cookie from each box, all of different sorts.
That is obvious. [MS: For example, each sort of cookies in its own dedicated box. That is a disposition that I use in my code] Let's call
the selection "thread".

2. There exists an operation that changes the disposition, but
preserves the property "it is possible to select a thread" (the
description of the operation would be given separately, a.t.m. it does
not matter).

3. By means of multiple operations of that type it is possible to turn
any arbitrary disposition to any other. Since after every step the
property "it is possible to select a thread" is preserved and because
there exist a disposition, for which this property is true, it means
that the property is true for any possible disposition.

4. The operations in question - exchange of any pair of cookies.
Comment: Induction is used only in (2) for proof that an operation
preserves the desired property.

For me, the second variant was less helpful. May be, for you it would
be more helpful.

I do at least understand it. It doesn't really help me construct
a solution, because I don't see how to transition from one choice
function (aka "thread") to another after exchanging a pair of
cookies. (Nor do I want an explanation of how to do so.)

[...code...]

In most cases I'm not a big fan of interactive debugging, but
trying to understand how this works might merit an exception.

Frankly, not all difficulties of following my code caused by the fact
that algorithm is derived from abstract proof.
Significant share of hurdles are rooted in my somewhat conflicting
desires to avoid all searches, except for inevitable search in the
innermost loop of the second pass, but at the same time to keep format
of the governing database (xch array+na array) minimalistic.

For what it's worth, I think your code has a better chance of
being understandable than the mathematical description of finding
a solution. Not interested enough to pursue it though.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 04:27:54 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

(just a few short replies)

[...]

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values can
simply be directly compared, without any masking.

Is it proven that this step helps?
To me it sounds unnecessary.

My experience with backtracking algorithms tells me that
selecting a path where the number of possible choices is
smallest usually does a better job of finding a solution
quickly.

[...]

I think that's it. I hope it makes sense.

It makes sense, but it looks to me as solution by exhausting, augmented
by "smallest number of boxes" heuristic.
It's not obvious [to me] without further thinking that the worst case is
not very slow.

Several short responses.

I think "by exhaustion" is usually taken to mean an unguided
exploration of all alternatives, even ones that aren't plausible.
Backtracking isn't that.

Backtracking is recognized as a viable search strategy in lots
of different kinds of problems, including problems that are
known to be NP complete. It has a good pedigree.

My goals in writing the program were to write a program that is
understandable, that does find solutions, and that has reasonable
performance. Since the program did find answers and also met your
stated performance goals, I have no reason to be dissatisfied.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 04:31:32 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

[...]

I don't have a roadmap, but I have a variant of [approximately] the
same algorithm implemented in the style that is closer to your
preferences: implicit bookkeeping by recursion instead of explicit
lists and back-trace pointers. I even borrowed the name "seen" from the
code of your friend.

[..code..]

I appreciate the response. It will take me some time to look
at this, and so I'm not sure when I can get to it.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Sat Apr 11 22:56:37 2026

From Newsgroup: comp.lang.c

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Here is an outline of the first (non-interface-compatible) code that
I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means done)
a "solution state" to direct the search (conceptually by value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in a
way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values can
simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected box"
set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order bits
this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

I need clarification for the last step.
What exactly do we do after our fist guess failed?
Supposedly continue to the next sort of cookies that has the same # of
boxes as the one we just tried.
But what we do after that? Do we have to try cookies with higher number
of boxes?
If yes, does it apply in case when cookies that we tried before had
just one box?

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Sat Apr 11 15:57:23 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Here is an outline of the first (non-interface-compatible) code that
I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means done)
a "solution state" to direct the search (conceptually by value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in a
way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values can
simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected box"
set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order bits
this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

I need clarification for the last step.

Here is a mixture of C code and pseudocode. A SolveState is an
array, one element for each cookie type.

Answer
solve( SolveState *problem ){
Answer answer;
jmp_buf found;
if( seek( &answer, found, problem ) ) return answer;

printf( "No joy\n" );
exit(0);
}
/* Note that seek() returns true if and only if a longjmp() was done. */

_Bool
seek( Answer *answer, jmp_buf found, SolveState *problem ){
if( setjmp( found ) ) return true;
return look( answer, found, 0, problem );
}
/* The call to look() after the if() starts the search going. */
/* If an answer is found, a longjmp() will go back to the 'return true;' */

_Bool
look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
SolveState work;
...

if( level == COOKIE_LIMIT ) longjmp( found, 1 );

... ignoring the "optimization" step of choosing the best cookie ...

for each box B in in->cookies[level] {
remember box B for the cookie in->cookies[level], in *answer
copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
look( out, found, level+1, &work );
}

return false;
}
/* Note that 'true' is never returned by this function. */
/* If there are more boxes to try for the current cookie, recurse. */
/* When there are no more boxes to try, return false. */
/* If/when all of the cookies have been assigned, longjmp() */
/* Doing a longjmp() will cause seek() to return true. */

The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
for() loop.

Now for your questions...

What exactly do we do after our fist guess failed?

If you mean the first box for the current cookie, go on to the
next box for that cookie.

If you mean all the boxes for the current cookie failed, simply
return.

If you mean all the boxes for the cookie on the first level, simply
return, which will cause an outer call to fail irrevocably. Under
the conditions of the problem, this never happens, if the code has
been written correctly.

Supposedly continue to the next sort of cookies that has the same # of
boxes as the one we just tried.
But what we do after that? Do we have to try cookies with higher number
of boxes?

No. At each level there is exactly one cookie to try, and we try
each of the boxes that cookie might be taken from (at this point in
the search); if none of those boxes work, backtrack -- which is to
say, return to the next outer level.

If yes, does it apply in case when cookies that we tried before had
just one box?

At each level, we are never concerned with the cookie choices that
were made at previous levels (which means a smaller level index)
as those choices are "frozen" for this part of the search.

I hope these answers clear up what you're looking for.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Sun Apr 12 02:15:43 2026

From Newsgroup: comp.lang.c

On Sat, 11 Apr 2026 15:57:23 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Here is an outline of the first (non-interface-compatible) code
that I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means
done) a "solution state" to direct the search (conceptually by
value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in
a way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values
can simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected
box" set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order
bits this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

I need clarification for the last step.

Here is a mixture of C code and pseudocode. A SolveState is an
array, one element for each cookie type.

Answer
solve( SolveState *problem ){
Answer answer;
jmp_buf found;
if( seek( &answer, found, problem ) ) return answer;

printf( "No joy\n" );
exit(0);
}
/* Note that seek() returns true if and only if a longjmp() was done.
*/

_Bool
seek( Answer *answer, jmp_buf found, SolveState *problem ){
if( setjmp( found ) ) return true;
return look( answer, found, 0, problem );
}
/* The call to look() after the if() starts the search going. */
/* If an answer is found, a longjmp() will go back to the 'return
true;' */

_Bool
look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
SolveState work;
...

if( level == COOKIE_LIMIT ) longjmp( found, 1 );

... ignoring the "optimization" step of choosing the best cookie
...

for each box B in in->cookies[level] {
remember box B for the cookie in->cookies[level], in *answer
copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
look( out, found, level+1, &work );
}

return false;
}
/* Note that 'true' is never returned by this function. */
/* If there are more boxes to try for the current cookie, recurse. */
/* When there are no more boxes to try, return false. */
/* If/when all of the cookies have been assigned, longjmp() */
/* Doing a longjmp() will cause seek() to return true. */

The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
for() loop.

Now for your questions...

What exactly do we do after our fist guess failed?

If you mean the first box for the current cookie, go on to the
next box for that cookie.

If you mean all the boxes for the current cookie failed, simply
return.

If you mean all the boxes for the cookie on the first level, simply
return, which will cause an outer call to fail irrevocably. Under
the conditions of the problem, this never happens, if the code has
been written correctly.

Supposedly continue to the next sort of cookies that has the same #
of boxes as the one we just tried.
But what we do after that? Do we have to try cookies with higher
number of boxes?

No. At each level there is exactly one cookie to try, and we try
each of the boxes that cookie might be taken from (at this point in
the search); if none of those boxes work, backtrack -- which is to
say, return to the next outer level.

If yes, does it apply in case when cookies that we tried before had
just one box?

At each level, we are never concerned with the cookie choices that
were made at previous levels (which means a smaller level index)
as those choices are "frozen" for this part of the search.

I hope these answers clear up what you're looking for.

It iterates differently from what I thought it does.
I should digest a new information. Not tonight.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Mon Apr 13 02:03:28 2026

From Newsgroup: comp.lang.c

On Sat, 11 Apr 2026 15:57:23 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Here is an outline of the first (non-interface-compatible) code
that I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means
done) a "solution state" to direct the search (conceptually by
value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in
a way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values
can simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected
box" set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order
bits this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

I need clarification for the last step.

Here is a mixture of C code and pseudocode. A SolveState is an
array, one element for each cookie type.

Answer
solve( SolveState *problem ){
Answer answer;
jmp_buf found;
if( seek( &answer, found, problem ) ) return answer;

printf( "No joy\n" );
exit(0);
}
/* Note that seek() returns true if and only if a longjmp() was done.
*/

_Bool
seek( Answer *answer, jmp_buf found, SolveState *problem ){
if( setjmp( found ) ) return true;
return look( answer, found, 0, problem );
}
/* The call to look() after the if() starts the search going. */
/* If an answer is found, a longjmp() will go back to the 'return
true;' */

_Bool
look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
SolveState work;
...

if( level == COOKIE_LIMIT ) longjmp( found, 1 );

... ignoring the "optimization" step of choosing the best cookie
...

for each box B in in->cookies[level] {
remember box B for the cookie in->cookies[level], in *answer
copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
look( out, found, level+1, &work );
}

return false;
}
/* Note that 'true' is never returned by this function. */
/* If there are more boxes to try for the current cookie, recurse. */
/* When there are no more boxes to try, return false. */
/* If/when all of the cookies have been assigned, longjmp() */
/* Doing a longjmp() will cause seek() to return true. */

The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
for() loop.

Now for your questions...

What exactly do we do after our fist guess failed?

If you mean the first box for the current cookie, go on to the
next box for that cookie.

If you mean all the boxes for the current cookie failed, simply
return.

If you mean all the boxes for the cookie on the first level, simply
return, which will cause an outer call to fail irrevocably. Under
the conditions of the problem, this never happens, if the code has
been written correctly.

Supposedly continue to the next sort of cookies that has the same #
of boxes as the one we just tried.
But what we do after that? Do we have to try cookies with higher
number of boxes?

No. At each level there is exactly one cookie to try, and we try
each of the boxes that cookie might be taken from (at this point in
the search); if none of those boxes work, backtrack -- which is to
say, return to the next outer level.

If yes, does it apply in case when cookies that we tried before had
just one box?

At each level, we are never concerned with the cookie choices that
were made at previous levels (which means a smaller level index)
as those choices are "frozen" for this part of the search.

I hope these answers clear up what you're looking for.

#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <limits.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint64_t boxes_of_sorts[N_BOXES] = {0};
uint64_t sorts_of_boxes[N_BOXES];
uint8_t b_idx[N_BOXES][N_BOXES];
// build indices
for (int bi = 0; bi < N_BOXES; ++bi) {
uint64_t sorts = 0;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t sort = boxes.b[bi][ci];
sorts |= (uint64_t)1 << sort;
b_idx[bi][sort] = ci;
boxes_of_sorts[sort] |= (uint64_t)1 << bi;
}
sorts_of_boxes[bi] = sorts;
}

peek_t result={0};
uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
struct stack_node {
uint64_t free_boxes, free_sorts, boxes;
int sel_sort;
} stack[N_BOXES];
int level = 0;
do {
// find free sort that remains in the smallest # of free boxes
uint64_t msk = free_sorts;
int nb_min = INT_MAX;
int sel_sort = N_BOXES;
while (msk) {
int sort = __builtin_ctzll(msk);
uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
if (boxes) {
int nb = __builtin_popcountll(boxes);
if (nb_min > nb) {
nb_min = nb;
sel_sort = sort;
if (nb == 1)
break;
}
}
msk ^= (uint64_t)1 << sort;
}

if (sel_sort == N_BOXES)
goto pop;

// candidate sort found
uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
back:;
int bi = __builtin_ctzll(boxes);
result.b[bi] = b_idx[bi][sel_sort];
boxes ^= (uint64_t)1 << bi;
// preserve state
stack[level] = (struct stack_node){
.free_boxes=free_boxes, .free_sorts=free_sorts,
.boxes=boxes, .sel_sort = sel_sort};
// go to the next level
free_boxes ^= (uint64_t)1 << bi;
free_sorts ^= (uint64_t)1 << sel_sort;
++level;
continue;

// return to previous level
pop:
while (level > 0) {
--level;
boxes = stack[level].boxes;
if (boxes) {
free_boxes = stack[level].free_boxes;
free_sorts = stack[level].free_sorts;
sel_sort = stack[level].sel_sort;
goto back;
}
}
// should never come here
break;
} while (level < N_BOXES);
return result;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Mon Apr 13 17:11:29 2026

From Newsgroup: comp.lang.c

On Sat, 11 Apr 2026 15:57:23 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 09 Apr 2026 21:22:51 -0700
Tim Rentsch <[email protected]> wrote:

Here is an outline of the first (non-interface-compatible) code
that I wrote.

Use a straightforward backtracking scheme. Make a choice at the
current level, and do a recursive call to try the next level. If
the next level returns, it failed, so go on to the next choice at
this level and call again.

The output value is an array (passed recursively via pointer) with
either a cookie type for each box or a box number for each cookie
type (I don't remember which). This array is filled in as the
recursive backtrack progresses.

The whole search is started by doing a setjmp() before the call to
the top level. If the search succeeds, a longjmp() is done to
deliver an answer. That means that if the call to the top level
returns then the search was a failure. The central recursive
function has four parameters:

an Answer *'out', to hold the solution
a jmp_buf 'found', for a longjmp() when a full answer is found
a counter/index to reflect search depth (starting 0, 20 means
done) a "solution state" to direct the search (conceptually by
value)

The "solution state" is at the heart of the algorithm. It's a
structure holding an array of 20 "cookie values", where a cookie
value is represented by an unsigned 64-bit int. The bits are
used as follows:

low six bits: the cookie type
next 52 bits: a "box mask", indicating at least one cookie of
this type in the corresponding box (so up to 52
boxes)
upper bits: count of boxes (ie, number of ones in the box mask)

The cookie type is important when storing a partial result in the
answer, not at any other time.

The box mask is important to know which boxes to try during the
backtracking.

The count of boxes is important to help guide the backtracking, in
a way that we hope will get an answer faster.

At each level of recursion, do the following:

If the search depth is the number of boxes, longjmp()

Next find the cookie of cookies in the solution state at or
above this level that has the smallest number of boxes. Because
the box count is held in the uppermost bits the 64-bit values
can simply be directly compared, without any masking. Move that
cookie to the array location corresponding to the current depth
(and put the cookie that was there in the hole left by doing the
move).

For each box in the box mask of that cookie, try selecting that
box for this cookie, which means:

copy the state of cookies above this level to a new solution
state, so the changed solution state described above can be
left alone

for each of those cookies, if the cookie has the "selected
box" set in its box mask, clear the bit and reduce the box count
by one. (again since the box count is held in high order
bits this count can be reduced by just subtracting a scaled one
value.)

recurse to continue searching at the next level

If we run out of boxes to check, just return; continue trying
alternatives at the next level up, if there is one.

I think that's it. I hope it makes sense.

I need clarification for the last step.

Here is a mixture of C code and pseudocode. A SolveState is an
array, one element for each cookie type.

Answer
solve( SolveState *problem ){
Answer answer;
jmp_buf found;
if( seek( &answer, found, problem ) ) return answer;

printf( "No joy\n" );
exit(0);
}
/* Note that seek() returns true if and only if a longjmp() was done.
*/

_Bool
seek( Answer *answer, jmp_buf found, SolveState *problem ){
if( setjmp( found ) ) return true;
return look( answer, found, 0, problem );
}
/* The call to look() after the if() starts the search going. */
/* If an answer is found, a longjmp() will go back to the 'return
true;' */

_Bool
look( Answer *answer, jmp_buf found, Index level, SolveState *in ){
SolveState work;
...

if( level == COOKIE_LIMIT ) longjmp( found, 1 );

... ignoring the "optimization" step of choosing the best cookie
...

for each box B in in->cookies[level] {
remember box B for the cookie in->cookies[level], in *answer
copy in->cookies[ level+1 .. COOKIE_LEVEL ) to work
remove B from each work.cookies[ level+1 .. COOKIE_LEVEL )
look( out, found, level+1, &work );
}

return false;
}
/* Note that 'true' is never returned by this function. */
/* If there are more boxes to try for the current cookie, recurse. */
/* When there are no more boxes to try, return false. */
/* If/when all of the cookies have been assigned, longjmp() */
/* Doing a longjmp() will cause seek() to return true. */

The notation ...cookies[ level+1 .. COOKIE_LEVEL ) is an implied
for() loop.

Now for your questions...

What exactly do we do after our fist guess failed?

If you mean the first box for the current cookie, go on to the
next box for that cookie.

If you mean all the boxes for the current cookie failed, simply
return.

If you mean all the boxes for the cookie on the first level, simply
return, which will cause an outer call to fail irrevocably. Under
the conditions of the problem, this never happens, if the code has
been written correctly.

Supposedly continue to the next sort of cookies that has the same #
of boxes as the one we just tried.
But what we do after that? Do we have to try cookies with higher
number of boxes?

No. At each level there is exactly one cookie to try, and we try
each of the boxes that cookie might be taken from (at this point in
the search); if none of those boxes work, backtrack -- which is to
say, return to the next outer level.

If yes, does it apply in case when cookies that we tried before had
just one box?

At each level, we are never concerned with the cookie choices that
were made at previous levels (which means a smaller level index)
as those choices are "frozen" for this part of the search.

I hope these answers clear up what you're looking for.

Just now I paid attention that when posting last night I made a
mistake in cut&past. It result in the post that contained the source
code but missed text that explains what it's all about.
So, second try.

Could the code below be considered an implementation of your algorithm?
It uses different programming technique:
- iteration, goto and explicit stack instead of recursion
- loop instead of longjmp
- popcount instead of maintaining 'count of boxes' field
- free_sorts bit mask instead of list of sorts of cookies indexed by
levels (later on I realized that this modification was unnecessary)
I coded it in such manner because as the next step I am planning to add instrumentation that would be easier in iterative code than in the
recursive one.
It seems to me that resulting search order is either exactly or at
least approximately the same as in your description.

#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <limits.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint64_t boxes_of_sorts[N_BOXES] = {0};
uint64_t sorts_of_boxes[N_BOXES];
uint8_t b_idx[N_BOXES][N_BOXES];
// build indices
for (int bi = 0; bi < N_BOXES; ++bi) {
uint64_t sorts = 0;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t sort = boxes.b[bi][ci];
sorts |= (uint64_t)1 << sort;
b_idx[bi][sort] = ci;
boxes_of_sorts[sort] |= (uint64_t)1 << bi;
}
sorts_of_boxes[bi] = sorts;
}

peek_t result={0};
uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
struct stack_node {
uint64_t free_boxes, free_sorts, boxes;
int sel_sort;
} stack[N_BOXES];
int level = 0;
do {
// find free sort that remains in the smallest # of free boxes
uint64_t msk = free_sorts;
int nb_min = INT_MAX;
int sel_sort = N_BOXES;
while (msk) {
int sort = __builtin_ctzll(msk);
uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
if (boxes) {
int nb = __builtin_popcountll(boxes);
if (nb_min > nb) {
nb_min = nb;
sel_sort = sort;
if (nb == 1)
break;
}
}
msk ^= (uint64_t)1 << sort;
}

if (sel_sort == N_BOXES)
goto pop;

// candidate sort found
uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
back:;
int bi = __builtin_ctzll(boxes);
result.b[bi] = b_idx[bi][sel_sort];
boxes ^= (uint64_t)1 << bi;
// preserve state
stack[level] = (struct stack_node){
.free_boxes=free_boxes, .free_sorts=free_sorts,
.boxes=boxes, .sel_sort = sel_sort};
// go to the next level
free_boxes ^= (uint64_t)1 << bi;
free_sorts ^= (uint64_t)1 << sel_sort;
++level;
continue;

// return to previous level
pop:
while (level > 0) {
--level;
boxes = stack[level].boxes;
if (boxes) {
free_boxes = stack[level].free_boxes;
free_sorts = stack[level].free_sorts;
sel_sort = stack[level].sel_sort;
goto back;
}
}
// should never come here
break;
} while (level < N_BOXES);
return result;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Wed Apr 15 02:15:26 2026

From Newsgroup: comp.lang.c

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

After that, I talked to a friend to try a different approach to
producing a solution. After some light editing by myself -- mostly
just formatting and some name changes -- the code below was put into
the hopper. (Note: I claim no credit for code below. I did some
editing to make it more readable but beyond that none of it is a
result of my efforts.)

#include <stdint.h>
#include "solver.h"

static uint32_t adjacent[ N_BOXES ];
static int cookie_of_box[ N_BOXES ];
static int box_of_cookie[ N_BOXES ];
static uint32_t seen;

static int
search( int b ){
uint32_t m = adjacent[b];
while( m ){
uint32_t bit = m & -m;
m ^= bit;
int c = __builtin_ctz( bit );
if( seen & 1u<<c ) continue;
seen |= 1u<<c;
if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
) ){ box_of_cookie[c] = b;
cookie_of_box[b] = c;
return 1;
}
}
return 0;
}

peek_t
solver( boxes_t boxes ){
peek_t res;

for( int i = 0; i < N_BOXES; i++ ){
uint32_t m = 0;
for( int k = 0; k < BOX_SIZE; k++ ){
m |= 1u << boxes.b[i][k];
}
adjacent[i] = m;
cookie_of_box[i] = -1;
box_of_cookie[i] = -1;
}

for( int i = 0; i < N_BOXES; i++ ){
seen = 0;
search( i );
}

for( int i = 0; i < N_BOXES; i++ ){
int c = cookie_of_box[i];
for( int k = 0; k < BOX_SIZE; k++ ){
if( boxes.b[i][k] == c ){
res.b[i] = k;
break;
}
}
}
return res;
}

Despite being derived independently, this code uses the same sort of
approach that I had used, except my code was recursive rather than
iterative.

I finally looked closer to the code of your friend.
To me, it does not look at all similar to description of "your"
algorithm in your other post.
It appears more similar to my 2nd algorithm, but sort of orthogonal to
it - my outermost loop iterate through sorts of cookies; in the code of
your friend outermost loop iterate through boxes.

Another difference, less important for robustness, more important for
measured speed, esp. on avearge, is when search decides to revert
previous selection. I am doing it only after all possibilities to
select at the top level failed. Your friend makes no such effort and
easily dives deep. In principle, both approaches are equivalent, but in physical world recursive call is significantly slower than iteration of
simple loop and excessive updates of state variables are also slower
then lookups without updates.

And, of course, apart from difference in algorithm there is a
significant difference in implementation technique. Your friend took
advantage of the fact that the challenge was specified for relatively
small number of boxes and intensely used bit masks. I [didn't took
similar advantage [except for using byte variables instead of short or
int] and used arrays. Of course, the technique applied by your friend
is faster. His solution ended up ~2 times slower then mine not because
of technique, but because of algorithmic difference mentioned in
previous paragraph.

Today I modified his code, applying variant of the same algorithmic
change. At the same opportunity I removed use of static data and
increased size of bit masks to 64 bits. As expected, resulting code was
very fast, ~3 times faster than my own.

Here is a code:
// Modification of code of friend of Tim Rentsch
#include <stdint.h>
#include "solver.h"

typedef struct {
uint64_t cookies_in_box[ N_BOXES ]; // helper index, constant after
init uint64_t selected;
uint8_t box_of_cookie [ N_BOXES ]; // valid for selected sorts
} state_t;

// return 0 on success, seen mask on failure
static uint64_t search(state_t* s, int b, uint64_t seen)
{
uint64_t m = s->cookies_in_box[b] & ~seen;
if (m & ~s->selected) {
int c = __builtin_ctzll(m & ~s->selected);
s->box_of_cookie[c] = b;
s->selected |= 1ull << c;
return 0; // success
}
while( m ){
int c = __builtin_ctz(m);
seen = search(s, s->box_of_cookie[c], seen | (1ull << c));
if (seen == 0) {
s->box_of_cookie[c] = b;
return 0; // success
}
m &= ~seen;
}
return seen;
}

peek_t
solver( boxes_t boxes ){
state_t s;
for (int i = 0; i < N_BOXES; i++) {
uint64_t m = 0;
for (int k = 0; k < BOX_SIZE; k++)
m |= 1ull << boxes.b[i][k];
s.cookies_in_box[i] = m;
}

s.selected = 0;
for( int i = 0; i < N_BOXES; i++ )
search(&s, i, 0);

// translate result from box_of_cookie[] to index of cookie in box
peek_t res;
for (int c = 0; c < N_BOXES; ++c) {
int b = s.box_of_cookie[c];
for( int k = 0; k < BOX_SIZE; k++ ){
if (boxes.b[b][k] == c) {
res.b[b] = k;
break;
}
}
}
return res;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 02:57:32 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

After that, I talked to a friend to try a different approach to
producing a solution. After some light editing by myself -- mostly
just formatting and some name changes -- the code below was put into
the hopper. (Note: I claim no credit for code below. I did some
editing to make it more readable but beyond that none of it is a
result of my efforts.)

#include <stdint.h>
#include "solver.h"

static uint32_t adjacent[ N_BOXES ];
static int cookie_of_box[ N_BOXES ];
static int box_of_cookie[ N_BOXES ];
static uint32_t seen;

static int
search( int b ){
uint32_t m = adjacent[b];
while( m ){
uint32_t bit = m & -m;
m ^= bit;
int c = __builtin_ctz( bit );
if( seen & 1u<<c ) continue;
seen |= 1u<<c;
if( box_of_cookie[c] == -1 || search( box_of_cookie[c]
) ){ box_of_cookie[c] = b;
cookie_of_box[b] = c;
return 1;
}
}
return 0;
}

peek_t
solver( boxes_t boxes ){
peek_t res;

for( int i = 0; i < N_BOXES; i++ ){
uint32_t m = 0;
for( int k = 0; k < BOX_SIZE; k++ ){
m |= 1u << boxes.b[i][k];
}
adjacent[i] = m;
cookie_of_box[i] = -1;
box_of_cookie[i] = -1;
}

for( int i = 0; i < N_BOXES; i++ ){
seen = 0;
search( i );
}

for( int i = 0; i < N_BOXES; i++ ){
int c = cookie_of_box[i];
for( int k = 0; k < BOX_SIZE; k++ ){
if( boxes.b[i][k] == c ){
res.b[i] = k;
break;
}
}
}
return res;
}

Despite being derived independently, this code uses the same sort of
approach that I had used, except my code was recursive rather than
iterative.

I finally looked closer to the code of your friend.
To me, it does not look at all similar to description of "your"
algorithm in your other post.
It appears more similar to my 2nd algorithm, but sort of orthogonal to
it - my outermost loop iterate through sorts of cookies; in the code of
your friend outermost loop iterate through boxes.

I confess I didn't look closely at the algorithm but relied
on comments about how the code worked. I tried to express
this in my earlier posting when I said "the same sort of
approach" rather than "the same approach". Sorry for any
confusion.

Another difference, less important for robustness, more important for measured speed, esp. on avearge, is when search decides to revert
previous selection. I am doing it only after all possibilities to
select at the top level failed. Your friend makes no such effort and
easily dives deep. In principle, both approaches are equivalent, but
in physical world recursive call is significantly slower than
iteration of simple loop and excessive updates of state variables are
also slower then lookups without updates.

And, of course, apart from difference in algorithm there is a
significant difference in implementation technique. Your friend took advantage of the fact that the challenge was specified for relatively
small number of boxes and intensely used bit masks. I [didn't took
similar advantage [except for using byte variables instead of short
or int] and used arrays. Of course, the technique applied by your
friend is faster. His solution ended up ~2 times slower then mine
not because of technique, but because of algorithmic difference
mentioned in previous paragraph.

Today I modified his code, applying variant of the same algorithmic
change. At the same opportunity I removed use of static data and
increased size of bit masks to 64 bits. As expected, resulting code
was very fast, ~3 times faster than my own. [...code...]

I'm happy to hear the posting provided an opportunity to more deeply
explore the problem. Also it's interesting to learn the results of
your investigations; thank you for those.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 03:22:00 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

[...]

After seeing your more recent posting I decided to focus on
that and let this one slide by.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Thu Apr 16 23:52:14 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Sat, 11 Apr 2026 15:57:23 -0700
Tim Rentsch <[email protected]> wrote:

[..some earlier back and forth..]

Just now I paid attention that when posting last night I made a
mistake in cut&past. It result in the post that contained the
source code but missed text that explains what it's all about.
So, second try.

Could the code below be considered an implementation of your
algorithm?

I'm not sure. I think it's close, but I'm having trouble being
more definite than that.

It uses different programming technique:
- iteration, goto and explicit stack instead of recursion

I got that.

- loop instead of longjmp

I got that too.

- popcount instead of maintaining 'count of boxes' field

I see that. Interesting choice.

- free_sorts bit mask instead of list of sorts of cookies
indexed by levels (later on I realized that this modification
was unnecessary)

I see that you did this but I was confused about how it worked.

I coded it in such manner because as the next step I am planning
to add instrumentation that would be easier in iterative code than
in the recursive one.

Yes, and surely there are other reasons to want an iterative version
rather than a recursive one.

It seems to me that resulting search order is either exactly or at
least approximately the same as in your description.

Exactly or at least approximately -- I'll go for that. :)

#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <limits.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint64_t boxes_of_sorts[N_BOXES] = {0};
uint64_t sorts_of_boxes[N_BOXES];
uint8_t b_idx[N_BOXES][N_BOXES];
// build indices
for (int bi = 0; bi < N_BOXES; ++bi) {
uint64_t sorts = 0;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t sort = boxes.b[bi][ci];
sorts |= (uint64_t)1 << sort;
b_idx[bi][sort] = ci;
boxes_of_sorts[sort] |= (uint64_t)1 << bi;
}
sorts_of_boxes[bi] = sorts;
}

peek_t result={0};
uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
struct stack_node {
uint64_t free_boxes, free_sorts, boxes;
int sel_sort;
} stack[N_BOXES];
int level = 0;
do {
// find free sort that remains in the smallest # of free boxes
uint64_t msk = free_sorts;
int nb_min = INT_MAX;
int sel_sort = N_BOXES;
while (msk) {
int sort = __builtin_ctzll(msk);
uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
if (boxes) {
int nb = __builtin_popcountll(boxes);
if (nb_min > nb) {
nb_min = nb;
sel_sort = sort;
if (nb == 1)
break;
}
}
msk ^= (uint64_t)1 << sort;
}

if (sel_sort == N_BOXES)
goto pop;

// candidate sort found
uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
back:;
int bi = __builtin_ctzll(boxes);
result.b[bi] = b_idx[bi][sel_sort];
boxes ^= (uint64_t)1 << bi;
// preserve state
stack[level] = (struct stack_node){
.free_boxes=free_boxes, .free_sorts=free_sorts,
.boxes=boxes, .sel_sort = sel_sort};
// go to the next level
free_boxes ^= (uint64_t)1 << bi;
free_sorts ^= (uint64_t)1 << sel_sort;
++level;
continue;

// return to previous level
pop:
while (level > 0) {
--level;
boxes = stack[level].boxes;
if (boxes) {
free_boxes = stack[level].free_boxes;
free_sorts = stack[level].free_sorts;
sel_sort = stack[level].sel_sort;
goto back;
}
}
// should never come here
break;
} while (level < N_BOXES);
return result;
}

I'm leaving the code in even though I don't have a lot to say
about it. Trying to understand it I kept getting an internal
"complexity threshold exceeded" exception. Finally I decided I
would just try to rewrite it, preserving the spirit although not
all of the details. My code ended up being more lines of source
than yours although the core routine is shorter. There are three
parts that I broke out into separate functions (not shown). Those
parts are, one, a function 'bsets_from_boxes()' to turn a boxes_t
into a collection of bitsets holding the boxes corresponding to
each cookie type; two, a function 'best_next_ctype()' that
chooses which cookie type to consider next; and three, a final
function 'answer()' to turn the 'box_chosen[]' bitset array into
the representation needed by the interface. Here is the central
routine:

peek_t
solver( boxes_t box_contents ){
Bset boxes_free[ COOKIE_LIMIT + 1 ];
Ctype ctypes[ COOKIE_LIMIT ];
Bset boxes_to_try[ COOKIE_LIMIT ];
Bset box_chosen[ COOKIE_LIMIT ];

Bsets bsets = bsets_from_boxes( &box_contents );
Index depth = 0;
Bset free = boxes_free[0] = ~((Bset){-1} << COOKIE_LIMIT-1 << 1);

for( Index i = 0; i < LIMIT_OF( ctypes ); i++ ) ctypes[i] = i;

do {
Ctype ctype = best_next_ctype( depth, free, &bsets, ctypes );
Bset chosen;

boxes_to_try[depth] = bsets.bset_of_ctype[ ctype ] & free;

while(
chosen = boxes_to_try[depth],
boxes_to_try[depth] ^= chosen ^= chosen & chosen-1,
!chosen
){
assert( depth > 0 ), --depth;
free = boxes_free[ depth ];
ctype = ctypes[ depth ];
}

box_chosen[ ctype ] = chosen;
assert( free & chosen );
boxes_free[ depth+1 ] = free = free ^ chosen;

} while( ++depth < COOKIE_LIMIT );

return answer( box_chosen, ctypes, &box_contents );
}

Two further comments.

One, by keeping the box_chosen[] array as bit masks, the use of
the __builtin bit function gets deferred and so is done only once
for each output slot, at the end (in the 'answer()' function).

Two, the function 'best_next_ctype()' manipulates the ctypes[]
array as well as returning the cookie type at the appropriate
depth in the array. I tried different policies for when to
select the "best" cookie type, including "always", "never", and
"only for the first N levels", with several choices of N.
Different policies produced different timing values but there
wasn't any obvious relation between them. Something that
occurred to me only later is a policy "choose the best out of the
next K cookie types", for different values of K.

The main thing is I hope the above routine does a better job of
conveying my thoughts and understandings.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 17 14:44:10 2026

From Newsgroup: comp.lang.c

On Thu, 16 Apr 2026 23:52:14 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Sat, 11 Apr 2026 15:57:23 -0700
Tim Rentsch <[email protected]> wrote:

[..some earlier back and forth..]

Just now I paid attention that when posting last night I made a
mistake in cut&past. It result in the post that contained the
source code but missed text that explains what it's all about.
So, second try.

Could the code below be considered an implementation of your
algorithm?

I'm not sure. I think it's close, but I'm having trouble being
more definite than that.

It uses different programming technique:
- iteration, goto and explicit stack instead of recursion

I got that.

- loop instead of longjmp

I got that too.

- popcount instead of maintaining 'count of boxes' field

I see that. Interesting choice.

- free_sorts bit mask instead of list of sorts of cookies
indexed by levels (later on I realized that this modification
was unnecessary)

I see that you did this but I was confused about how it worked.

I coded it in such manner because as the next step I am planning
to add instrumentation that would be easier in iterative code than
in the recursive one.

Yes, and surely there are other reasons to want an iterative version
rather than a recursive one.

It seems to me that resulting search order is either exactly or at
least approximately the same as in your description.

Exactly or at least approximately -- I'll go for that. :)

#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <limits.h>

#include "solver.h"

peek_t solver(boxes_t boxes)
{
uint64_t boxes_of_sorts[N_BOXES] = {0};
uint64_t sorts_of_boxes[N_BOXES];
uint8_t b_idx[N_BOXES][N_BOXES];
// build indices
for (int bi = 0; bi < N_BOXES; ++bi) {
uint64_t sorts = 0;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
uint8_t sort = boxes.b[bi][ci];
sorts |= (uint64_t)1 << sort;
b_idx[bi][sort] = ci;
boxes_of_sorts[sort] |= (uint64_t)1 << bi;
}
sorts_of_boxes[bi] = sorts;
}

peek_t result={0};
uint64_t free_boxes = (uint64_t)-1 >> (64-N_BOXES);
uint64_t free_sorts = (uint64_t)-1 >> (64-N_BOXES);
struct stack_node {
uint64_t free_boxes, free_sorts, boxes;
int sel_sort;
} stack[N_BOXES];
int level = 0;
do {
// find free sort that remains in the smallest # of free boxes
uint64_t msk = free_sorts;
int nb_min = INT_MAX;
int sel_sort = N_BOXES;
while (msk) {
int sort = __builtin_ctzll(msk);
uint64_t boxes = boxes_of_sorts[sort] & free_boxes;
if (boxes) {
int nb = __builtin_popcountll(boxes);
if (nb_min > nb) {
nb_min = nb;
sel_sort = sort;
if (nb == 1)
break;
}
}
msk ^= (uint64_t)1 << sort;
}

if (sel_sort == N_BOXES)
goto pop;

// candidate sort found
uint64_t boxes = boxes_of_sorts[sel_sort] & free_boxes;
back:;
int bi = __builtin_ctzll(boxes);
result.b[bi] = b_idx[bi][sel_sort];
boxes ^= (uint64_t)1 << bi;
// preserve state
stack[level] = (struct stack_node){
.free_boxes=free_boxes, .free_sorts=free_sorts,
.boxes=boxes, .sel_sort = sel_sort};
// go to the next level
free_boxes ^= (uint64_t)1 << bi;
free_sorts ^= (uint64_t)1 << sel_sort;
++level;
continue;

// return to previous level
pop:
while (level > 0) {
--level;
boxes = stack[level].boxes;
if (boxes) {
free_boxes = stack[level].free_boxes;
free_sorts = stack[level].free_sorts;
sel_sort = stack[level].sel_sort;
goto back;
}
}
// should never come here
break;
} while (level < N_BOXES);
return result;
}

I'm leaving the code in even though I don't have a lot to say
about it. Trying to understand it I kept getting an internal
"complexity threshold exceeded" exception. Finally I decided I
would just try to rewrite it, preserving the spirit although not
all of the details. My code ended up being more lines of source
than yours although the core routine is shorter. There are three
parts that I broke out into separate functions (not shown). Those
parts are, one, a function 'bsets_from_boxes()' to turn a boxes_t
into a collection of bitsets holding the boxes corresponding to
each cookie type; two, a function 'best_next_ctype()' that
chooses which cookie type to consider next; and three, a final
function 'answer()' to turn the 'box_chosen[]' bitset array into
the representation needed by the interface. Here is the central
routine:

peek_t
solver( boxes_t box_contents ){
Bset boxes_free[ COOKIE_LIMIT + 1 ];
Ctype ctypes[ COOKIE_LIMIT ];
Bset boxes_to_try[ COOKIE_LIMIT ];
Bset box_chosen[ COOKIE_LIMIT ];

Bsets bsets = bsets_from_boxes( &box_contents );
Index depth = 0;
Bset free = boxes_free[0] = ~((Bset){-1} <<
COOKIE_LIMIT-1 << 1);
for( Index i = 0; i < LIMIT_OF( ctypes ); i++ ) ctypes[i] = i;

do {
Ctype ctype = best_next_ctype( depth, free, &bsets, ctypes );
Bset chosen;

boxes_to_try[depth] = bsets.bset_of_ctype[ ctype ] & free;

while(
chosen = boxes_to_try[depth],
boxes_to_try[depth] ^= chosen ^= chosen & chosen-1,
!chosen
){
assert( depth > 0 ), --depth;
free = boxes_free[ depth ];
ctype = ctypes[ depth ];
}

box_chosen[ ctype ] = chosen;
assert( free & chosen );
boxes_free[ depth+1 ] = free = free ^ chosen;

} while( ++depth < COOKIE_LIMIT );

return answer( box_chosen, ctypes, &box_contents );
}

Two further comments.

One, by keeping the box_chosen[] array as bit masks, the use of
the __builtin bit function gets deferred and so is done only once
for each output slot, at the end (in the 'answer()' function).

Two, the function 'best_next_ctype()' manipulates the ctypes[]
array as well as returning the cookie type at the appropriate
depth in the array. I tried different policies for when to
select the "best" cookie type, including "always", "never", and
"only for the first N levels", with several choices of N.
Different policies produced different timing values but there
wasn't any obvious relation between them. Something that
occurred to me only later is a policy "choose the best out of the
next K cookie types", for different values of K.

I'd guess that every regular c.l.c reader knows that you like guessing
games. Even if I don't think that it is appropriate here, up to the
certain limit, I am willing to participate.
It's pretty easy to guess that the code above preceded by:

#include <assert.h>
#include "solver.h"

typedef uint64_t Bset;
typedef uint8_t Ctype;
typedef size_t Index;
enum { COOKIE_LIMIT = N_BOXES };
typedef struct {
Bset bset_of_ctype[COOKIE_LIMIT];
// more fields? I can't tell
} Bsets;
#define LIMIT_OF(x) (sizeof(x)/sizeof(x[0]))

static Bsets bsets_from_boxes(const boxes_t* );
static Ctype best_next_ctype(Index, Bset, Bsets*, Ctype* );
static peek_t answer(const Bset box_chosen[], const Ctype ctypes[],
const boxes_t* boxes);

It's also easy to guess that bsets_from_boxes contains following code:

static Bsets bsets_from_boxes(const boxes_t* boxes)
{
Bsets ret={0};
for (Index bi = 0; bi < N_BOXES; ++bi) {
for (int ci = 0; ci < BOX_SIZE; ++ci)
ret.bset_of_ctype[boxes->b[bi][ci]] |= (Bset)1 << bi;
}
return ret;
}

But quite possibly there is more code.

I am far less certain about answer(). In particular, I have no idea why
it needs its 2nd parameter. My guess is that it's something like that:

static
peek_t answer(const Bset box_chosen[], const Ctype unused[],
const boxes_t* boxes)
{
peek_t ret;
for (int i = 0; i < COOKIE_LIMIT; ++i) {
int bi = __builtin_ctzll(box_chosen[i]);
for (int ci = 0; ; ++ci)
if (boxes->b[bi][ci] == i) {
ret.b[bi] = ci;
break;
}
}
return ret;
}

However I both can not and don't want to guess the content of your
cores routine best_next_ctype(). And without it I don't have much to
say about you solution.

The main thing is I hope the above routine does a better job of
conveying my thoughts and understandings.

It looks like unlike your friend you missed the key - equity of number
of boxes and number of sorts of cookies is not accidental! It's the
most important thing in the whole exercise. It's what make solution
feasible for much bigger values of N_BOXES, probably up to several K
in less than 1 sec.
But, then again, without source code of best_next_ctype() I can not be
sure about it.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 17 17:04:44 2026

From Newsgroup: comp.lang.c

On Thu, 16 Apr 2026 02:57:32 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 08 Apr 2026 08:42:11 -0700
Tim Rentsch <[email protected]> wrote:

After that, I talked to a friend to try a different approach to
producing a solution. After some light editing by myself -- mostly
just formatting and some name changes -- the code below was put
into the hopper. (Note: I claim no credit for code below. I did
some editing to make it more readable but beyond that none of it
is a result of my efforts.)

#include <stdint.h>
#include "solver.h"

static uint32_t adjacent[ N_BOXES ];
static int cookie_of_box[ N_BOXES ];
static int box_of_cookie[ N_BOXES ];
static uint32_t seen;

static int
search( int b ){
uint32_t m = adjacent[b];
while( m ){
uint32_t bit = m & -m;
m ^= bit;
int c = __builtin_ctz( bit );
if( seen & 1u<<c ) continue;
seen |= 1u<<c;
if( box_of_cookie[c] == -1 || search(
box_of_cookie[c] ) ){ box_of_cookie[c] = b;
cookie_of_box[b] = c;
return 1;
}
}
return 0;
}

peek_t
solver( boxes_t boxes ){
peek_t res;

for( int i = 0; i < N_BOXES; i++ ){
uint32_t m = 0;
for( int k = 0; k < BOX_SIZE; k++ ){
m |= 1u << boxes.b[i][k];
}
adjacent[i] = m;
cookie_of_box[i] = -1;
box_of_cookie[i] = -1;
}

for( int i = 0; i < N_BOXES; i++ ){
seen = 0;
search( i );
}

for( int i = 0; i < N_BOXES; i++ ){
int c = cookie_of_box[i];
for( int k = 0; k < BOX_SIZE; k++ ){
if( boxes.b[i][k] == c ){
res.b[i] = k;
break;
}
}
}
return res;
}

Despite being derived independently, this code uses the same sort
of approach that I had used, except my code was recursive rather
than iterative.

I finally looked closer to the code of your friend.
To me, it does not look at all similar to description of "your"
algorithm in your other post.
It appears more similar to my 2nd algorithm, but sort of orthogonal
to it - my outermost loop iterate through sorts of cookies; in the
code of your friend outermost loop iterate through boxes.

I confess I didn't look closely at the algorithm but relied
on comments about how the code worked. I tried to express
this in my earlier posting when I said "the same sort of
approach" rather than "the same approach". Sorry for any
confusion.

Another difference, less important for robustness, more important
for measured speed, esp. on avearge, is when search decides to
revert previous selection. I am doing it only after all
possibilities to select at the top level failed. Your friend makes
no such effort and easily dives deep. In principle, both
approaches are equivalent, but in physical world recursive call is significantly slower than iteration of simple loop and excessive
updates of state variables are also slower then lookups without
updates.

And, of course, apart from difference in algorithm there is a
significant difference in implementation technique. Your friend
took advantage of the fact that the challenge was specified for
relatively small number of boxes and intensely used bit masks. I
[didn't took similar advantage [except for using byte variables
instead of short or int] and used arrays. Of course, the technique
applied by your friend is faster. His solution ended up ~2 times
slower then mine not because of technique, but because of
algorithmic difference mentioned in previous paragraph.

Today I modified his code, applying variant of the same algorithmic
change. At the same opportunity I removed use of static data and
increased size of bit masks to 64 bits. As expected, resulting code
was very fast, ~3 times faster than my own. [...code...]

I'm happy to hear the posting provided an opportunity to more deeply
explore the problem. Also it's interesting to learn the results of
your investigations; thank you for those.

I was aware of possibility of taking advantage of the fact that value
of N_BOXES specified in the challenge is <= than dominant width of
machine word.
I intentionally avoided it until this point, for two reasons, one
abstract (A) and one practical (P).
A. the challenge was meant to be algorithmic with lesser emphasis on implementation
P. I planned (and still plan) experimental testing of BigO

But I couldn't resist the appeal of fast :(

The achieved 3x speed up was actually disappointing. I expected more.
It seems, by now building helper indices takes more time than solver
itself, which makes further progress in this part pointless.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Fri Apr 17 07:40:13 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Thu, 16 Apr 2026 23:52:14 -0700
Tim Rentsch <[email protected]> wrote:

[...]

I'd guess that every regular c.l.c reader knows that you like guessing
games. Even if I don't think that it is appropriate here, up to the
certain limit, I am willing to participate.

I posted what I did because I wanted the focus to be on reading the
code and not on running the code. I was hoping that understanding
would be feasible without needing any guessing.

It's pretty easy to guess that the code above preceded by:

[...]

Looks reasonable.

However I both can not and don't want to guess the content of your
cores routine best_next_ctype(). And without it I don't have much to
say about you solution.

Here is one implementation of best_next_ctype():

Ctype
best_next_ctype( Index depth, Bset free, Bsets *bsets, Ctype ctypes[] ){
// permute?
return ctypes[depth];
}

At the "permute?" comment an implementation could perform any
permutation of the elements of the ctypes[] array, within the range
of ctypes[depth] and ctypes[COOKIE_LIMIT-1], and the result should
still be a working solution. Or just leave it as a comment. The
correctness of the code doesn't depend on what permutation is done
or how the choice is made.

The main thing is I hope the above routine does a better job of
conveying my thoughts and understandings.

It looks like unlike your friend you missed the key - equity of number
of boxes and number of sorts of cookies is not accidental!

The way the code is written it allows the value of COOKIE_LIMIT to
be chosen independently of the value of N_BOXES (or at least I hope
it does). I didn't explore any such cases but it might be
interesting to do that.

It's the
most important thing in the whole exercise. It's what make solution
feasible for much bigger values of N_BOXES, probably up to several K
in less than 1 sec.

Of course for the code posted the value of N_BOXES must be no more
than the number of bits in the bitset type Bset (which is meant to
be short for "Box set", or a set of box values).

But, then again, without source code of best_next_ctype() I can not be
sure about it.

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do the
same thing as your code? I was trying to match the behavior of your
code but I wasn't able to figure out how it decided where to go
next.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Fri Apr 17 18:30:17 2026

From Newsgroup: comp.lang.c

On Fri, 17 Apr 2026 07:40:13 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 16 Apr 2026 23:52:14 -0700
Tim Rentsch <[email protected]> wrote:

[...]

I'd guess that every regular c.l.c reader knows that you like
guessing games. Even if I don't think that it is appropriate here,
up to the certain limit, I am willing to participate.

I posted what I did because I wanted the focus to be on reading the
code and not on running the code. I was hoping that understanding
would be feasible without needing any guessing.

It's pretty easy to guess that the code above preceded by:

[...]

Looks reasonable.

However I both can not and don't want to guess the content of your
cores routine best_next_ctype(). And without it I don't have much
to say about you solution.

Here is one implementation of best_next_ctype():

Ctype
best_next_ctype( Index depth, Bset free, Bsets *bsets, Ctype ctypes[]
){ // permute?
return ctypes[depth];
}

At the "permute?" comment an implementation could perform any
permutation of the elements of the ctypes[] array, within the range
of ctypes[depth] and ctypes[COOKIE_LIMIT-1], and the result should
still be a working solution. Or just leave it as a comment. The
correctness of the code doesn't depend on what permutation is done
or how the choice is made.

The main thing is I hope the above routine does a better job of
conveying my thoughts and understandings.

It looks like unlike your friend you missed the key - equity of
number of boxes and number of sorts of cookies is not accidental!

The way the code is written it allows the value of COOKIE_LIMIT to
be chosen independently of the value of N_BOXES (or at least I hope
it does). I didn't explore any such cases but it might be
interesting to do that.

It's the
most important thing in the whole exercise. It's what make solution feasible for much bigger values of N_BOXES, probably up to several K
in less than 1 sec.

Of course for the code posted the value of N_BOXES must be no more
than the number of bits in the bitset type Bset (which is meant to
be short for "Box set", or a set of box values).

But, then again, without source code of best_next_ctype() I can not
be sure about it.

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do the
same thing as your code? I was trying to match the behavior of your
code but I wasn't able to figure out how it decided where to go
next.

No, as it is, it does not match the behavior of my code or of code of
your friend. One very simple, but very important element of search state
is missing.
I am reasonably sure that it can not be emulated by permutation of
ctypes array within best_next_ctype() routine.

As to observed timing, if your code is run for big number of cases, e.g. rep1=128, rep2=9999 (small modification of test bench required to enable
bigger rep2), it sometimes ends up very slow. My test bench is not well
suited to find exactly how slow, but my guess is above 0.1 sec. That's
very different from mine or your friends, where difference between
median and worst case is small, likeley less than 3x.

More convenient test bench that reports PRNG and can take PRNG seed as
input:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>

#include "solver.h"

static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
uint64_t* prngs, uint64_t seed);
int main(int argz, char** argv)
{
int rep1=128, rep2=11;
unsigned long long seed = 1;
if (argz > 1) {
if (strcmp(argv[1], "?")== 0 ||
strcmp(argv[1], "-?")== 0) {
fprintf(stderr,
"Usage:\n"
"greg_solver_tb [rep1 [rep2]] -s=seed\n"
"where\n"
"rep1 - number of solver calls in one time measurement set.
Default 128\n"
"rep2 - number of repetitions of time measurement. Default 11\n"
"seed - PRNG seed\n"
);
return 0;
}

static const int mx[2] = { 1e7, 10000 };
for (int arg_i = 1, val_i = 0; arg_i < 4 && arg_i < argz; ++arg_i) {
char* arg = argv[arg_i];
if (strncmp(arg, "-s=", 3)== 0) {
char* seedstr = arg+3;
char* endp;
seed = strtoull(seedstr, &endp, 0);
if (endp == seedstr) {
fprintf(stderr, "seed argument '%s' is not a number\n",
seedstr);
return 1;
}
} else {
char* endp;
long long n = strtoll(arg, &endp, 0);
if (endp == arg) {
fprintf(stderr, "argument '%s' is not a number\n", arg);
return 1;
}
int n_mx = mx[val_i];
if (n < 1 || n > n_mx) {
fprintf(stderr, "argument '%s' out of range. Please specify
value in range[1:%d]\n"
, arg, n_mx);
return 1;
}
if (val_i==0)
rep1 = n;
else
rep2 = n;
++val_i;
}
}
}

void* mem =
malloc((sizeof(boxes_t)+sizeof(peek_t)+sizeof(uint64_t))*rep1);
if (!mem) {
perror("malloc()");
return 1;
}

uint64_t* prngs = mem;
boxes_t* boxes = (boxes_t*)&prngs[rep1];
peek_t* results = (peek_t*)&boxes[rep1];
int ret = test(boxes, results, rep1, rep2, prngs, seed);
free(mem);

return ret;
}

static uint32_t my_prng(uint64_t* pState)
{
// we don't need very good PRNG,
// but it has to repeatable cross-platform,
// so C RTL rand() is not suitable
const uint64_t PRNG_A = 2862933555777941757ull;
const uint64_t PRNG_C = 20260329ull;
uint64_t s = *pState*PRNG_A + PRNG_C;
*pState = s;
return s >> 32;
}

static void make_random_boxes(boxes_t* dst, uint64_t* prng)
{
uint8_t pool[N_BOXES*BOX_SIZE];
for (int i = 0; i < N_BOXES; ++i)
for (int k = 0; k < BOX_SIZE; ++k)
pool[i*BOX_SIZE+k] = i;

for (int i = 0; i < N_BOXES; ++i) {
for (int k = 0; k < BOX_SIZE; ++k) {
uint32_t rnd = my_prng(prng);
int idx0 = i*BOX_SIZE+k;
int idx1 = ((N_BOXES*BOX_SIZE-idx0)*(uint64_t)rnd >> 32)+idx0;
uint8_t v0 = pool[idx0];
uint8_t v1 = pool[idx1];
dst->b[i][k] = v1;
pool[idx1] = v0;
}
}
}

static bool validate(const boxes_t* bx, const peek_t* res)
{
bool cookies[N_BOXES]={0};
for (int i = 0; i < N_BOXES; ++i) {
unsigned v = res->b[i];
if (v >= BOX_SIZE) {
fprintf(stderr, "res[%d] = %d. Out of range!\n", i, v);
return false;
}
unsigned c = bx->b[i][v];
if (cookies[c]) {
fprintf(stderr, "res[%d] = %d. bx[%d][%d]=%d repeated.\n"
, i, v
, i, v, c);
return false;
}
cookies[c] = true;
}
return true;
}

static int cmp_ll(const void* pa, const void* pb) {
long long a = *(const long long*)pa;
long long b = *(const long long*)pb;
if (a < b) return -1;
if (a > b) return 1;
return 0;
}

static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
uint64_t* prngs, uint64_t seed)
{
uint64_t prng = seed;
long long dt[rep2];
long long mn, mx;
uint64_t mn_prng, mx_prng;
for (int it = 0; it < rep2; ++it) {
for (int i = 0; i < rep1; ++i) {
prngs[i] = prng;
make_random_boxes(&boxes[i], &prng);
}
struct timespec t0;
clock_gettime(CLOCK_MONOTONIC, &t0);
for (int i = 0; i < rep1; ++i)
results[i] = solver(boxes[i]);
struct timespec t1;
clock_gettime(CLOCK_MONOTONIC, &t1);
long long delta_t = (t1.tv_sec-t0.tv_sec)*(long long)1e9+ t1.tv_nsec-t0.tv_nsec;
dt[it] = delta_t;
if (it == 0 || mn > delta_t) {
mn = delta_t;
mn_prng = prngs[0];
}
if (it == 0 || mx < delta_t) {
mx = delta_t;
mx_prng = prngs[0];
}

for (int i = 0; i < rep1; ++i) {
if (!validate(&boxes[i], &results[i])) {
fprintf(stderr,
"Test fail at iteration %d,%d. prng %llu\n"
,it, i, prngs[i]);
fprintf(stderr, "boxes:\n");
for (int bi = 0; bi < N_BOXES; ++bi) {
for (int k = 0; k < BOX_SIZE; ++k)
fprintf(stderr, " %2d", boxes[i].b[bi][k]);
fprintf(stderr, " : %d => %2d\n"
, results[i].b[bi]
, boxes[i].b[bi][results[i].b[bi]]);
}
return 1;
}
}
}

qsort(dt, rep2, sizeof(*dt), cmp_ll);
long long med = dt[rep2/2];
double scale = 1e-6/rep1;
printf("o.k.\nmed=%.6f msec. min=%.6f msec, prng %llu. max=%.6f msec,
prng %llu.\n"
, med*scale
, mn*scale, mn_prng
, mx*scale, mx_prng
);

return 0;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 17:58:56 2026

From Newsgroup: comp.lang.c

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

#include <iostream>
#include <array>
#include <random>
#include <cstdint>
#include <algorithm>
#include 

using namespace std;

int main()
{
using box = array<uint8_t, 10>;
array<box, 20> boxes;
for( size_t bx = 0; bx < boxes.size(); ++bx )
for( uint8_t &cookie : boxes[bx] )
cookie = (uint8_t)bx;
mt19937_64 mt;
uniform_int_distribution<size_t>
rndBox( 0, 19 ),
rndCookie( 1, 9 );
for( box &bx : boxes )
{
for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
swap( ck, boxes[rndBox( mt )][rndCookie( mt )] );
swap( bx[0], boxes[rndBox( mt )][0] );
}
for( box &bx : boxes )
sort( bx.begin(), bx.end() );
}
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 18:08:11 2026

From Newsgroup: comp.lang.c

Now the shuffling is complete:

int main()
{
using box = array<uint8_t, 10>;
array<box, 20> boxes;
for( size_t bx = 0; bx < boxes.size(); ++bx )
for( uint8_t &cookie : boxes[bx] )
cookie = (uint8_t)bx;
mt19937_64 mt;
uniform_int_distribution<size_t>
rndBox( 0, 19 ),
rndCookie19( 1, 9 ),
rndCookie09( 0, 9 );
for( box &bx : boxes )
{
for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
swap( ck, boxes[rndBox( mt )][rndCookie19( mt )] );
for( uint8_t &ck : bx )
swap( ck, bx[rndCookie09( mt )] );
}
}
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bart@[email protected] to comp.lang.c on Fri Apr 17 17:47:48 2026

From Newsgroup: comp.lang.c

On 17/04/2026 16:58, Bonita Montero wrote:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

That's not possible. There are 20 types of cookie, but you can only have
10 per box. Those might all be of the same kind.

(I took an array of 200 cookies (10 times one instance of each) and
randomly shuffled, then stored those in the boxes. Or you can populate
the boxes with an ordered selection then shuffle in situ.)

#include <iostream>
#include <array>
#include <random>
#include <cstdint>
#include <algorithm>
#include 

using namespace std;

int main()
{
 using box = array<uint8_t, 10>;
 array<box, 20> boxes;
 for( size_t bx = 0; bx < boxes.size(); ++bx )
 for( uint8_t &cookie : boxes[bx] )
 cookie = (uint8_t)bx;
 mt19937_64 mt;
 uniform_int_distribution<size_t>
 rndBox( 0, 19 ),
 rndCookie( 1, 9 );
 for( box &bx : boxes )
 {
 for( uint8_t &ck : span( bx.begin() + 1, bx.end() ) )
 swap( ck, boxes[rndBox( mt )][rndCookie( mt )] );
 swap( bx[0], boxes[rndBox( mt )][0] );
 }
 for( box &bx : boxes )
 sort( bx.begin(), bx.end() );
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 19:24:01 2026

From Newsgroup: comp.lang.c

Am 17.04.2026 um 18:47 schrieb Bart:

On 17/04/2026 16:58, Bonita Montero wrote:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

That's not possible. There are 20 types of cookie, but you can only have
10 per box. Those might all be of the same kind.

_At least one_ is possible, but not all kinds.

There was a subtle bug with my latest code.
Here's the correct shuffling:

#include <iostream>
#include <array>
#include <random>
#include <cstdint>
#include <algorithm>
#include 

using namespace std;

int main()
{
constexpr size_t
Boxes = 20,
Cookies = 10;
using box = array<uint8_t, Cookies>;
array<box, Boxes> boxes;
for( size_t flavour = 0; flavour < Boxes; ++flavour )
fill( boxes[flavour].begin(), boxes[flavour].end(), (uint8_t)flavour );
mt19937_64 mt;
uniform_int_distribution<size_t>
rndBox( 0, Boxes - 1 ),
rndInBoxX1( 1, Cookies - 1 ),
rndInBox( 0, Cookies - 1 );
for( box &bx : boxes )
for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
for( box &bx : boxes )
for( uint8_t &cookie : bx )
swap( cookie, bx[rndInBox( mt )] );
}
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 19:52:35 2026

From Newsgroup: comp.lang.c

I guess that works. Coding style is perfect for my taste.

#include <iostream>
#include <array>
#include <random>
#include <cstdint>
#include <algorithm>
#include 
#include <iomanip>

using namespace std;

int main()
{
constexpr size_t
Boxes = 20,
Cookies = 10;
using box = array<uint8_t, Cookies>;
array<box, Boxes> boxes;
for( size_t iFlavour = 0; iFlavour < Boxes; ++iFlavour )
fill( boxes[iFlavour].begin(), boxes[iFlavour].end(), (uint8_t)iFlavour );
mt19937_64 mt( (random_device())() );
uniform_int_distribution<size_t>
rndBox( 0, Boxes - 1 ),
rndInBoxX1( 1, Cookies - 1 ),
rndInBox( 0, Cookies - 1 );
for( box &bx : boxes )
for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
for( box &bx : boxes )
for( uint8_t &cookie : bx )
swap( cookie, bx[rndInBox( mt )] );
for( box &bx : boxes )
swap( bx, boxes[rndBox( mt )] );
for( size_t iBx = 0; iBx < Boxes; ++iBx )
{
auto attrs = []( auto &prnt ) -> auto & { return prnt << setw( 2 ) <<
setfill( ' ' ) << right; };
attrs( cout << "box " ) << (int)iBx << ": ";
box &bx = boxes[iBx];
for( uint8_t cookie : span( bx.begin(), bx.end() - 1 ) )
attrs( cout ) << (int)cookie << ", ";
attrs( cout ) << (int)bx[Cookies - 1] << endl;
}
cout << endl;
for( size_t iFlavour = 0; iFlavour < Boxes; ++iFlavour )
for( size_t iBox = 0; iBox < Boxes; ++iBox )
for( size_t iCookie = 0; iCookie < Cookies; ++iCookie )
if( boxes[iBox][iCookie] == iFlavour )
{
cout << "flavour " << iFlavour << " is at place " << iCookie << "
in box " << iBox << endl;
iCookie = Cookies;
iBox = Boxes;
}
}
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Fri Apr 17 20:17:23 2026

From Newsgroup: comp.lang.c

There was still a bug in the before code: Cookies could be taken
from a box more than once. I guess this code is perfect now:

#include <iostream>
#include <array>
#include <random>
#include <cstdint>
#include <algorithm>
#include 
#include <iomanip>

using namespace std;

int main()
{
constexpr size_t
Flavours = 20,
Cookies = 10;
using box = array<uint8_t, Cookies>;
array<box, Flavours> boxes;
for( size_t iFlavour = 0; iFlavour < Flavours; ++iFlavour )
fill( boxes[iFlavour].begin(), boxes[iFlavour].end(), (uint8_t)iFlavour );
mt19937_64 mt( (random_device())() );
uniform_int_distribution<size_t>
rndBox( 0, Flavours - 1 ),
rndInBoxX1( 1, Cookies - 1 ),
rndInBox( 0, Cookies - 1 );
for( box &bx : boxes )
for( uint8_t &cookie : span( bx.begin() + 1, bx.end() ) )
swap( cookie, boxes[rndBox( mt )][rndInBoxX1( mt )] );
for( box &bx : boxes )
for( uint8_t &cookie : bx )
swap( cookie, bx[rndInBox( mt )] );
for( box &bx : boxes )
swap( bx, boxes[rndBox( mt )] );
for( size_t iBx = 0; iBx < Flavours; ++iBx )
{
auto attrs = []( auto &prnt ) -> auto & { return prnt << setw( 2 ) <<
setfill( ' ' ) << right; };
attrs( cout << "box " ) << (int)iBx << ": ";
box &bx = boxes[iBx];
for( uint8_t cookie : span( bx.begin(), bx.end() - 1 ) )
attrs( cout ) << (int)cookie << ", ";
attrs( cout ) << (int)bx[Cookies - 1] << endl;
}
cout << endl;
array<bool, Flavours> used { false };
for( size_t iFlavour = 0; iFlavour < Flavours; ++iFlavour )
for( size_t iBox = 0; iBox < Flavours; ++iBox )
if( !used[iBox] )
for( size_t iCookie = 0; iCookie < Cookies; ++iCookie )
if( boxes[iBox][iCookie] == iFlavour )
{
cout << "flavour " << iFlavour << " is at place " << iCookie << "
in box " << iBox << endl;
used[iBox] = true;
iCookie = Cookies;
iBox = Flavours;
}
}
--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@[email protected] (Scott Lurndal) to comp.lang.c on Fri Apr 17 19:58:28 2026

From Newsgroup: comp.lang.c

Bonita Montero <[email protected]> writes:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

No, because this is comp.lang.c, not comp.lang.c++.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@[email protected] to comp.lang.c on Sat Apr 18 05:52:46 2026

From Newsgroup: comp.lang.c

Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

Bonita Montero <[email protected]> writes:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

No, because this is comp.lang.c, not comp.lang.c++.

"The solution does not have to be in C, ..."

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bart@[email protected] to comp.lang.c on Sat Apr 18 11:53:30 2026

From Newsgroup: comp.lang.c

On 18/04/2026 04:52, Bonita Montero wrote:

Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

Bonita Montero <[email protected]> writes:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

No, because this is comp.lang.c, not comp.lang.c++.

"The solution does not have to be in C, ..."

OK, here's one in my scripting language:

proc main=
names := splitstring("Martin Hohenberg, ... Andres Cordova", ", ")
for i, x in names do
names[i] := splitstring(x, " ")
end

isort(names, {a, b: convlc(a[$]) > convlc(b[$])})

for i, x in names do
println i, joinstrings(x, " ")
end
end

Its output is shown below. The parsing is not thorough, for examples it expects exactly one comma and one space between names, but it works for
the OP's test data.

If the string data is excluded, my version is 245 characters, compared
with 1589 for your C++. Compilation time is 0.0 seconds.

1 Paul Abbott
2 Neil Anuskiewicz
3 Masa Bando
4 Tim Bando
5 George Brower
6 Kyle Burkholder
7 John Carmack
8 Dale Carstensen
9 Jonathan Cast
10 Christopher Chang
11 Gratiela Chergu
12 William Christensen
13 Michael Ciagala
14 Andres Cordova
15 James Cronin
16 Nodarius Darjania
17 darrin
18 Rod Davenport
19 Chip Davis
20 Killer Delicious
21 Sven Dowideit
22 Steven Evans
23 Cheryl Evry
24 Daniel Garber
25 Mark Gardner
26 Donald Greer
27 Hal Hildebrand
28 Martin Hohenberg
29 David L. Jessup
30 Liudmilla Karukina
31 Ken Kennedy
32 Ken LaCrosse
33 Clemens Ladisch
34 Ron Lauzon
35 Wenfeng Liang
36 Brendan Long
37 Jacob Lyles
38 Jim McCloskey
39 Mordant
40 Mike Nichols
41 Michael Nygard
42 Malcolm Ocean
43 Doug Phillips
44 Mark Ping
45 Matt Pollard
46 ReallyBored
47 Irving Rivas
48 Jan Roudaut
49 Dewey Sasser
50 John Simpson
51 Bill Soistman
52 Hsueh Sung
53 taishi28012
54 Jane Tang
55 Tom Taylor
56 TheFred
57 Paul Theodoropolous
58 Jerod Tufte
59 Les Vogel
60 Xingyu Wang
61 Arnold F. Williams
62 Stan Witherspoon
63 Dave Witten
64 Connor Wood
65 Wojciech Woytniak
66 Jae Yang
67 Άρης Δάσος

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bart@[email protected] to comp.lang.c on Sat Apr 18 12:04:08 2026

From Newsgroup: comp.lang.c

On 18/04/2026 11:53, Bart wrote:

On 18/04/2026 04:52, Bonita Montero wrote:

Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

Bonita Montero <[email protected]> writes:

Is this a correct way to randomize the boxes so that there's at least
one cookie of each kind per box ?

No, because this is comp.lang.c, not comp.lang.c++.

"The solution does not have to be in C, ..."

OK, here's one in my scripting language:

Um, this is for DFS's sorting challenge, not Michael S's cookie puzzle.

DFS I think specified a solution in C.

So both my and your versions aren't allowed.

(I'm not going to do a C version; it's far too much fiddly work.

For a real application, I would look at creating a suite of functions to
help out, but it's not worth doing that here.)

If the string data is excluded, my version is 245 characters, compared
with 1589 for your C++. Compilation time is 0.0 seconds.

That is, the C++ for the name sorting program.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Sat Apr 18 20:35:12 2026

From Newsgroup: comp.lang.c

On Sat, 18 Apr 2026 05:52:46 +0200
Bonita Montero <[email protected]> wrote:

Am 17.04.2026 um 21:58 schrieb Scott Lurndal:

Bonita Montero <[email protected]> writes:

Is this a correct way to randomize the boxes so that there's at
least one cookie of each kind per box ?

No, because this is comp.lang.c, not comp.lang.c++.

"The solution does not have to be in C, ..."

The solution, i.e. implementation of the function solver(), does not
have to be in C. But it has to be pluggable into specified test bench,
which is coded in C.
It does not sound as you're interested in finding solution of the
challenge.
On my part, I am not interested in your attempts to provide an
alternative test bench.

In unlikely case that I am wrong about your intentions, below provided
variant of test bench that can be compiled both as C and as C++.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>

#include "solver.h"

static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
uint64_t* prngs, uint64_t seed);
int main(int argz, char** argv)
{
int rep1=128, rep2=11;
unsigned long long seed = 1;
if (argz > 1) {
if (strcmp(argv[1], "?")== 0 ||
strcmp(argv[1], "-?")== 0) {
fprintf(stderr,
"Usage:\n"
"greg_solver_tb [rep1 [rep2]] -s=seed\n"
"where\n"
"rep1 - number of solver calls in one time measurement set.
Default 128\n" "rep2 - number of repetitions of time measurement.
Default 11\n" "seed - PRNG seed\n"
);
return 0;
}

static const int mx[2] = { (int)1e7, 10000 };
for (int arg_i = 1, val_i = 0; arg_i < 4 && arg_i < argz; ++arg_i) {
char* arg = argv[arg_i];
if (strncmp(arg, "-s=", 3)== 0) {
char* seedstr = arg+3;
char* endp;
seed = strtoull(seedstr, &endp, 0);
if (endp == seedstr) {
fprintf(stderr, "seed argument '%s' is not a number\n",
seedstr); return 1;
}
} else {
char* endp;
long long n = strtoll(arg, &endp, 0);
if (endp == arg) {
fprintf(stderr, "argument '%s' is not a number\n", arg);
return 1;
}
int n_mx = mx[val_i];
if (n < 1 || n > n_mx) {
fprintf(stderr, "argument '%s' out of range. Please specify
value in range[1:%d]\n" , arg, n_mx);
return 1;
}
if (val_i==0)
rep1 = n;
else
rep2 = n;
++val_i;
}
}
}

void* mem =
malloc((sizeof(boxes_t)+sizeof(peek_t)+sizeof(uint64_t))*rep1); if
(!mem) { perror("malloc()");
return 1;
}

uint64_t* prngs = (uint64_t*)mem;
boxes_t* boxes = (boxes_t*)&prngs[rep1];
peek_t* results = (peek_t*)&boxes[rep1];
int ret = test(boxes, results, rep1, rep2, prngs, seed);
free(mem);

return ret;
}

static uint32_t my_prng(uint64_t* pState)
{
// we don't need very good PRNG,
// but it has to repeatable cross-platform,
// so C RTL rand() is not suitable
const uint64_t PRNG_A = 2862933555777941757ull;
const uint64_t PRNG_C = 20260329ull;
uint64_t s = *pState*PRNG_A + PRNG_C;
*pState = s;
return s >> 32;
}

static void make_random_boxes(boxes_t* dst, uint64_t* prng)
{
uint8_t pool[N_BOXES*BOX_SIZE];
for (int i = 0; i < N_BOXES; ++i)
for (int k = 0; k < BOX_SIZE; ++k)
pool[i*BOX_SIZE+k] = i;

for (int i = 0; i < N_BOXES; ++i) {
for (int k = 0; k < BOX_SIZE; ++k) {
uint32_t rnd = my_prng(prng);
int idx0 = i*BOX_SIZE+k;
int idx1 = ((N_BOXES*BOX_SIZE-idx0)*(uint64_t)rnd >> 32)+idx0;
uint8_t v0 = pool[idx0];
uint8_t v1 = pool[idx1];
dst->b[i][k] = v1;
pool[idx1] = v0;
}
}
}

static bool validate(const boxes_t* bx, const peek_t* res)
{
bool cookies[N_BOXES]={0};
for (int i = 0; i < N_BOXES; ++i) {
unsigned v = res->b[i];
if (v >= BOX_SIZE) {
fprintf(stderr, "res[%d] = %d. Out of range!\n", i, v);
return false;
}
unsigned c = bx->b[i][v];
if (cookies[c]) {
fprintf(stderr, "res[%d] = %d. bx[%d][%d]=%d repeated.\n"
, i, v
, i, v, c);
return false;
}
cookies[c] = true;
}
return true;
}

static int cmp_ll(const void* pa, const void* pb) {
long long a = *(const long long*)pa;
long long b = *(const long long*)pb;
if (a < b) return -1;
if (a > b) return 1;
return 0;
}

static int test(boxes_t* boxes, peek_t* results, int rep1, int rep2,
uint64_t* prngs, uint64_t seed)
{
uint64_t prng = seed;
long long dt[rep2];
long long mn, mx;
uint64_t mn_prng, mx_prng;
for (int it = 0; it < rep2; ++it) {
for (int i = 0; i < rep1; ++i) {
prngs[i] = prng;
make_random_boxes(&boxes[i], &prng);
}
struct timespec t0;
clock_gettime(CLOCK_MONOTONIC, &t0);
for (int i = 0; i < rep1; ++i)
results[i] = solver(boxes[i]);
struct timespec t1;
clock_gettime(CLOCK_MONOTONIC, &t1);
long long delta_t = (t1.tv_sec-t0.tv_sec)*(long long)1e9+ t1.tv_nsec-t0.tv_nsec; dt[it] = delta_t;
if (it == 0 || mn > delta_t) {
mn = delta_t;
mn_prng = prngs[0];
}
if (it == 0 || mx < delta_t) {
mx = delta_t;
mx_prng = prngs[0];
}

for (int i = 0; i < rep1; ++i) {
if (!validate(&boxes[i], &results[i])) {
fprintf(stderr,
"Test fail at iteration %d,%d. prng %llu\n"
,it, i, prngs[i]);
fprintf(stderr, "boxes:\n");
for (int bi = 0; bi < N_BOXES; ++bi) {
for (int k = 0; k < BOX_SIZE; ++k)
fprintf(stderr, " %2d", boxes[i].b[bi][k]);
fprintf(stderr, " : %d => %2d\n"
, results[i].b[bi]
, boxes[i].b[bi][results[i].b[bi]]);
}
return 1;
}
}
}

qsort(dt, rep2, sizeof(*dt), cmp_ll);
long long med = dt[rep2/2];
double scale = 1e-6/rep1;
printf("o.k.\nmed=%.6f msec. min=%.6f msec, prng %llu. max=%.6f msec,
prng %llu.\n" , med*scale
, mn*scale, mn_prng
, mx*scale, mx_prng
);

return 0;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Mon Apr 20 00:43:18 2026

From Newsgroup: comp.lang.c

On Fri, 17 Apr 2026 07:40:13 -0700
Tim Rentsch <[email protected]> wrote:

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do the
same thing as your code? I was trying to match the behavior of your
code but I wasn't able to figure out how it decided where to go
next.

Here is the simplest form of the algorithm that I can come with.
Essentially, it is a serial non-recursive variant of the code of your
friend with no algorithmic or technical enhancements.
Even in this form it is not slow and hopefully makes the key idea more
obvious.

#include <stdint.h>
#include <stdbool.h>
#include "solver.h"

peek_t
solver( boxes_t boxes ){
uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
for (int i = 0; i < N_BOXES; i++)
box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

// Selection proceeds in progressive steps.
// At each step we select one new box and one new sort.
// A box and a sort selected at step i are never unselected later
// although association between selected boxes and sorts
// can be changed
for (int step = 0; step < N_BOXES; ++step) {
typedef struct {
uint8_t box, sort;
} trace_t;
trace_t levels[N_BOXES];
bool seen[N_BOXES] = {0};
for (int lvl = 0, box = step; ;) {
// look for unseen sort in box
int sort;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
sort = boxes.b[box][ci];
if (!seen[sort])
break;
}
if (seen[sort]) { // no unseen sorts in box
box = levels[--lvl].box; // return to previous level
continue;
}
// Unseen sort found
// It will become a new selection for box
int selected_box = box_of_cookie[sort];
if (selected_box == N_BOXES) {
// sort unselected.
// it means that search at step i succeed
// commit preserved assignments
for (int k = 0; k < lvl; ++k)
box_of_cookie[levels[k].sort] = levels[k].box;
// assign new sort
box_of_cookie[sort] = box;
break;
}

// sort already selected
// Save new assignment for sort.
// Saved box also serves us if we return
// to the current level from above
levels[lvl++] = (trace_t){ .sort=sort, .box=box};
// Mark sort as seen.
// That mark guarantees forward progress of the search
seen[sort] = true;
// Continue search.
// At the next level we will look for new selection
// for old box of sort
box = selected_box;
}
}

// translate result from box_of_cookie[] to index of cookie in box
peek_t res;
for (int c = 0; c < N_BOXES; ++c) {
int b = box_of_cookie[c];
for (int k = 0; k < BOX_SIZE; ++k) {
if (boxes.b[b][k] == c) {
res.b[b] = k;
break;
}
}
}
return res;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Tue Apr 21 20:37:28 2026

From Newsgroup: comp.lang.c

On Mon, 20 Apr 2026 00:43:18 +0300
Michael S <[email protected]> wrote:

On Fri, 17 Apr 2026 07:40:13 -0700
Tim Rentsch <[email protected]> wrote:

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do the
same thing as your code? I was trying to match the behavior of your
code but I wasn't able to figure out how it decided where to go
next.

Here is the simplest form of the algorithm that I can come with.
Essentially, it is a serial non-recursive variant of the code of your
friend with no algorithmic or technical enhancements.
Even in this form it is not slow and hopefully makes the key idea more obvious.

#include <stdint.h>
#include <stdbool.h>
#include "solver.h"

peek_t
solver( boxes_t boxes ){
uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
for (int i = 0; i < N_BOXES; i++)
box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

// Selection proceeds in progressive steps.
// At each step we select one new box and one new sort.
// A box and a sort selected at step i are never unselected later
// although association between selected boxes and sorts
// can be changed
for (int step = 0; step < N_BOXES; ++step) {
typedef struct {
uint8_t box, sort;
} trace_t;
trace_t levels[N_BOXES];
bool seen[N_BOXES] = {0};
for (int lvl = 0, box = step; ;) {
// look for unseen sort in box
int sort;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
sort = boxes.b[box][ci];
if (!seen[sort])
break;
}
if (seen[sort]) { // no unseen sorts in box
box = levels[--lvl].box; // return to previous level
continue;
}
// Unseen sort found
// It will become a new selection for box
int selected_box = box_of_cookie[sort];
if (selected_box == N_BOXES) {
// sort unselected.
// it means that search at step i succeed
// commit preserved assignments
for (int k = 0; k < lvl; ++k)
box_of_cookie[levels[k].sort] = levels[k].box;
// assign new sort
box_of_cookie[sort] = box;
break;
}

// sort already selected
// Save new assignment for sort.
// Saved box also serves us if we return
// to the current level from above
levels[lvl++] = (trace_t){ .sort=sort, .box=box};
// Mark sort as seen.
// That mark guarantees forward progress of the search
seen[sort] = true;
// Continue search.
// At the next level we will look for new selection
// for old box of sort
box = selected_box;
}
}

// translate result from box_of_cookie[] to index of cookie in box
peek_t res;
for (int c = 0; c < N_BOXES; ++c) {
int b = box_of_cookie[c];
for (int k = 0; k < BOX_SIZE; ++k) {
if (boxes.b[b][k] == c) {
res.b[b] = k;
break;
}
}
}
return res;
}

And here is somewhat less simple variant. It is not necessarily easy to
follow by itself, but if one studied the code above then enhancements
made here are rather obvious.
What is interesting about it is that despite absence of 'parallel'
tricks this variant is the fastest that I measured so far!

#include <stdint.h>
#include <stdbool.h>
#include "solver.h"

peek_t
solver( boxes_t boxes ){
uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
for (int i = 0; i < N_BOXES; i++)
box_of_cookie[i]=N_BOXES;// value N_BOXES marks unselected sort

uint8_t ci_min[N_BOXES];
// all cookies [0:ci_min[i]-] in box i are selected
peek_t res;
for(int i = 0; i < N_BOXES; ++i) {
typedef struct {
uint8_t box, ci;
} trace_t;
trace_t levels[N_BOXES];
bool sort_seen[N_BOXES] = {0};
ci_min[i] = 0;
for (int lvl = 0, box = i; ;) {
if (ci_min[box] != BOX_SIZE) {
// look for unselected sort in the box
bool done = false;
for (int ci = ci_min[box]; ci < BOX_SIZE; ++ci) {
int sort = boxes.b[box][ci];
if (box_of_cookie[sort] == N_BOXES) {
// sort unselected.
// It means that search at step i succeed
// Assign new sort
res.b[box] = ci;
box_of_cookie[sort] = box;
// set new high water mark
ci_min[box] = ci + 1;
// commit preserved assignments
for (int k = 0; k < lvl; ++k) {
int box = levels[k].box;
int ci = levels[k].ci;
res.b[box] = ci;
box_of_cookie[boxes.b[box][ci]] = box;
}
done = true;
break;
}
}
if (done)
break;
ci_min[box] = BOX_SIZE;
}
// all cookies in the box belong to selected sorts

// look for unseen sort in box
bool back = true;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
int sort = boxes.b[box][ci];
if (!sort_seen[sort]) {
// Unseen sort found
// If current branch of search succeed then
// it will become a new selection for box
// Save new selection for sort.
// Saved box also serves us
// if we return to the current level from above
levels[lvl++] = (trace_t){ .ci=ci, .box=box};
// Mark sort as seen.
// That mark guarantees forward progress of the search
sort_seen[sort] = true;
// Continue the search.
// Look for new selection for the old box of sort
box = box_of_cookie[sort];
back = false;
break;
}
}
if (back) // no unseen sorts in box
box = levels[--lvl].box; // return to previous level
}
}
return res;
}

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@[email protected] to comp.lang.c on Tue Apr 21 18:22:34 2026

From Newsgroup: comp.lang.c

Michael S <[email protected]> writes:

On Fri, 17 Apr 2026 07:40:13 -0700
Tim Rentsch <[email protected]> wrote:

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do the
same thing as your code? I was trying to match the behavior of your
code but I wasn't able to figure out how it decided where to go
next.

Here is the simplest form of the algorithm that I can come with.
Essentially, it is a serial non-recursive variant of the code of your
friend with no algorithmic or technical enhancements.
Even in this form it is not slow and hopefully makes the key idea more obvious.

I have difficulty reading code written in the style used below. I
might describe it broadly as too many trees and not enough forest.
That comment is not meant as criticism but given just for
informational value.

My hope has been that my explanations, given first in prose and then
in code, would let you understand how I approached the problem, so
you could see how your approach differs from mine and then explain
what the differences are. As things stand I don't know what are the
common points of reference, or even whether there are any important
ones.

Having said that, let me ask this. What is your goal? I find
myself wondering if we are talking past each other.

#include <stdint.h>
#include <stdbool.h>
#include "solver.h"

peek_t
solver( boxes_t boxes ){
uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
for (int i = 0; i < N_BOXES; i++)
box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected sort

// Selection proceeds in progressive steps.
// At each step we select one new box and one new sort.
// A box and a sort selected at step i are never unselected later
// although association between selected boxes and sorts
// can be changed
for (int step = 0; step < N_BOXES; ++step) {
typedef struct {
uint8_t box, sort;
} trace_t;
trace_t levels[N_BOXES];
bool seen[N_BOXES] = {0};
for (int lvl = 0, box = step; ;) {
// look for unseen sort in box
int sort;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
sort = boxes.b[box][ci];
if (!seen[sort])
break;
}
if (seen[sort]) { // no unseen sorts in box
box = levels[--lvl].box; // return to previous level
continue;
}
// Unseen sort found
// It will become a new selection for box
int selected_box = box_of_cookie[sort];
if (selected_box == N_BOXES) {
// sort unselected.
// it means that search at step i succeed
// commit preserved assignments
for (int k = 0; k < lvl; ++k)
box_of_cookie[levels[k].sort] = levels[k].box;
// assign new sort
box_of_cookie[sort] = box;
break;
}

// sort already selected
// Save new assignment for sort.
// Saved box also serves us if we return
// to the current level from above
levels[lvl++] = (trace_t){ .sort=sort, .box=box};
// Mark sort as seen.
// That mark guarantees forward progress of the search
seen[sort] = true;
// Continue search.
// At the next level we will look for new selection
// for old box of sort
box = selected_box;
}
}

// translate result from box_of_cookie[] to index of cookie in box
peek_t res;
for (int c = 0; c < N_BOXES; ++c) {
int b = box_of_cookie[c];
for (int k = 0; k < BOX_SIZE; ++k) {
if (boxes.b[b][k] == c) {
res.b[b] = k;
break;
}
}
}
return res;
}

Another question: is the above typical of how you usually write
code, or are you writing it like that for my benefit?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@[email protected] to comp.lang.c on Wed Apr 22 12:05:30 2026

From Newsgroup: comp.lang.c

On Tue, 21 Apr 2026 18:22:34 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Fri, 17 Apr 2026 07:40:13 -0700
Tim Rentsch <[email protected]> wrote:

My questions are these. One, could you understand what the code is
doing and how it works? And two, if given an appropriate choice of
what permutation is done in best_next_ctype(), does this code do
the same thing as your code? I was trying to match the behavior
of your code but I wasn't able to figure out how it decided where
to go next.

Here is the simplest form of the algorithm that I can come with. Essentially, it is a serial non-recursive variant of the code of
your friend with no algorithmic or technical enhancements.
Even in this form it is not slow and hopefully makes the key idea
more obvious.

I have difficulty reading code written in the style used below. I
might describe it broadly as too many trees and not enough forest.
That comment is not meant as criticism but given just for
informational value.

My hope has been that my explanations, given first in prose and then
in code, would let you understand how I approached the problem, so
you could see how your approach differs from mine and then explain
what the differences are. As things stand I don't know what are the
common points of reference, or even whether there are any important
ones.

Having said that, let me ask this. What is your goal? I find
myself wondering if we are talking past each other.

My general goal is to show that while this challenge may look like
variant of travelling salesman problem, it actually is not.
It is not NP complete and has many solutions with mean complexity of
little above M*N and worst case complexity still below M*N*N.
My 2nd goal is to outline common characteristics of various robust
solutions ignoring as much as possible technical details:
- N progressive steps
- within each step, forward progress due to recording of failed
attempts ('seen' bit array).

#include <stdint.h>
#include <stdbool.h>
#include "solver.h"

peek_t
solver( boxes_t boxes ){
uint8_t box_of_cookie[N_BOXES]; // valid for selected sorts
for (int i = 0; i < N_BOXES; i++)
box_of_cookie[i] = N_BOXES; // value N_BOXES marks unselected
sort

// Selection proceeds in progressive steps.
// At each step we select one new box and one new sort.
// A box and a sort selected at step i are never unselected later
// although association between selected boxes and sorts
// can be changed
for (int step = 0; step < N_BOXES; ++step) {
typedef struct {
uint8_t box, sort;
} trace_t;
trace_t levels[N_BOXES];
bool seen[N_BOXES] = {0};
for (int lvl = 0, box = step; ;) {
// look for unseen sort in box
int sort;
for (int ci = 0; ci < BOX_SIZE; ++ci) {
sort = boxes.b[box][ci];
if (!seen[sort])
break;
}
if (seen[sort]) { // no unseen sorts in box
box = levels[--lvl].box; // return to previous level
continue;
}
// Unseen sort found
// It will become a new selection for box
int selected_box = box_of_cookie[sort];
if (selected_box == N_BOXES) {
// sort unselected.
// it means that search at step i succeed
// commit preserved assignments
for (int k = 0; k < lvl; ++k)
box_of_cookie[levels[k].sort] = levels[k].box;
// assign new sort
box_of_cookie[sort] = box;
break;
}

// sort already selected
// Save new assignment for sort.
// Saved box also serves us if we return
// to the current level from above
levels[lvl++] = (trace_t){ .sort=sort, .box=box};
// Mark sort as seen.
// That mark guarantees forward progress of the search
seen[sort] = true;
// Continue search.
// At the next level we will look for new selection
// for old box of sort
box = selected_box;
}
}

// translate result from box_of_cookie[] to index of cookie in box
peek_t res;
for (int c = 0; c < N_BOXES; ++c) {
int b = box_of_cookie[c];
for (int k = 0; k < BOX_SIZE; ++k) {
if (boxes.b[b][k] == c) {
res.b[b] = k;
break;
}
}
}
return res;
}

Another question: is the above typical of how you usually write
code, or are you writing it like that for my benefit?

Do you mean, relatively long comments distributed through the code vs concentrated at the beginning?
I don't know, because I don't try to remember such things. I'd have to
read a sizable corpus of my own production code in order to give an
answer.

--- Synchronet 3.21f-Linux NewsLink 1.2

Who's Online
Recent Visitors
- D2sk
  Sat Apr 25 08:20:10 2026
  from Fort Smith, Ar via Raw
- Noozle
  Sat Apr 25 08:05:40 2026
  from Noozle City via Telnet
- Microbot
  Sat Apr 25 02:25:30 2026
  from Moore, Ok via Telnet
- Kaptain_Krawdad
  Fri Apr 24 21:39:52 2026
  from Southern Il via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,114
Nodes:	10 (0 / 10)
Uptime:	492515:55:16
Calls:	14,267
Calls today:	3
Files:	186,321
D/L today:	27,599 files (9,006M bytes)
Messages:	2,518,520

Re: Cookies in boxes - algorithmic challenge

Who's Online

Recent Visitors

System Info