On Mon, 12 Jan 2026 08:03:31 -0800...
Andrey Tarasevich <[email protected]> wrote:
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
gcc code generator does not think so.
Do you have citation from the Standard?
But I was interested in the "opinion" of C Standard rather than of gcc compiler.
Is it full nasal UB or merely "implementation-defined behavior"?
May be. But it's not expressed by gcc code generator or by any wranings.
So, how can we know?
I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(
There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.
I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."
That's what you had in mind?
On Mon, 12 Jan 2026 21:09:25 -0500That's the difference - I did understand. In your struct, other_table is
"James Russell Kuyper Jr." <[email protected]> wrote:
On 2026-01-12 15:02, Scott Lurndal wrote:
Michael S <[email protected]> writes:...
On Mon, 12 Jan 2026 15:58:15 +0000
bart <[email protected]> wrote:
On 12/01/2026 14:28, Michael S wrote:
...struct bar1 {
int table[4];
int other_table[4];
};
So you want to deliberately read one element past the end because
you know it will be the first element of other_table?
Yes. I primarily want it for multi-dimensional arrays.
So declare it as int table[4][4].
Note that this suggestion does not make the behavior defined. It is
undefined behavior to make dereference table[0]+4, and it is
undefined behavior to make any use of table[0]+5.
Pay attention that Scott didn't suggest that dereferencing table[0][4]
in his example is defined.
Not that I understood what he wanted to suggest :(
On 2026-01-10, Michael S <[email protected]> wrote:
On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <[email protected]> wrote:
On 2026-01-09, Michael S <[email protected]> wrote:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <[email protected]> wrote:
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.
If false positives occur for the diagnostic frequently, there
will be legitimate complaint.
If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.
There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.
<snip>
I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.
In your particular example, it is crystal clear that the "return 0"
statement is elided away due to being considered unreachable, and the
only reason for that can be undefined behavior, and the only undefined behavior is accessing the array out of bounds.
The compiler has decided to use the undefined behavior of the OOB array access as an unreachable() assertion, and at the same time neglected to
issue the -Warray-bounds diagnostic which is expected to be issued for
OOB access situations that the compiler can identify.
No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.
It looks as if the logic for identifying OOB accesses for diagnosis is
out of sync with the logic for identifying OOB accesses as assertions of undefined behavior.
On 2026-01-13 16:54, Tristan Wibberley wrote:[...]
[...]IIRC indexing a table follows the rules of pointers and doing so
outside of a table's bounds is generally U/B except for very peculiar
specific cases. You can do it in a struct across members /sometimes/
because a struct is a single object. ...
No, there is no such exception in the standard. It is still undefined behavior. One of the most annoying ways undefined behavior can
manifest is that you get exactly the same behavior that you
incorrectly thought you were guaranteed to get. That's a problem,
because it can leave you unaware of your error.
Combining these, and padding requirements, you can definedly reach
On 2026-01-12 13:08, Michael S wrote:
On Mon, 12 Jan 2026 15:58:15 +0000...
bart <[email protected]> wrote:
...struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
I'm not even sure about there being no padding between .table and
.other_table.
Considering that they both 'int' I don't think that it could happen,
even in standard C.
"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)
While I can't think of any good reason for an implementation to insert padding between those objects, it would not violate any requirement of
the standard if one did.
On 2026-01-10, Michael S <[email protected]> wrote:
On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <[email protected]> wrote:
On 2026-01-09, Michael S <[email protected]> wrote:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <[email protected]> wrote:
The important thing to realize is that the fundamental issue
here is not a technical question but a social question. In
effect what you are asking is "why doesn't gcc (or clang, or
whatever) do what I want or expect?". The answer is different
people want or expect different things. For some people the
behavior described is egregiously wrong and must be corrected
immediately. For other people the compiler is acting just as
they think it should, nothing to see here, just fix the code
and move on to the next bug. Different people have different
priorities.
I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.
If false positives occur for the diagnostic frequently, there
will be legitimate complaint.
If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.
There are all kinds of optimizations compilers commonly do that
could also be erroneous situations. For instance, eliminating dead
code.
<snip>
I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.
In your particular example, it is crystal clear that the "return 0"
statement is elided away due to being considered unreachable, and the
only reason for that can be undefined behavior, and the only undefined behavior is accessing the array out of bounds.
The compiler has decided to use the undefined behavior of the OOB
array access as an unreachable() assertion, and at the same time
neglected to issue the -Warray-bounds diagnostic which is expected to
be issued for OOB access situations that the compiler can identify.
No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.
It looks as if the logic for identifying OOB accesses for diagnosis is
out of sync with the logic for identifying OOB accesses as assertions
of undefined behavior.
In some situations, a surprising optimization occurs not because of
undefined behavior, but because the compiler is assuming well-defined behavior (absence of UB).
That's not the case here; it is relying on the presence of UB.
Or rather, it is relyiing on the absence of UB in an assinine way:
it is assuming that the program does not reach the out-of-bounds
access, because the sought-after value is found in the array.
But that reasoning requires awareness of the existence of the
out-of-bounds access.
That's the crux of the issue there.
There is an unreachable() assertion in modern C. And it works by
invoking undefined behavior; it means "let's have undefined behavior
in this spot of the code". And then, since the compiler assumes
behavior is well-defined, assumes that that statement is not reached,
nor anything after it, and can eliminate it.
The problem is that an OOB array access should not be treated
as the same thing, as if it were unreachable(). Or, rather, no,
sure it's okay to treat an OOB arrary access as unreachable() --- IF
you generate the diagonstic about OOB array access that you
were asked to generate!!!
Perhaps the exception Tristan was referring to (though it doesn't apply
to indexing) is this, in N3220 6.5.10p7:
The idea, I think, is that without that paragraph, given something like
this:
#include <stdio.h>
int main(void) {
struct {
int a[10];
int b[10];
} obj;
printf("obj.a+10 %s obj.b\n",
obj.a+10 == obj.b ? "==" : "!=");
}
the compiler would have to go out of its way to treat obj.a+10 and obj.b
as unequal
On 14/01/2026 06:02, Keith Thompson wrote:
The idea, I think, is that without that paragraph, given something
like this:
#include <stdio.h>
int main(void) {
struct {
int a[10];
int b[10];
} obj;
printf("obj.a+10 %s obj.b\n",
obj.a+10 == obj.b ? "==" : "!=");
}
the compiler would have to go out of its way to treat obj.a+10 and
obj.b as unequal
No it wouldn't. The standard could have just made the comparison
undefined behaviour or unspecified, or implementation specified in all
those cases when dereferencing was undefined or unspecified.
No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
On 2026-01-12 13:08, Michael S wrote:
On Mon, 12 Jan 2026 15:58:15 +0000...
bart <[email protected]> wrote:
...struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
I'm not even sure about there being no padding between .table and
.other_table.
Considering that they both 'int' I don't think that it could happen,
even in standard C.
"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)
Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the type,
not where the type is used.
Thus if "int" has 4-byte size and 4-byte alignment, and you have :
struct X {
char a;
int b;
int c;
int ds[4];
}
then there will be 3 bytes of padding between "a" and "b", but cannot be
any between "b" and "c" or between "c" and "ds".
Even if you have a weird system that has, say, 3-byte "int" with 4-byte alignment, where you would have a byte of padding between "b" and "c",
you would have the same padding there as between "ds[0]" and "ds[1]".
(None of this means you are allowed to access data with "p[i]" or "p +
i" outside of the range of the object that "p" points to or into.)
While I can't think of any good reason for an implementation to insert
padding between those objects, it would not violate any requirement of
the standard if one did.
David Brown <[email protected]> wrote:
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
On 2026-01-12 13:08, Michael S wrote:
On Mon, 12 Jan 2026 15:58:15 +0000
bart <[email protected]> wrote:
...
struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
...
I'm not even sure about there being no padding between .table and
.other_table.
Considering that they both 'int' I don't think that it could happen,
even in standard C.
"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16) >>> "... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)
Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the type,
not where the type is used.
Thus if "int" has 4-byte size and 4-byte alignment, and you have :
struct X {
char a;
int b;
int c;
int ds[4];
}
then there will be 3 bytes of padding between "a" and "b", but cannot be
any between "b" and "c" or between "c" and "ds".
Why not? Assuming 4 byte int with 4 byte alignment I see nothing
wrong with adding 4 byte padding between b and c.
More precisely,
implementation could say that after first int field in a struct
there is always 4 byte padding. AFAICS alignment constraints
and initial segment rule are satified, padding is not at start
of the struct. Are there any other restrictions?
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
On 2026-01-12 13:08, Michael S wrote:
On Mon, 12 Jan 2026 15:58:15 +0000...
bart <[email protected]> wrote:
...struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
"Each non-bit-field member of a structure or union object is alignedI'm not even sure about there being no padding between .table and
.other_table.
Considering that they both 'int' I don't think that it could happen,
even in standard C.
in an implementation-defined manner appropriate to its type."
(6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)
Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the
type, not where the type is used.
Thus if "int" has 4-byte size and 4-byte alignment, and you have :
struct X {
char a;
int b;
int c;
int ds[4];
}
then there will be 3 bytes of padding between "a" and "b", but cannot
be any between "b" and "c" or between "c" and "ds".
Even if you have a weird system that has, say, 3-byte "int" with
4-byte alignment, where you would have a byte of padding between "b"
and "c", you would have the same padding there as between "ds[0]" and "ds[1]".
David Brown <[email protected]> writes:
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
On 2026-01-12 13:08, Michael S wrote:
On Mon, 12 Jan 2026 15:58:15 +0000...
bart <[email protected]> wrote:
...struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
"Each non-bit-field member of a structure or union object is alignedI'm not even sure about there being no padding between .table and
.other_table.
Considering that they both 'int' I don't think that it could happen,
even in standard C.
in an implementation-defined manner appropriate to its type."
(6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)
Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the
type, not where the type is used.
Note that the alignof operator applies to a type, not to an expression
or object.
Thus if "int" has 4-byte size and 4-byte alignment, and you have :
struct X {
char a;
int b;
int c;
int ds[4];
}
then there will be 3 bytes of padding between "a" and "b", but cannot
be any between "b" and "c" or between "c" and "ds".
There can be arbitrary padding between struct members, or after the last member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions. There can be no padding before the first member, and offsets of members
must be increasing.
If alignof (int) is 4, a compiler must place an int object at an address that's a multiple of 4. It's free to place it at a multiple of 8, or
16, if it chooses.
Even if you have a weird system that has, say, 3-byte "int" with
4-byte alignment, where you would have a byte of padding between "b"
and "c", you would have the same padding there as between "ds[0]" and
"ds[1]".
sizeof (int) == 3 and alignof (int) == 4 is not possible. Each type's
size is a multiple of its alignment. There is no padding between array elements.
On 14/01/2026 23:43, Keith Thompson wrote:...
They follow from a couple of facts:sizeof (int) == 3 and alignof (int) == 4 is not possible. Each type's
size is a multiple of its alignment. There is no padding between array
elements.
I have not, as yet, found a justification for those statements in the standards. But I'll keep looking!
On 14/01/2026 23:43, Keith Thompson wrote:[...]
There can be arbitrary padding between struct members, or after the
last member. Almost(?) all implementations add padding only to
satisfy alignment requirements, but the standard doesn't state any
restrictions. There can be no padding before the first member, and
offsets of members must be increasing.
On closer reading, I agree with you here. I find it a little
surprising that this is not implementation-defined. If an
implementation can arbitrarily add extra padding within a struct, it
severely limits the use of structs in contexts outside the current translation unit.
David Brown <[email protected]> writes:
On 14/01/2026 23:43, Keith Thompson wrote:[...]
There can be arbitrary padding between struct members, or after the
last member. Almost(?) all implementations add padding only to
satisfy alignment requirements, but the standard doesn't state any
restrictions. There can be no padding before the first member, and
offsets of members must be increasing.
On closer reading, I agree with you here. I find it a little
surprising that this is not implementation-defined. If an
implementation can arbitrarily add extra padding within a struct, it
severely limits the use of structs in contexts outside the current
translation unit.
In practice, struct layouts are (I think) typically specified by
a system's ABI, and ABIs generally permit/require only whatever
padding is necessary to meet alignment requirements.
And I think C has rules about type compatibility that are intended to
cover the same struct definition being used in different translation
units within a program, though I'm too lazy to look up the details.
[...]
On 14/01/2026 23:43, Keith Thompson wrote:
David Brown <[email protected]> writes:
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
There can be arbitrary padding between struct members, or after the last
member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions.
There can be no padding before the first member, and offsets of members
must be increasing.
On closer reading, I agree with you here. I find it a little surprising >that this is not implementation-defined. If an implementation can >arbitrarily add extra padding within a struct, it severely limits the
use of structs in contexts outside the current translation unit.
David Brown <[email protected]> writes:
On 14/01/2026 23:43, Keith Thompson wrote:
David Brown <[email protected]> writes:
On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:
There can be arbitrary padding between struct members, or after the last >>> member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions. >>> There can be no padding before the first member, and offsets of members
must be increasing.
On closer reading, I agree with you here. I find it a little surprising
that this is not implementation-defined. If an implementation can
arbitrarily add extra padding within a struct, it severely limits the
use of structs in contexts outside the current translation unit.
Including representing typical networking packet headers as structs.
Fortunately, most C compilers have some form of __attribute__((packed))
to inform the compiler that the structure layout should not be padded.
On Sun, 11 Jan 2026 11:48:08 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <[email protected]> wrote:
highcrew <[email protected]> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original
code, but I find it hard to think that the compiler isn't able to
notice it, given that it is even "exploiting" it to produce very
efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have objections
in case compiler generates the same code as today, but issues
diagnostic.
It depends on what the tradeoffs are. For example, given a
choice, I would rather have an option to prevent this particular
death-by-UB optimization than an option to issue a diagnostic.
Having both costs more effort than having just only one.
Me too.
But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.
David Brown <[email protected]> writes:
On 14/01/2026 23:43, Keith Thompson wrote:
[...]
There can be arbitrary padding between struct members, or after the
last member. Almost(?) all implementations add padding only to
satisfy alignment requirements, but the standard doesn't state any
restrictions. There can be no padding before the first member, and
offsets of members must be increasing.
On closer reading, I agree with you here. I find it a little
surprising that this is not implementation-defined. If an
implementation can arbitrarily add extra padding within a struct, it
severely limits the use of structs in contexts outside the current
translation unit.
In practice, struct layouts are (I think) typically specified by
a system's ABI, and ABIs generally permit/require only whatever
padding is necessary to meet alignment requirements.
And I think C has rules about type compatibility that are intended to
cover the same struct definition being used in different translation
units within a program, though I'm too lazy to look up the details.
On Mon, 12 Jan 2026 12:03:36 -0800
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <[email protected]> wrote:
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Yes, in the same sense as in `foo1`.
gcc code generator does not think so.
It definitely does.
Right.
May be. But it's not expressed by gcc code generator or by any
wranings. So, how can we know?
Do you have citation from the Standard?
The short answer is section 6.5.6 paragraph 8.
I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(
There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.
I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."
That's what you had in mind?
On Mon 1/12/2026 9:36 AM, Michael S wrote:
But I was interested in the "opinion" of C Standard rather than of gcc
compiler.
Is it full nasal UB or merely "implementation-defined behavior"?
It is full nasal UB per the standard. And, of course, it is as "implementation-defined" as any other UB in a sense that the standard
permits implementations to _extend_ the language in any way they
please, as long as they don't forget to issue diagnostics when
diagnostics are required (by the standard).
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,099 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492379:05:17 |
| Calls: | 14,106 |
| Calls today: | 2 |
| Files: | 187,124 |
| D/L today: |
2,546 files (1,099M bytes) |
| Messages: | 2,496,244 |