this post was submitted on 18 Aug 2023
24 points (92.9% liked)

Programming

17661 readers
235 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 2 years ago
MODERATORS
 

From Russ Cox

Lumping both non-portable and buggy code into the same category was a mistake. As time has gone on, the way compilers treat undefined behavior has led to more and more unexpectedly broken programs, to the point where it is becoming difficult to tell whether any program will compile to the meaning in the original source. This post looks at a few examples and then tries to make some general observations. In particular, today's C and C++ prioritize performance to the clear detriment of correctness.

I am not claiming that anything should change about C and C++. I just want people to recognize that the current versions of these sacrifice correctness for performance. To some extent, all languages do this: there is almost always a tradeoff between performance and slower, safer implementations. Go has data races in part for performance reasons: we could have done everything by message copying or with a single global lock instead, but the performance wins of shared memory were too large to pass up. For C and C++, though, it seems no performance win is too small to trade against correctness.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago (1 children)

I'm not sure about C, but C++ often describes certain things as "implementation defined";
one such example is casting a pointer to a sufficiently big integral type.

For example, you can assign a float* p0 to a size_t i, then i to a float* p1 and expect that p0 == p1.
Here the compiler is free to choose how to calculate i, but other than that the compiler's behavior is predictable.

"Undefined behavior" is not "machine-dependent" code - it's "these are seemingly fine instructions that do not make sense when put together, so we're allowing the compiler to assume that you're not going to do this" code.

That said, UB is typically the result of "clever" programming that ignores best practices, aside from extreme cases prosecuted by the fiercest language lawyers (like empty while(true) loops that may or may not boot Skynet, or that one time that atan2(0,0) erased from this universe all traces of Half Life 3).

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

For example, you can assign a float* p0 to a size_t i, then i to a float* p1 and expect that p0 == p1. Here the compiler is free to choose how to calculate i, but other than that the compiler’s behavior is predictable.

I don't think this specific example is true, but I get the broader point. Actually, "implementation defined" is maybe a better term for this class of "undefined in the language spec but still reliable" behavior, yes.

“Undefined behavior” is not “machine-dependent” code

In C, that's exactly what it is (or rather, there is some undefined-in-the-spec behavior which is machine dependent). I feel like I keep just repeating myself -- dereferencing 0 is one of those things, overflowing an int is one of those things. It can't be in the C language spec because it's machine-dependent, but it's also not "undefined" in the sense you're talking about ("clever" programming by relying on something outside the spec that's not really official or formally reliable.) The behavior you get is defined, in the manual for your OS or processor, and perfectly consistent and reliable.

I'm taking the linked author at his word that these things are termed as "undefined" in the language spec. If what you're saying is that they should be called "implementation defined" and "undefined" should mean something else, that makes 100% sense to me and I can get behind it.

The linked author seems to think that because those things exist (whatever we call them), C is flawed. I'm not sure what solution he would propose other than doing away with the whole concept of code that compiles down close to the bare metal... in which case what kernel does he want to switch to for his personal machine?