this post was submitted on 24 Jan 2024
33 points (90.2% liked)
Java
1394 readers
1 users here now
For discussing Java, the JVM, languages that run on the JVM, and other related technologies.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Re: "the guy has no clue what
std::unique_ptr
is", are you saying that because of his assertion thatunique_ptr
has a non-zero cost, whereas Rust'sBox
does not?He's actually correct about that, although the difference is fairly minimal, and I believe the difference is outweighed by the unwinding (i.e. panic/exception handling) code that needs to be generated in both cases. But with unwinding disabled, you can see clearly that Rust generates exactly the same code for a
Box
as for a raw pointer, whereas C++ does not:The reason I looked into this is because of a Chandler Carruth talk primarily about
unique_ptr
called "There Are No Zero-Cost Abstractions", which explains in detail why C++ fundamentally can't optimizeunique_ptr
to generate the same code as a raw pointer.That's a bad apples-to-oranges comparison,
unique_ptr
frees memory upon destruction, which with the raw pointer version you don't do. The least you could do is use rvalue references. The class layout ofunique_ptr
is also hard to optimize away (unless via LTO) becauseconsume
isn't in the same translation unit and the compiler has to let your binary be ABI compatible with the rest of your binaries. (Also, you're using Clang 9 by the way, we are at version 17 now)This is much fairer: https://godbolt.org/z/v4PYcd8hf
Then, if you additionally make the functions' bodies accessible to the compiler and add a
free
to the raw pointer version (for fairness if you insist to haveconsume
orfoo
destroy the resource), you should get an almost identical assembly code (with still an extra indirection that you'll see in an extramov
due to the fact that the C++ compiler still doesn't see how you use them, but IMO that should still be a textbook case for LTO), and the non-zero difference should disappear altogether once you actually use those functions and if it doesn't you absolutely should file a bug report.Carruth, while an excellent presenter, has been on a "C++ standard committee bad, why don't we do more ABI-breaking changes, y'all suck, Abseil and Carbon rule" rant spree, with that basically materialized by Google stopping active participation in Clang (haven't followed the drama since then so not sure if Google backtracked on that decision), and it's hard to consider him to be objective about this since he also has the Carbon project and his recent Carbon talks are painful to watch as it's hard to ignore how he's going from a "C++ optimization chad" that he used to be to a Google marketing/sales person.
I intentionally crafted an example where the code is simply using
unique_ptr
(andBox
) without freeing the memory, just as it uses the raw pointer without freeing it. Theconsumes
function would of course free it, hence the name. Freeing the memory shouldn't be all that different betweenfree
,~unique_ptr
, andBox::drop
.Moreover, the Rust code is doing the same thing the C++ code is doing;
Box
frees memory just likeunique_ptr
does.I was surprised to see how much lower-overhead that looks, and I couldn't remember why I originally wrote the example as passing by value until I reviewed Carruth's video. But he actually talks about using rvalue references around the 22 minute mark, and then goes back to passing by value, so I assume that's why I wrote it the way I did. I do think it's pretty counterintuitive that a type that's semantically a pointer needs to be passed by reference for efficiency.
The "class layout" of
unique_ptr
is just a pointer; are you talking about the struct needing to be on the stack in order to satisfy the ABI? That's true, but people do in fact need to pass data between multiple different translation units (and even into and out of dynamically-loaded libraries), so that should be possible to do in an efficient manner. And, again, both the raw-pointer version and the Rust version manage to make this work.Oops, good catch; I crafted this example a long time ago and did try it with the most recent version, but I guess that must have been in a different tab. But it doesn't actually make much of a difference here.
Yes, sure, compiling in one translation unit helps, but as I mentioned above, passing an owning pointer between translation units shouldn't be inherently inefficient. But also, as far as I can tell, making those changes doesn't actually make the
unique_ptr
and raw-pointer assembly equivalent. The&&
in the signature for "consumes" is odd because the function doesn't actually take ownership of the pointer so it doesn't actually free it, and consequently the inlining of the function is a no-op and the destructor is called insidefoo
. But that doesn't hinder the raw-pointer comparison much, because the C version just inlinesconsumes
. I don't read assembly well enough to understand whether the extramov
in theunique_ptr
version is very significant or why it exists. (Theprint_global
function is only here to prevent the other functions from being turned into no-ops.)https://godbolt.org/z/83T8Gfszv
Abseil is...a collection of C++ libraries? How does that make him biased against the C++ standards committee? Carbon was announced in 2022, and the talk I linked was given in 2019, so I don't know if Carruth was on his "rant spree" in your opinion at that point. But the point of linking to Carruth's talk was just to explain where that example originally came from and to let someone more knowledgeable than myself explain why it would require ABI breakage for C++ to optimize
unique_ptr
as well as Rust optimizesBox
.The reason I said to use rvalue references is because otherwise it is an apples-to-oranges comparison: in the C++ code you have implicit ABI decisions around the call convention and whose responsibility it is to destroy the temporary.
https://godbolt.org/z/9875qMM6Y (or alternatively: https://godbolt.org/z/9xehs3sYP)
The assembly is identical, the ownership is clearly transferred, and this doesn't need LTO or looking at the function bodies and is entirely done by the C++ compiler. It involves using (when available) a vendor attribute (see trivial_abi, shouldn't be an issue given Rust devs are fine with having only one compiler anyway) and writing a
UniquePtr
class (shouldn't be used in production code, what I've given there is only for illustration purposes) that assumes that the custom deleter cannot have an internal state.This is a zero-runtime-cost abstraction. Now whether the zeroing of that cost can depend on what ABI assumptions you're ready to make, or whether you want to depend on LTO is another thing. We're literally discussing a "problem" that is not really a problem because Rust doesn't have the luxury yet to have that problem: you're easily forgetting that Rust has only one compiler.
A project like that usually takes years, so again, very likely that they began working on it years before that. For instance, Google designed Go in 2007 and announced it in November 2009.
So...you had to make your own version of
unique_ptr
to make it zero-cost? Doesn't that just confirm the original statement you were disagreeing with, thatunique_ptr
has a small runtime cost? Or was there some other reason you thought the creator of the video you shared has "no idea" whatunique_ptr
is?I also don't understand why the standard library can't use the trivial-abi attribute. Different implementations of the standard library aren't required to be interoperable, are they?
I still don't understand what you think is "apples-to-oranges" here. If you change the Rust code to require the C ABI, there's no difference in the generated code: https://godbolt.org/z/1xf9qG3n8