Hacker News
A deep dive into SmallVector:push_back
someonebaggy
|next
[-]
im3w1l
|next
|previous
[-]
The gain is minimal for doing this optimization at one location. But doing it everywhere, that could matter. Pushing back in a loop could maybe be optimized to a single allocation and a memcopy.
tialaramex
|root
|parent
[-]
For example Bjarne Stroustrup suggests you should use reservation for "avoiding invalidation of iterators" instead.
oneshtein
|root
|parent
|next
[-]
tialaramex
|root
|parent
[-]
Zig calls the analagous methods ensureTotalCapacity and ensureTotalCapacityPrecise
It's worth mentioning that talk of "exact" and "precise" may be misleading because it probably makes sense to (and despite that word exact Vec does explicitly say it might) discern that the actual allocation was big enough for slightly more items and increase the capacity accordingly.
Right now, many (most?) allocators if asked for enough room to store 7 Doodads, each 7 bytes in length with one byte alignment (thus total 49 bytes), may give you 64 bytes because it was easier, but can't or won't tell you about that. Rust's GlobalAlloc can't do better, nor can C's malloc family. But Rust's (unstable) Allocator trait and many modern malloc-descendents can tell you hey, here are 64 bytes instead. With that fixed your growable array type should consider this, after all instead of 7 Doodads, 64 bytes is enough for 9 Doodads, so it's free capacity.
dzaima
|root
|parent
[-]
A problem with just directly exposing such is it makes precise sanitizing impossible, as you'd have to tolerate some out-of-intended-bounds reads/writes. (and making the sanitizer always give exact-size allocations would also be bad as that'd end up not testing code paths that may break when they're not)
dzaima
|next
|previous
[-]
On the push impl in the article - for non-x86 (and perhaps even on x86 for performance, though not size/instruction count) it'd be better to allow the size increment to reuse the size read done by the capacity check; with C++'s lack of suitable aliasing information, the interleaved memcpy/store prevents the compiler from deciding this itself.
RossBencina
|root
|parent
[-]
Interesting. I understand why it does that, but it makes me realise that I usually think "the compiler will reuse the loaded value/perform CSE" without considering the cases where it won't. Are there tools that will detect and warn/indicate this situation? e.g. "warning: could not reuse previously loaded value of 'foo' due to aliasing hazard 'memcpy' at line 234."