Aliasing
Written by me, proof-read by an LLM.
Details at end.
we ended on a bit of a downer: aliasing stopped optimisations dead in their tracks. I know this is supposed to be the , not the ! Knowing why your compiler can't optimise is just as important as knowing all the clever tricks it can pull off.YesterdayAdvent of Compiler OptimisationsAdvent of Compiler Giving Up
Let's take a simple example of a counter class. It accumulates integers into a member variable . I've used C++ templates to show two versions of the code: one that accumulates in an and one that accumulates in an .1totalintlong
At first glance, the two loop bodies look almost identical, as you might expect. In one case we'll accumulate in , and the other in , right? The truth is more subtle. Let's first look at the case:eaxraxint
Looks pretty reasonable, right? Now let's look at the version:long
The first change from the 32-bit case you'll notice is the to turn the 32-bit signed integer into a 64-bit signed integer. That's all but free on modern CPUs, and so while it looks like the loop is doing more work than the 32-bit version, it's not as bad as it seems.movsx
The important difference here is the update of : In the first version, each loop iteration updates the member variable . In the second version everything remains in registers until the end of the loop, and then is only updated at the end. CPU caches are super fast, but it's still best to avoid redundant stores in hot loops!totaltotaltotal
So, why this difference? Of course it's aliasing: In the version the compiler can't be sure that the passed in to doesn't cover the 's member variable . They are the same type, and so that would be perfectly OK by the type-based aliasing rules of C++.intspancountCountertotal
In the version, the types differ ( vs ), and under C++'s strict aliasing rules, it would be undefined behaviour for them to overlap in memory. Since the compiler can assume the program doesn't invoke undefined behaviour, it knows they don't alias and can safely optimise. That lets it cache the in a register and only update the member variable at the end. As we'll see later in the series, being aliasing-free helps other optimisations too, like vectorisation.longintlongtotal
To fix this issue, we have a couple of choices. The easy way would be to rewrite to accumulate in a local variable, and then update the total at the end: Using would fix this be more intention-revealing.total += std::accumulate(span.begin(), span.end(), 0)and
The other, non-standard way to work around this issue is to use . This GNU extension (borrowed from C) lets us decorate pointers and essentially promises that this pointer uniquely refers to the object it points at. In our case, the thing we need to prove is unique is the 's "this pointer" itself. Adding after the parameter list (where you would add for a const member function) works. But again - this is very non-standard, so use at your peril.__restrictCounter__restrictconst2
Aliasing is one of C++'s sneakier gotchas, especially when you're working with base types like and - you can't avoid using them, and they're prime candidates for aliasing with each other. It's perfectly legal to have overlapping same-type pointers, so the compiler assumes the worst and peppers your hot loops with memory accesses. The fix may be as simple as only using local variables within your loop - but first you have to spot it. Fire up , look for those unexpected writes to memory when you'd expect everything to stay in registers, and you'll know when aliasing is holding you back!intfloat3Compiler Explorer
See that accompanies this post.the video
This post is day 15 of , a 25-day series exploring how compilers transform our code.Advent of Compiler Optimisations 2025
← | →When LICM fails usCalling all arguments
This post was written by a human () and reviewed and proof-read by LLMs and humans.Matt Godbolt
.Support Compiler Explorer on or , or by buying CE products in the PatreonGitHubCompiler Explorer Shop
moveax,DWORDPTR[rdi]; eax = total.L3:addeax,DWORDPTR[rsi]; add element to totaladdrsi,4; move to next elementmovDWORDPTR[rdi],eax; total = eaxcmprsi,rdx; are we finished?jne.L3; loop if notretmovrax,QWORDPTR[rdi]; rax = total.L9:movsxrdx,DWORDPTR[rsi]; read & sign extend next elementaddrsi,4; move to next elementaddrax,rdx; rax += elementcmprcx,rsi; are we finished?jne.L9; loop if notmovQWORDPTR[rdi],rax; total = raxret; returnThis example is taken from something similar that actually bit me in my job a few years ago. I was bewildered until I worked out that it was aliasing! ↩
Try making the change in the Compiler Explorer window above. You'll see the update move outside of the loop. If you're not scared to jump ahead a bit, make it and you'll see the vectorised code, but only with .
total-O3__restrict↩Aliasing is particularly bad in C and C++. In other languages like Rust and Fortran, aliasing can't happen at all, or is severely restricted. This can help their optimisers considerably. ↩