RSS Bilingual Reader

Written by me, proof-read by an LLM. Details at end.

Every now and then a compiler will surprise me with a really smart trick. When I first saw this optimisation I could hardly believe it. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value:

So far so decent: GCC has done some preliminary checks, then fallen into a loop that efficiently sums numbers using (we've ). But taking a closer look at the loop we see something unusual:leaseen this before

The compiler has cleverly realised it can do two numbers at a time using the fact it can see we're going to add , which is the same as adding . Very cunning, I think you'll agree!¹xx + 1x*2 + 1and

If you turn the optimiser up to you'll see the compiler works even harder to vectorise the loop using parallel adds. All very clever.-O3

This is all for GCC. Let's see what clang does with our code:

This is where I nearly fell off my chair: ! Clang checks for positive , and if so it does:there is no loopvalue

It was not at all obvious to me what on earth was going on here. By backing out the maths a little, this is equivalent to:

Expanding the parentheses:

Rearranging a bit:

Multiplying the by 2 / 2:(v - 1)

Combining those and cancelling:

Simplifying and factoring gives us which is the closed-form solution to the "sum of integers"! Truly amazing - we've gone from an O(n) algorithm as written, to an O(1) one!v(v - 1) / 2²

I love that despite working with compilers for more than twenty years, they can still surprise and delight me. The years of experience and work that have been poured into making compilers great is truly humbling, and inspiring.

We're nearly at the end of this series - there's so much more to say but that will have to wait for another time. Tomorrow will be a little different: see you then!

See that accompanies this post.the video

This post is day 24 of , a 25-day series exploring how compilers transform our code.Advent of Compiler Optimisations 2025

← | →Switching it up a bit Thank you

This post was written by a human () and reviewed and proof-read by LLMs and humans.Matt Godbolt

.Support Compiler Explorer on or , or by buying CE products in the Patreon GitHub Compiler Explorer Shop

.L3:leaedx,[rdx+1+rax*2]; result = result + 1 + x*2addeax,2; x += 2cmpedi,eax; x != valuejne.L3; keep looping

leaeax,[rdi-1]; eax = value - 1leaecx,[rdi-2]; ecx = value - 2imulrcx,rax; rcx = (value - 1) * (value - 2)shrrcx; rcx >>= 1leaeax,[rdi+rcx]; eax = value + rcxdeceax; --eaxret

v + ((v - 1)(v - 2) / 2) - 1;

v + (v² - 2v - v + 2) / 2 - 1

(v² - 3v + 2) / 2 + (v - 1)

(v² - 3v + 2) / 2 + (2v - 2)/2

(v² - v) / 2

Some of the initial code checks for odd/even and accounts accordingly. ↩
Why does the compiler emit this exact sequence and not a slightly more straightforward sequence? I think it's partly avoiding overflow in cases where it might otherwise overflow just a side effect of the way clang tracks and infers . I really don't know for sure, though. andinduction variables ↩

When compilers surprise you