well, O3 definitely helped for the aleph 4 years ago when we set it.
but always figured eventually we would want to strike a different balance for more code space
( of course profiling is good but it is also hard for something like bees (an object-oriented patching environment) running on something like avr32. my method in evaluating O2 vs O3 back when, was just to make a large scene with long chains of operators fired by different events (metro, UI and USB) and measure idle time in the main event loop. hard to get very granular about function calls without just setting GPIO flips in particular functions - that’s not super helpful for a bigger picture…)
question is, which flags in O3 matter. and my suspicion is that unfortunately -finline-functions is probably a big one both for speed and size.
i don’t think the alignment flags in O2 matter much because avr32 doesn’t allow unaligned accesses anyway.
gonna try some tests explicitly setting/unsetting some of the O3 and O2 flags that Os would disable…
and well yeah, we certainly talked about inlining and the inline keyword, flags, and attritbutes. but there are a lot of other flags in O2 and O3 that i frankly don’t undersstand at all, would be interested to hear any other specific ideas about them…
ok, starting a more granular investigation of the gcc opt flags on avr32. immediately discovered that in fact not all the flags are supported by avr32-gcc.
spinning this into a GH issue for further discussion:
[ https://github.com/monome/aleph/issues/292 ]
update:
so i did a bunch of builds with O2 and additional individual flags. as expected, -finline-functions and -finline-small-functions emit a lot more code, but so do -fpeel-loops (complete unrolling) and -fipa-cp-clone (clone external functions with constant arguments.)
for this particular codebase (aleph bees) and avr32-gcc, the following set produces almost no size increase compared to -O2:
OPTIMIZATION = -O2 \
-fpredictive-commoning \
-ftree-loop-distribution \
-fexpensive-optimizations \
-funswitch-loops \
-fgcse-after-reload
i’m gonna do a profiling session with a few of these flags. but expect that predictive commoning and gcse could both be pretty valuable for speed. ftree-loop-distribution is probably useless, and -funswitch-loops doesn’t seem to be doing much for this particular codebase cause we don’t like to put conditionals in loops in the first place.