(Modern) C Programming Tips and Tricks


#102

I blame JavaScript. One mention and blammo, chaos ensues. :wink:


#103

just a word of caution that while FPs are certainly a powerful tool, their indiscriminate use can indeed slow things down on embedded systems with limited registers/stack.

the dereference mightn’t seem like a big deal, but it’s at least one extra instruction to fetch the address before the jump, and likely a few more to save/restore existing register contents. so figure 2-5 instructions for your FP call, vs 1 to jump to a known address/offset. in a realtime inner loop this could be significant.

lots of avr32 and blackfin code around here, so this is a relevant point - skektek managed to squeeze out a critical few cycles in aleph-waves by factoring out some over-clever pointer usage!

specifically, this usage here smells off to me - saving 6 lines of code doesn’t justify it. i’d prefer a regular old switch statement with inline functions. each case is then just a conditional jump (probably a single instruction.)

in fact with hindsight, i think the usage of FPs for monome serial protocols, while it is totally fine for libmonome, was not the best choice for aleph / libavr32. (not super critical since these protocols only get used at >=1ms intervals, but still.)

(for similar reasons, embedded developers are very careful with inheritance / virtuals in c++. generally prefer templates unless code size is a huge issue.)

(as an aside, i think that almost any time FPs are called for in non-throwaway code, it is considerate to typedef them. otherwise your pointer-to-function-accepting-pointer-to-function-accepting-pointer will be totally unreadable to future maintainers.)


#104

Agreed 100%. The code in question was test code that ran on a desktop, so no concerns there, but for embedded code I wouldn’t do such a thing to save code lines.

It’s a delicate balance on the embedded hardware. I’ve been resisting the inlining of code that I write, even if in the moment it’s an obvious candidate, simply because I am not comfortable with my familiarity of pressures on the flash memory.


#105

that is a very fair point; my reference to inlining was unnecessary and a red herring. sorry!

but since we’re here, we may as well dig down on inline a bit, since there is a ton of confusion about it. i have partaken in this confusion: i first was told to use inline liberally in the 90s, as an alternative to macros, because inlining was “faster” - this not really true anymore but it is a pervasive misconception.

here are the relevant references
http://en.cppreference.com/w/c/language/inline
https://gcc.gnu.org/onlinedocs/gcc/Inline.html

in a nutshell:

  1. inline was first introduced in c++, where you often see extern inline in header files. this tells the compiler that the function may be defined in multiple compilation units (and those definitions must in fact be the same.) it is basically static for header files, and is a bit of a misnomer because it is really about linker visibility and not inlining per se.

  2. in C, you most often see inline func() {} in headers and extern inline func; in .c files.

  3. there are other subtle differences. in C, funtion-local statics from different definitions of an inline function are distinct; in C++ they are the same!

  4. in C you also often see static inline func() {} in .c files and this is probably the usage we’re both thinking of.

the gcc docs make things a little more specific (thankfully)

When a function is both inline and static, if all calls to the function are integrated into the caller, and the function’s address is never used, then the function’s own assembler code is never referenced. In this case, GCC does not actually output assembler code for the function, unless you specify the option -fkeep-inline-functions.

most of the uses of inline in my code (relevantly, aleph and libavr32) are in this context. they are always very short functions, usually wrapping a single function call for the purposes of readability. i would argue that this usage does not actually increase the code size - no ASM is emitted for the function, and each call to it in the code is simply replaced with a different, less readable function call.

that said, it’s not really a useful optimization either way since the compiler will decide on its own what to inline (thank you very much) and in addition will consider optimization flags at compile time.

so on balance i think its very rare that using inline is actually a good idea, except in specific and unusual idioms (like case 2 above). IIRC it is actually necessary in c++ for certain template specializations… but that’s “OT” :slight_smile:

TL/DR: you’re right - kids, don’t use inline unless you really need it, and you probably don’t.


BTW, i should say that i in no way intended to nitpick your code, but just wanted to weigh in (alongside others) on the benefit / drawbacks of function pointers in embedded contexts. not because you haven’t considered it, but because this thread is “tips and tricks” and has a pedagogical / dialogue function.


#106

inline can be quite useful on AVR (the 8-bit) as an alternative to macros, which I try to avoid because they are confusing to debug and not generally type safe. The function call overhead on those processors is actually quite high. I often pair it with __attribute__((always_inline)) for some reason avr-gcc isn’t very good at figuring out that it should inline functions, maybe it’s because I use -Os or -O1 most of the time. I always check with avr-objdump -S to make sure things were compiled as expected. It’s a great way to find inefficient assembly code, though heavily optimized bits can look super confusing.


#107

interesting!
but yeah, i’d try and address that with gcc flags. in order of aggressiveness:

-finline-small-functions
-finline-functions
-fearly-inlining
-findirect-inlining

i think these are all disabled by default at -Os and -O1…

the general argument (take it or leave it) being that its the compiler’s job to estimate the overhead of a function call and factor that into optimization passes.

but yeah if you want to specify per-function behavior then that’s what __attribute__((always_inline)) and __attribute__((noinline)) are for… and the attributes exist because the inline keyword doesn’t generally do what it says on the tin (as you’ve in fact pointed out!)


#108

The flags can be good for general purpose use, and I’ve definitely experimented a fair bit with them. I still have cases where gcc won’t inline my 4 line function used in a tight loop (usually it 's something that does some delicate register dance that I want to abstract out) and I end up incurring a lot of function call overhead (mostly register saves on the stack). That’s when I get out the always_inline hammer…


#109

I, too, am having problems suggesting to avr32-gcc that something should obviously be inlined.

I’m taking the static inline in header file approach, as it seems to be less clunky than the always_inline attribute. :man_shrugging:


On a different subject, everyone should get to know cscope well. It will save you a lot of time parsing grep lines or the output of other search utilities.

I’m using it to identify candidates for inlining at the moment, searching the symbol table for small functions and using cscope to see how many places the function is called from.


#110

AFAIK static inline in a header only guarantees that each compilation unit gets its own copy of the function. It’s still up to the compiler to decide if it should inline it. Really it’s an alternative to LTO for those compilers that don’t support it (e.g. avr32-gcc).


#111

yes, somehow my attempts to clarify use of inline have totally backfired and we keep talking past each other.

main point: inline doesn’t force inlining. only compiler attributes and flags do that. doesn’t even appear to be effective as a “hint” - though specific compilers do make specific promises about specific use cases (e.g., the case with gcc referenced above.)

one reason i brought this up is that it ties into the earlier discussion of the “one definition” rule. inline can be a loophope; with extern it can create an edge case where it is literally undefined which of two definitions will be used; scary! here’e the example from the link above:

inline const char *saddr(void) // the inline definition for use in this file
{
    static const char name[] = "saddr";
    return name;
}
int compare_name(void)
{
    return saddr() == saddr(); // unspecified behavior, one call could be external
}
extern const char *saddr(void); // an external definition is generated, too

this could mess you up especially bad if you rely on a single definition of the static variable “name”.

because i frankly find the definition rules to be beyond confusing, and because inline doesn’t mean what most of us think it should mean, i’m trying to train myself away from its reflexive use, even in the the benign case of static inline on local functions in .c files. the only time i think it’s really needed is if you want to put definitions in headers, which at this point in time i don’t think is great practice for C (great for C++.)

i’ll stop belaboring this now.


second that use of cscope is almost mandatory, at least when editing large projects on emacs. in a pinch, the ack extensions are also handy.


#112

When it’s needed (say, because your embedded platform has an ancient compiler), the only alternative is to use macros. Granted, that’s kind of what they were invented for, but I find their sometimes-unexpected limitations and always-ugly syntax to be a worse alternative.

My last 2 cents on this matter :slight_smile:


#113

while we’re derailing the C programming tips thread, i guess i should reveal that the next monome thing heavily uses lua.

That’s exciting!(when i worked QA for video games, i tested some maps using mods scripted using Lua, and found Lua was simple and concise to learn… such a smooth/sleek language…) Can’t wait to see the next monome thingy and how it works with Lua!

(Also, while we’re derailing the C programming tips thread, just want to mention that y’all are a bunch of geeks :stuck_out_tongue_winking_eye: )


#114

… neeeeeeeeeeeeeeeeerrrrd. (yes this is a complete sentence)


#115

[joke about Lua’s indices starting with 1 and the grid’s first column and row unlit here]


#116

pretty sure this thread is where this goes…


#117

This seems like the best place for this, a little bit of light Sunday reading for y’all:

I came across it from Hackaday, it’s a comprehensive overview of a bunch of $1 microcontrollers.

The only thing missing from my point of view, was a discussion of using the respective toolchains without an IDE. One of the factors that makes the Monome modules so accessible for hacking on is that it’s relatively easy to get a toolchain setup (on all 3 OSs), no complex IDE required. Mutable Instruments modules are the same, interestingly Expert Sleepers Disting uses the far more proprietary PIC chips and hasn’t seen anywhere near as much interest.


#118

Forgive me if this has been asked or answered in this thread (or elsewhere), I looked - but I can’t find anything. I went down the rabbit hole of trying to fix a bug on Teletype. I know microcontrollers as Arduinos, where you barely need to know it’s a microcontroller, it has all the features you’d expect from a modern programming language. I’m also a 15+ year lover of Python and other high level, mostly web-y programming languages.

Which leads me to the new and exciting puzzle I discovered this weekend. You can see it happen in this Github issue. How do I write C without floating point numbers? How do I program without floating point numbers? My mind is completely blown! I have a bit of formal computer science…I went to CS grad school for ~ a year or so before dropping out for job reasons. This kind of low level, restricted C for embedded systems is super interesting, but googling and stack-overflowing isn’t cutting it for me. Any recommendations for a place to start?

I guess what I’m asking for is (Not Modern) C Programming Tips and Tricks, specifically - how do I live without floating point data types? My mind-cup is empty. I really want to learn this stuff, just need someone to help me take a few steps in the right direction.


#119

in the avr32 codebase we (mostly) use the mighty libfixmath library, which has its own wikipedia entry. its source code rewards examination, as do the wikipedia links to such topics as Qx.x arithmetic.

the basics are really pretty easy though. main thing to keep in mind is that the processor doesn’t know or care about where you put the decimal place, so you have to keep track.

  • when you add or subtract two fixed-point numbers, the result requires at most 1 more bit.

    • if the operands have the same number of fractional bits, then the result should have that number of fractional bits as well.
    • if they are different, then take the one with fewer fract bits and “pad” the trailing places by shifting left.
  • when you multiply two fixed-point numbers (with the same radix), the result takes up to 2x the number of bits as the operands. [this is just your basic exponential arithmetic: (2^N) * (2^M) = 2^(N+M).] so basically:

    • perform regular 2s-complement integer multiplication
    • the number of fractional bits in the result is the sum of the fractional bits in the operands.
    • and you want to restrict the range of the inputs to prevent overflow (unless your processor has a saturating multiply or extra-wide MACC register.)
  • division is a tiny bit more complicated because you have to pre-scale the operands and then round the intermediate result. see the links.

(oh, here’s a good blog post with a good brief introduction by example. it addresses the general case where each operand might have a different radix: https://www.dsprelated.com/showarticle/139.php)

for complex functions, trig and log and so on, i would not try and re-invent the wheel. people have been refining techniques for approximating these things for many decades (usually with some kind of polynomial expansion) and libfixmath is an excellent contemporary collection of those techniques.

if anyone wants to get deep into optimizing any parts of that for the avr32 instruction set, i’m sure we would all be grateful! (“we” being not just the monome dev community but anyone working with avr32.)


i should also point out that you can absolutely do floating-point calculations on the avr32. it’s just slow. maybe 20 instructions for a multiply. but in many cases it is still the right solution. e.g.:

  • if flash space is more limited than RAM and you can spare a few hundred microseconds on startup, it might better to compute a LUT on the fly than to include it in code.
  • sometimes it’s very difficult or impossible to get iterated algortihms to work correctly on fixed point with a small resolution. (for example the maps we discussed in the CHAOS thread; also applies to somes kinds of iterated DSP filters.)

oh, for a stripped-down example of fract arithmetic, check out the blackfin intrinsic emulators in the aleph repo:
[ https://github.com/monome/aleph/blob/dev/utils/bfin_sim/fract_math.c ]

(blackfin is a DSP designed primarily for 1.15 and 1.31 representations; that is, there is a sign bit, and all other bits represent the fractional part; so the domain is [-1, 1).)


and TIL that Doom used 16.16 for all its non-integer calculations!

if it’s good enough for DOOM… it’s good enough for you


#120

I wouldn’t have a problem with you using floating point calculations for the JI OP. It strikes me as an OP that will run at sequencer rates, so it’s alright if it’s a bit slow. Needing to use log does through a spanner in the works somewhat.

I think this is the version of logf that you’d end up using:

If you’ve got access to a serial cable, see if you can benchmark it on hardware?

Otherwise think about using a LUT to remove the need for log, but as @zebra mentions you have to be considerate of how much space the table requires in ROM (or possibly keep it in RAM, and calculate it on boot up, there is a lot of unused RAM…)

Ultimately it’s a compromise make it fast but use too much ROM/RAM (for something that isn’t used a lot), or make it slow.

If you’re trying to determine what counts as slow, consider how long it takes to run an i2c OP given that the i2c bus runs at ~100KB/s, and the AVR32 we use runs at 60MHz.


#121

The original implementation uses a LUT. Might just fix that up. I’m definitely interested in understanding more about the constraints of the AVR32, so might try different things. I’m still getting used to the tooling and basic stuff like GCC and Makefiles though, so it’ll be a while before I actually contribute PR’s.

Thanks for the advice, I think I’ve got good things to look at.