Aleph: Block Processing & Testing / Profiling code


#1

Calling @rick_monster & @zebra on this one.

I’ve been spending the last weeks building vector math libraries & dsp libraries for some stm32 projects I’m working on and it occurs to me the code is largely reusable on the aleph directly. Was hoping for a little guidance on how to get setup for testing & profiling the code, along with some ideas on what needs to happen in the libraries. I figure the starting point would be rolling some vector math functions that utilize the bf-intrinsics. I’m not sure if bf has parallel processing, but even on stm32 without it, there’s some subtle performance boosts from the memory access standpoint. Beyond that I’d like to start rebuilding the dsp libraries for block processing as there’s likely many more gains to be had there.

So my questions:

  1. Is using the block-processing versions as simple as loading the .ldr like a normal module? I’ve got the ‘rawsc’ module compiling but no aleph handy rn.

  2. Is there any kind of instructions for how to use the bfin_sim to test code in the box? Further to clarify, is the sim audio based, or just visual / debug-ready (sorry for the ignorance here).

  3. Along these same lines, are there recommendations for how to do profiling of the code to make sure the block processing are outperforming their iterative versions? I know the code can be disassembled (a link on how to would be great!) but is there some other kind of automated profiler?

Those are the big ones, and from there I guess it’s just a matter of any guidance / caveats / work already done somewhere and unnecessary to revisit…


Best representative threads
#2

Is using the block-processing versions as simple as loading the .ldr like a normal module? I’ve got the ‘rawsc’ module compiling but no aleph handy rn.

yes, i’m not sure what state it’s in and AFAIK ‘rawsc’ is the only one using the block processing core. but that’s how it works in theory - changes are all under the hood / in the makefile and the “API” is the same.

Is there any kind of instructions for how to use the bfin_sim

sorry but i’m not sure, rick is the one for this

are there recommendations for how to do profiling of the code to make sure the block processing are outperforming their iterative versions? I know the code can be disassembled (a link on how to would be great!)

to disassemble, something like
bfin-elf-objdump -D foo.o > foo.asm

for profiling, i honestly have stuck to the stupid method of flipping GPIO pins and scoping. this is totally effective, especially with two scope channels or more, but it’s hard to do away from a test bench.

so that said, i haven’t done any real profiling of the block processing code but i’m confident that it will be much faster based on (e.g.) the disassembly.

it also should be possible to count cycles and report them back to the avr32. i made one quick attempt at this which did not work (there is a feature branch somewhere), but i’m sure it would be easy for a smart person.

I’m not sure if bf has parallel processing

it probably doesn’t map closely to vectorized DSP unless you happen to be working with 8 bit values. there are 8x 32b data registers, and 2 (parallel) arithmetic units, each of which operates on 2x 16b operands. so single 32b x 32b operations are efficient, as are dual 16b or 4x 8b. details in ch. 2 of the hardware reference.

from the memory access standpoint

the de-facto “default” template for bf533 sets up instruction cache but not data cache. data can be explicitly located in L1 A, L1 B, or L2 (which have different performance characteristics, see the reference above) - in addition to SDRAM which is substantially slower. each module has a custom linker script, so could set data cache in one of those. i haven’t actually explored doing this. linker script examples must be hunted for, i’m afraid.

PS it would be sick to have some more aleph DSP to play with! thanks for considering it. :sound:


#3

so to use bfin_sim you can ‘make sim’ in, e.g fmsynth module. It compiles a jack client (executable is fmsynth_jack) which receives osc messages on port 7700 (iirc) . Haven’t done anything with block processing though, but it’d be easy enough to modify my jack wrapper:

https://github.com/rick-monster/aleph/blob/dev/utils/bfin_sim/main.c


#4

oh and fogregot to say the osc ormat you need to send is like this:

oscsend localhost 7770 “/param” ii 8 2147483647

so you send two integers - first is the param you want to bang, second is the value to bang it with…


#5

Thank you both so much! I’ll report back when I get in front of my dev setup at home, but all sounds pretty approachable. Probably long overdue for me to drill a couple holes in the aleph for scope probe points…