Is using the block-processing versions as simple as loading the .ldr like a normal module? I’ve got the ‘rawsc’ module compiling but no aleph handy rn.
yes, i’m not sure what state it’s in and AFAIK ‘rawsc’ is the only one using the block processing core. but that’s how it works in theory - changes are all under the hood / in the makefile and the “API” is the same.
Is there any kind of instructions for how to use the bfin_sim
sorry but i’m not sure, rick is the one for this
are there recommendations for how to do profiling of the code to make sure the block processing are outperforming their iterative versions? I know the code can be disassembled (a link on how to would be great!)
to disassemble, something like
bfin-elf-objdump -D foo.o > foo.asm
for profiling, i honestly have stuck to the stupid method of flipping GPIO pins and scoping. this is totally effective, especially with two scope channels or more, but it’s hard to do away from a test bench.
so that said, i haven’t done any real profiling of the block processing code but i’m confident that it will be much faster based on (e.g.) the disassembly.
it also should be possible to count cycles and report them back to the avr32. i made one quick attempt at this which did not work (there is a feature branch somewhere), but i’m sure it would be easy for a smart person.
I’m not sure if bf has parallel processing
it probably doesn’t map closely to vectorized DSP unless you happen to be working with 8 bit values. there are 8x 32b data registers, and 2 (parallel) arithmetic units, each of which operates on 2x 16b operands. so single 32b x 32b operations are efficient, as are dual 16b or 4x 8b. details in ch. 2 of the hardware reference.
from the memory access standpoint
the de-facto “default” template for bf533 sets up instruction cache but not data cache. data can be explicitly located in L1 A, L1 B, or L2 (which have different performance characteristics, see the reference above) - in addition to SDRAM which is substantially slower. each module has a custom linker script, so could set data cache in one of those. i haven’t actually explored doing this. linker script examples must be hunted for, i’m afraid.
PS it would be sick to have some more aleph DSP to play with! thanks for considering it. 