Comparison of fixed vs float DSP performance from ST app note AN4841


#1

Stumbled on an ST app note AN4841 today and noticed an interesting section:

Apologies for the weird small image, struggling with this forum app, original pdf here. Obviously it's tough to draw conclusions from random benchmarks. What struck me was that F32 performance seems to be roughly on par with Q31 performance if not slightly faster. This also gives a rough sense of how much additional headroom can be expected moving to an F7 over an F4. As mentioned elsewhere, I think the H7 is where we want to be looking going forward which bumps the clock rate up to 480Mhz but maintains pin compatibility with the F7. Not available until Q3 2019 according to ST.

As an aside: in one of my own projects I haven't been able to convince myself that the extra special ARM DSP F32 routines are actually faster in practice than the vanilla ARM GCC math library; I've looked at mainly sin and sqrt. Would be interested to hear others' experiences there.


#2

One strength of the FPU is the number of available float registers (32) compared to the number of available integer register (considering that many integer registers are already used for "logistics": indexes, pointers...).


#3

I can't comment much on CPU efficiency of float vs integer but yeah this graph makes it look like a basic F7 would only be about a 50% improvement. The H7 x3 seems to be available already, which is a single core 400MHz, and two dual core models are announced for this month?
https://www.st.com/en/microcontrollers-microprocessors/stm32h7-series.html

also this says they are fully backwards compatible to the ARM M4 instruction set.
and while the bare MCU is available for about 15$ the official ST evaluation boards start at over 400$ ?
edit: i learned about ST's EVAL/DISCO difference, DISCO boards are 50$ ish
https://www.mouser.de/ProductDetail/STMicroelectronics/STM32H745I-DISCO

but yeah i agree this is the level we should aim for


#4

There's a version of the Nucleo board with an H7 for $23: https://www.mouser.com/ProductDetail/STMicroelectronics/NUCLEO-H743ZI?qs=5aG0NVq1C4zVqdFc0FeE%252Bw%3D%3D

I'm a fan of the Nucleo dev boards. This is probably where I would start.

No stock at the moment though unfortunately; looks like they're still ramping up production.


#5

The CPU clock cycle counts for 32 bit integer operations and 32 bit float operations are about the same on a cortex M4. When you throw in a few scaling shifts for fixed point math, the float will often beat it. A few other points:

1) Divides are expensive (both float and integer), better to multiply by a constant.
2) Watch out for unintended doubles. E.g. Write "2.0f", not "2.0"

If in doubt - use objdump to look at the generated assembly code.


#6

I've gotten in the habit of writing "1.0f" as well but I've been using -fsingle-precision-constant in my projects too which is a handy compiler flag to avoid unintended doubles. There's also -Wdouble-promotion.

It looks like some of the higher end F7/H7 models actually have hardware double support which would be interesting to explore.


#7

Note that float divide and square root can run in parallel with integer operations.