there's been a few interesting posts on this, and perhaps deserves a post of its own
Using floats vs fixed point maths
@thetechnobear writes:
using floats and float functions (like powf) are expensive in terms of cpu cycles
Hmmm...
Cortex M4 clock cycles:
integer operations:
ADD - 1 clock cycle
SUB - 1 clock cycle
MUL - 1 clock cycle
SDIV- 2-12 clock cycles
Single precision float operations:
VADD.F32 - 1 clock cycle
VMUL.F32 - 1 clock cycle
VDIV.F32 - 14 clock cyles
Conclusion: single precision float operations on the cortex M4 are just as fast as integer operations. Division is costly for both ints and floats and should be avoided.
I got the impression that axoloti's use of fixed point was more about the re-use of legacy code from prior platforms which didn't have FPU support- which is a pity (IMO) because DSP code written with floats is a lot easier to read.
As for powf- yeah it's slow. I've benchmarked some of my lookup table based code as being about 10x faster, while being within 5e-5 of the standard library result- which is good enough for most audio dsp.
The only ‘justification’ and refute I’ve seen is here
So not entirely clear, and certainly closer than I’d have thought - as long as your careful.
And for sure, doing everything in float would make things much easier to read and write
Some nice features from the STM32F4:
- The intrinsic functions for conversion from float to fixed point (and reciprocally) take 1 cycle. So it is not a problem to use both q27 integers and floats in an object.
- The FPU includes 32... yes, 32 float registers. These 32 registers are not used for anything else than your math/dsp while many integer registers are already used for specific purposes.
- the float division - that takes 14 cycles - can be executed in parallel with integer instructions.
So, in some cases, it is a good idea to mix fixed point and floats !
thanks @deadsy and @SmashedTransistors some really interesting points.
this post was kind of related...
in particular:
yes this is a case in point...
so, as mentioned in the link above, Olivier from MI (pinchenettes in post above), stated that he uses float because he saw little advantage of fixed point maths, due to the FPU present on the chips (, and his experiments with converting the elements resonator reinforced this for him) , and i think we can all agree, MI modules are very efficient in their use of the CPU - so show with proper use, floats can be efficient.(*)
so when we moved the MI code to axoloti, we weren't going to 'convert' the code to fixed point (that would be a complete re-write), so wrapped it with conversion calls in/out.
generally i'll say ive been happy with the performance of the MI objects.... esp. baring in mnd clouds/elements are run on the same chip as axoloti.
so its does seem a valid conclusion that using floats can yield good performance - i think so.
however, seems they do need to be used with the same care as int32.
you will see the MI also use tables rather than float point operations for things like exp, and im sure he has many other optimisations.
so perhaps the take-away, is floats are not intrinsically 'bad', but be careful what operations/functions you use... its very easy with floats to start using std functions that are costly.
also I think for clarity we need to remember to to stay with float, and do not use doubles as this I'm assuming are 64 bit, and so very expensive... aren't there times when floats get automatically coerced in to doubles... do we need to take care to avoid this?
also float constants.... i think we use the compiler options to assume floats, but we should really be explicit e.g. use 24.0f rather than 24.0
(*) as a complete aside, I think the MI code, also shows that if used 'intelligently' C++ can also be used for audio code, you just have to know what to use, and what not too.
I like to do an "objdump -d" on the produced binary. ie- see what the compiler is actually doing rather than what I think it is doing. You can learn some interesting things:
float foo1(float x) {
return x / 2.f;
}
float foo2(float x) {
return x / 10.f;
}
is compiled to...
foo1:
320: eef6 7a00 vmov.f32 s15, #96 ; 0x3f000000 0.5
324: ee20 0a27 vmul.f32 s0, s0, s15
328: 4770 bx lr
32a: bf00 nop
foo2:
32c: eef2 7a04 vmov.f32 s15, #36 ; 0x41200000 10.0
330: ee80 0a27 vdiv.f32 s0, s0, s15
334: 4770 bx lr
336: bf00 nop
So in the first case /2 gets optimized to multiply by 0.5 - good.
In the second case /10 doesn't get optimized to multiply by 0.1 - why?
Answer: Because 0.1 does not have an exact FP representation, and so it has to be left as / 10 to properly represent the will of the programmer.
So if you thought that /k (constant) would always get optimized to * 1/k you'd be wrong- and you'd get undesired div operations in your code.
24.0f rather than 24.0
float foo3(float x) {
return x * 0.1;
}
foo3:
338: b508 push {r3, lr}
33a: ee10 0a10 vmov r0, s0
33e: f7ff fffe bl 0 <__aeabi_f2d>
342: a305 add r3, pc, #20 ; (adr r3, 358 <foo3+0x20>)
344: e9d3 2300 ldrd r2, r3, [r3]
348: f7ff fffe bl 0 <__aeabi_dmul>
34c: f7ff fffe bl 0 <__aeabi_d2f>
350: ee00 0a10 vmov s0, r0
354: bd08 pop {r3, pc}
356: bf00 nop
358: 9999999a .word 0x9999999a
35c: 3fb99999 .word 0x3fb99999
Whoah. Soft emulation....
float foo4(float x) {
return x * 0.1f;
}
foo4:
360: eddf 7a02 vldr s15, [pc, #8] ; 36c <foo4+0xc>
364: ee20 0a27 vmul.f32 s0, s0, s15
368: 4770 bx lr
36a: bf00 nop
36c: 3dcccccd .word 0x3dcccccd
That's better....
I see Johannes also did some testing on floats in his jt folder library>community>jt>devel.
I dont know what to make of it though..... :=)
Thus most of the time it's better to code
x = y * (1.0f / 3.0f);
than
x = y / 3.0f;