Practical fixed point math


#1

I'd like to ask some tips and guidance on how to convert this type math in float world into fixed point.

Let's say we have the floating point expression:

float a = 1.0f/(1.0f+alpha)

The final result is obviously bound to be <1 if alpha >=0 but the required intermediate calculation for the denominator would be higher than 1 and potential bigger than the highest value in q5.27 for example.

I thought about dividing by 8, doing the math and the rescaling back. Is this an ideal approach? I'd like to know of best practices or most common approaches typically used in DSP programming. If you could provide code that would be great.

Building a small repertoire of examples such as this could lower the barrier for new axoloti users interested in object coding.


Basic Table Help Request
#2

On http://www.musicdsp.org/archive.php , there are some fixed-point code examples. Not Axoloti-related though, but you can learn something.


#3

That link is awesome. Thank you for sharing.


#4

What's the expected range of alpha?

Using floating point division

float xf = q_to_float(x,31);
float yf = 1.0f/(1.0f+xf);
int y = float_to_q(yf,31);

measured 24 clock cycles, the floating point divide is 14 cycles by itself
But, interesting note:

Integer-only instructions following VDIVR or VSQRT instructions complete out-of-order. VDIV and VSQRT instructions take one cycle if no further floating-point instructions are executed.

Using integer division

int32_t q = 0x10000+(x>>15);
y = (0x7FFFFFFF/q)<<16;

This output only has the 16 top bits used, and the 15 LSB's of the input are ignored.
Measured: 4 clock cycles.

Using a Taylor series approximation

1/(1-x) = 1 + x + x^2 + x^3 + x^4 + ...
but this is only slowly convergent when |x| > 0.5

int32_t mx = -alpha;
y = mx + 0x7FFFFFFF;
int32_t mxx = ___SMMUL(mx,mx)<<1;
y += mxx;
int32_t mxxx = ___SMMUL(mxx,mx)<<1;
y += mxxx;
int32_t mxxxx = ___SMMUL(mxxx,mx)<<1;
y += mxxxx;
int32_t mxxxxx = ___SMMUL(mxxxx,mx)<<1;
y += mxxxxx;

measured 14 clock cycles

This can further be accelerated by replacing ___SMMUL(....,mx)<<1 with ___SMMUL(....,mx<<1) but that overflows when |x|>=0.5

All these implementations assume q31 input and output. The 1st one saturates at conversion from float to integer, the 2nd and 3rd overflow.

And probably other approximations are possible too...


#5

forgot to add the definitions of q_to_float() and float_to_q()...

__attribute__ ( ( always_inline ) ) __STATIC_INLINE float q_to_float(int32_t op1, int q) {
  float fop1 = *(float*)(&op1);
  __ASM volatile ("VCVT.F32.S32 %0, %0, %1" : "+w" (fop1) : "i" (q));
  return(fop1);
}

__attribute__ ( ( always_inline ) ) __STATIC_INLINE int32_t float_to_q(float op1, int q) {
  __ASM volatile ("VCVT.S32.F32 %0, %0, %1" : "+w" (op1) : "i" (q));
  return(*(int32_t*)(&op1));
}

#6

necro-ing this thread because I feel like opening a new one would only clog the forum.

I'm trying to wrap my head aroung some of the fixed-point math. I know the basics, but I never did any fixed-point stuff and some things I just don't get.
Lets consider the code of the math/* object:
outlet_result= ___SMMUL(inlet_a<<3,inlet_b<<2);

  1. AFAIK, inlet_a and inlet_b are signed q10.21 numbers, right? So after the shift, they are signed q7.24 and signed q8.23 respectively. After multiplying they would be signed q15.47 but SMMUL only keeps the upper 32bit (including the sign) so it really is q15.16. This is then output as a q10.21 number?! I clearly have an error in my thinking here.
  2. Why is the left shift even allowed? If inlet_a is larger than q7.21, it would overflow. Shouldn't it have to be outlet_result= ___SMMUL(__SSAT(inlet_a,28)<<3,__SSAT(inlet_b, 29)<<2);? Is the possible overflow traded in for better performance? (If so: why? It doesn't bring much benefit for k-rate signals!)
  3. According to this post the result has to be shifted to the left by 5 bits. Why shift the operands by 3 and 2 bits respectively when you could simply shift the result by 5 bits and as a result make overflows much less likely?

Thanks for anyone for helping my understand this mess.


#7

The "normal" signal range is -2^27 ; +2^27, the code was written with that in mind, so except for pathological situations you don't risk overflows.
This way you can preserve five of the least significant bits, which would be lost if you shifted left at the end of the operation.
Saving 5 least significant bits might seem overkill (and probably it is), however i think (it's just a supposition!) that if you do a lot of operations you could lose more quality in the signal if you perform the left shift in one take than if you do it in two takes.


#8

Trying to find my way into coding axoloti objects, I found this document, covering the basics of fixed point arithmetics. Not exhaustive, but certainly helpful.


#9

Thanks for the tip @otoskope

@Sputnki did a nice doc here:

On the Axoloti the __SMMUL and SMMLA functions with additional shifting allow fixed point 32 bit multiplications.

Note that the floating point calc on the Axoloti is quite efficient (1 cycle for basic for mul add sub, 14 cycle for div) and that conversions (int to/from float) are fast too (1 cycle).


#10

Thanks - great post. I had searched around, but somehow missed @Sputnki's post. I think it contains enough info for me to get going! Great.