Working with 8bit integers

Sputnki · 2016-10-21 08:38:34 UTC

I wanted to try something fancy: convert an input signal to 8bit (maybe applying some dither) and doing further processing on it (i wanted to build some sort of glitch-esque sample looper-mangler, hoping not to turn it into vapor).
As far as my coding understanding goes: conversion between two types of variable is resource-expensive, but i can try to contain conversions to the minimum.

now, will it be an advantage to use 8bit? (cpu-wise and memory-wise)
I was thinking to use both sdram and sram, so i'll need to rework some objects to do the trick. Should i expect an increase in cpu load?

Edit: i also noticed that uint8_t and int8_t don't seem to work. does this have something to do with the compiler or the firmware?

johannes · 2016-10-21 09:59:45 UTC

The CPU registers and SRAM memory buses are 32 bits wide, if a performance advantage can be found, it is by using a 32 bit value as a packed array of four 8 bit values. But there are only a few processor instructions that can deal with such packed arrays:

(signed/unsigned) (halving/not-halving) (add/substract)
SADD8, SSUB8, SHADD8, SHSUB8, UADD8, USUB8, UHADD8, UHSUB8
Unsigned Sum of Absolute Differences and Unsigned Sum of Absolute Differences and Accumulate
USAD8, USADA8
and bitwise manipulations like AND also operate (implicitly) on packed array of four 8 bit values

But there is no 8 bit multiply instruction, which limits the practical use of 8-bit signal processing.

instruction set reference

The situation is quite different for two 16 bit "halfwords" packed into a 32 bit word, for instance the SMLALD instruction effectively computes two multiplies and two additions in a single instruction cycle. The filter/convolution object takes benefit of this.

toneburst · 2016-10-21 11:07:16 UTC

I've used int8_t variables in custom objects a lot without any problems.

a|x

Sputnki · 2016-10-22 09:20:45 UTC

@toneburst Did you use them just internally or did you find the way to use them in output?

@johannes those instructions might suffice, actually. I might try doing multiplication outside the 8-bit world and then use the resulting variables just to store and recall data without fancy editing

toneburst · 2016-10-22 09:54:20 UTC

Both, I think. I usually bit-shifted up for output.

a|x

Sputnki · 2016-10-22 10:37:01 UTC

That's odd, i tried outputting an int8_t to an integer outlet and output was always zero

toneburst · 2016-10-22 10:49:09 UTC

That is strange. I've definitely used int outlets with 8-bit ints before. I'll dig out some code later (can't get to my Axoloti at the moment).

a|x

Sputnki · 2016-10-26 16:58:13 UTC

I started working with the 32 bit / 4 x 8bit packed array
I'm still not sure about the cpu advantage of such approach, but it definitely cuts down memory usage.
A 4-track looper is starting to take form, however i still need to get grip with the 2-complement representation of numbers.

I'll explain:
the conversion from 32bit to 8 bit is done this way:

There are 4 s-rate inputs (32bit, signed, Q27 format)
Each one is bitshifted to take the fractional part in the correct position and then bitwise and-ed with a mask

For example input1 must be fit into the leftmost 8 bits, so it's bitshifted left 5 bits and and-ed with the number 0b11111111000000000000000000000000 (it's a bit mask)

As my current understanding of 2-complement notation goes, this operation should saturate and preserve sign (but stop me here if my current understanding is wrong)
Experiments made summing two 8-bit numbers (with sign) obtained this way showed that the approach is somewhat correct.

I iterate this procedure for all 4 inputs, so

A = (input1<<5 ) & 11111111000000000000000000000000
B = (input2>>3 ) & 00000000111111110000000000000000
C = (input3>>11) & 00000000000000001111111100000000
D = (input4>>19) & 00000000000000000000000011111111

the 4x8bit packet is finally obtained or-ing A, B, C, D (actually it's all done in one line of code) and sent to an sdram array

As i mentioned before, operations made with ARM functions perform how they're expected.

The problem comes when it's time to convert the packed array back to 32bit format, in particular when the 8bit word has negative sign (which i'd be happy to use to take full advantage of SHADD8, since it's the only way i have to control volume)

what currently happens in the "unpacker" object is this:

   outlet_o1= (bitmask1&attr_table.array[pos])>>5;
   outlet_o2= (bitmask2&attr_table.array[pos])<<3;
   outlet_o3= (bitmask3&attr_table.array[pos])<<11;
   outlet_o4= (bitmask4&attr_table.array[pos])<<19;

attr_table.array[pos] is the 32bit 4x8bit packet created with the first procedure from the table record object.
Say i want to retrieve the second 8-bit word (the one which goes in outlet_o2:

attr_table.array[pos] = xxxxxxxxwordwordyyyyyyyyzzzzzzzz //(just to let you see where the 8-bit word is)
bitmask2 = 00000000111111110000000000000000
NUMBER = attr_table.array[pos] & bitmask2 = 00000000wordword0000000000000000
outlet_o2 = NUMBER << 3 = 00000wordword0000000000000000000 //take it back to Q27 format

You can see now, that if the word i'm retrieving had sign, that is forever gone thanks to those 5 zeros before it.

Is there some clever way to retain sign in this conversion?

Sorry for the very long post, you know, the 20 characters limitation

Sputnki · 2016-10-28 21:59:41 UTC

I found a relatively simple way to solve the problem: saving the 8bit words into temporary variables and then casting to int32_t and bitshifting.

i made a patch, you can find it in library/community/sptnk/4x8bit looper.axp

The next thing i want to do is to acquire transient data (i have to figure some way to implement this, i guess i might edit the table/alloc object) and make a granular quantized looper (is this going vapor?)