I started working with the 32 bit / 4 x 8bit packed array
I'm still not sure about the cpu advantage of such approach, but it definitely cuts down memory usage.
A 4-track looper is starting to take form, however i still need to get grip with the 2-complement representation of numbers.
I'll explain:
the conversion from 32bit to 8 bit is done this way:
There are 4 s-rate inputs (32bit, signed, Q27 format)
Each one is bitshifted to take the fractional part in the correct position and then bitwise and-ed with a mask
For example input1 must be fit into the leftmost 8 bits, so it's bitshifted left 5 bits and and-ed with the number 0b11111111000000000000000000000000 (it's a bit mask)
As my current understanding of 2-complement notation goes, this operation should saturate and preserve sign (but stop me here if my current understanding is wrong)
Experiments made summing two 8-bit numbers (with sign) obtained this way showed that the approach is somewhat correct.
I iterate this procedure for all 4 inputs, so
A = (input1<<5 ) & 11111111000000000000000000000000
B = (input2>>3 ) & 00000000111111110000000000000000
C = (input3>>11) & 00000000000000001111111100000000
D = (input4>>19) & 00000000000000000000000011111111
the 4x8bit packet is finally obtained or-ing A, B, C, D (actually it's all done in one line of code) and sent to an sdram array
As i mentioned before, operations made with ARM functions perform how they're expected.
The problem comes when it's time to convert the packed array back to 32bit format, in particular when the 8bit word has negative sign (which i'd be happy to use to take full advantage of SHADD8, since it's the only way i have to control volume)
what currently happens in the "unpacker" object is this:
outlet_o1= (bitmask1&attr_table.array[pos])>>5;
outlet_o2= (bitmask2&attr_table.array[pos])<<3;
outlet_o3= (bitmask3&attr_table.array[pos])<<11;
outlet_o4= (bitmask4&attr_table.array[pos])<<19;
attr_table.array[pos] is the 32bit 4x8bit packet created with the first procedure from the table record object.
Say i want to retrieve the second 8-bit word (the one which goes in outlet_o2:
attr_table.array[pos] = xxxxxxxxwordwordyyyyyyyyzzzzzzzz //(just to let you see where the 8-bit word is)
bitmask2 = 00000000111111110000000000000000
NUMBER = attr_table.array[pos] & bitmask2 = 00000000wordword0000000000000000
outlet_o2 = NUMBER << 3 = 00000wordword0000000000000000000 //take it back to Q27 format
You can see now, that if the word i'm retrieving had sign, that is forever gone thanks to those 5 zeros before it.
Is there some clever way to retain sign in this conversion?
Sorry for the very long post, you know, the 20 characters limitation