Help make object efficient: Trade offs between dsp and sram

Ke10g · 2019-12-20 09:38:09 UTC

Hello wonderful Axo community.

I'm new to coding. I've made a couple of variants of an object meant to encapsulate the functionality of a subpatch I was using that was too sram expensive.

The first version I made declared several variables and was not very efficient memory wise.

So I tried to put the whole thing into one equation, without declaring any variables and this saves a lot of memory (only using about half), but uses too much DSP resources and I end up overloading.

would like to know what would be the best way of approaching this.

Here is the original code with a bunch of variables, hungry for memory:

int te1 = attr_t1.array[_USAT((attrtarget),attr_t1.LENGTHPOW)]<<attr_t1.GAIN;
int te2 = attr_t2.array[_USAT((attrtarget),attr_t2.LENGTHPOW)]<<attr_t2.GAIN;
int tl1 = attr_t3.array[_USAT((attrtarget),attr_t3.LENGTHPOW)]<<attr_t3.GAIN;
int tl2 = attr_t4.array[_USAT((attrtarget),attr_t4.LENGTHPOW)]<<attr_t4.GAIN;
int tx = attr_t5.array[_USAT((attrtarget),attr_t5.LENGTHPOW)]<<attr_t5.GAIN;
int ty = attr_t6.array[_USAT((attrtarget),attr_t6.LENGTHPOW)]<<attr_t6.GAIN;
int tz = attr_t7.array[_USAT((attrtarget),attr_t7.LENGTHPOW)]<<attr_t7.GAIN;

int bpe1 = (te1-(1<<26))<<1;
int bpe2 = (te2-(1<<26))<<1;
int bpl1 = (tl1-(1<<26))<<1;
int bpl2 = (tl2-(1<<26))<<1;
int bpx = (tx-(1<<26))<<1;
int bpy = (ty-(1<<26))<<1;
int bpz = (tz-(1<<26))<<1;

int se1 = attr_ms.array[_USAT((0),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int se2 = attr_ms.array[_USAT((1),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int sl1 = attr_ms.array[_USAT((2),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int sl2 = attr_ms.array[_USAT((3),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int sx = attr_ms.array[_USAT((4),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int sy = attr_ms.array[_USAT((5),attrms.LENGTHPOW)]<<attr_ms.GAIN;
int sz = attr_ms.array[_USAT((6),attrms.LENGTHPOW)]<<attr_ms.GAIN;

int e1 = ___SMMUL(bpe1<<3,se1<<2);
int e2 = ___SMMUL(bpe2<<3,se2<<2);
int l1 = ___SMMUL(bpl1<<3,sl1<<2);
int l2 = ___SMMUL(bpl2<<3,sl2<<2);
int xt = ___SMMUL(bpx<<3,sx<<2);
int yt = ___SMMUL(bpy<<3,sy<<2);
int zt = ___SMMUL(bpz<<3,sz<<2);

outlet_total = e1 + e2 + l1 + l2 + xt + yt + zt + inlet_initial;

Here is the second one that is one equation, more memory efficient, but dsp hungry.

outlet_total =
(__SMMUL((((attrt1.array[USAT((attr_target),attr_t1.LENGTHPOW)]<<attr_t1.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 0),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt2.array[USAT((attr_target),attr_t2.LENGTHPOW)]<<attr_t2.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 1),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt3.array[USAT((attr_target),attr_t3.LENGTHPOW)]<<attr_t3.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 2),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt4.array[USAT((attr_target),attr_t4.LENGTHPOW)]<<attr_t4.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 3),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt5.array[USAT((attr_target),attr_t5.LENGTHPOW)]<<attr_t5.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 4),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt6.array[USAT((attr_target),attr_t6.LENGTHPOW)]<<attr_t6.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 5),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
(__SMMUL((((attrt7.array[USAT((attr_target),attr_t7.LENGTHPOW)]<<attr_t7.GAIN)-(1<<26))<<1)<<3,(attr_ms.array[USAT((+ 6),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2))+
inlet_initial;

How should I approach making this more efficient. They both work. But I need 192 of them in my patch, so they need to be as efficient as possible.

I still lack a lot of the coding concepts. Perhaps there is another assembler instruction I can use?
Any help welcome!

Ke10g · 2019-12-20 10:02:34 UTC

Ok I just discovered ___SMMLA and tried to implement the object in a third way, using accumulation.

int32_t accum = __SMMUL((((attrt1.array[USAT((attr_target),attr_t1.LENGTHPOW)]<<attr_t1.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 0),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2);
accum = __SMMLA((((attrt2.array[USAT((attr_target),attr_t2.LENGTHPOW)]<<attr_t2.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 1),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = __SMMLA((((attrt3.array[USAT((attr_target),attr_t3.LENGTHPOW)]<<attr_t3.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 2),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = __SMMLA((((attrt4.array[USAT((attr_target),attr_t4.LENGTHPOW)]<<attr_t4.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 3),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = __SMMLA((((attrt5.array[USAT((attr_target),attr_t5.LENGTHPOW)]<<attr_t5.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 4),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = __SMMLA((((attrt6.array[USAT((attr_target),attr_t6.LENGTHPOW)]<<attr_t6.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 5),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = __SMMLA((((attrt7.array[USAT((attr_target),attr_t7.LENGTHPOW)]<<attr_t7.GAIN)-(1<<26))<<1)<<3, (attr_ms.array[USAT((+ 6),attr_ms.LENGTHPOW)]<<attr_ms.GAIN)<<2, accum);
accum = inlet_initial+accum;

outlet_total = accum;

This one compiles, but does not function properly (would have to dig a bit), however there is no performance increase compared to the second version of the object above.

Is there anything I can do to make this more efficient?

the basic idea is I'm receiving data from attributed tables, converting them to bipolar, multiplying these values with data from another table, then summing the results all up with the initial value coming in the inlet.

thanks for your help!

thetechnobear · 2019-12-20 11:32:06 UTC

KYou’d be better to post the object and the patch.

For easier sharing/debugging
use an embedded custom object, in a single patch that demonstrates the issue , in context, with as little extra unrelated stuff as possible.

reason to share a single embedded patch is axoloti is able
open a single patch via a url.

The easier you make it for others to replicate your issue, the more likely they dive in an help

Edit: make sure your ‘test’ patch does an something, when it works ... as the person helping will want to know if it’s working ... doing stuff in isolation is no fun

Ke10g · 2019-12-20 11:39:32 UTC

test modreceive explicit.axp (6.5 KB)

test modreceive compact.axp (6.2 KB)

sorry, i'm new to this... here are the two first attempts...

thank you so much for your patience with me. You are a wonderful community and I am learning a lot.

Ke10g · 2019-12-20 13:29:24 UTC

It is difficult to show you a stripped down version that does something... The whole patch is highly dependent on the hardware i'm using...

But perhaps I could I explain the steps better, and someone will think of a way to improve it.

This is a mod matrix. It distributes 7 modulation sources to (ideally) 44 different destinations.

In my main patch, I have 7 tables filled with the amounts in which each mod source is applied to each mod destination.

Then, inside my voice patch, I have a 44 mod receive objects (the ones I shared above), that receive the data from these, each destination using an offset corresponding to it to access the relevant amounts of modulation. It also receives another table, specific to each voice (I have four voices), that is filled with the current modulation parameters for that voice (this is MPE, so it is different for each voice), and multiplies the amounts specified in those first 7 tables with the actual current value of those modulation sources. Then the results are added up including the "initial value" (which I specify directly with my interface knobs, etc...).

My first go at thiswas just to patch it:

...and inside that subpatch this is happening:

I hope this helps understand what I'm trying to achieve. This first attempt to just patch the functionality failed due to sram overflow. So I decided to code a custom object (in the two files shared above). But now it seems that I still can't get much more than Since I need so many of these, inside the voice, and there are four voices, and all this multiplication going on it eats up a lot of dsp.

thetechnobear · 2019-12-20 14:29:00 UTC

honestly i don't really know what your trying to do... but it does look pretty inefficient

(also as i mentioned, the approach you are using with attributes will NOT work)

sorry, I don't have time to create a custom object to deal with what you are trying to do,
but Ive put together a quick example, of the 'singleton technique' , i described above...
and how to do it in axoloti.

you need to place the stuff in objects, into your objects directory... (*)
this creates two custom objects. single_set and single_get... (actually single_set is a bit misnamed it should be something like single_alloc)

then there is a test patch in patches/tests/singleton.axp

(*) you will need to then use reload objects, or restart axoloti editor to pick them up.

what does it do?
it defines an c++ object called singleton, that can store 100 signed ints, and allow setting and getting,
which can be used anywhere in a patch, and will always refer to the same object.

ive then created two trivial objects, which demonstrates how to use it.
single_set , actually creates the Single, and then sets one of its elements (3) (due to the create, you can only use it once in the patch )
single_get, just return the 3rd elements value.

THIS IS NOT INTENDED TO BE USEFUL - its a demo of the technique.

Ive tried to make the c++ as simple as possible, so you should be able to adapt to your needs.

how would you go forward with this?

theres a few approaches...
(listed in least efficient, but easier to implement order)

a) you could modify this, to have single_alloc, single_set , single_get axoloti objects, with inlets and outlets - so basically like the table object.

b) create custom object(s) using singleton
so actually build your functionality into some more high level objects, and just use these in their implementation

c) create custom object(s) using technique
really the idea of this is to demonstrate the technique of using singletons,
the approach could be used to handle much more complex data structures that are relevant to your problem space.

id personally go with (c), but you might find this a bit hard to do at this stage...so perhaps start with (a) and move on.

I hope this is helpful... sorry I don't really have the time to do more,
if your new to C/C++ I recommend you go search on the internet for a tutorial,
Im sure, there are some excellent ones around that can do a much better job of teaching you the basics than I can over a forum

tip: whilst its tempting to just start cut n' pasting code around, its not a good idea.
unlike visual coding , C++ has plenty of potential to bite you in the a** if you do this,
and frankly it'll get pretty frustrating if you don't have a little grounding int c/c++.

note: if you use these objects thousands of times in the patch, you will probably want to add attribute((noinline)) after the get/set methods in singleton.h ,
this reduces performance (so dont do it unless you need too), but means it generates less code.

tele_player · 2019-12-20 14:45:25 UTC

I think of this as the Arduino syndrome.

Ke10g · 2019-12-20 15:25:30 UTC

This is incredible! Thanks for taking the time to show me this...
I'll dive right in...

Also:

Actually it does work: since this way i didn't have to bury my table/reads two patcher levels deep. It works just fine accessing tables from within the mpe voice patch, just not two levels down. However, it is still inefficient. I've got 16 destinations working with this set up, but I would like to get more going.

I'll look into the singleton technique. I didn't know where to start, and this is an excellent start.

THANK YOU TECHNOBEAR!

Ke10g · 2019-12-20 15:36:07 UTC

damn... it was made with version 2.0.0.... i can't open it.

thetechnobear · 2019-12-20 16:02:04 UTC

yeah, as i said, im focusing on 2.0.0 now... its better for me to be testing, and helping improve that... than focusing on 1.0.12. (not saying that's true for you/new users - as its not 100% stable at the moment)

I'll see if i can back port this back to 1.0.12

Ke10g · 2019-12-20 16:17:25 UTC

oh that would be great, if you have time.

I've tried installing 2.0.0 just to take a look at it, but nothing is compiling. I'm getting this:

Generate code complete
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
Compiling patch failed

thetechnobear · 2019-12-20 16:26:02 UTC

dont use 2.0 yet... its going to create too much confusion if users new to axoloti start using it,
as it'll be hard to know if something they say is 'user error' or a new bug.

ok, Ive done a 1.0.12 version

this is actually a little more logical anyway, as it has 3 objects
single/create, single/set, single/get

note: I repeat, the set/get are 'hardcoded', im not trying to create something useful here, rather demonstrate the technique.. I leave it to the 'reader' to make them useful

Ke10g · 2019-12-20 16:30:20 UTC

That's what I was thinking.

Thanks so much! I'll check it out.

Ke10g · 2019-12-20 18:21:53 UTC

Ok I have a problem: I can't reflash the firmware back to 1.0.12.

It says it is ready to connect when the led stops blinking and is steady green. But it does notflash. It is just immediately green and nothing happens.

EDIT: downgrading worked from the 2.0.0 application. So i'm good to go...

Ke10g · 2019-12-20 19:00:41 UTC

This is really interesting...

I'm having trouble getting it working though. Maybe there is something else I need to understand.

I've tried just adding an inlet to the set object in the help patch you sent me. I then transferred the init code to the k-rate code and swapped the "17" with "inlet_i1"...

It fails to compile, but weirdly so. It just gives me the following:

Creating directory on sdcard : /singleton
Done creating directory
Changing working directory on sdcard : /singleton
file error: FR_NO_FILESYSTEM, filename:"/singleton"
Done changing working directory
file error: FR_NO_FILESYSTEM, filename:"/singleton"

EDIT: to be clear, on its own your help patch works and the dial moves to 17 when I go live. But with this minor modification it fails to compile.

EDIT2: OK actually, as soon as I "embed as patch/object", without any modification, it fails to compile... perhaps this is just a file directory issue?

thetechnobear · 2019-12-20 20:21:23 UTC

you will not be able to just embed a new copy of the object , as it wont find the header file.
you'll have to edit the example.

Ke10g · 2019-12-21 15:19:04 UTC

Hi @thetechnobear,

I'm having trouble reimagining my architecture using singleton patterns. I wonder if you could set my intuitions straight before I get too deep into a dead end.

My intuition is that I should have a singleton for each of my mod destinations, which themselves accumulate the values from singletons corresponding to each modulation source and singletons for the relative amounts in which these mod sources are applied to each destination.

So in my synth I would have
-2 "global" (as in, not per voice) mod source singleton values: LFO1, LFO2
-5 per voice mod source singleton values: envelope 1, envelope 2, MPE x, MPE y, MPE z. (and for my synth with 4 voices, that would mean 20)
- 308! (44 mod destinations * 7 amounts of modulation applied for each) mod amount singletons values

It seems like I am just copying the tables paradigm into this one, but though I may save on some cycles because I will have less table parsing to do, I'm not sure I am getting much out of it with this approach. And because of the MPE, there will still be some parsing to do, since I will have to index which singletons are being read depending on the voice.

Ke10g · 2019-12-21 15:46:23 UTC

Or perhaps the best way is to just use the singleton pattern to distribute the 308 amounts?

jaffasplaffa · 2019-12-21 16:00:49 UTC

A question that pops to my head is:

Do you really need a 308 destinations mod matrix?

When I started out building patches for Axo, I wanted to build patches that can do everything. But after a while I realised it is probably not feasible to do so, it makes more sense to make more specialised patches with only what is needed for a certain task and then make a patch for each task and switch between them.

And I think you probably need to downscale your expectations a little bit. Or buy the new board from Urklaing when it is released in a not so far distant future. That board has around 4 X power of an original Axoloti.

But I do hope you get what you want, but you should probably also be realistic about it.

Ke10g · 2019-12-21 17:34:01 UTC

ha!

I know you can't have everything... I currently have 119 destinations working smoothly with my somewhat clunky implementation of mod matrix using __SMMLA accum, etc... and a combination of sends and tables...
It works wonderfully and sounds great. 4 voices, 3 oscillators per voice, lowpass per voice, high pass per voice, 2 envelopes per voice, two global lfos, plus mpe x, y, z. Delay.

I love thins thing!

But like technobear said, I'm just trying to squeeze out as much as possible... I'm about 85% dsp, but pretty close to maxing out sram. So that is why I was investigating other ways to do things. I would like to have reverb, and various wave folding... but because of my implementation, each time I add a feature, I need to add mod destinations so it quickly increases workload...