What is the maximum size for the binary image for a patch? I'm having a problem where I have a patch where the resulting binary is 7484 bytes in size which runs fine, but when I add some code and the size increases to 7612 bytes at least several of the objecs don't run (most likely none of them) - I have LogTextMessage() calls in the k-rate code and the printouts simply don't appear, and the objects don'tt seem to do anything either. I don't know if it's the size that makes the difference, but the added code is basically a duplicate of something that's already there, so that in itself should not cause any failures. The Axoloti doesn't crash, and if I shorten the code again and click Live again it runs as it did before.
I'm a bit deep in here as I'm running 1.0.9 on a different hardware than the Axoloti Core, so I realize I'm largely on my own, but I'm trying to collect some information so i can move forward.
EDIT: It definitely seems to be some sort of size issue. I have a couple of local functions in my code (defined as Local Data in the object editor), and if I add __attribute__(( noinline )) to them the patch binary shrinks to 7148 bytes and runs as it should. I'm still worried about the apparent size limitation though as was expecting it to be much larger.
yes, I hit this a while back... the issue, is as you have spotted, a lot of things are inlined (also see the -O flag that its compiled with), obviously for performance to minimise function calls, but it means patches get large pretty quickly.
I did as you did, in my own custom code, i started removing the inline, especially for large functions that I knew weren't being called frequently. this resolved it for me...
I do wonder though, if a sightly more flexible approach might be needed in the long run, so the balance between patch size and performance can be fine tuned depending upon application.
(theres also a comment in the ld file that this could be increased... I guess because there is still a small amount of memory unmapped)
Yes, I was hoping to see this defined in STM32F407xG.ld . However, this is what the memory setup looks like in that file:
MEMORY
{
flash (rx): org = 0x08000000, len = 896k /* last 128kB for patch storage*/
ram : org = 0x20000200, len = 0x0AE00 /* 44k - first 0x200 for vector table */
SRAM2 : org = 0x2001E000, len = 0x00002000 /* second half (8kB) of SRAM2 for DMA*/
CCMRAMEND : org = 0x1000C000, len = 10k /* for stacks */
sdram : org = 0xC0000000, len = 0x400000
}
There are a couple of things that strike me as odd here.
First of all, according to this specification, all RAM is not used by far. The STM32F427 has 192k of RAM starting at 0x2000 0000, but according to the spec above, only the area up to the end of SRAM2 is used, a total of 128k (which is fine for me in a way because I'm running Axoloti on an STM32F405 which only has 128k of RAM). (In addition to the system RAM starting at 0x2000 0000 there is the 64k CCM RAM starting at 0x1000 0000, of which 10k appears to be allocated in the .ld file for stacks). (Then of course there's the SDRAM which is external to the chip too, as well as the Flash memory).
Of the avilable on-chip RAM, only 44k is used at the beginning (the 'ram' segment) and 8k at the end (the 'SRAM2' segment). Or is the area in between implicitly reserved for patch usage? I have not delved into this. The .ld file just describes where stuff is located so that the linker building the firmware can figure out where to put things, the firmware itself could then in principle use the rest of the RAM as it sees fit, even though there's no definition for the corresponding area in the .ld file.
The thing I really find odd is that I don't get an error message in the Axoloti console saying, for instance "patch to big". In fact, it appears to start the patch - I can see a printout I've put in one of my objects in the Init section - but just doesn't run everything. I would have been less surprised if it had just crashed, after having put the patch in nonexistent memory (since I have less memory in my system than the Axoloti Core) and attempted to run it.
Ah well, all these questions could be answered by judicously studying the code of course...
EDIT: I now saw there's a ramlink.ld which is used for the actual patches. The 0xB000 bytes reserved for patch storage in that file seems much larger than the 7k barrier which I hit above though.
yeah, I saw the same behaviour... (iirc, when the patch gets very large, then it does have a compile error) what i did was start looking at the elf file, to see which sections were getting too big. again, from memory, i think it might have been the .text section in my case.
you'll also notice if you reduce down from -O3 it will compile. the 'issue' is, the xpatch.cpp is just one huge file, and because all the implementations are declared inline, everything is being compiled inline. further more, you will find quite a few functions in the firmware (e.g. axoloti_math.h) are #define'ing 'functions'... (they really should be inline, so that the behaviour can be fine tuned). its just a huge amount of 'bloat' ... but obviously the intention is to not have the function overhead.
personally, (and without trying), id have thought perhaps the midi handler and k-rate code, could be left as function calls... and just ensure the a-rate code sections were all inlined.... (perhaps even for advanced users, allowing more control e.g. saying some objects should be executed in-line and others not... i know, difficult to represent in the UI, and possibly creates confusion for some users...) anyway thoughts for the future.
The weird thing is if I take a patch like Library -> demos -> youtube -> tybett, it runs fine, and its binary is 12756 bytes in size, i.e. much longer than the 7612 bytes I have in the patch above that won't run. So it appears it's not the size of the patch itself, perhaps something inside it that gets to big, like a relative jump that's too far or something like that (but I would expect such a thing to elicit an error message from the compiler or linker).
try changing number of presets/modulations etc to 0 on the patch, and subpatches. sometimes, too much data also causes issue with the patch starting.. rather than the normal compile error. (though you mentioned the noinline , made it work... so it didnt seem like this was the issue)
does the patch disconnect the UI? also if you put an oscillator attached directly to the output, does it also sound? (sometimes, the patch can be running on the board, but just not connected to the UI... hence why i check with sound, so I can hear if its running or not)
I put a line the start of the PatchProcess() function to turn on the red LED (by adding code in src/main/java/axoloti/Patch.java so that it is inserted at the start of PatchProcess() when the Patcher creates xpatch.cpp . When the patch binary is short and start properly, the LED comes on, when it's long, it doesn't. Also tried to toggle the LED with 10 Hz, with the same result. Disassembling xpatch.elf verifies that the LED code is at the very start of PatchProcess(), so even if things go awry further on, the LED should still come on.
Hm, perhaps PatchProcess() is never called? Added a printout at the end of xpatch_init2() (at the end of xpatch.cpp), to print out the value of PatchProcess() (= patchMeta.fptr_dsp_process in patch.c in the firmware). It prints out the same value both when the patch is ok and when it is not. So the function pointer to call the patch dsp() code is the same in both cases.
Then took away the code to light / flash the LED from xpatch.cpp, and put in code to flash the LED in the ThreadDSP() function in patch.c, which in turn calls PatchProcess(). This LED flashes at all times, whether there is a patch loaded or not, but most importantly it shows that the thread runs also when the patch doesn't run properly, so it's not that the firmware dsp() thread dies or stops for some reason.
Finally verified by disassmembling xpatch.bin that the code in PatchProcess() is actually there (so the linker hasn't goofed up and for instance left a lot of the function empty).
Furthermore, PatchProcess() is rather short, and in turn calls the main root.dsp() function in xpatch.cpp which does the real work, which in turn contains all the (presumably inlined) dsp() functions of the various objects. I could imagine something going wrong in root.dsp() if it gets too large, but things seem to go wrong even before we get that far.
So I'm at a loss to explain this. When the patch fails to run, all the code seems to be there, but even code at the very start of PatchProcess() isn't executed, although the dsp thread is running as it should and the function pointer seems to be in place.
I'll have a look at this, although I have very little data in my patch.
No, the UI looks just as usual : the background goes dark and there's a tick in the Live box.
I should try this, although since I've verified that the PatchProcess() function in the patch doesn't seem to be running I'm not hopeful that this will do anything. I've also tried other things like connecting a push button to a LED, and that doesn't work either when the patch doesn't seem to run.
EDIT: Dumped the memory location where PatchProcess() is supposed to reside. When the patch is ok, it contains the expected data, which corresponds to the objdump of the .elf or .bin file. When the patch fails, it's all zeros. Not good.
hmm, zeros, that sound suspicious... is it zero even if a previous patch has already successfully run?
there is code run, before anything else, when the patch is loaded, that is responsible for zeroing out memory, to initialise, static variables etc... I wonder, if this perhaps tripping over itself, in some situations...
also as you say, perhaps its not so much the patch size as something else... it could be worth comparing the sizes of the different regions of memory, between your 7k patch and tybett (which you said was 12k), perhaps that will reveal that something is bigger in your patch.
perhaps if you post your patch here, I can also take a look, and it might also interest @johannes
EDIT: btw, if you don't want to public share the patch, you can pm it to myself and johannes.
Actually, in my latest debugging section, I discovered that the data is actually intact, but that the function pointer in patchMeta.fptr_dsp_process (which is what I was using as a starting point for my memory dump) seems to have a bogus value, and the data at that address was zero.
The symptoms seem to be slightly arbitary, during one of my sessions yesterday the symptom was that the Axoloti Core timed out when starting the patch, the log said Starting Patch, then nothing happened for a while, and the Patcher then concluded that 'taking too long to start patch'. It didn't seem to have lost contact with the Core though, as I could attempt it again which worked fine up to the same point. So perhaps something did in fact crash under those specific circumstances.
Like you're on to, I've been suspecting that the problem is not the patch size per se, but that something in my patch is larger than in most others. For instance, I have several local functions in my objects, which is rather unusual - although your Push object seems to have that too.
I don't mind sharing the patch, either publically or via PM but as it is it contains lots of custom objects so it's a bit cumbersome at this point.
exactly, thats why mine are all declared no-inline. the issue is, normally for axoloti objects, there functions are not called very often, so there is not a huge bloat. but in my push objects, some functions are called lots, of times, and in loop.... and these all get expanded by the compiler.
before I added this, I could only have my push object, plus about 3 or 4 other objects once i did this, I could comfortable run it, with even big patches like tybett..... it made a huge difference!
BTW, for the record, I experimented with changing -O3 to -O2 in Makefile.patch . I had a simple 6 voice polysynth patch which used 41% DSP with -O3, but used 51% with -O2. So the difference in optimization level definitely makes a difference.
Also for the record, my problem patch ran fine with -O2.
I agree I should add noinline to my local functions, but for now while I have the problem staring me in the face I want to try and figure out what is causing it, because I'm sure it will come back and bite me (or someone else) later anyway.
Ive moved this to developer... as its something Id like to explore and come to some resolution... then once we have the 'facts' then we can include something in the user guide section, to help other users understand where the limitations are, and what they can do about it.
yes, thats why I suggested looking at the -O flags, the reason is -O3 inlines everything -O2 does not. (can remember exact details, but its in the GCC manual .. this is how first identified that inlining was my issue) as you say , -02 though is too blunt a tool.... we need to be in more control.
for me the other issues is, xpatch.cpp is compiled as one unit, this is partly what causes the issues. as methods all in one compile unit are treated differently to those in different compile units.
I do agree, it would be good to document this a bit, and get to the root of it... as frankly, I now cant remember if I worked out exactly what was going on, or just knew how to solve it ,because i didnt write any notes, other than an email i set to johannes at the time...(hmm, I should dig that out!) ... i vaguely remember, it being the .text section that grows out of control...
oh one other thing I remember, which made an massive difference... at one stage I was rending my controls to the display using sprintf using a float ... this caused an explosion in code size, (sprintf is enormous) , so I changed this to chsnprintf (chibios) and used ints, and bang problem solved.
... actually, now I have a sneaking suspicion, the order of events was... I added (unknowingly) sprintf to format the display... a few days later, i noticed I couldn't load more complex patches, as i didn't know it was sprintf, i started inlining, which seemed to solve issues... and only later (can't remember why/how), I found sprintf to be the underlying cause, and then suddenly my object was pretty lightweight.
moral is, be careful which stdlib functions you use, some appear to be fine (I use things like strncmp), but some of the more complicated ones, might be 'off limits'. I guess, now, id probably steer clear of stdlib, and use chibios methods wherever I can.
Sure, no problem ... got a bit of a shock at first when I couldn't find the thread in its original location, for an instance I thought I'd been banned from the forum or something...
Anyway, I've done some more research by instrumenting the Axoloti Core code and also xpatch.cpp .
Now, without knowing the exact instrumentation, this might not be too meaningful, but here's an excerpt from the run when the patch is ok. Basically it just prints out where the code is at various times, together with some key variables:
Axoloti says: Init Start: dsp() is 0x00011851/0x00000000, patchMeta is 0x20009DFC
Axoloti says: To start root.Init()
Axoloti says: Here i am!
Axoloti says: # params at init time: 6
Axoloti says: Done root.Init()
Axoloti says: Init Done: dsp() is 0x00011851/0x00011851, patchMeta is 0x20009DFC
Axoloti says: Just got back from patch init
Axoloti says: fptr_dsp_process = 0x00011851, patchMeta 0x20009DFC
Axoloti says: fptr_dsp_process real address 0x00011850
Axoloti says: Start Patch done
Axoloti says: # params at run time: 7
Here's an occasion when it fails:
Axoloti says: Init Start: dsp() is 0x00011851/0x00000000, patchMeta is 0x20009DFC
Axoloti says: To start root.Init()
Axoloti says: Here i am!
Axoloti says: # params at init time: 6
Axoloti says: Done root.Init()
Axoloti says: Init Done: dsp() is 0x00011851/0x00011851, patchMeta is 0x20009DFC
Axoloti says: Just got back from patch init
Axoloti says: fptr_dsp_process = 0x11080003, patchMeta 0x00000041
Axoloti says: fptr_dsp_process real address 0x11080000
Axoloti says: Start Patch done
Axoloti says: # params at run time: 7
The interesting thing here is how the address for patchMeta is all wrong, and consequently patchMeta.fptr_dsp_process too. Now, patchMeta is not a pointer, it's a struct allocated in memory, and looking at the assembly code, it appears that the compiler keeps the address of patchMeta in r4, which is why the value can become wrong. It appears to happen across the call to (patchMeta.fptr_patch_init)(). So my guess would be that something is thrashing r4 in this function ... if it wasn't for the fact that r4 is saved on the stack at the start and restored at the end ... so, is something within the function thrashing the stack then?
Comparing the code between the ok and bad patches reveals that the only changes to the xpatch_init2() function (which is what patchMeta.fptr_patch_init points to) are a number of constants, mostly addresses to various bits of data that is accessed in the init code. I have not scrutinized the list, but there was nothing there that struck me as being completely wacko, most things just move on 0x70 bytes or so which would seem to correspond well to the amount of extra code that is included in the longer patch (as part of the dsp() function).
There are some things which are larger numbers which seem to skip quite a bit too, like from 0xf8200000 to 0xf8300000. I have not attempted to figure what these figures represent.
One thing that is nagging at me is that the stack might be too small. The xpatch_init2() is called from patch.c:StartPatch() in the Axoloti Core firmware, which in turn is called from pconnection.c:PExReceiveByte(), which in turn is called from the ui.c:ThreadUI() thread function. According to the corresponding WORKING_AREA definition, the function has a stack size of 1142 bytes. I haven't tried increasing the size, but it would seem strange that the stack were too small as I would expect it failing every time then. One thing though is that bss area for the patch itself (see ramlink.ld) is in the CCM RAM, in the first 48k, and the Core stacks are above that, so an initialization bug in the patch itself could conflict with the stacks. However, again, the only difference between the good and bad patches are additional dsp code, which shouldn't affect the amount of bss data.
Ah well, just speculation and mystery at this point.
(I'm including the actual patch and associated objects, most of which are custom at this point, so perhaps not too easy to follow. Basically, the patch is a lab bench for a UI using the P6 2x24 character LCD, with objects that read the P6 I/O expander (using 74HC165 chips), extract the encoder bits, do quadrature decoding, and feed an edit object (which is embedded) which basically adjusts 6 parameters specified in the ids list, using an LCD controller object. This is probably not the optimal way to implement it, that would probaly be using a single object for everything, but it does allow me to experiment and get everything working before creating a monster object, and also, the individual objects might be useful to others, I plan on adding them to the contributions area once they seem stable enough.)
The actual change in code size is accomplished by changing the #if 0's in the k-rate code in the embedded patch/object. The behaviour has been different depending on changes in the code; most often if one #if 0 is changed to #if 1, the code runs, but i get the second dump above, i.e. something messes up r4 or the stack during the call to xpatch_init2() as described above. If both #if 0's are changed to #if 1, depending on what other code I've got in there (i.e. adding and removing various debug printouts) i either get the same behavior as with one #1, or the Core crashes completely and reboots (the red LED starts flashing as it does when booted, and the Patcher times out, but after the Core has rebooted and the Patcher timed out, I can reconnect and continue from there.)
One more note: the lcdctrl.axo object has a pin allocated that is normally not a GPIO pin on the Axoloti Core, but is used for the LCD in the P6. So I don't know what would happen if the patch is run as it stands on the core (apart from the obvious lack functionality due to the missing hardware).
The real core of the problem seems to be somewhere in StartPatch(). I've been instrumenting the code with LogTextMessages() in various shapes and sizes, but a common problem has been that register r4 gets clobbered. At one point r4 contained the address to patchMeta, in the code I have running now, it is set to zero early in the routine, but when subsequently used to zero patchStatus() it happens to be 17 so of course the patch doesn't start properly.
r4 is pushed on the stack when StartPatch() calls patchMeta.fptr_patch_init() = xpatch:c:xpatch_init() (-> xpatch_init2()and I've added a printout at the start of xpatch_init2() printing out the value of the saved r4 on the stack. Already here it is 17, which is bewildering as if it were a stack overwrite there's not much time from the point where it is set to 0 in StartPatch() until I read it in xpatch_init2().
The good news though is that it doesn't seem to be something in xpatch.cpp, e.g. a buggy object, which causes the problem, the actual problem occurs before the patch is initialized. It still seems to be related to the size of the patch, currently I have to remove more of the patch to get it to run, but it's the same symptom apart from that - over a certain size, it doesn't run, in various ways, which depend on exactly how the firmware looks in the incarnation that's running at that time.
There is a consistency in that when a bogus value is found in r4, it is the same even if the patch is reloaded over and over again.
Also consistent with previous runs, Library -> demos -> youtube -> tybett (almost twice the size of my patch) still runs fine.
I think you're getting a stack overflow, possibly triggered by the LogTextMessage() calls. I'd first try increasing the stack size.
Do you have a in-circuit debug setup? Enabling some checks in chibios may also be helpful to get to the source of the issue, like CH_DBG_FILL_THREADS and CH_DBG_ENABLE_STACK_CHECK in firmware/chconf.h
It seems that calling xpatch_init() with ample stack space would be a good improvement. Now it is called from different contexts depending on how a patch is started - through GUI, sdcard or flash on startup, sdcard hotplug or load object...
Yeah, I've been thinking about the stack size, but the symptoms so far don't ring true with an ordinary stack overwrite, as my impression is that it is a couple of bytes that are overwritten, not something plowing into the stack from the top or bottom. My current impression is that there's a function which allocates a variable on the stack, and hands its address over to someone else, which subsequently updates the variable long after it has ceased to exist. I did a cursory look through pconnection.c as it allocates a lot of stuff on the stack for the data sent back via the USB transmit routines; if one of these for some reason made some change to the data it would explain what's happening. But that particular case seems highly unlikely, especially since the data areas are considered const by Chibios as evidenced by the casts in those calls.
The thing is, that I started having problems long before I started sprinkling the code with LogTextMessage()s. I did have some in the objects in my patch, but the problem seems to occur before any xpatch code is called. And removing those calls from the patch will make the code smaller again so it could start working for that reason.
Of course, it could be a complex chain of events, i.e. a stack overwrite which subsequently leads to a wild pointer, which is causing the problem.
Which stack size are you thinking of, the ThreadUI stack size (currently at 1142 bytes) (that would be the one in the call sequence that I'm using as I always load the patch via the Patcher)?
Then there's __main_stack__size__ and __process_stack_size__ in STM32F407xG.ld. Hm, those two are only specified as 0x400 each, but I suppose they are not the ones used for the dynamic thread stack allocation? Finally, there's the CCRAMEND area in STM32F407xG.ld (which only seems to allocate 10k of the potential 16k available above 0x1000C000).
No, sorry. If things get really hairy perhaps I could move this to an STM32F4DISCOVERY board which I have lying around, and use the ST-LINK on that one.
I'll definitely have a look at that, that seems like it might be worth while.
So it is a question of increasing the stack size for several threads then. Is the stack for main (where StartPatch is called from on startup) the __main_stack_size__ in STM32F407xG.ld ?
Ok, so after having spent some time debugging and thinking about this (my code looks like swiss cheese now with LogTextMessages() sprinkled all over the place, intermixed with lots of #if 0 and #if 1 to switch things off, and the DSP thread has been converted into a realtime polling stack monitor) I finally figured out what the problem is. As a side effect I know more about the Axoloti code, Chibios and ARM assembler than I was actually hoping for at this point.
To start with, with the setup I have right now, the problem I'm seeing is that when StartPatch() calls the patch loaded at PATCHMAINLOC in RAM, register r4 gets clobbered, and for this particular incarnation, it is loaded with 0 and used to zero things, incluing setting patchStatus to 0 after loading and initializing the patch. r4 is saved together with a lot of other processor registers when xpatch_init() is called, so it's saved on the stack for the duration of the init call. The problem is that upon return, r4 doesn't contain 0 anymore, it happens to contain 1, so the patch never gets started.
I added code to xpatch.cpp to print out the memory location where r4 is saved, and sure enough, that location contains 1. So I naturally assumed there was some sort of stack overwrite or wild pointer which was overwriting the stack. However, I was bewildered by the fact that there was actually very little going wrong in the code, a stack overwrite I would have thought would have had more disastrous consequences (especially if another stack hit the current one). Also, the bug has been fairly resilient to changing the firmware code, which moves things around in the flash, and also fairly consistent as to behaving differently when the size of the patch is changed.
(By the way, I tried enabling the stack checks in Chibios, including patching in some hooks so that i would get a printout or system hang if a stack overflow was detected, but it never triggered.)
As usual, the answer has been staring me in the face for some time. I've looked at numerous disassembly dumps of various parts of the code to see where things are put and what the processor actually does, but i hadn't noticed this:
xpatch_init actually starts at 0x11010, not at PATCHMAINLOC=0x11000 (disregarding the high bit as the RAM is doubly mapped to address 0), which is where StartPatch() calls it. If we now move to reality, which is the xpatch.bin file, we have:
So yes, xpatch_init() starts at an offset of 0x10, and furthermore, the first thing executed when calling offset 0x00 is movs r4, #1 ... which explains why I have observed that r4 changes to 1 (in this case). So it wasn't the stack getting overwritten, it was the register actually being clobbered!
No doubt the data at that address is not code, but something else, like a header, 0x00012401 looking like some form of length or highest address specifier, which explains why the behavior changes when the size of the patch changes.
So, got to figure out what the actual fix is here, just calling PATCHMAINLOC with a 16 byte offset will do it of course, but a question is if the header is in fact intentional and if not if it can be removed. Perhaps @johannes knows, otherwise I'll have a look at it next time I have a chance.
EDIT: It looks as if the 'header' is actually the constructor and destructor init arrays, with padding. I'm not sure if it has to be at the start of the binary like this, I'll experiment a bit with it when I get the chance.
I assume you found the code which calls the static constructors.
Also with this new knowledge are you able to create small patches that don't work? Seems to me the key to the problem is determining why some patches work and others not - since your hypothesis appears to be its not size related?
Here init_array and fini_array for the constructors and destructors precede the .text segment, thus xpatch_init() will not be at the start xpatch.bin, but a multiple of 16 bytes further on. Thus when calling PATCHMAINLOC the first code that is executed are the pointers in the init_array and fini_array. Since the actual values in those arrays are function pointers, their address and thus the resulting code depends on the size of the patch. So it's a question of creating a patch which results in values for the function pointers which correspond to ARM assembly codes which will do no harm ...
Right now I don't have access to the hardware for testing, but I'm going to try moving the constructors and destructors segments to after the text segment, as soon as I get the chance.
just had a look, yeah I think you are on to something.... as you say PATCHMAINLOC seems to expect the code is the first thing in SRAM. @johannes thoughts?
but, if this is the case, Im quite surprised, we are not seeing more failures... as there must be quite a few static constructors... and it appears if any of these create some kind of call op, then bang...
so... as for reproducing (important, so we can test the 'fix')
I guess, if we create a custom object, with LOTS of static constructor, there is a high chance it will crash... we don't need a large patch, just lots of opportunities, for the code to trip up ... no?
I can't see any harm in moving the ctor/dtor and will give it a go, but it would be nice to prove this is the case, somehow with a 'destructive test'....