SPI external SRAM implementation


#1

So I've been playing around trying to get a 23LC1024 SPI SRAM chip working with Axoloti running on the STM32F407 Discovery board.

I know I know, why try using SPI SRAM when the stock Axoloti board has fast and abundant SDRAM. I do own a stock Axoloti and use it live, and my playing with the Discovery board is mainly for learning purposes. Kindly see below what I've got so far...

Although the code is still quite ineffective the problem is, the SPI read and write actions seem to be using up so much CPU. Does anyone have a hint on what I am missing (something about threads? Maybe use spiExchange instead of spiSend/spiReceive?)
I did try minimizing redundant function calls and do the whole read write action in k-rate, writing to the frac32[i] buffer in a 'for' loop, but it ended up in roughly the same CPU load.

I am using SPI1 (used SPI2 before, same)

Or is that the best SPI can do on ChibiOS / Axoloti / Discovery? I dare doubt that since the 23LC1024 SRAM works fast (?) and clean e.g. on Teensy 3.X with the Teensy Audio Library?

This is my custom SPI config object code in XML (or download file below):
https://pastebin.com/wNzuyYzV

And here's the 23LC(V)1024 delay test object code in XML (or download file below):
https://pastebin.com/WT9dbQdU

spi1config.axo (3.2 KB)
spitest2-buffer.axp (10.6 KB)

Thanks in advance, and please do let me know wherever my code sucks!


#2

Hey there, this looks very interesting and considering so many users here run out of sram (and sdram) I am surprised nobody commented here.

Have you ben able to make any progress with your project in the meantime?


#3

Hi, actually I didn't follow up on this back then and let it be the way it was, however I now understand more about threads, interrupts, DMA etc. so I might have a look at this again to tune it up a bit. I2S, after all, is basically SPI in a specific configuration, and on the Axo it's stereo in stereo out with hardly any CPU load... this must be the power of DMA and interrupts.

That being said, I believe we're better off just using the Axo's stock SDRAM since it's fast and plenty. Expanding the internally usable SRAM in a useful way would mean delving deep in the firmware and linker (or maybe even lower?) code, besides this SRAM chip I have only speaks SPI, so I doubt there'd even be a way to have the memory available during firmware and patch compilation.

Hope this makes sense, glad to know there are others out there interested in this though!


#4

I use almost all of that for sampling, so and sram is full too...

since you're busy with spi on axoloti have you ever tested linking to boards via spi? there is an experimental firmware branch, but there wasnt much happeing ever since.


#5

Linking boards is definitely worth looking into. Wondering if CAN would be a viable protocol actually.


#6

What is CAN?

I think linking 2 boards would be so elegant, just double your resources and you would also have double ins and outs your system.


#7

Looks like CAN is another data transfer protocol and I believe it is used in automotive industry for inter-MCU communication? Anyway since some apps of the venerable Midibox platform use CAN to transfer data between multiple core MCUs I thought this would be a good candidate for Axo as well.

I agree with you that linking boards would be such a performance boost, it's a no-brauner really. Like so many other features, but so little time... I'm sure Johannes is busy enough already.

I'll definitely look into this though, I mean even a rudimentary data exchange (like transferring audio streams between boards directly into the other board's patch) would be so great


#8

The SPI functions (spiSend(), spiReceive() etc.) are from ChibiOS and are meant to be blocking, i.e. spiSend() returns only after all data has been sent. This means two things:
1. the CPU (or rather the thread) has to wait for each SPI transaction to finish. This seems wasteful: there is DMA and everything on the chip, and yet the CPU can't do anything else while SPI is active.
2. the source code is easily readable: spiSelect(), spiSend(), spiReceive() and spiUnselect() appear in the same order as things happen in realtime. ChibiOS also offers nonblocking SPI access with callbacks and stuff, which can quickly get confusing.
Notice that I mentioned the thread has to wait, not the whole CPU. ChibiOS can do multitasking after all. So if we put all SPI stuff in another thread, it won't delay the computation of all the other objects. There is already a factory object that puts code in a separate thread: script/script. Of course for the SPI RAM use some changes are necessary: red inputs and outputs instead of blue, and some mechanism (mutex or something) in the K-rate-code to sync the SPI thread to the main thread. I don't (yet) know enough about ChibiOS to figure that out.

As for linking boards via SPI, that seems to be the intention of the "Multiprocessor stream" and "Multiprocessor sync" connectors. The "stream" connector has SPI, and if we get the CPU not to wait for the hardware, SPI seems a practical interface: fast and no overhead (no start-bit, command-byte, frame checksum, address etc.). It should be possible to swap a block of data each k-rate cycle, maybe enough for stereo audio plus a bunch of control (blue) signals.


#9

To answer myself, looks like CAN is too slow to get audio data (at least in axoloti quality format) across.


#10

Ok so I have the SPI RAM communication in a separate thread now and voila, it consumes very few CPU cycles.

and some mechanism (mutex or something) in the K-rate-code to sync the SPI thread to the main thread. I don't (yet) know enough about ChibiOS to figure that out.

I do get audio into the RAM and back out, however as you said I need to sync the thread to the main thread, i.e. make sure each audio block sent and received is in sync with the audio buffers. As of now I basically have a delay+samplerate reducer effect going on.

I am currently using a flag that is set each k-rate when an audio block is fully transfered into the SPI's transmit buffer, then checked inside the thread and cleared when the SPI transfer is finished. Doesn't seem to work the way I expected.

Another option, chSysLock(), would probably block the CPU far too long since I am transferring audio samples mono in blocks of BUFSIZE*2 bytes (resulting in 16bit audio), in other words each spiSend and spiReceive transfers BUFSIZE*2 bytes (plus the instructions and address data making up another 4 bytes). This is because the 23LC1024 has this mode called Sequential Mode in which you can basically keep clocking in as many bytes as you want with the chip auto-incrementing the data address.


#11

chSysLock() seems too heavy-handed to me. There should be some simple mechanism like a message or an event flag for this.

(after reading some OS documentation) Ugh, messages are too complicated.
The SPI thread could call chEvtWaitAny() to wait for the main thread to copy the data. The main thread calls chEvtSignal() with at least one bit set after it has copied the data. This should release the SPI thread to do its job and return to waiting for the next round.
The SPI thread needs to have a higher priority than the main thread, so that it can start the SPI hardware immediately instead of waiting for the main thread to finish its k-rate computations (which would mean that (again) CPU and SPI hardware are not working at the same time, wasting precious CPU cycles).


#12

Your suggestion worked!

Now I just need to shorten the wires or make a small breakout board for the SPI RAM because I am still getting pops and clicks likely due to wire noise.

Then I can focus on improving the objects (delay interpolation will be another hurdle).