For loops - cycle counter question


#1

Hello!

I have been working on an object that uses 8 times of the same code, so I thought I'd put a single version in a for loop X 8, since its much easier to work with.

Then I thought I'd test it with the cycle counter and see if there performance was better. But, I was very surprised to see that the for loop version now used about 8 times as many cycles to process the same code..... A for loops with a single version of the object times 8, uses 8 times as many cycles as just 8 copied version with in same object.

But what does that actually mean?

I thought I could easily use a for loop for this, and actually maybe even gain some performance, but I got the totally opposite. So, I should overall avoid using for loops unless its really necessary i Axo world? I have attached the example from above.

Thanks!

For loop Test .axp (8.5 KB)


#2

Take a look at the produced C code.
When you unroll a loop it's often faster.

1) The code will have less loop overhead per operation.
2) The compiler will have more operations it can shuffle around to hide load/store latency in.

E.g.

// multiply a block by a scalar (2.31uS for n=128)
void block_mul_k(float *out, float k, size_t n) {
	for (size_t i = 0; i < n; i++) {
		out[i] *= k;
	}
}

// multiply a block by a scalar (0.6uS for n=128)
void block_mul_k(float *out, float k, size_t n) {
	// unroll x4
	while (n > 0) {
		out[0] *= k;
		out[1] *= k;
		out[2] *= k;
		out[3] *= k;
		out += 4;
		n -= 4;
	}
}

It's instructive to look at the assembly output for the compilation to see what the compiler is doing. It's not generally true that unrolling will give a performance benefit. You have to compare cases and measure carefully.


#3

@deadsy Thank you for the comment, Ill take a look at it :slight_smile: I am still learning coding, so I really appreciate it.

About the performance benefits, yeah I am only hoping that using a for loop would give better performance, but not expecting it. The main reason that I wated to use for loop was that its just easier to work with and one wants to expand the object later on, its really simple to do so.

Yes I need to start getting into that side of it too, analysis of objects. I recently started using the cycle counter. For now i have just been focusing on understanding writing code... :slight_smile: I am at the point now, where I can go back and edit a lot of my old code ande make it MUCH simpler.

Anyway, thank you again :slight_smile:


#4

You wouldn't normally expect to get any performance benefit by putting operations is a loop. Loops have to alter and check a loop variable and that takes some time.

total_time = N * (t0 + t1)

Where:
t0 = loop overhead time
t1 = operation time

If t0 and t1 are comparable in magnitude then loop unrolling gives a benefit.
if t0 << t1 then loop unrolling won't reduce the total time much.

The secondary effect of loop unrolling is that the compiler has more registers/instructions to work with and can use the latency of loads/stores to do other operations. That's often the dominant effect with DSP style block operations.

Having said that:

"Premature optimization is the root of all evil."

So go ahead and write for loops - and unroll them if and when you have to make things faster.