April 11, 2007
-
Loop performance in C#
Lately I’ve been attempting to perform arithmetic-heavy computations with C#, and
I encountered a somewhat surprising detail about loop performance. Computing the
squared distance (euclidian) between two vectors, I discovered that using a variant
of loop unrolling actually improves performance.Check it out. I ran the following two functions 25 million times each for random
64-dimensional arrays, under ‘Release’ mode in Visual Studio 2005 on a 2GHz Core
Duo.Using this loop, the full operation takes about 10.0 seconds:
int style="font-size: 10pt;"> Distance(int style="font-size: 10pt;">[] a,
int[]
b) {
int d_sum
= 0;
int d
= 0;
for (int
i = 0; i < a.Length; ++i)
{
d = a[i] – b[i];
d_sum += (d * d);
}
}
Whereas this loop takes about 7.9 seconds!?:
int
Distance(int style="font-size: 10pt;">[] a,
int[]
b) {
int d_sum
= 0;
int d1
= 0, d2 = 0, d3 = 0, d4 = 0;
for (int
i = 0; i < a.Length; ++i)
{
d1 = a[i] – b[i]; ++i;
d2 = a[i] – b[i]; ++i;
d3 = a[i] – b[i]; ++i;
d4 = a[i] – b[i];
d_sum += (d1 * d1) + (d2 * d2) + (d3
* d3) + (d4 * d4);
}return
d_sum;}
This is quite puzzling to me. Okay, so the loop unrolling
reduces the branching rate, but this branch should (almost) always be predicted
correctly, and, if unrolling makes sense at a lower level, this is one of those
cases where the compiler should definitely optimize things on its own. (I even tried
explicitly specifying ‘i < 64′ for the loop to increase the odds of this… no
improvement.) I’m not sure what’s going on here, and I guess the next step would
be to look at the IL code generated by the C# compiler.Other interesting points:
- Using a temporary value for ‘a.Length’ doesn’t speed anything up. Apparently the
compiler is able to notice that the loop has no chance of affecting the array’s
size. - Explicitly inlining this function inside of the outer loop (5,000 runs) actually
slows performance back down to the 10 sec range! - Switching this loop to use fixed unsafe pointers doesn’t improve performance either.
Why? Either fixing the pointer takes too much time, or there are some fancy low-level
optimizations that are helping loop #2 which aren’t being applied to the unsafe
code. - Software pipelining (moving the use of d1, d2, … until the next loop iteration)
doesn’t help, either.
- Using a temporary value for ‘a.Length’ doesn’t speed anything up. Apparently the
Comments (5)
That is fascinating. I’d be curious to see the IL code.
You know, if anyone has covered this, it’d have to be Rico Mariani. And sure enough, with a little help from Google:
http://blogs.msdn.com/ricom/archive/2003/12/02/40777.aspx
You should not run performance tests from visual studio, not even in release mode, the debugger turns of some IL-optimizations.
Run from command line.
thanks for the code it’s an excellent output
but please i need to ask
since it computes the euclidean distance where is the squareroot for d_sum
if we compute the square root should the return type also be integer , it can be double but should i have to return it as an integer
@mety2004 - you would want to return as double since square root won’t be integer. I didn’t take the square root because often you can use squared distances in calculations. Square root can be quite slow if called often.
The dude is absolutely right, and there is no question.
check | this site | here