Accelerate By Sorting

A fundamental axiom of computer science is that you cannot make a computer go faster, you can only reduce the work it has to do.  One of the most common ways to do less work is to sort the things your code is processing.  While experimenting with different ways to populate a set of aggregate roots, I found that the simple act of sorting the results from the database could make a significant difference in performance.

Normally when I have a set of parent objects, each having a collection of child objects, I’ll use a foreach loop to iterate over the parent collection.  A LINQ where predicate provides an easy way to filter which children belong to with their parents.  The problem with this is that using the LINQ where predicate creates more allocations than are necessary, and it loops over all the child objects once for each parent object.

foreach (var customer in customers)
{
     _setOrders.SetValue(customer, orders.Where(_ => _.CustomerId == customer.CustomerId.Value).ToList());
}

If my child collection was sorted by the parent identifier, I would only have to loop over the child collection once.  Let’s look at what that looks like.


position = 0;
for (var i = 0; i < customers.Length; i++)
{
     while (position < orders.Length && orders[position].CustomerId == customers[i].CustomerId)
     {
          customers[i].Orders.Add(orders[position]);
          position++;
     }
}

The counter is defined outside the for each loop, and does not get reset to the beginning of the child collection with each iteration of the parent collection.  Doing this eliminates having to iterate over the part of the child collection that has already been covered.  Once the element of the child collection doesn’t match with the current parent element, we can advance to the next parent element.

Using the nested while statement with a counter may reduce the readability of the code, but it’s not overly complex, and the performance benefit speaks for itself. 

So how much of a performance boost do you get from using this technique?  Obviously it depends on how many objects are in the collections.  If there is only one parent and one child, it will be slower, but not by much.  It doesn’t take long to sort a collection of one. 

Using a collection of 100 parents, 1000 children, and 10,000 grandchildren.  There was a 50X improvement.  Iterating over 10,000 grandchildren 1,000 times makes a big difference.

The next question that should spring to mind is how relevant are the time savings when making out of process calls to a database. Again it varies, but with sample data, I’m seeing a 3X improvement.

My suggestion is if you identify a situation where you need a performance improvement that involves iterating over a collection to find set of items….sort it. The impact may surprise you.