One of the most powerful, and overlooked, feature of SQL Server is parallelism. This feature allows the database engine to take complex queries and split the processing across multiple CPU cores to complete queries in a fraction of the time it would take without this feature. The problem is understanding this feature, so we often assume it just works and we kind of ignore the details.
In this article by Paul White, we get to some of the details of this feature and why it is important:
Counting Jelly Beans
Imagine you are presented with a large glass jar full of assorted jelly beans, and asked to count how many there are. Assuming you are able to count beans at an average rate of five per second, it would take you a little over ten minutes to determine that this particular jar contains 3,027 jelly beans.
If four of your friends offer to help with the task, you could choose from a number of potential strategies, but let’s consider one that closely mirrors the sort of strategy that SQL Server would adopt. You seat your friends around a table with the jar at its centre, and a single scoop to remove beans from the jar. You ask them to help themselves to a scoop of beans whenever they need more to count. Each friend is also given a pen and a piece of paper, to keep a running total of the number of beans they have counted so far.
Once a person finishes counting and finds the jar empty, they pass their individual bean count total to you. As you collect each subtotal, you add it to a grand total. When you have received a subtotal from each of your friends, the task is complete. With four people counting beans simultaneously the whole task is completed in around two and a half minutes – a four-fold improvement over counting them all yourself. Of course, four people still worked for a total of ten minutes (plus the few seconds it took you to add the last subtotal to the grand total).
This particular task is well-suited to parallel working because each person is able to work concurrently and independently. The desired result is obtained much more quickly, without doing much more work overall.
Counting Beans with SQL Server
SQL Server cannot count jelly beans directly, so we ask it to count the number of rows in a table instead. If the table is small, SQL Server will likely use an execution plan like the one shown in Figure 1.
This query plan uses a single worker – equivalent to counting all the beans yourself. The plan itself is very simple: the Stream Aggregate operator counts the rows it receives from the Index Scan operator, and returns the result once all rows have been processed. You might have chosen a similar strategy if the jelly bean jar had been almost empty, since you would be unlikely to save much time by splitting such a small number of beans among your friends, and the extra workers might even slow the process down slightly, due to the extra step of adding partial counts together at the end.
On the other hand, if the table is large enough, the SQL Server optimizer may choose to enlist additional workers, producing a query plan like the one shown in Figure 2.
The small yellow arrow icons identify operations that involve multiple workers. Each worker is assigned a separate part of the problem, and the partial results are then combined to give a final result. As the manual bean-counting example demonstrated, the parallel plan has the potential to complete much faster than the serial plan, because multiple workers will be actively counting rows, simultaneously.