If you are ever asked to figure out what queries are consuming all the memory from your SQL Server instance, you can use the data from the DMV to tell you which queries have been granted what amount of RAM. The query is basic:
SELECT r.session_id ,mg.granted_memory_kb ,mg.requested_memory_kb ,mg.ideal_memory_kb ,mg.request_time ,mg.grant_time ,mg.query_cost ,mg.dop ,( SELECT SUBSTRING(TEXT, statement_start_offset / 2 + 1, (CASE WHEN statement_end_offset = - 1 THEN LEN( CONVERT(NVARCHAR(MAX), TEXT) ) * 2 ELSE statement_end_offset END - statement_start_offset ) / 2) FROM sys.dm_exec_sql_text(r.sql_handle) ) AS query_text ,qp.query_plan FROM sys.dm_exec_query_memory_grants AS mg INNER JOIN sys.dm_exec_requests r ON mg.session_id = r.session_id CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp ORDER BY mg.required_memory_kb DESC;
The real task is to understand the subject of Memory Grants, and that is made much easier to understand by this article by Jay Choe.
Query memory grant (a.k.a. query work buffer) is a part of server memory used to store temporary row data while sorting and joining rows. It is called “grant” because the server requires those queries to “reserve” before actually using memory. This reservation improves query reliability under server load, because a query with reserved memory is less likely to hit out-of-memory while running, and the server prevents one query from dominating entire server memory. In the following sections, I will describe how SQL server estimates query memory requirement, and how it throttles memory grants when multiple queries are competing.
When SQL server receives a user query, it follows well-defined steps to produce a result for user. First, it creates a compiled plan, which is a set of logical instructions such as how to join rows. Next, it creates an execution plan based on the compiled plan. This execution plan contains instructions with all logical references translated to actual objects, and tracking mechanisms for query execution. Finally, the server starts execution from the top of instruction tree. Creating a compiled plan is expensive because the server needs to find the optimal plan out of hundreds of potential candidates. Distinguishing between compile and execution helps overall server performance because each compiled plan can be cached and shared among multiple execution plans. The memory grant estimate follows the same overall sequence. It has parameters saved in compiled plan, and a mechanism to calculate actual grant size at execution time.