When your users report a performance problem when using SQL Server, there are some basic troubleshooting techniques you can use to determine the cause of performance issues in a database instance. You can look at several causes to determine the source of the issue. Most performance issues are limited to some basic causes:
- Wrong Technology
- Missing Indexes
- Index Design
- Database Schema
- Storage Subsystem
- Buffer Pool Size
- Slow network
Some database performance problems can be traced back to simply using the wrong technology to get the job done. Usually this revolves around running large reports against production OLTP databases or databases that have an OLTP schema. Many of these reports end up doing large, complex aggregations. Although SQL Server can do the aggregations, it isn’t the best technology to use for them. For large, complex aggregations, SQL Server Analysis Services (SSAS) is the appropriate tool because SSAS pre-aggregates the data when the data is loaded. As a result, when a query asks for aggregated data, it has already been aggregated and the results can simply be returned instead of being processed. If you store the data in the same format it will be used on your reports, the reports will run much faster.
Disk performance problems and an increased number of deadlocks are some of the potential indicators that you might have indexes missing. One way to easily determine if there are missing indexes is to use two dynamic management views (DMVs): sys.dm_db_missing_index_details and sys.dm_db_missing_index_columns. Using these two DMVs you can easily identify many of the indexes that need to be created to improve performance.
However, the system doesn’t understand that indexes already in existence might only need another column added to their included columns list. Your SQL Server instance might recommend that entirely new index be created when all that’s needed is a new column in the existing index. Because of this, you shouldn’t just take the information from the DMVs and create all the specified indexes. The suggested indexes should be compared against the indexes that already exist in the database to ensure that duplicate indexes aren’t being created. Duplicate indexes can lead to additional writes, updates, and deletes, causing performance problems as well as using too much precious disk drive space.
A better approach is to use a query like that in Listing 1 to identify the indexes that are missing from the database on which the query is being run. This query uses the db_id() system function to restrict the output to that from the current database. So, to run it in your database, simply specify that database in the Available Databases drop-down box in SQL Server Management Studio (SSMS) or add a USE statement before the query.
Listing 1: Query to Identify Missing Indexes
SELECT TOP 25
dm_mid.database_id AS DatabaseID,
dm_migs.last_user_seek AS Last_User_Seek,
OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) AS [TableName],
'CREATE INDEX [IX_' + OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) + '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.equality_columns,''),', ','_'),'[',''),']','') +
WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.inequality_columns,''),', ','_'),'[',''),']','')
+ ' ON ' + dm_mid.statement
+ ' (' + ISNULL (dm_mid.equality_columns,'')
+ CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN ',' ELSE
+ ISNULL (dm_mid.inequality_columns, '')
+ ISNULL (' INCLUDE (' + dm_mid.included_columns + ')', '') AS Create_Statement
FROM sys.dm_db_missing_index_groups dm_mig
INNER JOIN sys.dm_db_missing_index_group_stats dm_migs
ON dm_migs.group_handle = dm_mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details dm_mid
ON dm_mig.index_handle = dm_mid.index_handle
WHERE dm_mid.database_ID = DB_ID()
ORDER BY Avg_Estimated_Impact DESC
When reviewing the output from the query in Listing 1, look for queries that have a high value in the second column. The higher the number, the more savings that will be seen by adding an index.
Poor index designs for database tables will usually manifest as slow running queries and queries that have a high execution plan cost. The easiest way to identity these problems is to query the procedure cache because queries’ execution plans remain in the cache as long as there is enough memory to store them. Listing 2 shows a sample query that looks for queries with a high total cost. When reviewing the output from Listing 2, look at the StatementSubTreeCost column. The higher the number, the more expensive the execution plan.
Listing 2: Query to Identify Poorly Designed Indexes
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
query_plan AS CompleteQueryPlan,
FROM sys.dm_exec_cached_plans AS dm_ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS dm_eqp
CROSS APPLY query_plan.nodes
ORDER BY n.value('(@StatementSubTreeCost)',
In SQL Server 2012, you can set up an Extended Events session to capture the execution plans for queries. However, it isn’t recommended that Extended Events be used to capture execution plans in high-load systems as doing so could overload the system by 21% or more by capturing the showplan_xml column.
One of the biggest killers of database performance is a poorly designed database schema. Evidence of a poorly designed database schema can be seen when lots of I/O requests occur while queries are being run. The easiest way to look for this evidence is to use SQL Server Profiler to capture the queries that have the highest I/O load, then run those queries in SSMS with SET STATISTICS IO enabled. Between the table I/O outputs and the execution plans, you’ll get a good indication of which table is having a performance problem.
The most obvious way to determine whether a SQL Server database instance is having storage performance problems is to look in the SQL Server ERRORLOG file. A message about slow I/O in the ERRORLOG file is a good indicator that there’s something wrong with the storage I/O subsystem.
However, not seeing any errors about slow I/O in the ERRORLOG file doesn’t rule out that your storage I/O subsystem isn’t the cause of a performance bottleneck. Not seeing these errors just means that your storage isn’t ridiculously overtasked and not keeping up.
Another way to look for I/O problems is to query the sys.dm_os_wait_stats DMV with code such as
ORDER BY wait_time_ms desc
If the first few rows returned from this query include some I/O wait types, then a slow I/O subsystem might be the problem.
You might have noticed that I said that these wait types might indicate a slow I/O problem and not that they do indicate a slow I/O problem. That’s because a poorly indexed database server can also report I/O problems when the I/O subsystem is perfectly sized for the application, but you’re asking the I/O subsystem to do a lot more work than it should be doing. In other words, the I/O problem is actually the result of an index problem. To resolve it, you need to add indexes to the system. Because of this, whenever I look at a new system in which the client wants to upgrade the hard drives, I always start with the indexes because that might very well fix the problem.
After the problem has been identified as actually being a storage I/O subsystem problem, there are a few directions that you can go to fix it. For large reporting databases, table partitioning might help. There are two different techniques that you can consider: partitioning for partition elimination and partitioning for CAPEX savings. No matter which approach is taken, a solid understanding of the data and how it’s used is required.
The buffer pool is where SQL Server stores data pages that it has read or written to recently. A small buffer pool will generally cause performance problems, because it will put too much pressure on the disk subsystem. There are a couple of different performance monitor counters that you can look at to see if the buffer pool size is healthy. Both counters are located within the SQLServer:Buffer Manager performance counter object.
The first counter to look at is Database Pages. This counter tells you how much memory is currently allocated to the buffer pool. This counter is in pages, so to get it into gigabytes, multiply the value by 8, then divide by 1,048,576.
The second counter to look at is Page Life Expectancy. This counter tells you how long, in seconds, SQL Server expects that you’ll be keeping the data in memory before releasing the page so that additional data can be loaded into memory. There is no magic number that says that the buffer pool is healthy.
Fixing a small buffer pool requires adding more memory to the server and configuring SQL Server to use the additional memory. The only other way to increase the amount of data that SQL Server can load from one database into the buffer pool is to move databases from one server to another so that the remaining databases have access to a higher percentage of the database engine’s buffer pool.
Another potential server bottleneck is the network. The easiest way to identify this problem is to look at the wait statistics on the system. A wait type of ASYNC_NETWORK_IO indicates that the network between SQL Server and the client computers is having a performance problem. If the client computers are located at a remote site, this could be the cause of the ASYNC_NETWORK_IO wait types.
If all the client computers are local, ASYNC_NETWORK_IO wait types might be the result of an older version of SQL Server or an older network switch being used. A slow or misconfigured network port could also be the cause of ASYNC_NETWORK_IO wait types. For example, having a network port configured for 10/half duplex could cause performance problems. In addition, there could be performance problems if a network port is configured to auto-detect the network speed, but the network switch and server don’t successfully negotiate the fastest possible network speed.
Fixing a slow network isn’t something that a DBA will be able to handle alone. It will require a coordinated effort between the network administrator, the network provider if there’s a WAN involved, and the DBA.