SQL Server Troubleshooting Steps

As a Database Administrator you are given the responsibility for managing your database systems, 24 x 7 x 365. So when your cellphone goes off at four am and you realize you are getting unexpected SQL Server error logs messages, what should you do?

  1. DO NOT PANIC – if you have taken the time to properly design and maintain your database and have a thoroughly tested written incident recovery plan, you have nothing to worry about. Even if you don’t have a written disaster recovery plan, panic still will not help.
  2. Write down the error numbers and any other information to make sure you have collected the relevant information to begin troubleshooting. If possible, take screen images to make sure you have all the information possible. After this incident is over, you will want to document the original messages and how you solved the issue step-by-step. The postmortem process will also help you prepare new procedures that will prevent this type of problem from happening again.
  3. Read the Application Log within the Windows Event Viewer and SQL Server Agent log for additional clues. Also examine the SQL Server error log entries prior to the error that caused your cellphone to go off. There could be a chain of errors that caused the problem to occur.
  4. Stop the backup tapes from leaving the building. Often the backups from last night leave the next morning, and you want those tapes to stay until the issue is resolved. If you have to restore from the last backup, you don’t want an additional delay as the tapes are transported back to the office. Make sure you can connect to the server that notified you of the problem – you can’t do much if you cannot connect.
  5. What you do next depends on the type of error you received. If the error you got was a fatal error, check the database consistency immediately. Ensure that you can connect to the database, and that you can read and write records. If DBCC CHECKDB does not generate any errors, you might wish to run a backup immediately, just to be safe. Don’t mix this emergency backup with the normal backups.
  6. If connectivity is fine and the database is online, but you are getting the error repeatedly, examine the connection that is causing the error. Check the commands executed by the connection using DBCC INPUTBUFFER and DBCC OUTPUTBUFFER.
  7. Ensure that the user getting the errors has appropriate permissions to perform the activity she is undertaking.
  8. If the connection seems to use more resources than it should, attempt to determine which query is causing performance degradation. Attempt to rewrite the problematic query so that it is more efficient or the error is addressed.
  9. Ensure that you can reproduce the error or poor performance condition on a different server. If you cannot reproduce the error it might be specific to the environment where it is occurring. Check Windows and SQL Server configuration settings on the server generating the error. Compare current settings to the last stable configuration.
  10. Ensure that you have applied the latest service packs to Windows as well as SQL Server patches. Often times errors are corrected in the latest version.
  11. If you are receiving errors specific to advanced SQL Server options, such as replication, clustering, log shipping or support services such as SQL Mail or Full Text Search ensure that the appropriate services are running. In addition, ensure that the accounts you used for SQL Server, SQL Server Agent and any support services have sufficient Windows permissions. Many errors are due to insufficient permissions.
  12. If you cannot resolve an error on your own, don’t hesitate to get help. Your colleagues won’t think less of you if you ask for their opinion. A second pair of eyes can often find an obvious issue that you can’t see because you’re too tired.
  13. If you and your team cannot resolve the problem, consult one or more online resources. Attempt to find the same or a similar experience that has been documented.
  14. If you cannot find the resolution in any of the resources, call Microsoft’s technical support. If you are experiencing hardware issues, you might also wish to contact your hardware vendor.
  15. Once you find a solution be sure to document it. Note the exact error message, configuration settings, SQL statements or stored procedures that caused the issue. Document the resolution along with the explanation of why the problem occurred. Write down a list of things you can do proactively to avoid the same or similar issues in the future.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s