It doesn’t matter if you are managing one database server or 100 database servers, as a database administrator (DBA) you have the responsibility of keeping those SQL Server databases fully operational. In some cases, the assigned DBA (maybe just an accidental DBA) ends up in pure problem-fighting mode, moving from one crisis to the next. This sort of environment is difficult, toxic, and unsustainable for any long-term maintenance. No one likes being under this kind of stress and facing constant interruption.
To do the responsibilities of a DBA better, and to reduce the stress of database maintenance, there are some things you could be doing that will make your life easier.
1. System Inventory
Once you perform a full system inventory, including a list of all database servers, what databases are on each server, and who uses each database, you will be in a better position to maintain those databases. You don’t want to be told there is a problem with a specific database and find you didn’t even know it existed. Once you have an inventory, you can track database backups, system patches, etc.
It’s critical to have an up-to-date inventory of what instances and databases you have at your enterprise and under your control. This is the only way you can properly manage them, consolidate where necessary, and correctly scope and plan projects and upgrades. It also helps you establish limits to your responsibilities by publishing a list of known instances for which you accept responsibility, with agreement from the various teams in your organization. You can define support policies for known instances and insist that new instances adhere to your configuration guidelines before you support them.
This can be as easy as a simple spreadsheet that you update every 90 days or so, where each row is a database; with columns for database name, server name, database owner, database size, backup schedule, file locations, etc.
2. Standard Configuration
Once you have a list of databases from a system inventory, you will start seeing differences between those different instances. The solution is to standardize your configuration to the greatest extent possible in terms of drive letters, server configuration options, database settings, database maintenance, security settings, etc. While you might not have a standard configuration that will work for all instances, you should understand what is the best configuration and why one or more servers needs to deviate from that standard. Include those unusual requirements on you System Inventory.
3. Understand The Drives
You need to understand the hardware that the servers are running on to understand some of the performance characteristics. There are several factors related to the drive subsystem that can affect your SQL Server instances. You need to be aware of these I/O issues and their potential impact:
- The overall capacity of the I/O subsystem in terms of read/write throughput and disk space. The disks must be able to cope with workload peak demands and still provide space for the data volume to grow before you have to acquire more capacity. By identifying I/O bottlenecks and moving data and/or log files to other portions of the I/O subsystem, you can more evenly balance the load and maybe improve performance.
- The redundancy capabilities of the I/O subsystem in terms of RAID level and whether it can do things like split-mirror backups and any form of mirroring/replication (at the I/O subsystem level, not the SQL Server level). It’s important to protect your data and log files from drive failures and other potential problems. This is often a compromise, as RAID-10 offers better redundancy than RAID-5 but RAID-10 is much more expensive. Read this article for more information on using RAID-5 or RAID-10.
- The I/O subsystem is configured correctly in terms of RAID stripe size, NTFS allocation unit/cluster size and partition alignment.
4. Maintenance Plan
You can’t just let the database run and wait until someone calls with a performance issue. Statistics become outdated, which leads to bad query plans and poor performance. Hard drives can get corrupted, drives run out of space, and databases need backups.
You can tackle all these problems by having a comprehensive maintenance plan tailored for your databases. The best starting point for building your own maintenance plan is the comprehensive (and free) script from Ola Hallengren. Investigate your options and get ahead of any issues before users start calling.
5. System Security
The first thing you need to do to address security is to patch your systems to make sure they are updated with the latest security updates. Then you need to review who has access to your databases. Check the server access, network security, server security, and Active Directory access. Once you have that resolved, start reducing who has access to your databases, one database at a time. Once you have security resolved, develop a formal procedure to manage any new security changes or user requests to make sure toy have a really good understanding of who has access to your databases and why they need that access.
6. Understand Your Developers
If you have an active development team in your organization, then you will eventually run into issues between the developers and the database team. Developers want resources to change existing databases or add new databases. This is working against you desire to keep the databases unchanged and running at 100% for the users. You will need to understand what they are doing, why they are doing it, and irk with the developers to make sure you are controlling them as much as possible to limit the potential negative impact on your production environment.
You can make your job a lot smoother by engaging proactively with the development team. Help design and implement a separate development and testing environment. Conduct design reviews with someone from the DBA team present, and test code adequately before being put into production—hopefully avoiding damaging errors that can further erode team relationships.
7. Disaster Recovery Planning
No matter how bullet-proof your infrastructure may be, you must have a contingency plan for when disaster strikes. You can’t predict corruption, power outages, fires, tornadoes, accidental data loss, or a long list of other potential problems. You need a written plan to address issues and recover from these problems that matches the companies expectations.
Work with management to define what is the acceptable amount of data that can be lost and an acceptable timeline for system recovery. Create a plan on how you would recover data from various kinds of data loss, determine how all your databases figure into your company’s business-continuity plan. Work out the relative importance of all the databases and instances so you can prioritize disaster recovery.
You’ll also need to implement technologies to help you know when problems occur, such as page checksums, consistency checks, SQL Agent alerts, Windows alerts. This disaster recovery infrastructure will help you protect data with backups, log shipping, replication and database mirroring; and potentially failover to a redundant system with database mirroring or failover clustering.
8. Database Backups
No matter how good your high availability and disaster recovery planning is, you cannot avoid taking regular backups of your databases. If your database is destroyed or fatally corrupted, your only recourse may be to restore from your last set of backups. If you don’t have any backups your company could suffer major consequences. Not only do you need to take backups, you also need to regularly practice restoring from them so you know they’ll work when needed.
You should make sure each database instance is backed up on a regular schedule, and that you have practiced restoring the database at least once, using the actual backups created. If you are doing a full database backup each day, then restore that backup o a test server to verify you understand the steps and what order they will be performed. It will also help you understand the drive space required, how long it will take, what user permissions and application logins will have to change, and what other issues you may see if you actually have to restore the database.
Backup are worthless if you can’t restore the databases. You should have a formal, written plan for restoring databases.
9. Monitor Performance
Performance tuning takes up the majority of a DBA’s time, but there are plenty of ways to streamline the process:
- Create a performance baseline. Measure performance changes against the established baseline, not user perceptions or a gut feeling that performance has changed.
- Be able to recreate specific performance characteristics. Being able to put you servers under a heavy load will be helpful in troubleshooting problems.
- Use the waits-and-queues methodology to quickly pinpoint performance problems.
- Use the performance Data Collector feature in SQL Server.
- Establish a Maintenance Plan to keep your databases at peak performance.
- Carefully plan and execute your indexing strategy with tools such as the Database Engine Tuning Advisor, missing index Dynamic Management Views (DMVs) and index usage DMV.
10. Continuous Learning
Production support task can get very complicated and you might get overwhelmed. It is vital that you know when to call it quits and ask for help. You have to know your limitations and accept that you can’t know everything about SQL Server. You should have a couple of good resources for reliable information online, like books online from Microsoft, popular community sites like SQL Server Central, and this blog.
Another fast way to find help is to tap into the local SQL Server user group, and social media SQL Server community groups on Twitter, Facebook, LinkedIn, etc.
You should also attend a SQL Server-specific conference such as the annual PASS Community Summit, the bi-annual SQL Server Connections or more frequent SQL Saturdays. Don’t forget there are some great books published on various SQL Server subjects.