There are several ways that you can make you queries faster. I have looked at several blog posts and articles from around the internet, as well as my own experience, to put together some things to look at when trying to make your queries faster.
1. Don’t use
UPDATE instead of
CASE – Make sure the way you are collecting and updating data is the fastest way possible.
This issue is very common, and though it’s not hard to spot, many developers often overlook it because using
UPDATE has a natural flow that seems logical.
Take this scenario, for instance: You’re inserting data into a temp table and need it to display a certain value if another value exists. Maybe you’re pulling from the Customer table and you want anyone with more than $100,000 in orders to be labeled as “Preferred.” Thus, you insert the data into the table and run an
UPDATE statement to set the CustomerRank column to “Preferred” for anyone who has more than $100,000 in orders. The problem is that the
UPDATE statement is logged, which means it has to write twice for every single write to the table. The way around this, of course, is to use an inline
CASE statement in the SQL query itself. This tests every row for the order amount condition and sets the “Preferred” label before it’s written to the table. The performance increase can be staggering.
2. Don’t blindly reuse code – Just because it has been used in the past, or for someone else, doesn’t make it the best solution for your use today.
This issue is also very common. It’s very easy to copy someone else’s code because you know it pulls the data you need. The problem is that quite often it pulls much more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the
WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.
3. Do pull only the number of columns you need – Make sure you query only the exact rows or fields you need.
This issue is similar to issue No. 2, but it’s specific to columns. It’s all too easy to code all your queries with
SELECT * instead of listing the columns individually. The problem again is that it pulls more data than you need. I’ve seen this error dozens and dozens of times. A developer does a
SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you’re processing so much more data than you need it’s a wonder the query returns at all. You’re not only processing more data than you need, but you’re also taking resources away from other processes.
4. Don’t double-dip – One simple query is almost always better than several queries to get the same answer.
Here’s another one I’ve seen more times than I should have: A stored procedure is written to pull data from a table with hundreds of millions of rows. The developer needs customers who live in California and have incomes of more than $40,000. So he queries for customers that live in California and puts the results into a temp table; then he queries for customers with incomes above $40,000 and puts those results into another temp table. Finally, he joins both tables to get the final product.
Are you kidding me? This should be done in a single query; instead, you’re double-dipping a superlarge table. Don’t be a moron: Query large tables only once whenever possible — you’ll find how much better your procedures perform.
A slightly different scenario is when a subset of a large table is needed by several steps in a process, which causes the large table to be queried each time. Avoid this by querying for the subset and persisting it elsewhere, then pointing the subsequent steps to your smaller data set.
5. Do know when to use temp tables – Know when to use ten tables and test alternatives
This issue is a bit harder to get a handle on, but it can yield impressive gains. You can use temp tables in a number of situations, such as keeping you from double-dipping into large tables. You can also use them to greatly decrease the processing power required to join large tables. If you must join a table to a large table and there’s a condition on that large table, you can improve performance by pulling out the subset of data you need from the large table into a temp table and joining with that instead. This is also helpful (again) if you have several queries in the procedure that have to make similar joins to the same table.
6. Do pre-stage data – Load data you need quickly into smaller tables, like reporting data, for faster results.
This is one of my favorite topics because it’s an old technique that’s often overlooked. If you have a report or a procedure (or better yet, a set of them) that will do similar joins to large tables, it can be a benefit for you to pre-stage the data by joining the tables ahead of time and persisting them into a table. Now the reports can run against that pre-staged table and avoid the large join.
You’re not always able to use this technique, but when you can, you’ll find it is an excellent way to save server resources.
Note that many developers get around this join problem by concentrating on the query itself and creating a view-only around the join so that they don’t have to type the join conditions again and again. But the problem with this approach is that the query still runs for every report that needs it. By pre-staging the data, you run the join just once (say, 10 minutes before the reports) and everyone else avoids the big join. I can’t tell you how much I love this technique; in most environments, there are popular tables that get joined all the time, so there’s no reason why they can’t be pre-staged.
7. Do delete and update in batches – Greatly improve performance and response times by running in batches
Here’s another easy technique that gets overlooked a lot. Deleting or updating large amounts of data from huge tables can be a nightmare if you don’t do it right. The problem is that both of these statements run as a single transaction, and if you need to kill them or if something happens to the system while they’re working, the system has to roll back the entire transaction. This can take a very long time. These operations can also block other transactions for their duration, essentially bottlenecking the system.
The solution is to do deletes or updates in smaller batches. This solves your problem in a couple ways. First, if the transaction gets killed for whatever reason, it only has a small number of rows to roll back, so the database returns online much quicker. Second, while the smaller batches are committing to disk, others can sneak in and do some work, so concurrency is greatly enhanced.
Along these lines, many developers have it stuck in their heads that these delete and update operations must be completed the same day. That’s not always true, especially if you’re archiving. You can stretch that operation out as long as you need to, and the smaller batches help accomplish that. If you can take longer to do these intensive operations, spend the extra time and don’t bring your system down.
Database design is sometimes a neglected aspect of building an application. The need to rapidly build a database to store the data and get coding results in tables containing data types which are incorrectly sized.
So lets look at a simple example of what I mean using MS SQL Server.
The system we are designing for allows advertisements to be placed in one county in England per advert. Whilst this does not sound like a terribly flexible system, it serves the purpose in trying to demonstrate the issue with badly sized data types.
So here is the create table script. (I have omitted the Customer create table script which would theoretically exist here to ensure that the relationship on Advert.CustomerID could be created).
CREATE TABLE County ( CountyID INT PRIMARY KEY IDENTITY(1,1), County VARCHAR(20) ); CREATE TABLE Advert ( AdvertID INT PRIMARY KEY IDENTITY(1,1), CustomerID INT NOT NULL REFERENCES Customer(CustomerID), DatePlaced DATETIME DEFAULT GETDATE() NOT NULL, CountyID INT NOT NULL REFERENCES County(CountyID), AdvertText VARCHAR(MAX) NOT NULL, );
So, two tables and where in this design are there issues?
How many counties are there in the England? Well certainly not this many: 2,147,483,647 which is the limit of the data type above 0. Yes you can store negatives also up to -2,147,483,648.
So what, they’re just numbers right? What’s the problem? The problem is the storage needed. The INT data type will require 4bytes of storage. The number of English counties is less than 255. Is it ever going to get larger than 255? I can’t see that happening any time soon so better to design the County table to be CountyID TINYINT and the related field in Advert to also be a TINYINT. This data type uses 1byte of storage. That’s a 75% reduction and if you had an extremely large Advert table it could all start to add up.
The other field in question is the DatePlaced. Always question whether you need the increased size of the types you are setting. In my example, I don’t think I am too fussy about the extra precision which the DATETIME data type gives me, so I should have designed that as a SMALLDATETIME.
But storage is cheap right? Yes but this is not the point. It’s all about I/O and memory pressure. The smaller the data types are, the less pages have to be read from disk into the memory buffers. If the memory buffers are not constantly being emptied to make way for other data, then your database server and application will be a lot happier as there will be less read requests to disk.
9. Adding or Removing Indexes – This can be helpful on extremely large and unused indexes
Identify queries which are performing badly. Can you add indexes which would make them faster? SQL Server Management Studio will even recommend which indexes to add when you view the execution plan. Identify where you need to add clustered indexes and put them in.
Identify unused indexes and remove them. For each record inserted to a table, every index is updated. If you do not need an index, get rid of it. It will help speed up inserts.
Try and build SQL queries which deal with sets of data. Leave any row by row processing to the application layer.
Database use statistics to make basic query optimization suggestions and to develop effective query plans. You need to make sure the statistics are updated on a regular basis.
Make sure you analyze your disk drive performance and use simple optimization techniques to spread you data files over multiple physical hard drives to improve performance. You should also verify your server is configured using the best RAID configuration possible for your environment.