Optimize SQL Server database query

LANGUAGES TECHNOLOGY

Posted on 2016-08-23 02:52:00


Application performance depends on many factors, including a very important factor that is the time to handle SQL Server T-SQL statements. Sometimes designing databases and complex queries that impede the performance of T-SQL statements.

How to write the code each T-SQL statement may also cause SQL Server to work harder to process the query. The following article will show you the good habits when writing code should exercise T-SQL. Thereby you can understand how to write optimized queries, utilize SQL server resources and improve performance.

Specify the column name in the command  SELECT

SELECT * FROM MyTable;

You've written the above statement many times?

The use of an asterisk (*) for the database that you want to return all columns from a table (or tables) are declared in the FROM clause. This is not a good habit even if you want all columns are returned applications. Preferably you should write the name of each column in the following table:

SELECT ID, Description, DateModified FROM MyTable;

The declaration explicitly name the columns in the SELECT statement brought a lot of benefits. First, SQL server will return only data required for the application rather than a pile of data in which there are many things your app does not need to. By request returns only the required data, you will help to optimize the workload of SQL server to take to collect all the columns of information you requested. Also, by not using the asterisk (*), so you have to minimize traffic flowing over the network (number of bytes) required to send the data related to the application SELECT statement.

In case you use an asterisk (*) and have someone add a new column to the table, your application will start receiving data for this column without changing application code. If your application waiting for a certain number of columns that are returned, it will fail as soon as additional people into a new column in the reference table. By explicitly declaring the name of each column in the SELECT statement, the application will always receive a fixed number of columns returned, even when someone add a new column to the table referenced in the SELECT statement. Thus, you have to help applications avoid the potential risks related to the database changes can happen to any table you refer to in the SELECT statement.

Specify the name of the column in the statement INSERT

Like the above, you should specify the name of each column you want to insert data into the INSERT statement. Do not write the INSERT statement as follows:

INSERT INTO MyTable VALUES ( 'A', 'B', 'C');

When you write this type, SQL server requires just the right three columns defined in the table MyTable, and the value "A" will be inserted into the first column, "B" in the second column, "C" in the last column . If someone adds a new column to the table MyTable, your application will fail:

Msg 213, Level 16, State 1, Line 1 
Column name or number of supplied values ​​does not match table definition. 
(Column name or number does not match the table value)

So, instead of writing such INSERT statements, you should write as follows:

INSERT INTO MyTable (number1, number2, SO3) VALUES ( 'A', 'B', 'C');

By writing on, when someone adds a new column named "SO4" MyTable table, the INSERT statement continues to work on the condition column "SO4" is created with the default value or allows NULL DEFAULT.

Add prefix to the wildcard to speed search

Use the wildcard (wildcard) appropriately can improve query performance. For example, you want to search in the table LastNames AdventureWorks.Person.Contact all end in "sen". Assuming that you have built an index on LastName column. If you write the following search queries:

SELECT  Distinct  LastName 
Person.Contact FROM 
WHERE LastName LIKE '% lotus'

Statements using characters percent (%) as a substitute for zero or more characters followed by the string "lotus" in the LastName field. This makes the SQL server do the scan in search indexes all names ending in "sen" to resolve queries. This is significant because until the entire table is scanned, SQL Server can not guarantee that has found all the records that have LastName end in "sen".

Also, if you are looking for the right record with six characters long LastName and ended with "sen", you can write the following search queries:

Distinct SELECT LastName 
Person.Contact FROM 
WHERE LastName LIKE '___sen'

Here, statements using the underscore character (_) to replace a single character. This example is similar to the example above and use an index scan operation to resolve. Again, SQL server know that need to scan the entire index before the guarantee had found all six characters long name and ending with "sen" in Person.Contact table.

SQL server can return results more quickly if it does not have to read the entire index using scanning mechanisms. SQL Server is smart enough to recognize when you add the prefix before replacing characters (%, _, etc.), It can use a search operation conducted index to resolve search criteria. Here is an example of the search request statement returns all records with lastname starting with letters "A" and ends with "sen":

Distinct SELECT LastName 
Person.Contact FROM 
WHERE LastName LIKE 'A% lotus'

By placing the letter "A" forward percent sign (%) in search queries, you have to know that its SQL server can use an index search operation to resolve queries. Once SQL server to record the final reading with lastname starting with the letter "A", it said that there is no public record lastname starts with the letter "A" again and will stop.

Not all are required wildcard prefix to use a SQL Server index search operation to address queries. Here is an example in which the use of the expression statement represents a set of alternative characters and still allows the server to resolve SQL query using an index search operation:

Distinct SELECT LastName 
Person.Contact FROM 
WHERE LastName LIKE '[AM]% lotus'

T-SQL statements on the search the entire lastname starting with any one character in the range "A" through "M" and ending with "sen". The syntax to use alternative characters which specify a set of characters can also call the index scan operation to resolve the search criteria.

DISTINCT only when needed

Put the keyword DISTINCT in the SELECT statement will remove duplicate results of the results returned by the query. It causes SQL Server to perform more operations SORT to sort the data in order to identify and remove the duplicates. So, if you know in advance the results returned will not duplicate, do not use the keyword DISTINCT in T-SQL statements. With the use of the keyword DISTINCT in the query, you asked to SQL Server do the sorting results to remove duplicates. This is extra work part of the SQL server and not mean anything if your result set includes only unique records.

Only use  UNION  when needed

Like the case of keywords DISTINCT, UNION operator requires additional SORT to manipulate SQL Server can remove duplicate results. If you know in advance the list of results returned no results yet, the same sort that manipulate SQL server to perform becomes unnecessary. So when you need to use the UNION to join two sets of records with each other, which is unique records do not overlap, the better you should use the UNION ALL. UNION ALL operator does not eliminate duplicate records so will mitigate part of the job for SQL Server in the process due to not do the sort. Reducing the work for SQL Server means that processing operations will be carried out faster.

Apply good habit to  code faster

There are many reasons to exercise yourself familiar with writing code optimization. When you apply these little tricks mastered above and turn it into a habit when writing T-SQL statement, you will avoid the risks that may occur when the database changes, while improving performance server's productivity by reducing network traffic flowing through. These simple tips will also help you make better use of server resources in processing the command.

Versatile storage procedures

Before entering the problem how to optimize storage process Versatile (Jack Of All Trades Stored Procedure - SP), we need to have a little concept of this type of procedure. Versatile storage procedures are procedures adopted many different parameters related to the procedures. Based on the parameters passed to, stored procedures versatile determine what records will be returned. Here is an example of a stored procedure versatile:
CREATE PROCEDURE JackOfAllTrades
(
@SalesOrderID Int = NULL
, @ SalesOrderDetailID int = NULL
, @ CarrierTrackingNumber nvarchar (25) = NULL
)
AS
SELECT * FROM AdventureWorks.Sales.SalesOrderDetail
WHERE
(SalesOrderID = @SalesOrderID or @SalesOrderID IS NULL)
AND (SalesOrderDetailID = @SalesOrderDetailID or @SalesOrderDetailID IS NULL)
AND (CarrierTrackingNumber = @CarrierTrackingNumber or @CarrierTrackingNumber IS NULL)
WOOD
Here SP JackOfAllTrades accepts three different parameters. All these parameters have a default value is NULL. When a value is passed, it will be used as a parameter in the WHERE clause to bind the records returned. Each parameter in the SP was used to build a complex WHERE clause contains the following logic in the WHERE clause for each parameter passed:
(<TableColumn> = @PARM or @PARM IS NULL)
Logic says if @PARM be on TV non-null value, it will bind returns records to ensure that <TableColumn> @PARM equal value.The second part of this condition is "@PARM IS NULL". This section means if no value @PARM passed (equal to NULL) does not bind data based on this parameter.
Check out the execution of JackOfAllTrades typical SP. Suppose we execute SP with the following command:
EXEC JackOfAllTrades @SalesOrderID = 43659
When running the command, execution plan looks like this:
Here you can find for each single parameter is passed on, the server decided to use manipulation "index scan". SP SELECT statement's unique binding @SalesOrderID column - a part of the clustered index key. You might think that SQL Server is smart enough to realize that handle versatile storage procedures by manipulating "search index" will be working faster than the index cluster. But as we can see on the execution plan, SQL server not so smart. Why?
When servers see conditions "@PARM IS NULL", it's like a constant for the SQL server. So server as yet no useful index processing conditions "(<TableColumn> = @ PARM1 or @ PARM1 IS NULL)" due to the constant is in the WHERE clause.Therefore, SQL server operation decided to use "scan index" to solve the problem. Stored procedures and more versatile parametric performance is reduced by the impact of the number of scans necessary actions for each parameter passed.

Discover caching scheme

You have to take advantage of the flow diagram on the cache yet? You have exploited the cache scheme to be? Your application only use them once or several times to take advantage? You have many cached diagram for the same query in the procedure cache at the same time? Gaps cached diagrams used is how much? Above are some of the questions you need answered to make sure you have optimized caching procedures and minimize the number of cache scheme which application created. There are a few minor problems in writing T-SQL statements to your cause SQL Server to perform more work to compile and cache the execution plan for the same code.

Before SQL Server can process the T-SQL code, it needs to create an execution plan. To create an execution plan, SQL Server first have to spend valuable resources such as CPU to compile code T-SQL. When the diagram is created, it will be cached to be reused when the application calls the same T-SQL statement more than once. You can improve performance if you write your SQL Server T-SQL statement to enhance reuse cached diagram with regular T-SQL statements to be executed. 
With the introduction of SQL Server 2005, Microsoft provide the DMV (dynamic management Views - dynamic management window) allows you to explore the flow diagram. By using the DMV, you can learn a lot about the cache scheme. The following is a summary list of the things you might recognize:

  • The text relating to cache scheme
  • Total cache scheme is executed
  • Diagrams cache size

In the following paragraphs of this article, I will teach you how to use the DMV to find out information cached diagrams.

Create multiple diagrams for the notes or extra spaces

I'm sure you all support the idea of ​​putting code into stored procedures (Stored Procedure - SP). We do this in order to increase the reuse of code within a single application or across multiple applications. However, not all of the code to be executed by SQL Server are in the SP. Some applications can be written in T-SQL commands in-line format (raw command). If you are writing T-SQL code in the rough, you need to be careful when a note or put spaces because it can cause SQL Server to create multiple diagrams cached for the same T-SQL code .

Here is an example of two T-SQL statements differ:

SELECT * FROM AdventureWorks.Production.Product 
GO 
SELECT * FROM AdventureWorks.Production.Product - return records 
GO

As you can see, I have two T-SQL statement the same. Both returns all records from the table AdventureWorks.Production.Product. So you think SQL Server will create as many diagrams cached when running this code?To answer this question, I will find out information cached diagrams using the DMV in SQL Server 2005 and SQL Server 2008. To view the diagrams created by two T-SQL statement above, I will run the following code:

DBCC FREEPROCCACHE 
GO 
SELECT * FROM AdventureWorks.Production.Product 
GO 
SELECT * FROM AdventureWorks.Production.Product - return records 
GO 
AS SELECT stats.execution_count exec_count, 
p.size_in_bytes as [size], 
[sql]. [Text] as [plan_text] 
FROM sys.dm_exec_cached_plans p 
outer apply sys.dm_exec_sql_text (p.plan_handle) sql 
join sys.dm_exec_query_stats stats stats.plan_handle = p.plan_handle ON 
GO

In the above code, I first liberation procedure cache by running DBCC FREEPROCCACHE. This command remove the entire execution plan in memory. However I would also like to note that the word should not use this command when working in the business because it will remove the entire cache scheme. This can cause major impact to your job due to the common scheme were recompiled. After liberation procedure cache, I run to two different SELECT statement. Finally, I linked information from the DMV to return information cached diagrams of two SELECT statements. Here are the results received when running the code above:

exec_count plan_text size 
---- - --- 
1 40 960 SELECT * FROM AdventureWorks.Production.Product - return records 
1 40960 SELECT * FROM AdventureWorks.Production.Product

As you can see, these two SELECT statements on creating two different cached diagrams and charts are executed every 1 time (exec_count number). The reason this happens is because the two SELECT statements are not entirely the same. Monday SELECT statement is a little different because there are more notes. In addition, keep an eye diagram size: 40960 bytes - memory size is too large for a T-SQL statement is simple. Therefore, you should be careful when adding notes to code, to avoid creating multiple server redundancy scheme.

Another reason that led to the creation of cached diagram for the T-SQL statement that is similar spaces. Here are two statements are identical except for the space:

SELECT * FROM AdventureWorks.Production.Product 
GO 
SELECT * FROM AdventureWorks.Production.Product 
GO

As you can see, the second statement contains a few extra spaces between FROM and object names. The extra white space is cause SQL server think these are two different statements, thereby leading to the creation of two different cached diagrams for two statements. In this case, obviously you can easily recognize the difference between the two statements because the white space in the middle of the statement. But if you accidentally add a space in front of the SELECT clause or the end of the statement, you will not be able to recognize the space and the command will look identical. However, SQL Server, it can be seen, and therefore it creates cached diagram because the extra white space there.

When SQL Server look at the code, it will be compared with the existing scheme in the procedure cache. If identifying code identical diagrams available cache, SQL Server does not need to compile and save the diagram into memory again. SQL Server will reuse schemes in the cache for the same code. To optimize the source code, you need to ensure the reuse cached diagrams whenever possible.

When you're building applications where the source code can use T-SQL statements without using SP, you must be careful to ensure the scheme receive potentially reusable as possible. We often use the method to copy - paste when you want to use the same code in different parts of the application. However as you can see in the example above, you need to be careful when performing this operation. Just a few extra white space or a small note that SQL Server creates different cached diagrams.

Lift up the maximum performance and minimize memory

To optimize the source code, if only interested in database design is not enough, you also need to pay attention to smaller details such as the spaces and notes. If you do not pay attention to the details surrounding the T-SQL statement the same, you can make SQL server creates cached diagrams. Maybe having some redundancy scheme cached in memory is not too important, however, is a programmer, we need to try their best to improve server performance and reduce resource use. And one way to accomplish this goal, which is to avoid creating cached diagram for the T-SQL statement the same.

IN BRIEF:
- Table (table) must have a primary key (Primary Key). 
- Table (table) must have at least 01 clustered index. 
- Table (table) must have a number of suitable non-clustered index. 
- Non-clustered index must be created on columns (column) of the table (table) based on demand queries. 
- based on the arrangement following order when any index is created: 
a. WHERE clause, b. JOIN clause, c. ORDER BY clause, d. SELECT clause 
- Views should not be used in place of table (table) original. 
- Triggers should not be used if not necessary, to enter the trigger on the handle of procedure (stored procedure). 
- Removed the direct query commands followed and replaced by procedure (stored procedure). 
- Must be at least 30% of hard disk space on the partition containing the database. 
- If possible go UDF (user defined function) to SP (stored procedures). 
- Only SELECT the columns needed, should not SELECT *. 
- remove joins from the table (table) is not necessary. 
- Limit the use of the pointer (cursor) 
- Ensure that the hardware meets the system's needs.

Source: vadesign