Tuesday, June 30, 2020

Clustered and Non Clustered Indexes

What is the difference between Clustered and Non-Clustered Indexes in SQL Server?

August 28, 2017 by 
Indexes are used to speed-up query process in SQL Server, resulting in high performance. They are similar to textbook indexes. In textbooks, if you need to go to a particular chapter, you go to the index, find the page number of the chapter and go directly to that page. Without indexes, the process of finding your desired chapter would have been very slow.
The same applies to indexes in databases. Without indexes, a DBMS has to go through all the records in the table in order to retrieve the desired results. This process is called table-scanning and is extremely slow. On the other hand, if you create indexes, the database goes to that index first and then retrieves the corresponding table records directly.
There are two types of Indexes in SQL Server:
  1. Clustered Index
  2. Non-Clustered Index

Clustered Index

A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.

Non-Clustered Indexes

A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.
It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.
When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes.

Conclusion

From the discussion we find following differences between clustered and non-clustered indexes.
  1. There can be only one clustered index per table. However, you can create multiple non-clustered indexes on a single table.
  2. Clustered indexes only sort tables. Therefore, they do not consume extra storage. Non-clustered indexes are stored in a separate place from the actual table claiming more storage space.
  3. Clustered indexes are faster than non-clustered indexes since they don’t involve any extra lookup step.

SQL Server Windows function

SQL Server Window Functions calculate an aggregate value based on a group of rows and return multiple rows for each group.
NameDescription
CUME_DISTCalculate the cumulative distribution of a value in a set of values
DENSE_RANKAssign a rank value to each row within a partition of a result, with no gaps in rank values.
FIRST_VALUEGet the value of the first row in an ordered partition of a result set.
LAGProvide access to a row at a given physical offset that comes before the current row.
LAST_VALUEGet the value of the last row in an ordered partition of a result set.
LEADProvide access to a row at a given physical offset that follows the current row.
NTILEDistribute rows of an ordered partition into a number of groups or buckets
PERCENT_RANKCalculate the percent rank of a value in a set of values.

Trace Flags

Trace flags are used to set specific server characteristics or to alter a particular behavior. For example, trace flag 3226 is a commonly used startup trace flag which suppresses successful backup messages in the error log. Trace flags are frequently used to diagnose performance issues or to debug stored procedures or complex computer systems, but they may also be recommended by Microsoft Support to address behavior that is negatively impacting a specific workload. All documented trace flags and those recommended by Microsoft Support are fully supported in a production environment when used as directed. Note that trace flags in this list may have additional considerations regarding their particular usage, so it is advisable to carefully review all the recommendations given here and/or by your support engineer. Also, as with any configuration change in SQL Server, it is always best to thoroughly test the flag in a non-production environment before deploying.

Remarks

In SQL Server, there are three scopes at which trace flags can work: query, session, and global. Query trace flags are active for the context of a specific query. Session trace flags are active for a connection and are visible only to that connection. Global trace flags are set at the server level and are visible to every connection on the server. Some flags can only be enabled as global, and some can be enabled at either global or session scope.
The following rules apply:
  • A global trace flag must be enabled globally. Otherwise, the trace flag has no effect. We recommend that you enable global trace flags at startup, by using the -T command line option. This ensures the trace flag remains active after a server restart. Restart SQL Server for the trace flag to take effect.
  • If a trace flag has either global, session or query scope, it can be enabled with the appropriate scope. A trace flag that is enabled at the session level never affects another session, and the effect of the trace flag is lost when the SPID that opened the session logs out.
Trace flags are set ON or OFF by using either of the following methods:
  • Using the DBCC TRACEON and DBCC TRACEOFF commands.
    For example, to enable the 2528 trace flag globally, use DBCC TRACEON with the -1 argument: DBCC TRACEON (2528, -1). The effect of enabling a global trace flag with DBCC TRACEON is lost on server restart. To turn off a global trace flag, use DBCC TRACEOFF with the -1 argument.
  • Using the -T startup option to specify that the trace flag be set on during startup.
    The -T startup option enables a trace flag globally. You cannot enable a session-level trace flag by using a startup option. This ensures the trace flag remains active after a server restart. For more information about startup options, see Database Engine Service Startup Options.
  • At the query level, by using the QUERYTRACEON query hint. The QUERYTRACEON option is only supported for Query Optimizer trace flags documented in the table below.
Use the DBCC TRACESTATUS command to determine which trace flags are currently active.

Cross Apply and outer apply

SQL Server APPLY operator has two variants; CROSS APPLY and OUTER APPLY

  • The CROSS APPLY operator returns only those rows from the left table expression (in its final output) if it matches with the right table expression. In other words, the right table expression returns rows for the left table expression match only.
  • The OUTER APPLY operator returns all the rows from the left table expression irrespective of its match with the right table expression. For those rows for which there are no corresponding matches in the right table expression, it contains NULL values in columns of the right table expression.
  • So you might conclude, the CROSS APPLY is equivalent to an INNER JOIN (or to be more precise its like a CROSS JOIN with a correlated sub-query) with an implicit join condition of 1=1 whereas the OUTER APPLY is equivalent to a LEFT OUTER JOIN.
You might be wondering if the same can be achieved with a regular JOIN clause, so why and when do you use the APPLY operator? Although the same can be achieved with a normal JOIN, the need of APPLY arises if you have a table-valued expression on the right part and in some cases the use of the APPLY operator boosts performance of your query. Let me explain with some examples.

Create Sample Data for CROSS APPLY and OUTER APPLY examples

Script #1 creates a Department table to hold information about departments. Then it creates an Employee table which holds information about the employees. Please note, each employee belongs to a department, hence the Employee table has referential integrity with the Department table.
--Script #1 - Creating some temporary objects to work on...

USE [tempdb] 
GO
 
IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Employee]') AND type IN (N'U')) 
BEGIN 
   DROP TABLE [Employee] 
END 
GO 

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Department]') AND type IN (N'U')) 
BEGIN 
   DROP TABLE [Department] 
END 

CREATE TABLE [Department]( 
   [DepartmentID] [int] NOT NULL PRIMARY KEY, 
   [Name] VARCHAR(250) NOT NULL, 
) ON [PRIMARY] 

INSERT [Department] ([DepartmentID], [Name])  
VALUES (1, N'Engineering') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (2, N'Administration') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (3, N'Sales') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (4, N'Marketing') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (5, N'Finance') 
GO 

CREATE TABLE [Employee]( 
   [EmployeeID] [int] NOT NULL PRIMARY KEY, 
   [FirstName] VARCHAR(250) NOT NULL, 
   [LastName] VARCHAR(250) NOT NULL, 
   [DepartmentID] [int] NOT NULL REFERENCES [Department](DepartmentID), 
) ON [PRIMARY] 
GO
 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (1, N'Orlando', N'Gee', 1 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (2, N'Keith', N'Harris', 2 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (3, N'Donna', N'Carreras', 3 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (4, N'Janet', N'Gates', 3 ) 

SQL Server CROSS APPLY vs INNER JOIN example

The first query in Script #2 selects data from the Department table and uses a CROSS APPLY to evaluate the Employee table for each record of the Department table. The second query simply joins the Department table with the Employee table and all matching records are produced.
--Script #2 - CROSS APPLY and INNER JOIN

SELECT * FROM Department D 
CROSS APPLY 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = D.DepartmentID 
   ) A 
GO
 
SELECT * FROM Department D 
INNER JOIN Employee E ON D.DepartmentID = E.DepartmentID 
GO 
If you look at the results, you can see see they are the same.
cross apply and inner join result set
Also, the execution plans for these queries are similar and they have an equal query cost, as you can see in the image below.
cross apply and inner join query plan
So what is the use of APPLY operator? How does it differ from a JOIN and how does it help in writing more efficient queries? I will discuss this later.

SQL Sever OUTER APPLY vs LEFT OUTER JOIN example

The first query in Script #3 selects data from Department table and uses an OUTER APPLY to evaluate the Employee table for each record of the Department table. For those rows for which there is not a match in the Employee table, those rows contain NULL values as you can see in case of row 5 and 6 below. The second query simply uses a LEFT OUTER JOIN between the Department table and the Employee table. As expected the query returns all rows from Department table, even for those rows for which there is no match in the Employee table.
--Script #3 - OUTER APPLY and LEFT OUTER JOIN

SELECT * FROM Department D 
OUTER APPLY 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = D.DepartmentID 
   ) A 
GO
 
SELECT * FROM Department D 
LEFT OUTER JOIN Employee E ON D.DepartmentID = E.DepartmentID 
GO 
outer apply and left outer join result set
Even though the above two queries return the same information, the execution plan is a bit different. Although cost wise there is not much difference, the query with the OUTER APPLY uses a Compute Scalar operator (with estimated operator cost of 0.0000103 or around 0%) before the Nested Loops operator to evaluate and produce the columns of the Employee table.
outer apply and left outer join query plan

Joining table valued functions and tables using APPLY operators

 In Script #4, I am creating a table-valued function which accepts DepartmentID as its parameter and returns all the employees who belong to this department. The next query selects data from the Department table and uses a CROSS APPLY to join with the function we created. It passes the DepartmentID for each row from the outer table expression (in our case Department table) and evaluates the function for each row similar to a correlated subquery. The next query uses the OUTER APPLY in place of the CROSS APPLY and hence unlike the CROSS APPLY which returned only correlated data, the OUTER APPLY returns non-correlated data as well, placing NULLs into the missing columns.
--Script #4 - APPLY with table-valued function

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[fn_GetAllEmployeeOfADepartment]') AND type IN (N'IF')) 
BEGIN 
   DROP FUNCTION dbo.fn_GetAllEmployeeOfADepartment 
END 
GO
 
CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment(@DeptID AS INT)  
RETURNS TABLE 
AS 
RETURN 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = @DeptID 
   ) 
GO
 
SELECT * FROM Department D 
CROSS APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID) 
GO
 
SELECT * FROM Department D 
OUTER APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID) 
GO 
apply with table valued function result set
You might be wondering if we can use a simple join in place of the above queries, the answer is NO. If you replace the CROSS/OUTER APPLY in the above queries with an INNER JOIN/LEFT OUTER JOIN, specifying the ON clause with 1=1 and run the query, you will get the error "The multi-part identifier "D.DepartmentID" could not be bound.". This is because with JOINs the execution context of the outer query is different from the execution context of the function (or a derived table), and you cannot bind a value/variable from the outer query to the function as a parameter. Hence the APPLY operator is required for such queries.
So in summary the APPLY operator is required when you have to use a table-valued function in the query, but it can also be used with inline SELECT statements.

Joining table valued system functions and tables using APPLY operators

Let me show you another query with a Dynamic Management Function (DMF). Script #5 returns all the currently executing user queries except for the queries being executed by the current session. As you can see in the script below, the sys.dm_exec_requests dynamic management view is being CROSS APPLY'ed with the sys.dm_exec_sql_text dynamic management function which accepts a "plan handle" for the query and the "plan handle" is being passed from the left/outer expression to the function to return the data.
--Script #5 - APPLY with Dynamic Management Function (DMF)

USE master 
GO
 
SELECT DB_NAME(r.database_id) AS [Database], st.[text] AS [Query]  
FROM sys.dm_exec_requests r 
CROSS APPLY sys.dm_exec_sql_text(r.plan_handle) st 
WHERE r.session_Id > 50           -- Consider spids for users only, no system spids. 
AND r.session_Id NOT IN (@@SPID)  -- Don't include request from current spid. 
Note, for the above query, the [text] column in the query returns all queries submitted in a batch. If you want to see only the active (currently executing) query you can use the statement_start_offset and statement_end_offset columns to trim the active part of the query. 


SQL Server Hints

Hints are options or strategies specified for enforcement by the SQL Server query processor on SELECT, INSERT, UPDATE, or DELETE statements. The hints override any execution plan the query optimizer might select for a query.
Caution: Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that <join_hint>, <query_hint>, and <table_hint> be used only as a last resort by experienced developers and database administrators.

Join Hints
Join hints specify that the query optimizer enforce a join strategy between two tables in SQL Server 2019
Applies to:
Delete, Update, Select
Syntax:
<join_hint> ::= { LOOP | HASH | MERGE | REMOTE }

Arguments

LOOP | HASH | MERGE
Specifies that the join in the query should use looping, hashing, or merging. Using LOOP |HASH | MERGE JOIN enforces a particular join between two tables. LOOP cannot be specified together with RIGHT or FULL as a join type. For more information, see Joins.
REMOTE
Specifies that the join operation is performed on the site of the right table. This is useful when the left table is a local table and the right table is a remote table. REMOTE should be used only when the left table has fewer rows than the right table.
If the right table is local, the join is performed locally. If both tables are remote but from different data sources, REMOTE causes the join to be performed on the site of the right table. If both tables are remote tables from the same data source, REMOTE is not required.
REMOTE cannot be used when one of the values being compared in the join predicate is cast to a different collation using the COLLATE clause.
REMOTE can be used only for INNER JOIN operations.

Remarks

Join hints are specified in the FROM clause of a query. Join hints enforce a join strategy between two tables. If a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query, based on the position of the ON keywords. When a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order.

Examples

A. Using HASH

The following example specifies that the JOIN operation in the query is performed by a HASH join. The example uses the AdventureWorks2012 database.
SELECT p.Name, pr.ProductReviewID FROM Production.Product AS p LEFT OUTER HASH JOIN Production.ProductReview AS pr ON p.ProductID = pr.ProductID ORDER BY ProductReviewID DESC;

B. Using LOOP

The following example specifies that the JOIN operation in the query is performed by a LOOP join. The example uses the AdventureWorks2012 database.
DELETE FROM Sales.SalesPersonQuotaHistory FROM Sales.SalesPersonQuotaHistory AS spqh INNER LOOP JOIN Sales.SalesPerson AS sp ON spqh.SalesPersonID = sp.SalesPersonID WHERE sp.SalesYTD > 2500000.00; GO

C. Using MERGE

The following example specifies that the JOIN operation in the query is performed by a MERGE join. The example uses the AdventureWorks2012 database.
SELECT poh.PurchaseOrderID, poh.OrderDate, pod.ProductID, pod.DueDate, poh.VendorID FROM Purchasing.PurchaseOrderHeader AS poh INNER MERGE JOIN Purchasing.PurchaseOrderDetail AS pod ON poh.PurchaseOrderID = pod.PurchaseOrderID; GO
Query Hints
Query hints specify that the indicated hints should be used throughout the query. They affect all operators in the statement. If UNION is involved in the main query, only the last query involving a UNION operation can have the OPTION clause. Query hints are specified as part of the OPTION clause. Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan.
Applies to: Delete, Update, Insert, Select, Merge

Arguments

{ HASH | ORDER } GROUP
Specifies that aggregations that the query's GROUP BY or DISTINCT clause describes should use hashing or ordering.
{ MERGE | HASH | CONCAT } UNION
Specifies that all UNION operations are run by merging, hashing, or concatenating UNION sets. If more than one UNION hint is specified, the Query Optimizer selects the least expensive strategy from those hints specified.
{ LOOP | MERGE | HASH } JOIN
Specifies all join operations are performed by LOOP JOIN, MERGE JOIN, or HASH JOIN in the whole query. If you specify more than one join hint, the optimizer selects the least expensive join strategy from the allowed ones.
If you specify a join hint in the same query's FROM clause for a specific table pair, this join hint takes precedence in the joining of the two tables. The query hints, though, must still be honored. The join hint for the pair of tables may only restrict the selection of allowed join methods in the query hint. For more information, see Join Hints (Transact-SQL).
EXPAND VIEWS
Specifies the indexed views are expanded. Also specifies the Query Optimizer won't consider any indexed view as a replacement for any query part. A view is expanded when the view definition replaces the view name in the query text.
This query hint virtually disallows direct use of indexed views and indexes on indexed views in the query plan.
The indexed view remains condensed if there's a direct reference to the view in the query's SELECT part. The view also remains condensed if you specify WITH (NOEXPAND) or WITH (NOEXPAND, INDEX(index_value_ [ ,...n ] ) ). For more information about the query hint NOEXPAND, see Using NOEXPAND.
The hint only affects the views in the statements' SELECT part, including those views in INSERT, UPDATE, MERGE, and DELETE statements.
FAST number_rows
Specifies that the query is optimized for fast retrieval of the first number_rows. This result is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.
FORCE ORDER
Specifies that the join order indicated by the query syntax is preserved during query optimization. Using FORCE ORDER doesn't affect possible role reversal behavior of the Query Optimizer.

Table Hints
Table hints override the default behavior of the Query Optimizer for the duration of the data manipulation language (DML) statement by specifying a locking method, one or more indexes, a query-processing operation such as a table scan or index seek, or other options. Table hints are specified in the FROM clause of the DML statement and affect only the table or view referenced in that clause.

KEEPDEFAULTS
Is applicable only in an INSERT statement when the BULK option is used with OPENROWSET.
Specifies insertion of a table column's default value, if any, instead of NULL when the data record lacks a value for the column.
For an example that uses this hint in an INSERT ... SELECT * FROM OPENROWSET(BULK...) statement, see Keep Nulls or Use Default Values During Bulk Import (SQL Server).
FORCESEEK [ (index_value(index_column_name [ ,... n ] )) ]
Specifies that the query optimizer use only an index seek operation as the access path to the data in the table or view.

Using a Table Hint as a Query Hint

Table hints can also be specified as a query hint by using the OPTION (TABLE HINT) clause. We recommend using a table hint as a query hint only in the context of a plan guide. For ad-hoc queries, specify these hints only as table hints. For more information, see Query Hints (Transact-SQL).

Permissions

The KEEPIDENTITY, IGNORE_CONSTRAINTS, and IGNORE_TRIGGERS hints require ALTER permissions on the table.

Examples

A. Using the TABLOCK hint to specify a locking method

The following example specifies that a shared lock is taken on the Production.Product table in the AdventureWorks2012 database and is held until the end of the UPDATE statement.
UPDATE Production.Product WITH (TABLOCK) SET ListPrice = ListPrice * 1.10 WHERE ProductNumber LIKE 'BK-%'; GO

B. Using the FORCESEEK hint to specify an index seek operation

The following example uses the FORCESEEK hint without specifying an index to force the query optimizer to perform an index seek operation on the Sales.SalesOrderDetail table in the AdventureWorks2012 database.
SELECT * FROM Sales.SalesOrderHeader AS h INNER JOIN Sales.SalesOrderDetail AS d WITH (FORCESEEK) ON h.SalesOrderID = d.SalesOrderID WHERE h.TotalDue > 100 AND (d.OrderQty > 5 OR d.LineTotal < 1000.00); GO