Data to Analytics: Insert

Thursday, 7 July 2016

Tuning Performance of Bulk INSERT / Reduce transaction log file growth by Minimal Logging

One of my friend asked me that is there any way in SQL Server to minimise the transaction logging for a table so that INSERT operation run faster? He is expert in ORACLE Database and In Oracle database have feature of NOLOGGING that you can set for a table.

Answer is NO. SQL Server don't have any direct keyword as NOLOGGING but we have alternative to reduce the logging.

Requirement : Suppose you have a big table like Fact table where every night you used to load huge data around 100GB, here transact is not much matter as its not a transnational table. Now when you load this much of data then Transnational Log file grow very fast, also the INSERT operation will run slow.

We can tune the performance of the INSERT operation in several ways; here we will discuss one of the way by reducing the logging. To reduce the logging we need to change the database Recovery model to SIMPLE or BULK_LOGGED. Even it is in FULL recovery model, you can use TABLOCK table hints to minimise the logging. Finally you can use below steps to boost the INSERT performance:

Set the Database Recovery model to SIMPLE and use the TABLOCK.
Set the Database Recovery model to BULK_LOG and use the TABLOCK.
Set the Database Recovery model to FULL and use the TABLOCK.

Here option #1 is the best way, but if your database need point of time recovery and you used to take the transaction log backup then this option will not work for it. So you can use the option #3. When you used to dump the data from heterogeneous file you can use the option #2.These will also help to reduce the transaction log file growth:

Below are the analyses to see the performance difference and Transaction Log file size differences:

DEMO:

I have created 6 Database as below with different Recovery model :

create database bulktestsimple

alter database bulktestsimple set recovery simple

create database bulktestsimpleLock

alter database bulktestsimpleLock set recovery simple

create database bulktestFull

alter database bulktestFull set recovery Full

create database bulktestFullLock

alter database bulktestFullLock set recovery Full

create database bulktestbul

alter database bulktestbul set recovery bulk_logged

create database bulktestbulLock

alter database bulktestbulLock set recovery bulk_logged

Let's See the size of Log file of all the databases, here all log file size are 63 MB.

Now create table in every databases and load the data.

. Database with simple recovery mode

use bulktestsimple

GO

--Creating blank table

select a.* into DumpTable from AdventureWorks2008.sys.objects a

cross join AdventureWorks2008.sys.objects b

cross join sys.tables

where 1=2

--Loading Data

insert into DumpTable

select a.* from AdventureWorks2008.sys.objects a

cross join AdventureWorks2008.sys.objects b

cross join master.sys.tables

2.Database is in simple Recovery Mode and load data with TABLOCK hints:

select a.* into DumpTable from AdventureWorks2008.sys.objects a

cross join AdventureWorks2008.sys.objects b

cross join sys.tables

where 1=2

--Loading Data

insert into DumpTable with (tablock)

select a.* from AdventureWorks2008.sys.objects a

cross join AdventureWorks2008.sys.objects b

cross join master.sys.tables

I have did same for all other 4 Databases and then we find below :

Log file size after data loading

Time taken by different databases for same set of data load :

1. Simple Recovery mode , without table hints --59 Second

2. Simple Recovery mode , with table hints -- 16 Second

3. Full Recovery mode , Without table hints --1. 4 Minutes

5. Full Recovery mode , With table hints --25 Second

Conclusion: With the help of TABLOCK we can reduce the Logging and increase the performance, Its only helpful for the Non transaction tables.

Monday, 24 August 2015

How MERGE statement works

Generating new ID without Identity

Let’s see how Merge statement works, either it will read all the data from the source and start doing operation on the target or will it take row by row from the source and join with target and then update /insert the records in the target table.

Here we are trying to address small misconception about merge, suppose if you have one table test1 in that one column is there called id we want to increase the values of the ID, but this column is not Identity. Let’s see this practically.

USE TestDatabase
GO
-- drop table test2
-- drop table test1

CREATE TABLE test1 (id INT, name VARCHAR(100))
GO
CREATE TABLE test2 (id INT identity, name VARCHAR(100))
GO
INSERT INTO test2(name) values ('amit'),('sumit'),('abhi'),('sachine'),('subhash')

INSERT INTO test1 VALUES(1, 'vikas');
INSERT INTO test1 VALUES(2, 'pnm');
SELECT * FROM test1;

-- drop table #t
CREATE TABLE #T (ID INT ,ACTION1 VARCHAR(100))

--SET SHOWPLAN_TEXT on
; MERGE test1 AS TARGET
using( SELECT id AS ID,NAME FROM test2 )
AS SOURCE
ON TARGET.ID=SOURCE.ID
WHEN MATCHED THEN
UPDATE
SET NAME=SOURCE.NAME
WHEN NOT MATCHED THEN
INSERT
VALUES((SELECT MAX(ID)+1 FROM test1),SOURCE.NAME)
OUTPUT INSERTED.ID,$ACTION INTO #T
;

--SET SHOWPLAN_TEXT OFF
SELECT * FROM test1

Now we are getting same values for the id this should 3, 4, 5 but it is displaying 3 for all three records. Let’s understand the execution process. In below execution plan we clearly see that’s it is reading data from target table and calculating Max after that it starting merge statement. So there is no chance to get the new max after every insert. This is the default behavior of the SQL server.

|--Stream Aggregate(DEFINE:([Expr1012]=MAX([TestDatabase].[dbo].[test1].[id])))
                                          |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1]))

-------------------------

|--Table Merge(OBJECT:([TestDatabase].[dbo].[test1] AS [TARGET]), SET:(Insert, [TestDatabase].[dbo].[test1].[name]
  as [TARGET].[name] = [Expr1008],[TestDatabase].[dbo].[test1].[id] as [TARGET].[id] = [Expr1014]), SET
:(Update, [TestDatabase].[dbo].[test1].[name] as [TARGET].[name] = [Expr1008]) ACTION:([Action1007]))
       |--Assert(WHERE:(CASE WHEN [Expr1020]>(1) THEN (0) ELSE NULL END))
            |--Sequence Project(DEFINE:([Expr1020]=conditional_row_number))
                 |--Segment
                      |--Sort(ORDER BY:([Bmk1003] ASC))
                           |--Compute Scalar(DEFINE:([Expr1014]=CASE WHEN [Action1007]=(4) THEN [Expr1012]+(1)
    ELSE [TestDatabase].[dbo].[test1].[id] as [TARGET].[id] END))
                                |--Nested Loops(Inner Join, PASSTHRU:(IsFalseOrNull [Action1007]=(4)))
                                     |--Compute Scalar(DEFINE:([Expr1008]=[TestDatabase].[dbo].[test2].[name]))
                                     |    |--Compute Scalar(DEFINE:([Action1007]=ForceOrder(CASE WHEN [TrgPrb1005] IS NOT NULL
      THEN (1) ELSE (4) END)))
                                     |         |--Nested Loops(Left Outer Join, OUTER REFERENCES:([TestDatabase].[dbo].[test2].[id]))
                                     |              |--Table Scan(OBJECT:([TestDatabase].[dbo].[test2]))
                                     |              |--Compute Scalar(DEFINE:([TrgPrb1005]=(1)))
                                     |                   |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1] AS [TARGET]),
         WHERE:([TestDatabase].[dbo].[test1].[id]
        as [TARGET].[id]=[TestDatabase].[dbo].[test2].[id]) ORDERED)
                                     |--Stream Aggregate(DEFINE:([Expr1012]=MAX([TestDatabase].[dbo].[test1].[id])))
                                          |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1]))

As a solution you can use the SEQUENCE , CREATE SEQUENCE Test.CountBy1 START WITH 1 INCREMENT BY 1 ;

Pages

Thursday, 7 July 2016

Tuning Performance of Bulk INSERT / Reduce transaction log file growth by Minimal Logging

Monday, 24 August 2015

How MERGE statement works