Showing posts with label Insert. Show all posts
Showing posts with label Insert. Show all posts

Thursday, 7 July 2016

Tuning Performance of Bulk INSERT / Reduce transaction log file growth by Minimal Logging


  One of my friend asked me that is there any way in SQL Server to minimise the  transaction logging for a table so that INSERT operation run faster? He is expert in ORACLE  Database and In Oracle database have feature of NOLOGGING that you can set for a table.

   Answer is NO. SQL Server don't have any direct keyword as NOLOGGING but we have alternative to reduce the logging.

Requirement : Suppose you have a big table like Fact table where every night you used to load huge data around 100GB, here transact is not much matter as its not a transnational table. Now when you load this much of  data then Transnational Log file grow very fast, also the INSERT operation will run slow.

      We can tune the performance of  the INSERT operation in several ways; here we will discuss one of the way by reducing the logging. To reduce the logging we need to change the database Recovery model to SIMPLE or BULK_LOGGED. Even it is in FULL recovery model, you can use TABLOCK table hints to minimise the logging. Finally you can use below steps to boost the INSERT performance:


  1. Set the Database Recovery model to SIMPLE and use the TABLOCK.
  2.  Set the Database Recovery model to BULK_LOG and use the TABLOCK.
  3.  Set the Database Recovery model to FULL and use the TABLOCK.
          Here option #1 is the best way, but if your database need point of time recovery and you used to take the transaction log backup then this option will not work for it. So you can use the option #3. When you used to dump the data from heterogeneous file you can use the option #2.These will also help to reduce the transaction log file growth:

   Below are the analyses to see the performance difference and Transaction Log file size differences:
   
  DEMO:

   I have created 6 Database as below with different Recovery model :
     
            create database bulktestsimple
alter database bulktestsimple  set recovery simple

create database bulktestsimpleLock
alter database bulktestsimpleLock  set recovery simple


create database bulktestFull
alter database bulktestFull  set recovery Full

create database bulktestFullLock
alter database bulktestFullLock  set recovery Full

create database bulktestbul
alter database bulktestbul set recovery bulk_logged

create database bulktestbulLock
alter database bulktestbulLock set recovery bulk_logged


Let's See the size of Log file of all the databases, here all log file size are 63 MB.





       Now create table in every databases and load the data.
    
  1.    . Database with simple recovery mode 
  1. use bulktestsimple
    GO

    --Creating blank table
    select a.* into DumpTable  from AdventureWorks2008.sys.objects a
    cross join AdventureWorks2008.sys.objects b
    cross join sys.tables
    where 1=2


    --Loading Data

    insert into DumpTable
    select a.*   from AdventureWorks2008.sys.objects a
    cross join AdventureWorks2008.sys.objects b
    cross join master.sys.tables



          2.Database is in simple Recovery Mode  and load data with TABLOCK hints:
  
select a.* into DumpTable  from AdventureWorks2008.sys.objects a
cross join AdventureWorks2008.sys.objects b
cross join sys.tables
where 1=2


--Loading Data

insert into DumpTable with (tablock)
select a.*   from AdventureWorks2008.sys.objects a
cross join AdventureWorks2008.sys.objects b
cross join master.sys.tables

     I have did same for all other 4 Databases and then we find below : 

    Log file size after data loading 


    Time  taken by different databases for same set of data load :

1. Simple Recovery mode , without table hints --59 Second
2. Simple Recovery mode , with table hints -- 16  Second
3. Full Recovery mode , Without table hints --1. 4 Minutes
5. Full Recovery mode , With table hints --25 Second



   Conclusion: With the help of TABLOCK we can reduce the Logging and increase the performance, Its only helpful for the Non transaction tables.


Monday, 24 August 2015

How MERGE statement works


Generating new ID without Identity

           Let’s see how Merge statement works, either it will read all the data from the source and start doing operation on the target or will it take row by row from the source and join with target and then update /insert the records in the target table.
  Here we are trying to address small misconception about merge, suppose if you have one table test1 in that one column is there called id we want to increase the values of the ID, but this column is not Identity. Let’s see this practically.
USE TestDatabase
GO
-- drop table test2
-- drop table test1

CREATE TABLE test1 (id INT, name VARCHAR(100))
GO
CREATE TABLE test2 (id INT identity, name VARCHAR(100))
GO
INSERT INTO test2(name) values ('amit'),('sumit'),('abhi'),('sachine'),('subhash')

INSERT INTO test1 VALUES(1, 'vikas');
INSERT INTO test1 VALUES(2, 'pnm');
SELECT * FROM test1;

-- drop table #t
CREATE TABLE #T (ID INT ,ACTION1 VARCHAR(100))

--SET SHOWPLAN_TEXT on
; MERGE test1 AS TARGET
using( SELECT id AS ID,NAME FROM test2 )
AS SOURCE
ON TARGET.ID=SOURCE.ID
WHEN MATCHED THEN
UPDATE
SET NAME=SOURCE.NAME
WHEN NOT MATCHED THEN
INSERT
VALUES((SELECT MAX(ID)+1 FROM test1),SOURCE.NAME)
OUTPUT INSERTED.ID,$ACTION INTO #T
;

--SET SHOWPLAN_TEXT OFF 
 
SELECT * FROM test1

Now we are getting same values for the id this should 3, 4, 5 but it is displaying 3 for all three records. Let’s understand the execution process. In below execution plan we clearly see that’s it is reading data from target table and calculating Max after that it starting merge statement. So there is no chance to get the new max after every insert. This is the default behavior of the SQL server.

|--Stream Aggregate(DEFINE:([Expr1012]=MAX([TestDatabase].[dbo].[test1].[id])))
                                          |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1]))


-------------------------


  |--Table Merge(OBJECT:([TestDatabase].[dbo].[test1] AS [TARGET]), SET:(Insert, [TestDatabase].[dbo].[test1].[name]
  as [TARGET].[name] = [Expr1008],[TestDatabase].[dbo].[test1].[id] as [TARGET].[id] = [Expr1014]), SET
 :(Update, [TestDatabase].[dbo].[test1].[name] as [TARGET].[name] = [Expr1008]) ACTION:([Action1007]))
       |--Assert(WHERE:(CASE WHEN [Expr1020]>(1) THEN (0) ELSE NULL END))
            |--Sequence Project(DEFINE:([Expr1020]=conditional_row_number))
                 |--Segment
                      |--Sort(ORDER BY:([Bmk1003] ASC))
                           |--Compute Scalar(DEFINE:([Expr1014]=CASE WHEN [Action1007]=(4) THEN [Expr1012]+(1)
    ELSE [TestDatabase].[dbo].[test1].[id] as [TARGET].[id] END))
                                |--Nested Loops(Inner Join, PASSTHRU:(IsFalseOrNull [Action1007]=(4)))
                                     |--Compute Scalar(DEFINE:([Expr1008]=[TestDatabase].[dbo].[test2].[name]))
                                     |    |--Compute Scalar(DEFINE:([Action1007]=ForceOrder(CASE WHEN [TrgPrb1005] IS NOT NULL
      THEN (1) ELSE (4) END)))
                                     |         |--Nested Loops(Left Outer Join, OUTER REFERENCES:([TestDatabase].[dbo].[test2].[id]))
                                     |              |--Table Scan(OBJECT:([TestDatabase].[dbo].[test2]))
                                     |              |--Compute Scalar(DEFINE:([TrgPrb1005]=(1)))
                                     |                   |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1] AS [TARGET]),
         WHERE:([TestDatabase].[dbo].[test1].[id]
        as [TARGET].[id]=[TestDatabase].[dbo].[test2].[id]) ORDERED)
                                     |--Stream Aggregate(DEFINE:([Expr1012]=MAX([TestDatabase].[dbo].[test1].[id])))
                                          |--Table Scan(OBJECT:([TestDatabase].[dbo].[test1]))



  As a solution you can use the SEQUENCE , CREATE SEQUENCE Test.CountBy1     START WITH 1     INCREMENT BY 1 ;
GO