Thursday 7 July 2016

Tuning Performance of Bulk INSERT / Reduce transaction log file growth by Minimal Logging


  One of my friend asked me that is there any way in SQL Server to minimise the  transaction logging for a table so that INSERT operation run faster? He is expert in ORACLE  Database and In Oracle database have feature of NOLOGGING that you can set for a table.

   Answer is NO. SQL Server don't have any direct keyword as NOLOGGING but we have alternative to reduce the logging.

Requirement : Suppose you have a big table like Fact table where every night you used to load huge data around 100GB, here transact is not much matter as its not a transnational table. Now when you load this much of  data then Transnational Log file grow very fast, also the INSERT operation will run slow.

      We can tune the performance of  the INSERT operation in several ways; here we will discuss one of the way by reducing the logging. To reduce the logging we need to change the database Recovery model to SIMPLE or BULK_LOGGED. Even it is in FULL recovery model, you can use TABLOCK table hints to minimise the logging. Finally you can use below steps to boost the INSERT performance:


  1. Set the Database Recovery model to SIMPLE and use the TABLOCK.
  2.  Set the Database Recovery model to BULK_LOG and use the TABLOCK.
  3.  Set the Database Recovery model to FULL and use the TABLOCK.
          Here option #1 is the best way, but if your database need point of time recovery and you used to take the transaction log backup then this option will not work for it. So you can use the option #3. When you used to dump the data from heterogeneous file you can use the option #2.These will also help to reduce the transaction log file growth:

   Below are the analyses to see the performance difference and Transaction Log file size differences:
   
  DEMO:

   I have created 6 Database as below with different Recovery model :
     
            create database bulktestsimple
alter database bulktestsimple  set recovery simple

create database bulktestsimpleLock
alter database bulktestsimpleLock  set recovery simple


create database bulktestFull
alter database bulktestFull  set recovery Full

create database bulktestFullLock
alter database bulktestFullLock  set recovery Full

create database bulktestbul
alter database bulktestbul set recovery bulk_logged

create database bulktestbulLock
alter database bulktestbulLock set recovery bulk_logged


Let's See the size of Log file of all the databases, here all log file size are 63 MB.





       Now create table in every databases and load the data.
    
  1.    . Database with simple recovery mode 
  1. use bulktestsimple
    GO

    --Creating blank table
    select a.* into DumpTable  from AdventureWorks2008.sys.objects a
    cross join AdventureWorks2008.sys.objects b
    cross join sys.tables
    where 1=2


    --Loading Data

    insert into DumpTable
    select a.*   from AdventureWorks2008.sys.objects a
    cross join AdventureWorks2008.sys.objects b
    cross join master.sys.tables



          2.Database is in simple Recovery Mode  and load data with TABLOCK hints:
  
select a.* into DumpTable  from AdventureWorks2008.sys.objects a
cross join AdventureWorks2008.sys.objects b
cross join sys.tables
where 1=2


--Loading Data

insert into DumpTable with (tablock)
select a.*   from AdventureWorks2008.sys.objects a
cross join AdventureWorks2008.sys.objects b
cross join master.sys.tables

     I have did same for all other 4 Databases and then we find below : 

    Log file size after data loading 


    Time  taken by different databases for same set of data load :

1. Simple Recovery mode , without table hints --59 Second
2. Simple Recovery mode , with table hints -- 16  Second
3. Full Recovery mode , Without table hints --1. 4 Minutes
5. Full Recovery mode , With table hints --25 Second



   Conclusion: With the help of TABLOCK we can reduce the Logging and increase the performance, Its only helpful for the Non transaction tables.


Data Mesh

  Data Mesh is a relatively new approach to managing and organizing data within organizations, especially large enterprises. It advocates fo...