Archive

Archive for the ‘SQL’ Category

Deleting millions of records from a table without blowing the transaction log

January 2nd, 2009 No comments

Over the weekend I realized that deleting 80% of the records on a table with 87 million rows is not quite as easy as issuing a delete statement with a where criteria.

The main reason is that the transaction log quickly grows as it needs to keep track of all the uncommitted rows that are being deleted. In my case the database file was 60GB and the log file grew quickly from a mere 40MB to 32GB before I ran out of disk space and the database goes into recovery mode.

Browsing through blogs I narrowed my scenario to two options.

  1. Move the required records over to a temporary table and issue a truncate on the table you want to delete.
  2. Issue the deletes in batches so that the log file doesn’t fill up.

The first option seems to take the shorter time to execute but I didn’t want to go through the effort of removing and adding foreign keys  and renaming tables. So I went with the second option. Here’s the script I used for SQL Server which deletes 10,000 rows at a time and commits them.

DECLARE @continue INT
DECLARE @rowcount INT
 
SET @continue = 1
WHILE @continue = 1
BEGIN
    PRINT GETDATE()
    SET ROWCOUNT 10000
    BEGIN TRANSACTION
    DELETE FROM Transactions WHERE  TradeDate IS NULL
    SET @rowcount = @@rowcount 
    COMMIT
    PRINT GETDATE()
    IF @rowcount = 0
    BEGIN
        SET @continue = 0
    END
END

The print statements help to show you how things are moving along.

(10000 row(s) affected)

Jan  1 2009 11:54PM

Jan  1 2009 11:54PM

(10000 row(s) affected)

Jan  1 2009 11:55PM

Jan  1 2009 11:55PM

Seven hours later the script is still executing with 39 million rows deleted so far and the log file currently at 700MB.

Categories: SQL Tags:

Viewing the disk space usage (size) of all the tables in a SQL Server 2005/2008 database

December 31st, 2008 2 comments
I spent a couple of hours searching for a script that returns the size of all the tables in a database. Most of the ones that you find use sp_msforeachtable but that is just for one table and you then need to use temporary tables to hold the values.
 
This article on the Database Journal by Ananth Kumar (AKA MAK) had this handy script that simply works by copy-pasting it as is. Read the linked article for details on viewing the size by schema.
 
BEGIN TRY 
SELECT 
    (row_number() over(order by a3.name, a2.name))%2 as l1,
    a3.name AS [schemaname],
    a2.name AS [tablename],
    a1.rows as row_count,
    (a1.reserved + ISNULL(a4.reserved,0))* 8 AS reserved, 
    a1.data * 8 AS data,
    (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data THEN 
    (a1.used + ISNULL(a4.used,0)) - a1.data ELSE 0 END) * 8 AS index_size,
    (CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used THEN 
    (a1.reserved + ISNULL(a4.reserved,0)) - a1.used ELSE 0 END) * 8 AS unused
FROM
 (SELECT 
  ps.object_id,
  SUM (
   CASE
    WHEN (ps.index_id < 2) THEN row_count
    ELSE 0
   END
   ) AS [rows],
  SUM (ps.reserved_page_count) AS reserved,
  SUM (
   CASE
     WHEN (ps.index_id < 2) THEN 
   (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
     ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count)
   END
   ) AS data,
  SUM (ps.used_page_count) AS used
 FROM sys.dm_db_partition_stats ps
 GROUP BY ps.object_id) AS a1
LEFT OUTER JOIN 
 (SELECT 
  it.parent_id,
  SUM(ps.reserved_page_count) AS reserved,
  SUM(ps.used_page_count) AS used
  FROM sys.dm_db_partition_stats ps
  INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)
  WHERE it.internal_type IN (202,204)
  GROUP BY it.parent_id) AS a4 ON (a4.parent_id = a1.object_id)
INNER JOIN sys.all_objects a2  ON ( a1.object_id = a2.object_id ) 
INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id)
WHERE a2.type <> 'S' and a2.type <> 'IT'
ORDER BY a3.name, a2.name
END TRY 
BEGIN CATCH 
SELECT 
 -100 as l1
, 1 as schemaname 
,       ERROR_NUMBER() as tablename
,       ERROR_SEVERITY() as row_count
,       ERROR_STATE() as reserved
,       ERROR_MESSAGE() as data
,       1 as index_size
,   1 as unused 
END CATCH

 

Quick extra tip, if you export the result-set to Excel you can do additional sorts to figure out more stuff. For example the first table on this list was taking up 47GB of disk space in a 63 GB database. With the index taking more space than the actual data! But this is more of an OLAP db with just one batch update daily and optimized for retrieving data faster so I guess that’s okay.

sql-disk-space-usage

 

Categories: SQL Tags:

SQL query optimization

November 3rd, 2006 No comments

Wow. Two SQL posts on the same day.

The SQL query processing team blog is quite a gem. I wonder how I missed this.

The query processing team – query optimization & execution – providing tips, tricks, advice and answers to frequently-asked questions in a continued effort to make your queries run faster and smoother in SQL Server.

BTW: I consider this short article a must read for everyone who writes a SELECT query. The reason is that we never learn about the query optimizer in school when learning about databases.

Let me prove this to you. My best friend Shasheendra wrote this code the other day and it stumped me. Imagine this, you want to insert 10,000 rows into a temporary table.  Let’s say we can take two approaches. The first one is to simply write 10,000 inserts the second could be to insert the entries into a table with a single field with the 10,000 being comma delimited. Then parse the string and insert each row at a time.

Which do you think runs faster? No hurries, just take a look at the code and then scroll down to see the time comparison.

Approach 1:

Write 10,000 insert statements. Like this

CREATE TABLE #filetab (filename varchar(256))

CREATE CLUSTERED INDEX IX_filetab ON #filetab (filename)

INSERT INTO #filetab VALUES (‘dummy’)

INSERT INTO #filetab VALUES (‘ed7d1eda-025b-41be-b8b4-523dc9d53ece’)

INSERT INTO #filetab VALUES (‘b61abf2e-3252-423c-9ad7-cc63cb5e51bc’)

INSERT INTO #filetab VALUES (’4224b7f8-120a-447b-a8ae-63cefce78255′)

….(10,000 entries)…

INSERT INTO #filetab VALUES (’7f38bed5-be32-42c8-986f-20b3c78950ad’)

SELECT COUNT(*) FROM #filetab

DROP TABLE #filetab

GO

Approach 2:

DECLARE @filelisttab TABLE (filelist text)

CREATE TABLE #filetab (filename varchar(256))

CREATE CLUSTERED INDEX IX_filetab ON #filetab (filename)

INSERT INTO @filelisttab (filelist) VALUES (‘dummy,ed7d1eda-025b-41be-b8b4-523dc9d53ece,b61abf2e ….(10,000 comma seperated entries)…,7f38bed5-be32-42c8-986f-20b3c78950ad’)

DECLARE @i int, @l int, @tl int

SET @i = 1

SELECT @tl = DATALENGTH(filelist) FROM @filelisttab

WHILE (@i < @tl)

BEGIN

    DECLARE @filename Varchar(256)

    DECLARE @nextcommat int

    SELECT @filename = SUBSTRING(filelist, @i, 128) FROM @filelisttab – Use 128 character read buffer!

    SET @nextcommat = CHARINDEX(‘,’, @filename)

    SET @l = CASE @nextcommat WHEN 0 THEN 128 ELSE (@nextcommat-1) END

    SET @filename = SUBSTRING(@filename, 1, @l)

    IF (LEN(@filename)>0) INSERT INTO #filetab (filename) VALUES (@filename)

    SET @i = @i + @l + 1

END

SELECT COUNT(*) FROM #filetab

DROP TABLE #filetab

Query1.sql (722 KB)

Query2.sql (401 KB)

 

If you don’t believe me, you can download the two sql statements and try running them on your machine. These scripts create a temporary table, insert data and then drop the table.

Query 1 : First run 2 min 48 seconds, second run 2 min 28 seconds

Query 2:  First run 9 seconds, second run 5 seconds

And here I was thinking all these days that I knew all the basics on SQL query optimization.

Categories: SQL Tags:

Publishing your SQL Server 2005 database?

November 3rd, 2006 No comments

Ever wanted to MySQL feature that let’s you script out a database (including the data) when working with SQL Server?

Fear no more the Database Publishing Wizard is here to help.

The Database Publishing Wizard enables the deployment of SQL Server 2005 databases into a hosted environment on either a SQL Server 2000 or 2005 server. It generates a SQL script file which can be used to recreate the database in shared hosting environments where the only connectivity to a server is through a web-based control panel with a scripting window.

The Database Publishing Wizard is currently a command line tool, though future releases will add a graphical user interface.

Get the download from here. Keep in mind that it’s still a CTP and is being developed actively.

Categories: SQL Tags:

Database diagram problems on a database migrated from SQL Server 2005

April 27th, 2006 No comments

I ran into a wierd error message when trying to view the Diagrams on a database that was migrated to SQL Server 2005. Something about the user not being valid and to change the database owner.

A quick search brought up this quick fix:

Try running this command to change the compatibility level to SQL Server 2005.

EXEC sp_dbcmptlevel database_name, 90

Works like a charm! [SQL BOL]

Categories: SQL Tags:

Use scope_identity() instead of @@identity

November 2nd, 2003 No comments

TravisL explains why
we should use scope_identity instead of @@identity when we want to get the ID of the
last record that was added.

The problem with @@identity is that it returns the ID of the last record that was
in the current connection. In which case if the table that you adding to has triggers
(which inserts records in other tables) then @@identity returns the ID of the last
record that was added by code inside the trigger. This becomes a potentially very
difficult bug to track down.

So what should you do? Use scope_identity() which returns the ID of the last record
that was inserted in the current scope. See MSDN:scope_identity() for
detailed examples. 

Categories: SQL Tags: