Inside Microsoft SQL Server 2005

Kalen Delaney

Mentioned 11

Provides information on the tuning and optimization features of SQL server 2005, covering such topics as query execution, plan caching, and concurrency problems.

More on Amazon.com

Mentioned in questions and answers.

What good resources exist for understanding database tuning on the major engines and advancing your knowledge in that area?

The idea of this question is to collect the shed load of resources that invariably exist, so that people can have a "one stop" knowledge shop of the good, peer approved resources.


General SQL

PostgreSQL (wiki) (PGsearch)

MySQL

Oracle

MS SQL Server

Sybase SQL Anywhere

JDBC

I'd start out by understanding how the database works at a fundamental level. How is data stored on disk, what does creating an index do, how does query plan optimization work, how are plans cached, when to cached plans expire.

If you can commit all that to memory, most of the advice about tuning seems obvious.

Here's a great book for MSSQL

SQL Server Internals

Xaprb is a must-read blog for MySQL DBAs. The author has written a book on high-performance MySQL

For the happy few working with Sybase SQL Anywhere I can only recommend Breck Carter's blog and his SQL Anywhere Studio 9 Developer's Guide

For Microsoft SQL, I'd recommend the books by Kalen Delaney (et al) called "Inside SQL Server". They offer a good insight into the internals of SQL Server, thus allowing readers to educate themselves on why particular statements might be faster than others.

Inside SQL Server 7.0
Inside SQL Server 2000
Inside Microsoft SQL Server 2005
Microsoft SQL Server 2008 Internals

There's also a book dedicated to performance tuning of SQL Server 2008 queries: SQL Server Performance Tuning Distilled

I also like the blogs by Paul Randal and Kimberly Tripp on SQLSkills.com. They are full of solid SQL advice:

Paul's blog
Kimberly's blog

As a ASP.NET developer with 5+ year experience. I like to measure my competency level in ASP.NET & SQL Server. Basically my goal is to raise my competency level and skill-set in ASP.NET; before that I need to know what is my level considering current ASP.NET and related technologies...

So, please provide some pointers...

  • Is there are any skill-set measuring Quiz or exam, which account experience and technology ?
  • How do you measure your or your junior developers skills or competency?

I guess I could rattle off some exams, like the MCP exams, or BrainBench, but you have to pay lots of money for those.

If you were really sold on taking an exam to gauge your competency, you could get a one of the MCP exam prep guides for ASP.NET, C#, and SQL Server and see how well you comprehend and take in that material. I'm not sure that it's the most accurate way of measuring competency though.

You can get a good qualitative evaluation of your SQL Server skills by simply reading Itzik's or Kalen's books and seeing how you comprehend them. For .NET, read Richter and critically evaluate yourself against the concepts you find in that book. Do those concepts make sense?

Probably the most valuable way to get feedback is to ask your senior developers for a frank evaluation of your skills.

If you're asking how I evaluate my junior developers, it's pretty easy once I see their code and they get a track record for a few months, but I don't believe quantitative analysis is the best way. Instead, I ask questions like:

  • Can they deliver?
  • Are they writing good code?
  • Are they taking the initiative to learn more?
  • What have they brought to the table?
  • Do they understand the software development lifecycle?
  • Do they break builds?
  • Are they good team players, or do they code in solitude?
  • Do they make suggestions?
  • Are they open to others' suggestions?
  • Do their design decisions make sense for the projects they've been on?

Ask yourself how your leaders would answer these questions about you. If you are seriously confident that they will respond positively, you will have an easier time "grading yourself".

I know how to write SQL queries and can get the results I need. But sometimes my queries are slow and I don't fully understand why.

What are some good resources for learning to write efficient queries and to optimize the queries I've already written?

I am good at database(sql) programming part but I want to move ahead into database optimization part like: where and when to indexes, how to decide which query is better than other, how to optimize database. Can you guide me some good resources or books which can lead me to this?

I know SQL but wanted to master it for use in complex SQL and try to create complex SQL statement in one query. Any book to recommend. I found this book: SQL Design Patterns by Vadim Tropashko. What do you folks suggest?

If you're talking SQL Server, the series of SQL books by Solid Quality Mentors are well worth a read.

I personally can recommend:

I think they're a great set to have on your bookcase.

weird question. i am inserting 10,000 records or so in a table and the primary key is not an Identity field. so when inserting all 10,000 if some are duplicate, is there a way to skip to next record in sql server insert and make sure the non-duplicates go in? i really dont care bout the duplicates not getting inserted.

A couple of ways spring to mind, as I don't know what mechanism you're using to do the inserting.

1) Bulk load all the records into a new empty table, then run an INSERT into the real table from that staging table where the record doesn't already exist in the main table. e.g.

INSERT MyRealTable (PKField, Field1)
SELECT x.PKField, x.Field1
FROM MyStagingTable x
    LEFT JOIN MyRealTable r ON x.PKField = r.PKField
WHERE r.PKField IS NULL

2) wrap each insert in a TRY...CATCH block to swallow the PK constraint error should one happen (if you're using SQL 2005 or later).

Assume I do not have text indexing on. Let say I have:

SELECT * FROM myTable WHERE myTable.columnA LIKE 'bobo'

Will the SQL engine go through every row and only return those matching conditions or is SQL smarter and does do some hidden native indexing?

I'm on MSSQL 2000, 2005, 2008 but if other versions of SQL have different approaches I'm all ears.

Is there a good online doc or even book that goes into how operations are carried out?

Section 3 of this book, "Query Execution" covers the main elements that you'll see when you view the execution plan, as Welbog suggested.

I have a Windows Server 2003 machine which I will be using as a Postgres database server, the machine is a Dual Core 3.0Ghz Xeon with 4 GB ECC Memory and 4 x 120GB 10K RPM SAS Drives, all stripped.

I have read that the default Postgres install is configured to run nicely on a 486 with 32MB RAM, and I have read several web pages about configuration optimizations - but was hoping for something more concrete from my Stackoverflow peeps.

Generally, its only going to serve 1 database (potentially one or two more) but the catch is that the database has 1 table in particular which is massive (hundreds of millions of records with only a few coloumn). Presently, with the default configuration, it's not slow, but I think it could potentially be even faster.

Can people please give me some guidance and recomendations for configuration settings which you would use for a server such as this.

My experience suggests that (within limits) the hardware is typically the least important factor in database performance.

Assuming that you have enough memory to keep commonly used data in cache, then your CPU speed may vary 10-50% between a top-of the line machine and a common or garden box.

However, a missing index in an important search, or a poorly written recursive trigger could easily make a difference of 1,000% or 10,000% or more in your response times.

Without knowing exactly your table structure and row counts, I think anybody would suggest that your your hardware looks amply sufficient. It is only your database structure which will kill you. :)

UPDATE:

Without knowing the specific queries and your index details, there's not much more we can do. And in general, even knowing the queries, it's often very difficult to optimize without actually installing and running the queries with realistic data sets.

Given the cost of the server, and the cost of your time, I think you need to invest thirty bucks in a book. Then install your database with test data, run the queries, and see what runs well and what runs badly. Fix, rinse, and repeat.

Both of these books are specific to SQL Server and both have high ratings:

http://www.amazon.com/Inside-Microsoft%C2%AE-SQL-Server-2005/dp/0735621969/ref=sr_1_1

http://www.amazon.com/Server-Performance-Tuning-Distilled-Second/dp/B001GAQ53E/ref=sr_1_5

Where can I find some in-depth information on tuning statistics in SQL Server 2005?

I need to really delve in to what statistics are being used in a number of different queries, how they are interacting with indexes, how/when/where to use custom statistics (over and above what the database tuning advisor recommends), when/how to update the statistics for the best performance etc. etc.

Does anyone know of any good articles/webcasts/books around this subject?

Thanks in advance,

Tom

I suggest you visit http://sqlserverpedia.com/ and look for blog/article/demo/screencast/podcast specific to your needs. SQLServerPedia is a free Premier Online SQL Server resource.

Or are you specifically asking these questions right now:

"Why is this query running slow? Is my index getting used? Why does this query run faster than this query?"

The book "Dissecting SQL Server Execution Plans" will help you on this regard. A free ebook version is available here.

;-) MarlonRibunal

This book gives fairly detailed information on performance tuning Inside Microsoft® SQL Server(TM) 2005: Query Tuning and Optimization

I have a problem with a stored procedure of a transactional table, the user have a web form to find transactions by several values.

The process is taking too long and I don't know how to set proper index.

here is my stored procedure:

CREATE PROCEDURE dbo.cg_searchTransactions
(
    @id_Ent tinyint,
    @transactionTypeID int = NULL,
    @transactionID numeric(18,0) = NULL,
    @channelID int = NULL,
    @transactionDateFrom datetime = NULL,
    @transactionDateTo datetime = NULL,
    @transactionStatusID INT = NULL,
    @documentType INT = NULL,
    @documentNumber varchar(50) = NULL,
    @userName varchar(50) = NULL,
    @accountFromNumber varchar(20) = NULL,
    @accountToNumber varchar(20) = NULL,
    @amountFrom money = NULL,
    @amountTo money = NULL,
    @correlationID varchar(30) = NULL,
    @externalReference varchar(20) = NULL,
    @externalReference2 varchar(20) = NULL,
    @PageIndex INT = 1, 
    @PageSize INT = 20
)
AS
BEGIN
    SET NOCOUNT ON


        DECLARE @QUERY VARCHAR(MAX)
        SET @QUERY = '
            WITH Trans AS (
            SELECT
                ROW_NUMBER() OVER (ORDER BY transactionID DESC) AS Row,
                T.id_Ent,
                T.transactionID,
                T.trnTypeCurrencyID,
                T.transactionDate,
                T.transactionStatusID,
                T.documentType,
                T.documentNumber,
                T.childDocumentType,
                T.childDocumentNumber,
                T.userName,
                T.accountFromNumber,
                T.accountFromType,
                T.accountFromCurrency,
                T.accountDescriptionFrom,
                T.costCenterFrom,
                T.subtotalFrom,
                T.taxamountFrom,
                T.taxamountFrom2,
                T.amountFrom,
                T.accountToNumber,
                T.accountToType,
                T.accountToCurrency,
                T.accountDescriptionTo,
                T.costCenterTo,
                T.subtotalTo,
                T.taxamountTo,
                T.taxamountTo2,
                T.amountTo,
                T.exchangeCurrency,
                T.traderAuthNumber,
                T.benefContractNumber,
                T.contractNumber,
                T.merchantID,
                T.creditCardAuthorizationNumber,
                T.comment,
                T.companyServiceCommision,
                T.usercommission,
                T.companyServiceAuthorizationNumber,
                T.customerBranchId,
                T.correlationID,
                T.transactionStartTime,
                T.transactionEndTime,
                T.enlapsedTime,
                T.serverName,
                T.externalReference,
                T.externalReference2,
                T.externalTrxType,
                T.beneficiaryName,

                C.shortName AS ChannelsShortName,
                TT.shortName AS TransactionTypesShortName,
                TS.shortName AS TransactionStatusDefShortName,
                DT.shortName AS DocumentTypesShortName,
                CDT.shortName AS ChildDocumentTypesShortName,
                AFT.shortName AS AccountTypesShortNameFrom,
                ATT.shortName AS AccountTypesShortNameTo,
                CURF.shortName AS CurrenciesShortNameFrom,
                CURT.shortName AS CurrenciesShortNameTo
            FROM
                Transactions T (NOLOCK) 

                    INNER JOIN TransactionTypesCurrencies TTC
                        ON  T.id_Ent = TTC.id_Ent
                            AND T.trnTypeCurrencyID = TTC.trnTypeCurrencyID

                        INNER JOIN Channels C
                            ON  TTC.id_Ent = C.id_Ent
                                AND TTC.channelID = C.ID

                        INNER JOIN TransactionTypes TT
                            ON  TTC.id_Ent = TT.id_Ent
                                AND TTC.transactionTypeID = TT.transactionTypeID

                    INNER JOIN TransactionStatusDef TS
                        ON  T.id_Ent = TS.ent_Ent
                            AND T.transactionStatusID = TS.ID

                    INNER JOIN DocumentTypes DT
                        ON  T.id_Ent = DT.id_Ent
                            AND T.documentType = DT.ID

                    INNER JOIN DocumentTypes CDT
                        ON  T.id_Ent = CDT.id_Ent
                            AND T.childDocumentType = CDT.ID

                    INNER JOIN AccountTypes AFT
                        ON  T.id_Ent = AFT.id_Ent
                            AND T.accountFromType = AFT.ID

                    INNER JOIN AccountTypes ATT
                        ON  T.id_Ent = ATT.id_Ent
                            AND T.accountToType = ATT.ID

                    INNER JOIN Currencies CURF
                        ON  T.id_Ent = CURF.id_Ent
                            AND T.accountFromCurrency = CURF.ID

                    INNER JOIN Currencies CURT
                        ON  T.id_Ent = CURT.id_Ent
                            AND T.accountToCurrency = CURT.ID
            WHERE 
                T.id_Ent = ' + CONVERT(VARCHAR,@id_Ent)
                IF NOT @transactionDateFrom IS NULL
                    SET @QUERY = @QUERY + ' AND T.transactionDate >= ''' + CONVERT(VARCHAR,@transactionDateFrom,121) + ''''

                IF NOT @transactionDateTo IS NULL
                    SET @QUERY = @QUERY + ' AND T.transactionDate <= ''' + CONVERT(VARCHAR,@transactionDateTo,121) + ''''

                IF NOT @transactionStatusID IS NULL
                    SET @QUERY = @QUERY + ' AND T.transactionStatusID = ' + CONVERT(VARCHAR,@transactionStatusID)

                IF NOT @documentType IS NULL
                    SET @QUERY = @QUERY + ' AND T.documentType = ' + CONVERT(VARCHAR,@documentType)

                IF NOT @userName IS NULL
                    SET @QUERY = @QUERY + ' AND T.userName = ''' + @userName + ''''

                IF NOT @documentNumber IS NULL
                    SET @QUERY = @QUERY + ' AND T.documentNumber = ''' + @documentNumber + ''''

                IF NOT @accountFromNumber IS NULL
                    SET @QUERY = @QUERY + ' AND T.accountFromNumber = ''' + @accountFromNumber + ''''

                IF NOT @accountToNumber IS NULL
                    SET @QUERY = @QUERY + ' AND T.accountToNumber = ''' + @accountToNumber + ''''

                IF NOT @amountFrom IS NULL
                    SET @QUERY = @QUERY + ' AND T.amountTo >= ' + CONVERT(VARCHAR,@amountFrom)

                IF NOT @amountTo IS NULL
                    SET @QUERY = @QUERY + ' AND T.amountTo <= ' + CONVERT(VARCHAR,@amountTo)

                IF NOT @correlationID IS NULL
                    SET @QUERY = @QUERY + ' AND T.correlationID = ''' + @correlationID + ''''

                IF NOT @externalReference IS NULL
                    SET @QUERY = @QUERY + ' AND T.externalReference = ''' + @externalReference + ''''

                IF NOT @externalReference2 IS NULL
                    SET @QUERY = @QUERY + ' AND T.externalReference2 = ''' + @externalReference2 + ''''

                IF NOT @channelID IS NULL
                    SET @QUERY = @QUERY + ' AND C.ID = ' + CONVERT(VARCHAR,@channelID)

                IF NOT @transactionTypeID IS NULL
                    SET @QUERY = @QUERY + ' AND TT.transactionTypeID = ' + CONVERT(VARCHAR,@transactionTypeID)

            SET @QUERY = @QUERY + ')'
            SET @QUERY = @QUERY + 'SELECT * FROM Trans WHERE Row BETWEEN (' + CONVERT(VARCHAR,@PageIndex) + ' - 1) * ' + CONVERT(VARCHAR,@PageSize) + ' + 1 AND ' + CONVERT(VARCHAR,@PageIndex) + '*' + CONVERT(VARCHAR,@PageSize)

            SET @QUERY = @QUERY + 'OPTION (FAST 1)'

            EXEC(@QUERY)

END

Questions which I ask myself before I create an index:

  1. Is this table(or tables) going to be read only or read write ?

ReadWrite - everytime there is an update/delete to the table, the index would be updated.it might be fast for "select" but slow for insert,update and delete. To quote MS "If you have a large number of indexes on a table, you increase the chance that the optimizer will choose a suboptimal index for a query plan."

  1. You are creating queries on the fly.What I would do is try to run trace on QA (or production DB for a specific time) and see what the users are trying to run. You can get a dump of the DB from production to your sandbox/ try tools like index tuning wizard (which can tell you what indexes are needed) , SQL DMVs etc to find where the bottleneck is. The problem need not be only with this SP, but there might be deadlocks, improper use of temp tables/ temp DB etc.

  2. If you are reasonably confident that this table is the culprit, you should also try partitioning the table horizontally

  3. When you execute any query, look at the execution plan and look for table scans- which usually means that some index is missing

  4. Read, Read and Read.

Once upon a time there was a View with many many Joins:

CREATE VIEW [dbo].[V_BIGGEST_VIEW_EVER]
AS
SELECT {many many columns}
FROM (SELECT * FROM dbo.T_CUS_TSK_TASK WHERE is_deleted=0) T
  INNER JOIN dbo.V_CUS_GRP_GROUP G ON (T.group_id = G.group_id)
  INNER JOIN dbo.T_BKK_DISCOUNT_TYPE DT ON (DT.discount_type_id=T.discount_type_id)
  INNER JOIN dbo.T_BKK_CURRENCY DC ON (T.debit_currency_id=DC.currency_id)
  INNER JOIN dbo.T_BKK_CURRENCY PC ON (T.payback_currency_id=PC.currency_id)
  INNER JOIN dbo.T_BKK_CURRENCY FC ON (T.final_debit_currency_id=FC.currency_id)
  INNER JOIN dbo.T_GLOBAL_COUNTER D1C ON (D1C.company_id=T.company_id AND
    D1C.counter_name='PROFORMA_INVOICE_COUNTER')
  INNER JOIN dbo.T_GLOBAL_COUNTER D2C ON (D2C.company_id=T.company_id AND
    D2C.counter_name='TAX_INVOICE_COUNTER')
  INNER JOIN dbo.T_GLOBAL_COUNTER D3C ON (D3C.company_id=T.company_id AND
    D3C.counter_name='INVOICE_RECEIPT_COUNTER')
  INNER JOIN dbo.T_GLOBAL_COUNTER D4C ON (D4C.company_id=T.company_id AND
    D4C.counter_name='DELIVERY_NOTE_COUNTER')
  INNER JOIN dbo.T_GLOBAL_COUNTER D5C ON (D5C.company_id=T.company_id AND
    D5C.counter_name='BILL_OF_LADING_COUNTER')
  INNER JOIN dbo.T_GLOBAL_COUNTER D6C ON (D6C.company_id=T.company_id AND
    D6C.counter_name='CREDIT_INVOICE_COUNTER')
  LEFT JOIN dbo.V_SYS_BRANCH BR ON (T.branch_id = BR.branch_id)
  LEFT JOIN dbo.T_CUS_TSK_TASKS_ARRAY AR ON (T.array_id = AR.array_id)
  LEFT JOIN dbo.T_DRIVER D ON (T.driver_id = D.driver_id)
  LEFT JOIN dbo.T_VEHICLE V ON (T.vehicle_id = V.vehicle_id)
  LEFT JOIN dbo.T_STF_INVITER I ON (T.inviter_id = I.inviter_id)
  LEFT JOIN dbo.T_STF_SUBCONTRACTOR SC1 ON (SC1.subcontractor_id = D.subcontractor_id)
  LEFT JOIN dbo.T_STF_SUBCONTRACTOR SC2 ON (SC2.subcontractor_id = T.subcontractor_id)
  LEFT JOIN dbo.T_CUS_TSK_TASK_STATUS S ON (S.task_status_id=T.task_status_id)
  LEFT JOIN dbo.V_STF_SUB_LOCATION SL1 ON (SL1.sub_location_id=T.start_sub_location_id)
  LEFT JOIN dbo.V_STF_SUB_LOCATION SL2 ON (SL2.sub_location_id=T.end_sub_location_id)
  LEFT JOIN dbo.T_STF_CUSTOMER CU ON (CU.customer_id=T.customer_id)
  LEFT JOIN dbo.T_STF_CUSTOMER_SPLITTING_CODE SP ON (SP.splitting_id=T.splitting_id)
  LEFT JOIN dbo.V_CUS_TSK_CREDIT_FOR_TASK CR ON CR.task_id=T.task_id
  LEFT JOIN dbo.T_BKK_PROFORMA_INVOICE D1 ON (T.proforma_invoice_id=D1.proforma_invoice_id)
  LEFT JOIN dbo.T_BKK_TAX_INVOICE D2 ON (T.tax_invoice_id=D2.tax_invoice_id)
  LEFT JOIN dbo.T_BKK_INVOICE_RECEIPT D3 ON (T.invoice_receipt_id=D3.invoice_receipt_id)
  LEFT JOIN dbo.T_BKK_DELIVERY_NOTE D4 ON (T.delivery_note_id=D4.delivery_note_id)
  LEFT JOIN dbo.T_BKK_BILL_OF_LADING D5 ON (T.bill_of_lading_id=D5.bill_of_lading_id)
  LEFT JOIN dbo.V_CUS_TSK_CONTAINER CONTAINER1 ON (CONTAINER1.container_id=T.container1_id)
  LEFT JOIN dbo.V_CUS_TSK_CONTAINER CONTAINER2 ON (CONTAINER2.container_id=T.container2_id)
  LEFT JOIN dbo.V_STF_TRAILER TRAILER1 ON (TRAILER1.trailer_id=T.trailer1_id)
  LEFT JOIN dbo.V_STF_TRAILER TRAILER2 ON (TRAILER2.trailer_id=T.trailer2_id)
  LEFT JOIN dbo.T_STF_LUGGAGE_TYPE LUGGAGE_TYPE ON (LUGGAGE_TYPE.luggage_type_id=T.luggage_type_id)

One day the user asked the view for a query:

SELECT {many many columns}
FROM V_BIGGEST_VIEW_EVER
WHERE {column1}=1 AND
      {column2}=2 AND
      .......
      {and so and so}
      .......
      {columnN}=N

And the lazy biggest-view-ever worked and worked and after 5 minutes (!!) and not less it returns results.

Those tables had primary keys and foreign keys.

How can I reduce the time the query executes? How can I reduce this view?

I searched on Google but I couldn't find anything that helped.

Lets consider for a minute that the query represents a valid business requirement.

Just because the view is big doesn't mean that it should perform badly. The performance of selecting from this view is driven primarily by the layout of the underlying tables. Even with a view that left joins over 20 lookup tables SQL Server should return the result in milliseconds, provided the table T_CUS_TSK_TASK is properly indexed for the query being performed.

You must approach this just as any other query optimization. Investigate what are the main IO factors (SET STATISTICS IO ON), investigate the query plan, look at the cardinality estimates, consider if statistics are correct, look at the query missing index hints and consider how can you change the table(s) schema accordingly. You starting point should be this: Designing Indexes. Even a cursory look at your (not provided in post) tables schema should reveal if, say, deleted is not the leftmost clustered index key then you certainly have a problem.

You current approach of blindly hacking at the query based on its text is completely unprofessional.

Now of course is hard to believe this query represents a valid business requirement. But, none the less, your view about query optimization and data model design ('Those tables had primary keys and foreign keys.') is primitive, to use a mild term. Read about index design, read about covering indexes, buy a book (like this: Inside Microsoft SQL Server 2005: Query Tuning and Optimization).