Microsoft SQL Server 2008 Internals

Mentioned 16

Delve inside the core SQL Server engine--and put that knowledge to work--with guidance from a team of well-known internals experts. Whether database developer, architect, or administrator, you'll gain the deep knowledge you need to exploit key architectural changes--and capture the product's full potential. Discover how SQL Server works behind the scenes, including: What happens internally when SQL Server builds, expands, shrinks, and moves databases How to use event tracking--from triggers to the Extended Events Engine Why the right indexes can drastically reduce your query execution time How to transcend normal row-size limits with new storage capabilities How the Query Optimizer operates Multiple techniques for troubleshooting problematic query plans When to force SQL Server to reuse a cached query plan--or create a new one What SQL Server checks internally when running DBCC How to choose among five isolation levels and two concurrency models when working with multiple concurrent users

More on Amazon.com

Mentioned in questions and answers.

What good resources exist for understanding database tuning on the major engines and advancing your knowledge in that area?

The idea of this question is to collect the shed load of resources that invariably exist, so that people can have a "one stop" knowledge shop of the good, peer approved resources.


General SQL

PostgreSQL (wiki) (PGsearch)

MySQL

Oracle

MS SQL Server

Sybase SQL Anywhere

JDBC

I'd start out by understanding how the database works at a fundamental level. How is data stored on disk, what does creating an index do, how does query plan optimization work, how are plans cached, when to cached plans expire.

If you can commit all that to memory, most of the advice about tuning seems obvious.

Here's a great book for MSSQL

SQL Server Internals

Xaprb is a must-read blog for MySQL DBAs. The author has written a book on high-performance MySQL

For the happy few working with Sybase SQL Anywhere I can only recommend Breck Carter's blog and his SQL Anywhere Studio 9 Developer's Guide

For Microsoft SQL, I'd recommend the books by Kalen Delaney (et al) called "Inside SQL Server". They offer a good insight into the internals of SQL Server, thus allowing readers to educate themselves on why particular statements might be faster than others.

Inside SQL Server 7.0
Inside SQL Server 2000
Inside Microsoft SQL Server 2005
Microsoft SQL Server 2008 Internals

There's also a book dedicated to performance tuning of SQL Server 2008 queries: SQL Server Performance Tuning Distilled

I also like the blogs by Paul Randal and Kimberly Tripp on SQLSkills.com. They are full of solid SQL advice:

Paul's blog
Kimberly's blog

So lets say that you want to learn some stuff about database internals. What's the best source code to look at? the best books to buy?

I was talking about this with a buddy the other day and he recommended:
Art of Computer Programming, Volume 3: Sorting and Searching

What other books would help me learn about all the File IO and memory issues, pages, locking, etc.... ?

A colleague and I got a great deal of information out of Database in Depth: Relational Theory for Practitioners Very low level stuff but it sounds like that is the sort of thing you are looking for.

Take a look at Database Systems: The Complete Book by by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom. It is specifically about the internals of the DBMS.

The answer by SquareCog also contains sensible suggestions; I've not looked at the two books mentioned (though the Stonebreaker "Architecture" book is only 136 pages according to Amazon, which seems a tad lightweight).

Textbook: Database Management Systems by Ramakrishnan and Gehrke.

Or: Architecture of a Database System by Hellerstein, Stonebraker, and Hamilton.

Production Code: PostgreSQL

(I like the PG code better than SQLite , it's far more complete and, I think, better organized. SQLite is awesome for what it does, but there is a lot it doesn't take on).

Extra Credit: Readings in Database Systems, 4th edition edited by Hellerstein.

Not everybody likes his style, but I find that Joe Celko does a fine job of explaining the set-based logic that drives SQL databases. If you already have a little SQL experience under your belt, you should read SQL for Smarties.

In depth information about internals is database specific, here's a source on SQL Server 2008: http://www.amazon.com/Microsoft%C2%AE-SQL-Server%C2%AE-2008-Internals/dp/0735626243

Ive always wondered this but never had the chance to research it.

What sort of algorithms are use in SQL? Naturally you search and compare for basic statements, what search algorithms? Sorting? and for other functions like Join's etc.

I know there is no clear cut answer to this questions but the idea is to gather a general idea of what sort of methods SQL uses to carry out its work.

I would suggest you get a copy of SQL Server 2008 Internals by Delaney, Randal, Tripp and more. Excellent book on the internal workings of SQL Server.

http://www.amazon.com/Microsoft-SQL-Server-2008-Internals/dp/0735626243/ref=sr_1_1?s=books&ie=UTF8&qid=1289565465&sr=1-1

RDBMS composed of several elements:

  • Transaction manager -- manages transactions, as evident from it's name:)
  • Physical storage manager -- manages how data stored in underlying file-system(s)
  • Query parser/planner/executor -- this is 'user front-end' of database

Each of these elements is essential for any RDBMS and use different set of algorithms to make itself work.

If you interested in internals of RDBMS, get yourself this book: http://www.amazon.com/Database-Systems-Complete-Book-2nd/dp/0131873253/

I was looking at this page on MSDN:

Maximum Capacity Specifications for SQL Server 2008

And it says the following:

Max Columns per 'nonwide' table: 1,024
Max Columns per 'wide' table: 30,000

However I cannot find any information on the difference between 'wide' and 'nonwide' tables in SQL 2008. If I wanted to define a 'wide' table, how would I do it?

It is important to note that your total fixed and variable length data are still limited to 8019 bytes total. Being able to do this crazy extra large number of columns is only supported in sparse tables where MOST of the data is nulls. Otherwise you still end up with rows that exceed the 8019 bytes and end up with rowdata that won't fit, or overflow into extended row data (which is very expensive to maintain compared to normal data pages).

There is a really good book from Karen Delaney that has a ton of internal features and limits for SQL Server entitled SQL Server 2008 Internals. If you are really into the low level limits and how things are done in SQL Server it is a fantastic read. It will increase the depth of your knowledge for how SQL Server does what it does under the hood at the byte level to disk in some cases.

I had been working on Oracle long time back and has exposure to SQL. Now I need to brush up my SQL skills for a project on SQL Server 2008. Could you please recommend a good resource (online/book)?

We are telling our client to put a SQL Server database file (mdf), on a different physical drive than the transaction log file (ldf). The tech company (hired by our client) wanted to put the transaction log on a slower (e.g. cheaper) drive than the database drive, because with transaction logs, you are just sequencially writing to the log file.

I told them that I thought that the drive (actually a RAID configuration) needed to be on a fast drive as well, because every data changing call to the database, needs be saved there, as well as to the database itself.

After saying that though, I realized I was not entirely sure about that. Does the speed of the transaction log drive make a significant difference in performance... if the drive with the database is fast?

In simplistic terms, if you are talking about an OLTP database, your throughput is determined by the speed of your writes to the Transaction Log. Once this performance ceiling is hit, all other dependant actions must wait on the commit to log to complete.

This is a VERY simplistic take on the internals of the Transaction Log, to which entire books are dedicated, but the rudimentary point remains.

Now if the storage system you are working with can provide the IOPS that you require to support both your Transaction Log and Database data files together then a shared drive/LUN would provide adequately for your needs.

To provide you with a specific recommended course of action I would need to know more about your database workload and the performance you require your database server to deliver.

Get your hands on the title SQL Server 2008 Internals to get a thorough look into the internals of the SQL Server transaction log, it's one of the best SQL Server titles out there and it will pay for itself in minutes from the value you gain from reading.

I have some column EntityName, and I want to have users to be able to search names by entering words separated by space. The space is implicitly considered as an 'AND' operator, meaning that the returned rows must have all of the words specified, and not necessarily in the given order.

For example, if we have rows like these:

  1. abba nina pretty balerina
  2. acdc you shook me all night long
  3. sth you are me
  4. dream theater it's all about you

when the user enters: me you, or you me (the results must be equivalent), the result has rows 2 and 3.

I know I can go like:

WHERE Col1 LIKE '%' + word1 + '%'
  AND Col1 LIKE '%' + word2 + '%'

but I wanted to know if there's some more optimal solution.

The CONTAINS would require a full text index, which (for various reasons) is not an option.

Maybe Sql2008 has some built-in, semi-hidden solution for these cases?

You're going to end up with a full table scan anyway.

The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:

Collation can make a huge difference when SQL Server has to look at almost all characters in the strings. For instance, look at the following:

SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'

This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchar data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.

Does anyone know of a resource that will tell me the sequence of locks that will be taken out on a table/page/row/index during a select/insert/update/delete in SQL Server 2005 AND how different table hints and isolation levels will impact the locks taken?

I know I am asking a lot here, but surely this information must be documented somewhere?

Thanks in advance,

Tom

SQL Server locking is based on the concepts in Transaction Processing: Concepts and Techniques. This book explains in great detail how locks are to be acquired, what locks are needed and why things must be the way they are.

The resources Marc linked are good coverage on the topic, but the details are scattered and you need to know where to look. Here is a primer to start you up:

The transaction isolation levels only affect read locks. Under normal read committed when reading a row an S-lock is acquired that is released immediately after the read. If the isolation level is elevated to repeatable read then the S-locks are held until the transaction ends. On higher serializable level range locks are placed instead of simple row locks, and they are held until the transaction commits. The snapshot modes are different in that they don't necessarily affect the type of lock, but the source of the read: rows are retrieved from the version store instead.

Lock order/hierarchy is always the same:

  • an Sch-S lock is placed on the metadata at the start of any DML operation. DDL operations require Sch-M locks and thus conflict, so DML can be assured of the 'stability' of the schema on which it operates (object schema, not database schema...).
  • The lock hierarchy path to a row is table-page-row. The actual granularity decided by the engine is dynamic. Typically is row.
  • No matter the granularity, the path to the actual locked resource is protected with intent locks. Ie. to S-lock a row, the reader must acquire IS-lock on the table and the page. To S-lock a page, it needs an IS-lock on table.
  • Single partition operations acquiring more that 5000 locks on a scan may trigger lock escalation. Escalation is always an attempt (ie. will never block if failed). Escalation in practice goes always from row-level locking to table (partition in 2008) level locking.

The lock hints can never change the order of locks, they can only change:

  • the type of lock (U-lock or X-lock when an S-lock would be required)
  • the granularity (enforce table, or page or row)
  • the duration (hold S-locks)
  • the blocking behavior (readpast to skip incompatible rows).

I did not talk too much about insert/update/deletes since they are quite uninteresting: they require X locks, that's it. The only interesting thing about it is the way update works because it first acquire an U-lock that is later converted to an X-lock. This behavior is needed to leverage the U-lock asymmetry that allows pending S-locks to drain before the update proceeds.

With this I hope you can go and find all the details left out from the articles and books linked.

How about these:

UPDATE: how about these more on transaction isolation levels and query hints:

If you're interested in these rather advanced topics, I'd strongly recommend you get the SQL Server 2008 Internals book by Kalen Delaney (and others) which has all these nitty gritty details in them - even in this book, the "locking" topic only begin on pages 610 and up :-)

alt text

Marc

I'm looking for a high-level, algorithmic understanding so that I can get a Big-O sense of what SQL-Server is doing to perform joins. Feel free to be concise, I'm not looking for the extremely nitty gritty. The thing that prompted me to understand how joins are implemented better is the scenario behind this question that I also just posted. I felt like they were ultimately two separate questions though, which is why I didn't combine them.

Thanks!

Honestly if you are interested at that level of detail I would suggest you read: http://www.amazon.com/Microsoft-SQL-Server-2008-Internals/dp/0735626243/ref=sr_1_1?ie=UTF8&qid=1297976127&sr=8-1#_

And learn to read execution plans. SQL Server has a pretty good optimization engine. It doesn't always do things the way we humans would expect though or even the same way for two queries that appear to us be similar.

I've been developing and administering Oracle database apps for many years, and would like to learn SQL Server. Does anyone have any book recommendations (preferably electronic). Naturally enough I already know SQL inside/out so I'd like to avoid any intro stuff and go straight to the meat, but not skip any basic SQL Server things I'd learn from an intro book. Any suggestions?

tia

I would definitely recommend:

If you have more of a development focus:

There are also some very good, free electronic books from Redgate:

to list a few.

I am trying to understnad how SQL Server allocates and reserves space.

Having run the examples from the article "How table design can impact your SQL Server performance?" [1], I received the results [My 1.1] diverting from those in article [1.1].
Why?

Why in one case the excessive space is reserved/allocated (all cases in [1]) but not in another [My 1.1]?
(Note that in [1] in both cases the excessive space is reserved, though on my computer only in one of cases)

How is space allocated, reserved by SQL Server?
And how can I control/manage it?

[1] ======
How table design can impact your SQL Server performance?
http://sqlserver-training.com/how-table-design-can-impact-your-sql-server-performance

[My 1.1]
My results diverting from [1] below

name                               rows  reserved  data     index_size  unused
---------------------------------- ----- --------- -------- ----------- ---------
Fixed_Lenght_Row_Table_Optimised   10000 40008 KB  40000 KB 8 KB        0 KB

[1.1] Results in [1] diverting from mine above

name                               rows  reserved  data     index_size  unused
---------------------------------- ----- --------- -------- ----------- ---------
Fixed_Lenght_Row_Table_Optimised   10000 40072 KB  40000 KB 8 KB        64 KB

[1.2]
Results from [1] coinciding with mine

name                                 rows  reserved  data     index_size  unused
----------------------------------   ----- --------- -------- ----------- ---------
Fixed_Lenght_Row_Table_Non_Optimised 10000 80072 KB  80000 KB 8 KB        64 KB

The article in the link is dubious at best. Does not specify what SQL Server version to use, and what are the various options enabled/disabled. Since SQL Server 2005 there is row small-lob storage for columns over spilling the 8K limit (see Table and Index Organization), there are default in-row vs. out-of-row options (see sp_tableoption) and there are many compression options (row-level, page-level, Unicode).

For accurate information I would stick to the official product documentation, starting from Planning and Architecture (Database Engine). For a more digestible read, buy one of the well established books, like Microsoft SQL Server 2008 Internals or Inside Microsoft SQL Server 2005: The Storage Engine.

I'm doing some reading on SQL Server performance:

http://www.amazon.com/Inside-Microsoft-SQL-Server-2005/dp/0735623139/ref=sr_1_6?ie=UTF8&s=books&qid=1267032068&sr=8-6

One of the surprising things I came across was how it processes the "FROM" phase in its Logical Processing. From what I understand, SQL Server will do the following:

1) For the first two tables, it will create a virtual table (VT1) consisting of a Cartesian join of the two tables

2) For every additional table, it will create a Cartesian join of VT1 and the additional table, with the result becoming VT1

I'm sure there is alot more to it under the covers, but at face value, this seems like it would involve a huge amount of processing/memory if you're dealing with big tables (and big queries).

I was just wondering whether anyone had a quick explanation of how SQL Server is able to do this in any sort of realistic time/space frame?

The carthesian join is just a description of the result, not an actual result. After the full carthesian join of tables A, B, C...X, the filter operators are applied (still as a definition), things like ON clauses of the join and WHERE clauses of the query. In the end this definition is in turn transformed into an execution plan, which will contain physicall operators like Nested Loops or Hash Join or Merge Join, and this operators, when iterated, will produce the results as requested in the query definition.

So the big 100x100x100x100... carthesian cube is never materialized, is just a definition.

If you are really interested in how SQL Server does what it does, please read this book: http://www.amazon.com/Microsoft-SQL-Server-2008-Internals/dp/0735626243/ref=sr_1_1?ie=UTF8&s=books&qid=1267033666&sr=8-1

when we create a clustered index CIX_FirstNames on a column , say, FirstNames, then what actually happens internally in SQL Server?

i have read that clustered indexes create copy of the data.

so, does sql server creates a new index table, IndexTable , and copies all the FirstNames from the table into IndexTable, and when a firstname is searched, then it displays it from the index table?

is this the actual working of clustered indexes ?

This is way too big a topic to handle here in a quick answer - you need to get yourself:

Microsoft SQL Server 2008 Internals

enter image description here

which covers this all in great detail.

But just briefly: NO! creating a clustered index does NOT duplicate data! Where did you get that from??

What a clustered index does is reorder the data (according to the key), and the leaf-level nodes of the clustered index are the data pages - but those exist only once.

Some more resources of interest:

I want to discuss about the Transaction Log of SQL Server, and I searched around, found some product which is accomplished, for example, Lumigent Log Explorer. But I am still interested about it.

Methods I known: 1.Read directly from physical log file 2.Using database command/query, DBCC LOG or through fn_dblog

Problems/difficulties: 1.Log file structure is hard to reverse engineered. 2.When I do lots of INSERTs, the fn_dblog didn't has all of them, for example, when I INSERT 50000 records, the fn_dblog just has 29616 LOP_INSERT_ROWS records, which means 20384 records are truncated?I don't know the internal logical about fn_dblog, can someone explain it?Does the fn_dblog has limitations?

Glad to hear some researches about SQL Server Transaction Log.

The SQL Server transaction log isn't meant to be "human readable". It's meant to support SQL Server, allowing transactions, read consistency, etc etc.

SUGGESTION: If you really want to understand SQL Server internals (including how the transaction log works), I strongly encourage you to get a copy of this book:

SQL Server 2008 Internals, Kalen Delaney

It's an excellent book; you will learn a LOT of practical and important information. Satisfaction guaranteed!