Inside Microsoft SQL Server 2008

Lubor Kollar, Dejan Sarka

Mentioned 7

Provides information on the architecture of the T-SQL programming language to create scalable code.

More on Amazon.com

Mentioned in questions and answers.

I have a table with this structure.

UserID  | UserName  | AnswerToQuestion1 | AnswerToQuestion2 | AnswerToQuestion3
1       | John      | 1                 | 0                 | 1
2       | Mary      | 1                 | 1                 | 0

I can't figure out what SQL query I would use to get a result set like this:

UserID  | UserName  | QuestionName      | Response
1       | John      | AnswerToQuestion1 | 1
1       | John      | AnswerToQuestion2 | 0
1       | John      | AnswerToQuestion3 | 1
2       | Mary      | AnswerToQuestion1 | 1
2       | Mary      | AnswerToQuestion2 | 1
2       | Mary      | AnswerToQuestion3 | 0

I'm trying to split the three columns into three separate rows. Is this possible?

According to Itzik Ben-Gan in Inside Microsoft SQL Server 2008: T-SQL Querying, SQL Server goes through three steps when unpivoting a table:

  1. Generate copies
  2. Extract elements
  3. Remove rows with NULLs

Step 1: Generate copies

A virtual table is created that has a copy of each row in the orignal table for each column that is being unpivoted. Also, a character string of the column name is stored in a new column (call this the QuestionName column). *Note: I modified the value in one of your columns to NULL to show the full process.

UserID  UserName  AnswerTo1 AnswerToQ2 AnswerToQ3 QuestionName
1       John      1         0          1          AnswerToQuestion1
1       John      1         0          1          AnswerToQuestion2
1       John      1         0          1          AnswerToQuestion3
2       Mary      1         NULL       1          AnswerToQuestion1
2       Mary      1         NULL       1          AnswerToQuestion2
2       Mary      1         NULL       1          AnswerToQuestion3

Step 2: Extract elements

Then another table is created that creates a new row for each value from the source column which corresponds to the character string value in the QuestionName column. The value is stored in a new column (call this the Response column).

UserID  UserName  QuestionName        Response
1       John      AnswerToQuestion1   1
1       John      AnswerToQuestion2   0
1       John      AnswerToQuestion3   1
2       Mary      AnswerToQuestion1   1
2       Mary      AnswerToQuestion2   NULL
2       Mary      AnswerToQuestion3   1

Step 3: Remove rows with NULLS

This step filters out any rows that were created with null values in the Response column. In other words, if any of the AnswerToQuestion columns had a null value, it would not be represented as an unpivoted row.

UserID  UserName  QuestionName        Response
1       John      AnswerToQuestion1   1
1       John      AnswerToQuestion2   0
1       John      AnswerToQuestion3   1
2       Mary      AnswerToQuestion1   1
2       Mary      AnswerToQuestion3   1

If you follow those steps, you can

  1. CROSS JOIN all rows in the table against each AnswerToQuestion column name to get row copies
  2. Populate the Response column based on the matching the source column and QuestionName
  3. Remove the NULLs to get the same results without using UNPIVOT.

An example below:

DECLARE @t1 TABLE (UserID INT, UserName VARCHAR(10), AnswerToQuestion1 INT, 
  AnswertoQuestion2 INT, AnswerToQuestion3 INT
) 

INSERT @t1 SELECT 1, 'John', 1, 0, 1 UNION ALL SELECT 2, 'Mary', 1, NULL, 1 

SELECT
  UserID,
  UserName,
  QuestionName,
  Response
FROM (
  SELECT
    UserID,
    UserName,
    QuestionName,
    CASE QuestionName
      WHEN 'AnswerToQuestion1' THEN AnswerToQuestion1
      WHEN 'AnswerToQuestion2' THEN AnswertoQuestion2
      ELSE AnswerToQuestion3 
    END AS Response 
  FROM @t1 t1
      CROSS JOIN (
        SELECT 'AnswerToQuestion1' AS QuestionName
        UNION ALL SELECT 'AnswerToQuestion2'
        UNION ALL SELECT 'AnswerToQuestion3'
      ) t2
    ) t3
WHERE Response IS NOT NULL

I had been working on Oracle long time back and has exposure to SQL. Now I need to brush up my SQL skills for a project on SQL Server 2008. Could you please recommend a good resource (online/book)?

"T-SQL Querying" book (http://www.amazon.com/Inside-Microsoft-Querying-Developer-Reference/dp/0735626030) has an interesting example, where, querying a table under default transaction isolation level during clustered index key column update, you may miss a row or read a row twice. It looks to be acceptable, since updating table/entity key is not a good idea anyway. However, I've updated this example so that the same happens, when you update non-clustered index key column value.

Following is the table structure:

SET NOCOUNT ON;
USE master;
IF DB_ID('TestIndexColUpdate') IS NULL CREATE DATABASE TestIndexColUpdate;
GO
USE TestIndexColUpdate;
GO

IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;
CREATE TABLE dbo.Employees
(
empid CHAR(900) NOT NULL, -- this column should be big enough, so that 9 rows fit on 2 index pages
salary MONEY NOT NULL,
filler CHAR(1) NOT NULL DEFAULT('a')
);
CREATE INDEX idx_salary ON dbo.Employees(salary) include (empid); -- include empid into index, so that test query reads from it
ALTER TABLE dbo.Employees ADD CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED(empid);

INSERT INTO dbo.Employees(empid, salary) VALUES
('A', 1500.00),('B', 2000.00),('C', 3000.00),('D', 4000.00),
('E', 5000.00),('F', 6000.00),('G', 7000.00),('H', 8000.00),
('I', 9000.00);

This is what needs to be done in the first connection (on each update, the row will jump between 2 index pages):

SET NOCOUNT ON;
USE TestIndexColUpdate;

WHILE 1=1
BEGIN
    UPDATE dbo.Employees SET salary = 10800.00 - salary WHERE empid = 'I'; -- on each update, "I" employee jumps between 2 pages
END

This is what needs to be done in the second connection:

SET NOCOUNT ON;
USE TestIndexColUpdate;

DECLARE @c INT
WHILE 1 = 1
BEGIN
    SELECT salary, empid FROM dbo.Employees
    if @@ROWCOUNT <> 9 BREAK;
END

Normally, this query should return 9 records we inserted in the first code sample. However, very soon, I see 8 records being returned. This query reads all it's data from the "idx_salary" index, which is being updated by previous sample code. This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.

Do I interpret this behavior correctly? Does this mean, that even non-clustered index keys should not be updated?

UPDATE: To solve this problem, you only need to enable "snapshots" on the db (READ_COMMITTED_SNAPSHOT ON). No more deadlocking or missing rows. I've tried summarize all of this here: http://blog.konstantins.net/2015/01/missing-rows-after-updating-sql-server.html

UPDATE 2: This seems to be the very same problem, as in this good old article: http://blog.codinghorror.com/deadlocked/

Do I interpret this behavior correctly?

Yes.

Does this mean, that even non-clustered index keys should not be updated?

No. You should use a proper isolation level or make the application tolerate the inconsistencies that READ COMMITTED allows.

This issue of missing rows is not limited to clustered indexes. It is caused by moving a row in a b-tree. Clustered and nonclustered indexes are implemented as b-trees with only tiny physical differences between them.

So you are seeing the exact same physical phenomenon. It applies every time your query reads a range of rows from a b-tree. The contents of that range can move around.

Use an isolation level that provides you the guarantees that you need. For read-only transactions the snapshot isolation level is usually a very elegant and total solution to concurrency. It seems to apply to your case.

This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.

This is an understandable request. On the other hand you specifically requested a low level of isolation. You can dial all the way up to SERIALIZABLE of you want. SERIALIZABLE presents you as-if serial execution.

Missing rows are just one special case of the many effects that READ COMMITTED allows. It makes no sense to specifically prevent them while allowing all kinds of other inconsistencies.


SET NOCOUNT ON;
USE TestIndexColUpdate;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED

DECLARE @c INT
WHILE 1 = 1
BEGIN
    DECLARE @count INT
    SELECT @count = COUNT(*) FROM dbo.Employees WITH (INDEX (idx_salary))
    WHERE empid > '' AND CONVERT(NVARCHAR(MAX), empid) > '__'
        AND salary > 0
    if @count <> 9 BREAK;
END

I have a table that currently is a long list of devices and information about when they were sold. I need to take the table which would look something like this:

Item   |  Time Sold
--------------------
  A        05/2010
  B        04/2010
  C        03/2010
  A        04/2010
  A        05/2010

And then have a table with the item as the first column, with the count of the dates being the column headers, like below:

Item   |   Count_03/2010   |  Count_04/2010  |  Count_05/2010
-------------------------------------------------------------
  A    |         0         |         1       |        2
  B    |         0         |         1       |        0
  C    |         1         |         0       |        0

Is there a simple way of doing this? I know in other languages there is a single command to do it, was wondering if SQL had one.

EDIT

My problem is that there is more than one table, and in some tables the months may be different than in other tables. Is there anyway to write a script that will apply to them all, by getting the variables listed and then using those to create the columns in the code? I could write one if I knew the months would always be the same, but since they are variable, is there a way of doing this.

SQL Server 2005 and up has something called pivot, but it is still not one command and you need to know the values in the Time sold column. You can use a dynamic pivot approach as demonstrated by Itzik Ben-Gan in his Inside Microsoft SQL Server 2008: T-SQL Querying book

Example

create table #test (Item char(1),  TimeSold varchar(20))

  insert #test values('A','05/2010')
  insert #test values('B','04/2010')
  insert #test values('C','03/2010')
  insert #test values('A','04/2010')
  insert #test values('A','05/2010')

  SELECT *
FROM
(SELECT Item,TimeSold
FROM #test) AS pivTemp
PIVOT
(   COUNT(TimeSold)
    FOR TimeSold IN ([05/2010],[04/2010],[03/2010])
) AS pivTable

I'm a developer with over 10 years experience in vb5/6, C#, VB.NET, 3 years of WCF too, design and architecture. I consider myself to be pretty good and can usually solve anything given time (like most good developers). I worked for a company for 8 years and during that time used a database called IBM UniVerse, it's an old Pick style database (multivalue), but it's ok. My trouble is, I've used a multivalue database for such a long time of my career I haven't got a lot of major advanced skills in SQL Server. I know enough about ANSI SQL to get by, but usually consult the net for syntax here and there, whilst in the 8 year job we automated SQL commands so I didn't even write SQL for at least a few years. Now I've left and I'm looking for another job and most of them say not only do you need to be an absolute expert at C#/VB.NET (which is fine) but you also need to be an expert at SQL Server and have the ability to write complex stored procs in T-SQL and performance tune. Now, I could probably get by just fine and learn as I go as I generally do anyway, but when it comes to write T-SQL on a stupid piece of paper in front of 4 people in an interview I don't do too well. I do like SQL Server as it has a management GUI unlike IBM UniVerse with it's DOS like command prompt! So using the management studio is fairly easy, setting indexes, creating tables, creating relationships etc.

So my question is, how to I go about getting up to speed very quickly in all things SQL Server? Most jobs now seems to be "Software Developer / DBA". In my opinion they are separate jobs, but oh well, can't do anything about that.

Buy a good book or two, and create a sample project to test your new skills.

I'd strongly suggest, in order:

I work with SQL almost on a daily basis and want to improve my SQL skills. I am looking for online resources or books for advanced queries for real life problems. I liked Joe Celko's books and I am looking for similar resources. I am also interested in SQL puzzles but ones which solve real life business apps. I am not into brain teasers, chess, games, cards and the like of SQL problems which look overly complicated or too obscure to be useful.

I want to learn to use SQL with fewer temp tables and more sophisticated joins, subqueries.. etc.

I checked out the resources from a similar SO question but I didn't find them very useful and was hoping there are others. Celko's books mention different approaches to the same problem and I find this pretty informative and eye opening. I want more! http://stackoverflow.com/questions/550545/any-recommended-websites-for-sql-puzzles

I have found this book to be a life-saver. This is targeted not only for beginners, but also advanced database professionals. I hope this helps you as well!

http://www.amazon.com/Murachs-Server-2008-Developers-Murach/dp/1890774510

This question contains complete learning curve for Cpp from Beginner,*Intermediate* and advance which is very helpful.

Please suggest me same for Databases,to become a good Database Expert or a Good DBA. As there are may books and tutorials which to choose.

I found following books written by Itzik Ben-Gan very helpful:

Beginning:

  1. Microsoft SQL Server 2012 T-SQL Fundamentals http://www.amazon.com/Microsoft-Server-2012-T-SQL-Fundamentals/dp/0735658145

Advanced:

  1. Training Kit (Exam 70-461): Querying Microsoft SQL Server 2012 (MSPress, December 2012) http://www.amazon.com/Training-Kit-Exam-70-461-Microsoft/dp/0735666059

  2. Inside Microsoft SQL Server 2008: T-SQL Programming (MSPress, 2009) http://www.amazon.com/Inside-Microsoft%C2%AE-Server%C2%AE-2008-Pro-Developer/dp/0735626022/ref=pd_sim_b_1?ie=UTF8&refRID=1ZFP1YGEB2QPZBZCBFZS

I'd say: learn the internals

http://www.amazon.com/Microsoft-SQL-Server-2012-Internals/dp/0735658560

From there you have a solid knowledge of querytuning, how the optmizer works and solid knowledge of how SQl Server works. Something every dba should know.

And the books of Iztik Ben Gan: http://www.amazon.com/Inside-Microsoft%C2%AE-SQL-Server%C2%AE-2008/dp/0735626030/ref=sr_1_sc_2?s=books&ie=UTF8&qid=1393572731&sr=1-2-spell&keywords=iztik+ben+gan