Paul Nielsen, Kalen Delaney, Adam Machanic
This book presents the collective wisdom of more than 30 of today's leading SQL Server experts. This high-powered group is writing because of another passion it shares--all author royalties will be donated to help child victims of war, worldwide.
Assume that I have a main table which has 100 columns referencing (as foreign keys) to some 100 tables (containing primary keys).
The whole pack of information requires joining those 100 tables. And it is definitely a performance issue to join such a number of tables. Hopefully, we can expect that any user would like to request a bunch of data containing values from not more than some 5-7 tables (out of those 100) in queries that put conditions (in WHERE part of the query) on the fields from some 3-4 tables (out of those 100). Different queries have different combinations of the tables used to produce "SELECT" part of query and to put conditions in "WHERE". But, again, every SELECT would require some 5-7 tables and every WHERE would requre some 3-4 tables (definitely, the list of tables used to produce SELECT may overlap with the list of tables used to put conditions in WHERE).
I can write a VIEW with the underlying code joining all those 100 tables. Then I can write the mentioned above SQL-queries to this VIEW. But in this case it is a big issue for me how to instruct SQL Server that (despite the explicit instructions in the code to join all those 100 tables) only some 11 tables should be joined (11 tables are enough to be joined to produce SELECT outcome and take into account WHERE conditions).
Another approach may be to create a "feature" that converts the following "fake" code
SELECT field1, field2, field3 FROM TheFakeTable WHERE field1=12 and field4=5
into the following "real" code:
SELECT T1.field1, T2.field2, T3.field3 FROM TheRealMainTable join T1 on .... join T2 on .... join T3 on .... join T4 on .... WHERE T1.field1=12 and T4.field4=5
From grammatical point of view, it is not a problem even to allow any mixed combinations of this "TheFakeTable-mechanism" with real tables and constructions. The real problem here is how to realize this "feature" technically. I can create a function which takes the "fake" code as an input and produces the "real" code. But it is not convenient because it requires using dynamic SQL tools evrywhere where this "TheFakeTable-mechanism" appears. A fantasy-land solution is to extend the gramma of the SQL-language in my Management Studio to allow writing such a fake code and then automatically converting this code into the real one before sending to the server.
My questions are:
Thanks to everyone for every comment!
PS The structure with 100 tables arises from the following question that I asked here: Normalizing an extremely big table
The SQL Server optimizer does contain logic to remove redundant joins, but there are restrictions, and the joins have to be provably redundant. To summarize, a join can have four effects:
NULLs (for a
To successfully remove a redundant join, the query (or view) must account for all four possibilities. When this is done, correctly, the effect can be astonishing. For example:
USE AdventureWorks2012; GO CREATE VIEW dbo.ComplexView AS SELECT pc.ProductCategoryID, pc.Name AS CatName, ps.ProductSubcategoryID, ps.Name AS SubCatName, p.ProductID, p.Name AS ProductName, p.Color, p.ListPrice, p.ReorderPoint, pm.Name AS ModelName, pm.ModifiedDate FROM Production.ProductCategory AS pc FULL JOIN Production.ProductSubcategory AS ps ON ps.ProductCategoryID = pc.ProductCategoryID FULL JOIN Production.Product AS p ON p.ProductSubcategoryID = ps.ProductSubcategoryID FULL JOIN Production.ProductModel AS pm ON pm.ProductModelID = p.ProductModelID
The optimizer can successfully simplify the following query:
SELECT c.ProductID, c.ProductName FROM dbo.ComplexView AS c WHERE c.ProductName LIKE N'G%';
The main restrictions are that foreign key relationships must be based on a single key to contribute to the simplification process, and compilation time for the queries against such a view may become quite long particularly as the number of joins increases. It could be quite a challenge to write a 100-table view that gets all the semantics exactly correct. I would be inclined to find an alternative solution, perhaps using dynamic SQL.
That said, the particular qualities of your denormalized table may mean the view is quite simple to assemble, requiring only enforced
FOREIGN KEYs non-
NULLable referenced columns, and appropriate
UNIQUE constraints to make this solution work as you would hope, without the overhead of 100 physical join operators in the plan.
Using ten tables rather than a hundred:
-- Referenced tables CREATE TABLE dbo.Ref01 (col01 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref02 (col02 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref03 (col03 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref04 (col04 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref05 (col05 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref06 (col06 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref07 (col07 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref08 (col08 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref09 (col09 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE); CREATE TABLE dbo.Ref10 (col10 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
The parent table definition (with page-compression):
CREATE TABLE dbo.Normalized ( pk integer IDENTITY NOT NULL, col01 tinyint NOT NULL REFERENCES dbo.Ref01, col02 tinyint NOT NULL REFERENCES dbo.Ref02, col03 tinyint NOT NULL REFERENCES dbo.Ref03, col04 tinyint NOT NULL REFERENCES dbo.Ref04, col05 tinyint NOT NULL REFERENCES dbo.Ref05, col06 tinyint NOT NULL REFERENCES dbo.Ref06, col07 tinyint NOT NULL REFERENCES dbo.Ref07, col08 tinyint NOT NULL REFERENCES dbo.Ref08, col09 tinyint NOT NULL REFERENCES dbo.Ref09, col10 tinyint NOT NULL REFERENCES dbo.Ref10, CONSTRAINT PK_Normalized PRIMARY KEY CLUSTERED (pk) WITH (DATA_COMPRESSION = PAGE) );
CREATE VIEW dbo.Denormalized WITH SCHEMABINDING AS SELECT item01 = r01.item, item02 = r02.item, item03 = r03.item, item04 = r04.item, item05 = r05.item, item06 = r06.item, item07 = r07.item, item08 = r08.item, item09 = r09.item, item10 = r10.item FROM dbo.Normalized AS n JOIN dbo.Ref01 AS r01 ON r01.col01 = n.col01 JOIN dbo.Ref02 AS r02 ON r02.col02 = n.col02 JOIN dbo.Ref03 AS r03 ON r03.col03 = n.col03 JOIN dbo.Ref04 AS r04 ON r04.col04 = n.col04 JOIN dbo.Ref05 AS r05 ON r05.col05 = n.col05 JOIN dbo.Ref06 AS r06 ON r06.col06 = n.col06 JOIN dbo.Ref07 AS r07 ON r07.col07 = n.col07 JOIN dbo.Ref08 AS r08 ON r08.col08 = n.col08 JOIN dbo.Ref09 AS r09 ON r09.col09 = n.col09 JOIN dbo.Ref10 AS r10 ON r10.col10 = n.col10;
Hack the statistics to make the optimizer think the table is very large:
UPDATE STATISTICS dbo.Normalized WITH ROWCOUNT = 100000000, PAGECOUNT = 5000000;
Example user query:
SELECT d.item06, d.item07 FROM dbo.Denormalized AS d WHERE d.item08 = 'Banana' AND d.item01 = 'Green';
Gives us this execution plan:
The scan of the Normalized table looks bad, but both Bloom-filter bitmaps are applied during the scan by the storage engine (so rows that cannot match do not even surface as far as the query processor). This may be enough to give acceptable performance in your case, and certainly better than scanning the original table with its overflowing columns.
If you are able to upgrade to SQL Server 2012 Enterprise at some stage, you have another option: creating a column-store index on the Normalized table:
CREATE NONCLUSTERED COLUMNSTORE INDEX cs ON dbo.Normalized (col01,col02,col03,col04,col05,col06,col07,col08,col09,col10);
The execution plan is:
That probably looks worse to you, but column storage provides exceptional compression, and the whole execution plan runs in Batch Mode with filters for all the contributing columns. If the server has adequate threads and memory available, this alternative could really fly.
Ultimately, I'm not sure this normalization is the correct approach considering the number of tables and the chances of getting a poor execution plan or requiring excessive compilation time. I would probably correct the schema of the denormalized table first (proper data types and so on), possibly apply data compression...the usual things.
If the data truly belongs in a star-schema, it probably needs more design work than just splitting off repeating data elements into separate tables.
Are there any tools that do Query Optimizations for SQL Server 2005 and above?
I've searched & didn't find anything.
What do you recommend?
If this Question was repeated before you can close it but I didn't see anything similar
As John Saunders commented, the best tool at your disposal is your own mind. Following bernd_k's comment, here are a few suggestions for sharpening that tool.
I got more than 7 million rows in a table and
SELECT COUNT(*) FROM MyTable where MyColumn like '%some string%'
gives me 20,000 rows and takes more than 13 seconds.
The table has NONCLUSTERED INDEX on MyColumn.
Is there any way to improve speed?
Try using a binary collation first, which will mean that the complex Unicode rules are replaced by a simple byte comparison.
SELECT COUNT(*) FROM MyTable WHERE MyColumn COLLATE Latin1_General_BIN2 LIKE '%some string%'
Also, have a look at chapter titled 'Build your own index' in SQL Server MVP Deep Dives written by Erland Sommarskog
The basic idea is that you introduce a restriction to the user and require the string to be at least three contiguous characters long. Next, you extract all three letter sequences from the MyColumn field and store these fragments in a table together with the MyTable.id they belong to. When looking for a string, you split it into three letter fragments as well, and look up which record id they belong to. This way you find the matching strings a lot quicker. This is the strategy in a nutshell.
The book describes implementation details and ways to optimise this further.
I've been developing and administering Oracle database apps for many years, and would like to learn SQL Server. Does anyone have any book recommendations (preferably electronic). Naturally enough I already know SQL inside/out so I'd like to avoid any intro stuff and go straight to the meat, but not skip any basic SQL Server things I'd learn from an intro book. Any suggestions?
I would definitely recommend:
Microsoft SQL Server 2008 Internals: goes deep but does not neglect the fundamentals. It is a must read.
Professional SQL Server 2008 Internals and Troubleshooting: similar to the first book, but with a slightly more performance based focus.
SQL Server 2008 Query Performance Tuning Distilled: great techniques for identifying and fixing poorly performing queries, from the metal to the application.
If you have more of a development focus:
Expert SQL Server 2005 Development (SQL Server 2005 but still relevant for SQL Server 2008)
There are also some very good, free electronic books from Redgate:
to list a few.
Best Book for a new Database Developer
I'm looking to become an intermediate, eventually advance sql programmer. I'm currently half way between beginner and intermediate. I'd like to book to cover at least beginner to intermediate, or all the way even.
Thank you all for your responses, I have chosen the MVP Deep Dive book from amazon, it has 5 stars and only 5 stars! Sounds good enough for me. I look forward to reading it.
The 2005 and 2008 editions of the inside SQL Server series by Itzik Ben-Gan, Kalen Delaney et al
And of course...how did I forget..after all, I wrote 2 chapters in this book (chapter 8 and 20) SQL Server MVP Deep Dives
This one is not SQL Server specific, but if you are learning databases, it is ggod to understand how to refactor them and most books don't cover this: