Practical Issues in Database Management

Fabian Pascal

Mentioned 7

--C. J. Date Three decades ago relational technology put the database field on a sound, scientific foundation for the first time. But the database industry--vendors, users, experts, and the trade press--has essentially flouted its principles, focusing instead on a cookbook, product-specific approach, devoid of conceptual understanding. The consequences have been costly: DBMS products, databases, development tools, and applications dont always perform up to expectation or potential, and they can encourage the wrong questions and provide the wrong answers. Practical Issues in Database Management is an attempt to remedy this intractable and costly situation. Written for database designers, programmers, managers, and users, it addresses the core, commonly recurring issues and problems that practitioners--even the most experienced database professionals--seem to systematically misunderstand, namely: *Unstructured data and complex data types *Business rules and integrity enforcement *Keys *Duplicates *Normalization and denormalization *Entity subtypes and supertypes *Data hierarchies and recursive queries *Redundancy *Quota queries *Missing information Fabian Pascal examines these crit

More on Amazon.com

Mentioned in questions and answers.

Is it good practice to give every database table a primary key? It seems to me that if the primary key is not explicitly needed, then it would just be extra clutter in my database.

The purpose of keys in relational database design is to prevent duplicate data and thereby help ensure data integrity. If you permit duplicate rows then you create ambiguity, potential update anomalies and very possibly incorrect results. Therefore in general every table ought to have at least one key (and maybe more than one if required). It's not usually the case that data integrity is "not explicitly needed"!

A primary key is just any one of the keys of a table. Designating one key as a primary key can be useful but is not particularly important - the important thing is that each table have at least one candidate key.

The reasons why duplicate rows ought to be avoided are covered pretty extensively in database literature. See:

http://www.dbdebunk.com/page/page/627052.htm

http://www.dbdebunk.com/page/page/638922.htm

http://dl.acm.org/citation.cfm?id=77708

http://www.amazon.com/Practical-Issues-Database-Management-Practitioner/dp/0201485559

For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)

Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.

These books are Database specific, here is one for SQl Server. http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.

Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.

As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.

Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.

The most important things for database design:

  • Thorough definition of the data needed (including correct datatypes) and the relationships between pieces of data (including correct normalization)
  • data integrity
  • performance
  • security
  • consistency (of datatypes, naming standards etc.)

I am trying to find a great resource to study for functional dependency and normalization.

Anyone have any idea where should I look to? I am having difficulty differentiating whether a FD is in 1NF, 2NF or 3NF?

I've been reading Wikipedia and used Google search to find good research, but can't find any that explains it in simple terms.

Maybe you all can share on how you learned FD's and normalization during your life as well.

A functional dependency isn't in any normal form. Normal Form is a property of a relation that satisfies some given set of FDs.

Don't rely on Wikipedia or Google. There is too much misinformation and rubbish from online sources. Try:

Introduction to Database Systems by Chris Date

Practical Issues in Database Management by Fabian Pascal

A functional dependency defines a functional relationship between attributes. For example: PersonId functionally determines BirthDate (normally written as PersonId -> BirthDate). Another way of saying this is: There is exactly one Birth Date for any given given any instance of a person. Note that the converse may or may not be true. Many people may have been born on the same day. Given a BirthDate we may find many PersonId sharing that date.

Sets of functional dependencies may be used to synthesize relations (tables). The definition of the first 3 normal forms, including Boyce Codd Normal Form (BCNF) is stated in terms of how a given set of relations represent functional dependencies. Fourth and fifth normal forms involve Multi-Valued dependencies (another kettle of fish).

Here are a few free resources about Functional Dependencies, Normalization and database design. Be prepared to exercise your brain and math skills when studying this material.

The following are "slide shows" from various academic sites...

The following are academic papers. Heavier reading but well worth the effort.

If you are seriously interested in this subject I suggest you put out the cash for a good book on the subject of Relational Database Design. For example: An Introduction to Database Systems by C.J. Date

So I've read a couple books on SQL and I've used SQL and created databases, but really nothing advanced, a couple tables, a couple joins, that sort of thing.

So now I am working on a project that is pretty big. The database is going to have multiple tables and we will be processing 40'000 to 80'000 rows of data daily.

Now I find myself lacking some basic SQL concepts, for instance:

I have a dataset with files and file details. The files are listed in one table and the file details are listed in another. We also have a build table and fourth table that joins the build table, file table and file details table together.

The problems are like, "How do I process the files and have the correct file details be associated with files and associated with a build?

Even my university didn't cover this and now I feel I am very out my depth.

Lets take one more example. Lets take the File Table. In my application I have 40'000 files, I need to add files that don't already exist to the file table. Do I have to search for each individual entry and add if it does not exist and skip if it does?

So my real question is, where is a good place to read up on general concepts?

  • SQL and Relational Theory - How to Write Accurate SQL Code, by Chris J. Date; from the Preface:

    SQL is ubiquitous. But SQL is hard to use: It's complicated, confusing, and error prone—much more so, I venture to suggest, than its apologists would have you believe. In order to have any hope of writing SQL code that you can be sure is accurate, therefore (meaning it does exactly what it's supposed to do, no more and no less), you must follow some appropriate discipline—and it's the thesis of this book that using SQL relationally is the discipline you need. But what does this mean? Isn't SQL relational anyway?

    Well, it's true that SQL is the standard language for use with relational databases—but that fact in itself doesn't make it relational. The sad truth is, SQL departs from relational theory in all too many ways; duplicate rows and nulls are two obvious examples, but they're not the only ones. As a consequence, it gives you rope to hang yourself with, as it were. So if you don't want to hang yourself, you need to understand relational theory (what it is and why); you need to know about SQL's departures from that theory; and you need to know how to avoid the problems they can cause. In a word, you need to use SQL relationally. Then you can behave as if SQL truly were relational, and you can enjoy the benefits of working with what is, in effect, a truly relational system.

    Now, a book like this wouldn't be needed if everyone was using SQL relationally already—but they aren't. On the contrary, I observe much bad practice in current SQL usage. I even observe such practice being recommended, in textbooks and similar publications, by writers who really ought to know better (no names, no pack drill); in fact, a review of the literature in this regard is a pretty dispiriting exercise. The relational model first saw the light of day in 1969, and yet here we are, almost 40 years on, and it still doesn't seem to be very well understood by the database community at large. Partly for such reasons, this book uses the relational model itself as an organizing principle; it explains various features of the model in depth, and shows in every case how best to use SQL to implement the feature in question. ...

  • From there, if you want to deepen the subject, Applied Mathematics for Database Professionals, by Lex de Haan (RIP) and Toon Koppelaars; from the Forward, by Hugh Darwen and Chris J. Date (again):

    We welcome this contribution to the database literature. It is another book on the theory and practice of relational databases, but this one is interestingly different. The bulk of the book is devoted to a treatment of the theory. The treatment is not only rigorous and mathematical, but also rather more approachable than some other texts of this kind. The authors clearly recognize, as we do, the importance of logic and mathematics if database study is to be taken seriously. They have done a good job of describing a certain formalism developed by their former teachers, Bert de Brock and Frans Remmen. This formalism includes some ideas that will be novel to many readers, even those who already have a degree of familiarity with the subject. A particularly interesting novel idea, to us, is the formalization of updating and transactions in Chapter 10. ...

  • Practical Issues in Database Management: A Reference for the Thinking Practitioner, by Fabian Pascal:

    Preface

    The computer industry —and its database sector in particular— resembles the fashion industry: it is driven by fads. And more often than not, vendors profit from accelerated obsolescence on which fads are predicated. It's the users, however, not the vendors, who pay trough the nose. The vendors, helped by the trade media, can profitably exploit ignorance and obscure serious product deficiencies and questionable practices they induce by simply luring users to the next fad—the Internet being just the latest one. ... [My note: The book was published in 2000, but there is no sign that the WWW in particular is advanced beyond its fad functioning mode.]

  • An Introduction to Relational Database Theory, by Hugh Darwen, this one is free, it can be downloaded directly from its site.

I was wondering if anybody had any recommendations for some good free resources where I could learn about designing complex database architecture?

Any books which shares knowledge regarding scalability,fault-tolerance,which DBMS to choose,availability and all essential factors.

All suggestions appreciated.

If you are new to the subject then Introduction to Database Systems by Chris Date is a good place to start before you get onto database design.

http://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844

For books specifically about design:

http://www.amazon.com/Information-Modeling-Relational-Databases-Management/dp/0123735688

http://www.amazon.com/Practical-Issues-Database-Management-Practitioner/dp/0201485559

want to learn details about how to do a better database design any suggested recommended for this topic? thx very much

There's a vendor-neutral intro course online:

And here are two book recommendations:

I am looking forward to learn database normalization with help of some examples.
Putting it simple - How can I make someone learn about Normal forms giving examples.

I looked for some articles online, but did not find them very useful