SQL and Relational Theory

C. Date

Mentioned 2

SQL is full of difficulties and traps for the unwary. You can avoid them if you understand relational theory, but only if you know how to put the theory into practice. In this insightful book, author C.J. Date explains relational theory in depth, and demonstrates through numerous examples and exercises how you can apply it directly to your use of SQL. This second edition includes new material on recursive queries, “missing information” without nulls, new update operators, and topics such as aggregate operators, grouping and ungrouping, and view updating. If you have a modest-to-advanced background in SQL, you’ll learn how to deal with a host of common SQL dilemmas. Why is proper column naming so important? Nulls in your database are causing you to get wrong answers. Why? What can you do about it? Is it possible to write an SQL query to find employees who have never been in the same department for more than six months at a time? SQL supports “quantified comparisons,” but they’re better avoided. Why? How do you avoid them? Constraints are crucially important, but most SQL products don’t support them properly. What can you do to resolve this situation? Database theory and practice have evolved since the relational model was developed more than 40 years ago. SQL and Relational Theory draws on decades of research to present the most up-to-date treatment of SQL available. C.J. Date has a stature that is unique within the database industry. A prolific writer well known for the bestselling textbook An Introduction to Database Systems (Addison-Wesley), he has an exceptionally clear style when writing about complex principles and theory.

More on Amazon.com

Mentioned in questions and answers.

I have a table called Users that has a growing list of preferences. These preferences could includes ReligionId (which would key off to another table that contains the list of religions).

The list of preferences is growing. I want to split it off the Users table into 2 tables. The strategy that I think would work well is to make a separate table called UserPreferences. I'm wondering if doing this is in line with the rules of normalization. Here is an illustration to make things a bit more clear.

enter image description here

Is this normalized? Are there better ways? All comments appreciated.

EDIT: How UserPreferences key off to other tables:

enter image description here

Some folks are recommending to store the preferences one per row. This is called an Entity-Attribute-Value table, and it is not normalized. Some people say EAV is "more normalized," but they're mistaken. There is no rule of normalization that encourages EAV as a design in a relational database.

One practical way you can tell it's not normalized is that you can no longer use a foreign key constraint for your religion lookup table, if all values of all preferences share a single column in this preferences table. You can't make a foreign key constraint restrict the values only on rows for a particular preference type -- the FK constraint always applies to all rows in the table.
Basically, Entity-Attribute-Value breaks SQL's support for constraints.

The only normalized design is to define a separate column for each distinct preference. Then you can define data types and constraints appropriate to that preference type.

If you really want to understand relations and normalization, read SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date.

Each column represents selections from a set. A set can be the set of integers, or the set of religions, or the set of email addresses. A row in a table is a pairing of sets that "go together," for example a given user has a name, and an birthdate, and a religion, and an email address, so those values are matched up together into one row, and together they describe something that exists in the world, i.e. a human being with those attributes.

What this means is that in each row, you have one value chosen for each of the columns, i.e. from each of the component sets that are references. And each column contains values only from one set. In the religion column, you can only choose religions, you can't also put favorite color and mother's maiden name and shoe size into the same column.

That's why EAV is bogus from a relational perspective, because it mashes values from any and all disparate attributes into the same column. It's more like a spreadsheet than a database. Not that I'm saying relational is the only way to store data. It's just that if you're asking if EAV is normalized, and normalization assumes as a prerequisite that the data is relational, then no, EAV is not relational and therefore cannot be normalized.

I've created a stock control database which contains two tables (actually more than two, but these are the two that are relevant to my question): Stock, and Receipts

I would like the link between the stock in the stock table,and the stock in the receipts table to be a little more clearer, this would be fine if a customer could only order one item of stock per receipt, as i'd simply have a StockID column and a Quantity column in the Recipts table, with the StockID column as an FK to the ID in the Stock table, however, the customer can make a receipt with any number of items of stock on it, which would mean i'd have to have a large number of columns in the Receipts table (i.e. StockID_1, Quantity_1, StockID_2, Quantity_2 etc.)

Is there a way around this (can you have like a dynamically expanding set of columns in MySQL) within MySQL, other than what i've done at the moment, which is to have an OrderContents column with the following structure (which isn't enforced by the database or anything) StockID1xQuantity,StockID2xQuantity and so on?

I would post an image of the DB structure, but I don't have enough repuation yet. My lecturer mentioned something about that it could be done, by normalising the database into 4th or 5th normal form?

No, relational databases do not allow dynamic columns. The definition of a relational table is that it has a header that name the columns, and every row has the same columns.

Your technique of repeating the groups of stock columns is a violation of First Normal Form, and it also has a lot of practical problems, for instance:

  • How do you know how many extra columns to create?
  • How do you search for a given value when you don't know which column it's in?
  • How do you enforce uniqueness?

The simplest solution is as @OGHaza described, store extra stock/quantity data on rows in another table. That way the problems above are solved.

  • You don't need to create extra columns, just extra rows, which is easy with INSERT.
  • You can search for a given value over one column to find it.
  • You can put constraints on the column.

If you really want to understand relational concepts, a nice book that is easy to read is: SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date.

There are also situations where you want to expand a table definition with dynamic columns that aren't repeating -- they're just new attributes. This is not relational, but it doesn't mean that we don't need some data modeling techniques to handle the scenario you describe.

For this type of problem, you might like to read my presentation Extensible Data Modeling with MySQL, for an overview of different solutions, and their pros and cons.

PS: Fourth and Fifth normal form have nothing to do with this scenario. Your lecturer obviously doesn't understand them.