An Introduction to Database Systems

C. J. Date

Mentioned 13

This book provides a comprehensive introduction to the large field of database systems through a solid grounding in the foundations of database technology.

More on Amazon.com

Mentioned in questions and answers.

What concepts in Computer Science do you think have made you a better programmer?

My degree was in Mechanical Engineering so having ended up as a programmer, I'm a bit lacking in the basics. There are a few standard CS concepts which I've learnt recently that have given me a much deeper understanding of what I'm doing, specifically:

Language Features

  • Pointers & Recursion (Thanks Joel!)

Data Structures

  • Linked Lists
  • Hashtables

Algorithms

  • Bubble Sorts

Obviously, the list is a little short at the moment so I was hoping for suggestions as to:

  1. What concepts I should understand,
  2. Any good resources for properly understanding them (as Wikipedia can be a bit dense and academic sometimes).

I find it a little funny that you're looking for computer science subjects, but find wikipedia too academic :D

Anyway, here goes, in no particular order:

As a recent graduate from a computer science degree I'd recommend the following:

Some of the OS concepts

 ( memory, IO, Scheduling, process\Threads, multithreading )

[a good book "Modern Operating Systems, 2nd Edition, Andrew S. Tanenbaum"]

Basic knowledge of Computer networks

[a good book by Tanenbaum

OOPS concepts

Finite autometa

A programming language ( I learnt C first then C++)

Algorithms ( Time\space complexity, sort, search, trees, linked list, stack, queue )

[a good book Introduction to Algorithms]

I am trying to find a great resource to study for functional dependency and normalization.

Anyone have any idea where should I look to? I am having difficulty differentiating whether a FD is in 1NF, 2NF or 3NF?

I've been reading Wikipedia and used Google search to find good research, but can't find any that explains it in simple terms.

Maybe you all can share on how you learned FD's and normalization during your life as well.

A functional dependency isn't in any normal form. Normal Form is a property of a relation that satisfies some given set of FDs.

Don't rely on Wikipedia or Google. There is too much misinformation and rubbish from online sources. Try:

Introduction to Database Systems by Chris Date

Practical Issues in Database Management by Fabian Pascal

A functional dependency defines a functional relationship between attributes. For example: PersonId functionally determines BirthDate (normally written as PersonId -> BirthDate). Another way of saying this is: There is exactly one Birth Date for any given given any instance of a person. Note that the converse may or may not be true. Many people may have been born on the same day. Given a BirthDate we may find many PersonId sharing that date.

Sets of functional dependencies may be used to synthesize relations (tables). The definition of the first 3 normal forms, including Boyce Codd Normal Form (BCNF) is stated in terms of how a given set of relations represent functional dependencies. Fourth and fifth normal forms involve Multi-Valued dependencies (another kettle of fish).

Here are a few free resources about Functional Dependencies, Normalization and database design. Be prepared to exercise your brain and math skills when studying this material.

The following are "slide shows" from various academic sites...

The following are academic papers. Heavier reading but well worth the effort.

If you are seriously interested in this subject I suggest you put out the cash for a good book on the subject of Relational Database Design. For example: An Introduction to Database Systems by C.J. Date

I'm developing an application that will require me to create my first large-scale MySQL database. I'm currently having a difficult time wrapping my mind around the best way to organize it. Can anyone recommend any reading materials that show different ways to organize a MySQL database for different purposes?

I don't want to try getting into the details of what I imagine the database's main components will be because I'm not confident that I can express it clearly enough to be helpful at this point. That's why I'm just looking for some general resources on MySQL database organization.

I can recomend you the first chapter of this book: An Introduction to Database Systems, it may help you organize your ideas, and I certainly recomend not using 5th normal form but using 4th, this is very important.

I'm looking at implementing an RDBMS. Are there any good resources out there about how a database works internally, and the kinds of things I'd need to know when starting out to build my own? (Please no comments about whether it's a practical idea or not - just imagine it's for a hobby project or something).

Again - interested in the RDBMS design, not the Database design. And efficiency is very important (it seems like it's reasonably easy to design some kind of relational database like structure if I don't care about speed).

I'd suggest starting with Introduction to Database Systems and Transactional Information Systems. They should both have bibliographies to take you further.

I am unable to figure out an efficient way to establish relationships between tables. I want to have a database of books, authors, publishers and the users that sign-up and have their bookshelves (Read, Currently Reading, Want to Read (or Plan to Read)). I want the users to be able to select which books they've read, want to read or are currently reading.

P.s. I am aware of PK and FK in database table relations.

Edit: maybe this is a better way of doing it:

enter image description here

Then I shall use "Status" = (Read, Plant to Read and Currently reading) - please tell me if this is good and efficient!

a database table is a mathematical relation, in other words a predicate and a set of tuples ("rows") for which that predicate is true. that means each "row" in a "table" is a (true) proposition.

this may all look scary but the basic principles are really simple and worth knowing and applying rigorously: you'll better know what you're doing.

relations are simple if you start small, with the binary relation. for example, there's a binary relation > (greater than) on the set of all integers which "contains" all ordered pairs of integers x, y for which the predicate x > y holds true. note: you would not want to materialize this specific relation as a database table. :)

you want Books, Authors, Publishers and Users with their bookshelfs (Read, Currently Reading, Want to Read). what are the predicates in that? "user U has read book B", "user U is reading book B", "user U wants to read book B" would be some of them; "book B has ISBN# I, title T, author A" would be another, but some books have multiple authors. in that case, you'll do well to split it out into a separate predicate: "book B was written by author A".

CREATE TABLE book (
  id INT NOT NULL PRIMARY KEY
);
CREATE TABLE author (
  id INT NOT NULL PRIMARY KEY
, name TEXT NOT NULL
);
CREATE TABLE written_by (
  book INT NOT NULL REFERENCES book (id)
, author INT NOT NULL REFERENCES author (id)
);
CREATE TABLE reader (
  id INT NOT NULL PRIMARY KEY
);
CREATE TABLE has_read (
  reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE is_reading (
  reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE plans_reading (
  reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);

etc etc.

edit: C. J. Date's Introduction to Database Systems

So say you already know how to do basic things with databases e.g. create them...:) and you understand basic concepts like views and joins..and basic SQL syntax....but you just don't feel comfortable that your solutions follow the "best practices" or are even "correct".

Where could you read up something relatively rigorous but straight to the point for someone trying to build a database driven website? something that would say when/why joins are bad, when to cache, what kinds of queries are expensive, maybe have some exercises and hints/ solutions? something with a little bit of math as well?

Something like an e-book or some lecture notes may be the most appropriate but any suggestions are welcome!

A Developer's Guide to Data Modeling for SQL Server: Covering SQL Server 2005 and 2008

Now I know this isn't directly for MySQL but the book was very easy to follow and very informative in just overall design concerns. I would also assume there is some basic transference of the information from one server to another server. (You can get this as an eBook as well as a normal book)

Hope this helps.

  1. Work with people more experienced than you
  2. Make lots of mistakes... and make sure you learn from them!
  3. Start reading books on relational databases
  4. Learn the internal details of your specific dbms (the optimizer in particular)

The only real shortcut is #1.

Once you begin to feel confident in modelling, start utilizing the knowledge of the internals of your specific dbms. It can be a big payoff to tilt minor design choices in ways that are benificial for your particular dbms. (Fixed length records and clustering comes to mind in MySQL).

You also need a thick and boring book on the fundamentals of relational database design to impress your friends and fellow coworkers. The best book for this purpose I've found is C.J Date's An Introduction to Database Systems. Apart from being just as thick and boring as its author, it unfortunately has some of the best material on the topic of normalization and relational theory.

I second @Cade Roux advice about Celko's books, and especially recommend "SQL for Smarties" or "Data and Databases".

A book used by lots of universities is "Database System Concepts" by "Silberschatz":

http://www.amazon.com/Database-System-Concepts-Abraham-Silberschatz/dp/0073523321/

I can't believe nobody mentioned The manga guide to databases

I am new to programming (1 year of C#-winforms) and am from a non CS background. I have not worked with databases yet. As an application programmer, how much should I know? and where should I start from?

EDIT: Specific topics, books, tutorials, blogs would be appreciated.

I was wondering if anybody had any recommendations for some good free resources where I could learn about designing complex database architecture?

Any books which shares knowledge regarding scalability,fault-tolerance,which DBMS to choose,availability and all essential factors.

All suggestions appreciated.

If you are new to the subject then Introduction to Database Systems by Chris Date is a good place to start before you get onto database design.

http://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844

For books specifically about design:

http://www.amazon.com/Information-Modeling-Relational-Databases-Management/dp/0123735688

http://www.amazon.com/Practical-Issues-Database-Management-Practitioner/dp/0201485559

I want to cover the equivalent of a typical CS undergrad course in material, so I'm making a list of books to cover the typical topics. I've split the list into topics that, from the research I did, I think are compulsory and optional. I would like some help to confirm if the topics are split correctly, and if the books are of the correct level. Also, please let me know if I left out any important topics, or if any are beyond undergrad level.

Thank you for your time!

Edit regarding the on hold status: I do not believe this question is off-topic as I am not asking for a recommendation for books - I am asking if the topics I have listed are indicative of a typical CS course, and if any important topics are missing. The books links are only there in case the books I have chosen are not correct for the topic, and can be removed if necessary.


COMPULSORY

Operating Systems: Operating System Concepts

Networks: Computer Networking: A Top-Down Approach

Discrete Mathematics: Concrete Mathematics

Data Structures and Algorithms: Introduction to Algorithms

Computer Architecture: Computer Systems: A Programmer's Perspective

Automata Theory: Introduction to the Theory of Computation

Compilers: Engineering a Compiler was recommended to me over the dragon book.

Database Theory: An Introduction to Database Systems

Programming Language Concepts and Design: Programming Language Pragmatics

OPTIONAL

Cryptography: Cryptography Engineering: Design Principles and Practical Applications

Functional Programming: Learn You a Haskell for Great Good!

Artificial Intelligence: Artificial Intelligence: A Modern Approach

Computer Graphics: Real-Time Rendering

Your list is very good on the subjects directly related to computer science. However, it is light on math. In my own B.Sc. in Computer Science I also had a ton of calculus, linear algebra, algebra (groups, rings, etc), statistics, analytic geometry and numerical analysis. Some applications of computer science rely heavily on those:

  • Machine learning depends on lots of linear algebra, calculus, and statistics;
  • Computer graphics depends a lot on analytic geometry and linear algebra;
  • Scientific computation relies on calculus and numerical analysis.

I never used much of the ton of Algebra I had, but I hear it is important for cryptography. :-)

For a programmer developing more regular applications your list is very good, but for those interested in these more specialized areas (which are still pretty important), these subjects are vital.

What's the widest overview and where are the deepest analysis of different replication methods and problems?

I would start here: wikipedia's replication article, then read a couple of related papers on general replication techniques such as the replicated distributed state machine approach (Paxos (pdf)) and epidemic replication (Google 'Epidemic Algorithms for Replicated Database Maintenance').

For a practical overview, perhaps consider investigating the source code for Postgresql, which seems to have some replication technologies built in. This presentation purports to have some details.

However, given that you're talking about deep analysis, the best approach is to make sure that you have a very sound understanding of fundamental distributed database systems issues. My copy of Date's Introduction to Database Systems has a few pages on distributed databases and their attendant issues. I should think a textbook dedicated to distributed databases would have much more detail - this one, for example, looks promising.

You can go much deeper if you read Ken Birman's work on Virtual Synchrony, and most things that Leslie Lamport has ever written. These will attack the problem from the perspective of a general distributed systems approach.

Good luck!

While researching the topic, I came across this post: Should you enforce constraints at the database level as well as the application level?

The person who answered the question claimed that we should enforce Database constraint because it is "easier, integrity, flexible".

The reason I brought out this question is because of my recent maintenance work in one of a very robust systems. Due to a change in business rule, one of the data columns used to have CHAR(5) is now accepting 8 Characters. This table has many dependencies and will also affect many other tables not only in the database but also a few other systems, thus increasing the size to CHAR(8) is literally impossible.

So my question goes back to the database design - wouldn't it be so much easier if you reduce or even eliminate the need of database constraints? If the above mentioned scenario would have happened, all you have to do is to change the front-end or application level validation to make sure the user enter 8 characters for that field.

In my opinion, we should minimize the database constraint to anticipate any changes in the data structure in the future. What is your thought?

Database systems provide a number of benefits, one of the most important is (physical) data independence. Data independence can be defined as an immunity of application program to change in the way that the data is physically stored and accessed, this concept is tightly related to data-model design and normalization roles where data constraints are fundamental.

Database sharing is one of the application integration patterns, widely used between independent applications. Tradeoff will be trying to spread data integrity code in all applications or in a centric fashion inside database.

Minimizing the database constraint will minimize usage of wide range of well-known, proven technologies developed over many years by a wide variety of very smart people.

As a foot note:

This table has many dependencies and will also affect many other tables not only in the database but also a few other systems

Beside this smells redundancy, at least it shows the side effect of the change. Think about when you have to find the side effects with code review!
Application comes, applications go but data remains.

I am looking forward to learn database normalization with help of some examples.
Putting it simple - How can I make someone learn about Normal forms giving examples.

I looked for some articles online, but did not find them very useful