Are you an SQL programmer that, like many, came to SQL after learning and writing procedural or object-oriented code? Or have switched jobs to where a different brand of SQL is being used, or maybe even been told to learn SQL yourself? If even one answer is yes, then you need this book. A "Manual of Style" for the SQL programmer, this book is a collection of heuristics and rules, tips, and tricks that will help you improve SQL programming style and proficiency, and for formatting and writing portable, readable, maintainable SQL code. Based on many years of experience consulting in SQL shops, and gathering questions and resolving his students' SQL style issues, Joe Celko can help you become an even better SQL programmer. + Help you write Standard SQL without an accent or a dialect that is used in another programming language or a specific flavor of SQL, code that can be maintained and used by other people. + Enable you to give your group a coding standard for internal use, to enable programmers to use a consistent style. + Give you the mental tools to approach a new problem with SQL as your tool, rather than another programming language - one that someone else might not know!
Convention has it that table names should be the singular of the entity that they store attributes of.
I dislike any T-SQL that requires square brackets around names, but I have renamed a
Users table to the singular, forever sentencing those using the table to sometimes have to use brackets.
My gut feel is that it is more correct to stay with the singular, but my gut feel is also that brackets indicate undesirables like column names with spaces in them etc.
Should I stay, or should I go?
In his book "SQL Programming Style," Joe Celko suggests that a collection (e.g. a table) should be named in the plural, while a scalar data element (e.g. a column) should be named in the singular.
He cites ISO-11179-4 as a standard for metadata naming, which supports this guideline.
Let me argue why this makes sense.
argsin Java program.
argsin plural exactly because they can hold 0, 1, and millions of arguments.
for every student in students, not
for every s in student. You select a subset of students from a set of students. You do not select a subset of students from a student. That is conceptual reason for the proper naming.
Debunking the fallacies. The most pupular answer says
Reason 1 (Concept). You can think of bag containing apples like "AppleBag", it doesn't matter if contains 0, 1 or a million apples, it is always the same bag. Tables are just that, containers, the table name must describe what it contains, not how much data it contains. Additionally, the plural concept is more about a spoken language one (actually to determine whether there is one or more), a table is not intended to be read by a human.
Lets convert AppleBag => BagOfApples. Now, the same "conceptual" argument says somehting opposite to itself, we see Apples and therefore answer must be plural!
This is because there is nothing conceptual in this ****. It fails even to reason about English language, the simple logic. In English, a
BagOfApples != AnApple!
The argument "Bag contains 0,1, .. millions" of apples just proves that the collection must be called "Apples", likewise Java arguments or
stackoverflow.com/posts. Somehow, the enemies of civilization manage to conclude that singular must be used here. The posts is just a folder. Why should it be plural?
Let's teach mr. Artuwood some logic:
folder is a folder and must describe what it contains, not how much data it contains.
In fact, if you start thinking you realize that
apple contains apple describes nonsense. The true concept is that
Bag contains Apples. Either we name our container Bag(of specific kind) or think what it contains, the Apples (those dirty bastards tried to reduce that to the numbering problem. But we are after apples, not their number.). Yes, if we abstract the wrapper away, we realize that we are after
Apples, what is contained in it. We fetch and iterate either a Bag or Apples, not apple.
Reason 2. (Convenience). it is easier come out with singular names, than with plural ones. Objects can have irregular plurals or not plural at all, but will always have a singular one (with few exceptions like News).
Customer Order User Status News
Which convenience is he talking about? Let's just point out that that plural come out simpler than singulars.
Reason 3. (Aesthetic and Order).
Aestetics is on our side. It is simple, as he did. We just claim that and that is all what is needed to turn the table.
Reason 4 (Simplicity). Put all together, Table Names, Primary Keys, Relationships, Entity Classes... is better to be aware of only one name (singular) instead of two (singular class, plural table, singular field, singular-plural master-detail...)
Am I alone seeing that for simplicity, everything must become singular? Yes, forget the plural using English language. It will make things simpler.
In fact, in many languages, sex is attached to all things, not only boys and girls. This is stupid and makes language unnecessarily complex. But, I have noticed that it simplifies the referencing. When I say "fox, dog and wolf" in Russian and then say "he", it unambiguisly means that I am referencing the "wolf" because "fox" and "dog" are "she". Now you say that distinction that helps to reduce ambiguity creates it. How so?
Probably, the logic is "let's keep arbitrary crap in our language and enforce the disorder by removing the rules which make sense". Yes, the proposal to use singular where logic demands plural while keeping to attach inappropriate attributes like sex to inappropriate items is a pursuit of nonsense in our crazy world.
You probably need to
SELECT student.name FROM students as student
Ok, here probably is the argument: how is table's alias is singular if table is plural? Ok, this makes some sense. But saying that column (object's attribute) is singular, therefore, the object collection must be singular as well, is nonsense.
Reason 5. (Globalization). The world is getting smaller, you may have a team of different nationalities, not everybody has English as native language. Would be easier for a non-native English language programmer to think of "Repository" than of "Repositories", or avoid them type "Statuses" instead of "Status". Having singular names can lead to less errors caused by typos, save time by avoid spending extra seconds to think "is it Child or Children?", hence improving productivity.
This bigot will break his mind, no doubt. But why other non-native speakers, like me, should buy making nonsense "for simplicity and internationalization"? Why not improve our English and keep the logic unraped? It is much easier to understand the language in this case, after all.
Otherwise, we should also abandon the verbs, the articles, the adjectives and speak only in nouns, for simplicity. It would be simpler (and more productive) if we limit our speech to
moooo and nothing else, for the animals to be able to speak English as well. This way we achieve much wider internationalization.
Reason 6. (Why not?). It can even save you writing time, save you disk space, and even make your computer keyboard lasts more!
This argument supports why we should abandon the plurals in our human communaction as well. No, instead of
moooo, mo moo, moo moo, we will say
I, III, III, II. Much shorter than with singular proposal.
The bottom line is that all the "arguments" against plural is pure bullshit. If you think about it, it says against singular. When all you have to support your position is bullshit, it means that you need bullshit to support your view, you are on a wrong side.
The only meaningful argument for the singular is that table is a very special set of objects. Table is a Class! Yes, it contains all objects of specific type and we use singular for class names. Class
Dog contains all dogs. What is dog1? It is a Dog. On the other hand, users deal with tables as collections. They add/remove items into collection and we use plural for collections.
Often I will end up with very complex SQL statements and I wondered if there was style guideline out there that dictates a common way of laying various aspects of a query out.
Most of my queries are written for MySQL.
So do you know of a universal style guide? Perhaps you have written one yourself. Please share your guidelines.
Is there such a thing as Design Patterns in SQL ???
(Not even sure if this is a recommended book.. just illustrating that "proper design" exists).
SQL seems to be the most neglected language when it comes to formatting nicely and readably... And as SQL statements can be incredibly detailed and complex, it makes it extremely hard to work with. But I find that when I try to format my SQL code in the best way possible I'm sometimes unsure of how to do it. I know the standards for Java, C#, Python, etc.... but when it comes to SQL I haven't seen too many guidelines or accepted practices. What are tips/rules for formatting SQL so that it's clear, legible, and logical? Can you give example code to illustrate? What have you found to be the most standard, accepted way of formatting SQL?
You could try checking out Joe Celko's book SQL Programming Style. I'm sure that there are a lot of people who disagree with his style, but it's a good start.
Some of my own "rules"
SELECT column_1, column_2, CASE WHEN column_5 = 'Blah' THEN 1 WHEN column_6 = 'Blah' THEN 2 ELSE 3 END AS column_alias FROM My_Table MT INNER JOIN My_Other_Table MOT ON MOT.column_1 = MT.column_1 WHERE MT.column_2 = 'Some Value' AND ( MT.column_3 = 'Some other value' OR MT.column_4 = 'Some other value' )
We put common prefixes on related tables to assure they display next to each other in our DB management software (Toad, Enterprise Manager, etc).
So for example, all user tables start with the word User:
Ideally, in honor of the three great virtues of a programmer these tables should be named User, Event, Purchase respectively to save some typing, agreed?
Is this naming convention the best (only?) practice for grouping related tables together naturally?
I wouldn't use a naming convention for purposes of alphabetizing table names. It's nice when it works out that way, but this shouldn't be by design.
Read Joe Celko's book "SQL Programming Style." His first chapter in that book is about naming conventions, guided by the ISO 11179 standard for metadata naming. One of his recommendations is to avoid unnecessary prefixes in your naming convention.
I work at a company that has some very non-standardized SQL conventions (They were written by Delphi Developers years ago). Where is the best place that I can find SQL industry standard convention definitions that are most commonly used?
In his book "SQL Programming Style," Joe Celko suggests a number of conventions, for example that a collection (e.g. a table) should be named in the plural, while a scalar data element (e.g. a column) should be named in the singular.
He cites ISO-11179-4 as a standard for metadata naming, which supports this guideline.
i have many to many database and i'm there is a query that i just got stuck with and cant do it.
i have 4 tables Artists, Tracks, Albums, Clips
now i'm on the artist page so i need to get them by the artist page, i already got all of them, but not the way i want them.
because some tracks, albums, clips belong to other artists as well (duet) and i need to display their name.
but the problem is that i'm selecting using the artist id so my GROUPC_CONCAT function wont work here is the query that gets the artist albums.
SELECT al.album_name, GROUP_CONCAT(a.artist_name SEPARATOR ', ') AS 'artist_name' FROM Albums al LEFT JOIN ArtistAlbums art ON art.album_id = al.album_id LEFT JOIN Artists a on a.artist_id = art.artist_id WHERE a.artist_id = 10 GROUP BY al.album_id
one of the albums have two artists attached to it, but it does not get the other artist name.
when i select by the album_id i get the two artists.
please note that i'm new to mysql and i did not find any answers on this particular problem almost no resources on many-to-many querying.
how can i tackle this problem.?
any resources or books on many-to-many that show how to deal with the database on the application layer will be much appreciated, thanks in advance.
Think of table aliases as really being row aliases. That is, for purposes of expressions in the WHERE clause and the select-list, the alias refers to a single row at a time.
Since you've created a condition such that
a.artist_id = 10, then only rows for that artist match the condition. What you really want is to match all artists on an album given that one artist is artist_id = 10.
For that, you need another join, so that the row where artist_id = 10 is matched to all the rows for that respective album.
SELECT al.album_name, GROUP_CONCAT(a2.artist_name SEPARATOR ', ') AS `artist_name` FROM Albums al INNER JOIN ArtistAlbums art ON art.album_id = al.album_id INNER JOIN Artists a on a.artist_id = art.artist_id INNER JOIN ArtistAlbums art2 ON art2.album_id = al.album_id INNER JOIN Artists a2 on a2.artist_id = art2.artist_id WHERE a.artist_id = 10 GROUP BY al.album_id
P.S.: I've also replaced your use of LEFT JOIN with INNER JOIN. There's no reason you needed LEFT JOIN, since you're explicitly looking for albums that have a matching artist, not all albums whether or not it has artist 10. Review the meaning of different types of join.
Re your followup question:
I don't know of a book specifically about many-to-many queries. I'd recommend books like:
I have something that looks like this:
if ( $_SESSION['username'] ) $owner = $_SESSION['username']; $q = "SELECT * FROM projects where owners='$owner' ORDER BY created DESC";
The problem is that the 'owners' column may have multiple entries separated by commas. I need a way to cycle through all the entries and select any table that has $owner in the owners columns. Whats the best way to go about this?
The proper design for a relational database is to store only one value in a given "cell" (i.e. in a given column on a given row -- but please do not use spreadsheet terminology with relational databases, you'll make Joe Celko cry. :-).
Here's an example:
CREATE TABLE Projects ( project_id INT PRIMARY KEY, ... ); CREATE TABLE Users ( user_id INT PRIMARY KEY, ... ); CREATE TABLE Project_Owners ( project_id INT NOT NULL, user_id INT NOT NULL, PRIMARY KEY (project_id, user_id), FOREIGN KEY (project_id) REFERENCES Projects(project_id), FOREIGN KEY (user_id) REFERENCES Users(user_id) );
You insert one row to Project_Owners table for each owner of a given project. So you can easily have a project with multiple owners, and a user who works on multiple projects.
We say that Projects has a many-to-many relationship to Users.
So when you want to get projects associated with a given owner, you'd need to look up the owner in the users table, and then join that to projects via the many-to-many table:
<?php $owner = mysql_real_escape_string($_SESSION["username"]); $sql = "SELECT p.* FROM Projects INNER JOIN Project_Owners o USING (project_id) INNER JOIN Users u USING (user_id) WHERE u.name = '$owner'";
Or if you want to simplify it, you can use integers in your session data. Then you can skip a join:
<?php $owner = (int) $_SESSION["username"]; $sql = "SELECT p.* FROM Projects INNER JOIN Project_Owners o USING (project_id) WHERE o.user_id = $owner";
Should variable names in PL/SQL use underscores or capital letters to separate words?
I can't find a clear answer to this and I've seen both.
I would strongly recommend investing in Joe Celko's SQL Programming Style.
Basically echoing @user1598390, you should avoid Camel Case and jaming everything together.
Camel Case has numerous issues, primarily:
I don't know how to name this column because my english is bad. Now, I name it "sorting", is it correct?
| id | name | sorting | ----------------------- | 1 | foo | 0 | | 2 | bar | 1 |
I would take a page from Joe Celko's excellent SQL Programming Style. Section 1.2.4 takes you through developing standardized postfixes. In your case, the following excerpt applies:
“_seq” = sequence, ordinal numbering. This is not the same thing as a tag number, because it cannot have gaps.
So something like
sorting_seq could apply here.
I have a table with entries for Items as being 'lost' and 'found'. Each row has a date for the event. Im hoping to build a query with matching pairs of 'itemid', 'lost date', 'found date' by joining the table to itself.
This works to a point: unfortunately if there are multiple lost and found pairs for a given item each 'lost date' will be joined with all the 'found dates' that follow it.
Still with me?
The query goes something like:
select c0.ItemId, c0.ChangeDate, c1.ChangeDate from Changes c0 join Changes c1 on c0.ItemId = c1.ItemId and c1.ChangeDate >= c0.ChangeDate where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found);
What Im hoping to achieve is some form of a given 'lost date' paired with only the next 'found date' in sequence (or NULL if no 'found date' exists). Im (pretty) sure this is possible but Im not seeing the path.
I was wondering about putting a sub-select in the first join and using a LIMIT 1 to get only one record but I don't see how to join this to the appropriate row in the main part of the select. MySQL tells me it doesn't exist. Fair enough.
Generally when dealing with pairs of dates (e.g. start/end for scheduling) the advice is don't put them on separate rows. Put them in two columns of the same row. See Joe Celko's SQL Programming Style.
But that said, you can solve it with your current schema by searching doing another self-join to search for a ChangeDate between the two. If none is found (that is, if c2.* is null because of the outer join), then c0 and c1 are "adjacent."
select c0.ItemId, c0.ChangeDate, c1.ChangeDate from Changes c0 inner join Changes c1 on c0.ItemId = c1.ItemId and c1.ChangeDate > c0.ChangeDate left outer join Changes c2 on c0.ItemId = c2.ItemId and c2.ChangeDate > c0.ChangeDate and c2.ChangeDate < c1.ChangeDate and c2.ChangeType IN (9,10) -- edit where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found) and c2.ItemId IS NULL;
In the above example, I've assumed that ChangeDate is unique, and I changed the >= to >. If ChangeDate is not unique, you'll have to come up with some other expression to test for c2 "between" c0 and c1.
Say you have a table named Product and it has an auto-number/identity column. Do you name it simply Id or do you name it ProductId or Product_Id? Please explain why.
There is no clear answer because it depends on your other tools, your existing naming conventions, and even personal preference. For example, you might use an ORM that expects "ID" on every table. Personally, I would go with "ProductID" because it makes it easier to read queries (for me) and I don't like underscores.
One argument that is completely wrong is that "ID" saves typing time: you spend a lot more time reading code than writing it, so you should always prioritize readable code over saving keystrokes.