SQL Antipatterns

Bill Karwin

Mentioned 37

Illustrating some of the most common misconceptions and pitfalls software developers face using relational databases, this book helps readers use a database to produce the most efficient results, and turn sluggish, inflexible code into high-quality, reliable solutions.

More on Amazon.com

Mentioned in questions and answers.

How do you get the rows that contain the max value for each grouped set?

I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:

Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)

Person | Group | Age
---
Bob  | 1     | 32  
Jill | 1     | 34  
Shawn| 1     | 42  
Jake | 2     | 29  
Paul | 2     | 36  
Laura| 2     | 39  

Desired result set:

Shawn | 1     | 42    
Laura | 2     | 39  

The correct solution is:

SELECT o.*
FROM `Persons` o                    # 'o' from 'oldest person in group'
  LEFT JOIN `Persons` b             # 'b' from 'bigger age'
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL                 # bigger age not found

How it works:

It matches each row from o with all the rows from b having the same value in column Group and a bigger value in column Age. Any row from o not having the maximum value of its group in column Age will match one or more rows from b.

The LEFT JOIN makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULLs from b ('no biggest age in the group').
Using INNER JOIN makes these rows not matching and they are ignored.

The WHERE clause keeps only the rows having NULLs in the fields extracted from b. They are the oldest persons from each group.

Further readings

This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming

I have a bit of SQL that is almost doing what I want it to do. I'm working with three tables, a Users, UserPhoneNumbers and UserPhoneNumberTypes. I'm trying to get a list of users with their phone numbers for an export.

The database itself is old and has some integrity issues. My issue is that there should only ever be 1 type of each phone number in the database but thats not the case. When I run this I get multi-line results for each person if they contain, for example, two "Home" numbers.

How can I modify the SQL to take the first phone number listed and ignore the remaining numbers? I'm in SQL Server and I know about the TOP statement. But if I add 'TOP 1' to the LEFT JOIN select statement its just giving me the 1st entry in the database, not the 1st entry for each User.

This is for SQL Server 2000.

Thanks,

SELECT  Users.UserID, 
  Users.FirstName, Users.LastName,
  HomePhone, WorkPhone, FaxNumber

FROM Users

LEFT JOIN
 (SELECT UserID, PhoneNumber AS HomePhone
 FROM UserPhoneNumbers LEFT JOIN UserPhoneNumberTypes ON UserPhoneNumbers.UserPhoneNumberTypeID=UserPhoneNumberTypes.UserPhoneNumberTypeID
 WHERE UserPhoneNumberTypes.PhoneNumberType='Home') AS tmpHomePhone
 ON tmpHomePhone.UserID = Users.UserID
LEFT JOIN
 (SELECT UserID, PhoneNumber AS WorkPhone
 FROM UserPhoneNumbers LEFT JOIN UserPhoneNumberTypes ON UserPhoneNumbers.UserPhoneNumberTypeID=UserPhoneNumberTypes.UserPhoneNumberTypeID
 WHERE UserPhoneNumberTypes.PhoneNumberType='Work') AS tmpWorkPhone
 ON tmpWorkPhone.UserID = Users.UserID
LEFT JOIN
 (SELECT UserID, PhoneNumber AS FaxNumber
 FROM UserPhoneNumbers LEFT JOIN UserPhoneNumberTypes ON UserPhoneNumbers.UserPhoneNumberTypeID=UserPhoneNumberTypes.UserPhoneNumberTypeID
 WHERE UserPhoneNumberTypes.PhoneNumberType='Fax') AS tmpFaxNumber
 ON tmpFaxNumber.UserID = Users.UserID

I assume you have some primary key field on each joined table, since UserID is not unique. I'll assume your primary key is called ID. We'll take the records with the lowest ID. This meets your "first" criteria.

SELECT  Users.UserID, Users.FirstName, Users.LastName, hp.HomePhone,
        wp.WorkPhone, fn.FaxNumber
FROM Users
LEFT JOIN HomePhone hp ON hp.UserID = Users.UserID
LEFT JOIN HomePhone hp2 ON hp2.UserID = Users.UserID AND hp2.ID < hp.ID
LEFT JOIN WorkPhone wp ON wp.UserID = Users.UserID
LEFT JOIN WorkPhone wp2 ON wp2.UserID = Users.UserID AND wp2.ID < wp.ID
LEFT JOIN FaxNumber fn ON fn.UserID = Users.UserID
LEFT JOIN FaxNumber fn2 ON fn2.UserID = Users.UserID AND fn2.ID < fn.ID
WHERE hp2.ID IS NULL AND wp2.ID IS NULL AND fn2.ID IS NULL

There is a whole chapter on this type of issue, called "Ambiguous Gruops", in the book SQL Antipatterns.

Our masters thesis project is creating a database schema analyzer. As a foundation to this, we are working on quantifying bad database design.

Our supervisor has tasked us with analyzing a real world schema, of our choosing, such that we can identify some/several design issues. These issues are to be used as a starting point in the schema analyzer.

Finding a good schema is a bit difficult because we do not want a schema which is well designed in all aspects, but a schema that is more "rare to medium".

We have already scheduled the following schemas for analysis: wikimedia, moodle and drupal. Not sure in which category each fit. It is not necessary that the schema is open source.

The database engine used is not important, though we would like to focus on SQL server, Posgresql and Oracle.

For now literature will be deferred, as this task is supposed to give us real world examples which can be used in the thesis. i.e. "Design X is perceived by us as bad design, which our analyzer identifies and suggests improvements to", instead of coming up with contrived examples.

I will update this post when we have some kind of a tool ready.

Check the Dell-dvd-store, you can use it for free.

The Dell DVD Store is an open source simulation of an online ecommerce site with implementations in Microsoft SQL Server, Oracle and MySQL along with driver programs and web applications

Bill Karwin has written a great book about bad designs: SQL antipatterns

For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)

Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.

These books are Database specific, here is one for SQl Server. http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.

Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.

As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.

Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.

The most important things for database design:

  • Thorough definition of the data needed (including correct datatypes) and the relationships between pieces of data (including correct normalization)
  • data integrity
  • performance
  • security
  • consistency (of datatypes, naming standards etc.)

I have a problem selecting 6 random friends

This is the query I've got so far:

$result = num_rows("SELECT * FROM friends WHERE member_id = '".$_SESSION['userid']."'");
if($result >= 6) {
    $f_num = 6;
} else {
    $f_num = $result;
}
for($i = 1; $i <= $f_num; $i++) {
    $q_get_member_friends = mysql_query("SELECT * FROM friends WHERE member_id = '".$_SESSION['userid']."' ORDER BY rand() LIMIT 1");
    $r_get_member_friends = mysql_fetch_array($q_get_member_friends);
    echo $r_get_member_friends['friend_with'];
}

I want to select 6 random friends if the logged in user has more or equal to 6 friends

Stuck on this for a while now :/

Thanks for any help :)

Instead of SELECT * at the beginning, try SELECT COUNT(*) and use the actual return value instead of num_rows().

Your loop could generate duplicates. I would suggest trying OMG Ponies answer.

There is a whole chapter about random selection in the book SQL Antipatterns.

I am building a custom MVC framework using PHP. My problem is when I want to access any model class through the controller class. One way I have seen this done is through a registry design pattern using magic methods such as get and set, though PHP get and set are considered bad practise by some. I have read about dependency injection done through a container, but I can not see this working effectily as the container would have to call the models or it would have to contain the models which would defeat the purpose of MVC and create a massive super class. Singleton is seen as bad practise. Is there any solutions or improvements of the methods I have mentioned. It may just be my understand and knowledge of PHP needs improving.

Currently I have this: router.php (loads up controllor through a GET varible

 <?php

class router {



function __construct() {

    if (file_exists("controller/".$_GET['url']."Controller.php"))  {


       function __autoload($controller) {

           $controlinclude = "controller/".$controller.".php";
            include $controlinclude;

      }
       $control = $_GET['url']."Controller";
        new $control();


    }
    else    {

        // throw exception
    }

}

}
?>

Hope that makes sence

First of all ... Do no put autoloading script in routing mechanism. You are mixing the responsibilities. You will be better off creating a separate class for this based on spl_autoload_register.

Neeext .. do no put complicated operations on constructor. It is makes you code somewhat untestable. Maybe you should be something like:

// you might want to replace $_GET with $_SERVER['QUERY_STRING'] later
$router = new Router( $_GET['url'] );

// where 'default' is the name of fallback controller
$controller_class = $router->get_controller( 'default' );
$method_name = $router->get_action( 'index' );

$model_factory = new ModelFactory( new PDO( ... ) );
$controller = new {$controller_class}( $model_factory );
$controller->{$method_name}();

Additionally, you should look into php namespaces. There is no point in ending class with ...Controller just to know where the class will be located.

Ok ... back to the Model.

There is quite common misconception about models in web development community ( i blame RoR for this mess ). Model in MVC is not a class, but an application layer which contains multitude of instances. Most of the instances belong to one of two types of classes. With following responsibilities:

  • Domain Logic :

    Deals with all the computation, calculation and all the domain specific details. Objects in this group have no knowledge of where and how data is actually stored. They only manipulate the information.

  • Data Access

    Usually made of objects that fit DataMapper pattern (do not confuse with ORM of same name .. nothing in common). Responsible for storing data from Domain Objects and retrieving them. Might be in database.. might not. This is where your SQL queries would be.

In semi-real world situation () it might looks something like this (related to code abowe):

class SomeController
{
   // ... snip ...
   protected $model_factory = null;
   // ... snip ...

   public function __construct( ModelFactory $factory )
   {
       $this->model_factory = $factory;
   }
   // ... snip ...

   public function action_foobar()
   {
      $user = $this->model_factory->build_object( 'User' );
      $mapper = $this->model_factory->build_mapper( 'User' );

      $user->set_id(42);
      $mapper->fetch($user);

      if ( $user->hasWarning()  )
      {
          $user->set_status( 'locked' );
      }

      $mapper->store( $user );
   }

   // ... snip ...
}

As you see, there is no indication how the data was stored. It does not even matter if user account was new, or already existing.

Some materials you might find useful

Videos

Books:

I have two tables players and scores.

I want to generate a report that looks something like this:

player    first score             points
foo       2010-05-20              19
bar       2010-04-15              29
baz       2010-02-04              13

Right now, my query looks something like this:

select p.name        player,
       min(s.date)   first_score,
       s.points      points    
from  players p    
join  scores  s on  s.player_id = p.id    
group by p.name, s.points

I need the s.points that is associated with the row that min(s.date) returns. Is that happening with this query? That is, how can I be certain I'm getting the correct s.points value for the joined row?

Side note: I imagine this is somehow related to MySQL's lack of dense ranking. What's the best workaround here?

Most RDMBs won't even let you include non aggregate columns in your SELECT clause when using GROUP BY. In MySQL, you'll end up with values from random rows for your non-aggregate columns. This is useful if you actually have the same value in a particular column for all the rows. Therefore, it's nice that MySQL doesn't restrict us, though it's an important thing to understand.

A whole chapter is devoted to this in SQL Antipatterns.

I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.

I have had a few ideas.

a) Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.

I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.

b)

SELECT * FROM table WHERE (100*RAND()) < odds

I considered LIMIT 0,1

But then if items have the same probability only one of the will be returned

Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).

I guess I could order by odds ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).

Seems like a clumsy solution.

Does anyone have a superior solution? If not which one of the above is best?

If you have an index on the odds column, and a primary key, this would be very efficient:

SELECT id, odds FROM table WHERE odds > 0

The database wouldn't even have to read from the table, it would get everything it needed from the odds index.

Then, you'll select a random value between 1 and the number of rows returned.

Then select that row from the array of rows returned.

Then, finally, select the whole target row:

SELECT * FROM table WHERE id = ?

This assures an even distribution between all rows with an odds value.


Alternatively, put the odds in a different table, with an autoincrement primary key.

Odds
ID     odds
1      4
2      9
3      56
4      12

Store the ID foreign key in the main table instead of the odds value, and index it.

First, get the max value. This never touches the database. It uses the index:

SELECT MAX(ID) FROM Odds

Get a random value between 1 and the max.

Then select the record.

SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1

This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.

There is a whole chapter on random selection in the book SQL Antipatterns.

I've been using Rails for a few years and I've grown used to the convention of naming the primary key column id. But I've run across lots of examples in SQL books that name the primary key column something like employee_id for an employees table or feed_id for a feeds table.

One advantage of the 2nd system seems to be that you can use USING() more to produce more concise SQL queries:

select feeds.title, items.title from items inner join feeds USING(feed_id);

As opposed to

select feeds.title, items.title from items inner join feeds on feeds.id = items.feed_id;

Which naming convention is better? Which is favored by experienced database administrators?

Also, is it better to pluralize the name of the table?

Tablename_Id is my strong preference. When you do joins to Fks you know exactly what to join to what and don't make mistakes where you join to ID in table a when you meant tableb below is an example of how easy this is to do especially if you copy the on clause from somewhere else

FROM tablea a
JOIN tableb b
  ON a.ID = b.tableaid
JOIN tablec c
  ON a.ID = c.tablebid

In the case above, you really wanted to join to B.Id but forgot to change it from a when you copied. It will work and give you a resultset that isn't correct. If you use table_id instead, the query would fail the syntax check.

Another problem with using Id is when you are doing complex reports. Since the repport queries have to have fields with individual names, you can end up wasting time writing a bunch of aliases you wouldn't need if you had named the id with the tablename.

Now people who use ORMs don't write a lot of SQl but what they do write and what report writers write are generally complex, complicated statements. You need to design you database to make it easier to do those things than simple queries.

The use of ID as the name for the identifying field is considered a SQl antipattern. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1308929815&sr=1-1

Structure:

Actor <=== ActorMovie ===> Movie

ActorMovie: ActorID (fk), MovieId (fk)... ===> pk: (ActorID, MovieID)

Should do I create a surrogate key for ActorMovie table like this?

ActorMovie: ActorMovieID (pk), ActorID (fk), MovieId (fk)...

Conventions are good if they are helpful

"SQL Antipatterns", Chapter 4, "ID Required"

Intention of primary key

Primary key - is something that you can use to identify your row with it's unique address in table. That means, not only some surrogate column can be primary key. In fact, primary key should be:

  • Unique. identifier for each row. If it's compound, that means, every combination of column's values must be unique
  • Minimal. That means, it can't be reduced (i.e. if it's compound, no column could be omitted without losing uniqueness)
  • Single. No other primary key may be defined, each table can have only one primary key

Compound versus surrogate

There are cases, when surrogate key has benefits. Most common problem - if you have table with people names. Can combination of first_name + last_name + taxpayer_id be unique? In most cases - yes. But in theory, there could be cases, when duplicated will occur. So, this is the case, when surrogate key will provide unique identifying of rows in any case.

However, if we're talking about many-to-many link between tables, it's obvious, that linking table will always contain each pair once. In fact, you'll even need to check if duplicate does not exist before operating with that table (otherwise - it's redundant row, because it holds no additional information unless your design has a special intention to store that). Therefore, your combination of ActorID + MovieID satisfies all conditions for primary key, and there's no need to create surrogate key. You can do that, but that will have little sense (if not at all), because it will have no meaning rather than numbering rows. In other hand, with compound key, you'll have:

  • Unique check by design. Your rows will be unique, no duplicates for linking table will be allowed. And that has sense: because there's no need to create a link if it already exists
  • No redundant (and, thus, less comprehensive) column in design. That makes your design easier and more readable.

As a conclusion - yes, there are cases, when surrogate key should (or even must) be applied, but in your particular case it will definitely be antipattern - use compound key.

References:

Am working on a project that will save many types of objects into the database. These include: articles, polls, writers .. etc. Some of those are unknown to me at design time so what i am trying to do is build a generic table schema that will enable me to save any type of item into this database.

My design

Table Object

objectID int
title varchar
body text
type int #will represent the ID of the type of object am inserting

Having that circumstances or requirements might need additional data, i decided that this will go to what i call "object metadata" as follows:

Table Object_MetaData

metaID int
metaKey varchar
metaIntKey int #representing an integer value of metaKey which is calculated using some algorithm to speed up queries
metaValue varchar(1000)

I am doing all the work in PHP and MySQL btw.

A couple of things came up to my mind when doing this design regarding the performance of the database on the long run:

1) Is saving int, boolean, small text, big text into the metaValue effecient?

2) on the long run the metadata table is going to grow quite quickly. Is this a maintainance nightmare? how is MySQL going to to handle this?

3) IN php when fetching objects, i will have to loop each object to fetch its metadata, or load all metadata once using lazy loading or load any single metakey once its requested via __get magic method. So when fetching a list of say 50 objects, at least 50 select statements will be executed when using any of the methods i suggested. Is there a better more efficient way of doing this?

Any thoughts or comments on this design are welcome.

I'd suggest re-architecting your solution. This pattern is called "Entity-Attribute-Value" (EAV), and is commonly seen as an anti-pattern. Instead, define the metadata attributes in columns (of the same table, or another if needed). Use the proper data types as well.

I'd suggest giving Bill Karwin's book SQL AntiPatterns a read. It has some great insights that will give you a hand here specifically. You can get a feel for some of it from this presentation...

I am trying to find a proper way of getting X random records. Also I want to check usage of a specific record so I won't use the same random record as often as others.

I am testing the set with these 3 tables, one table for questions, one table for users, and one table for the served question for a specific user. I want to make this perform with around 6000 questions.

CREATE TABLE `questions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `question` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `served` (
  `user` int(11) NOT NULL DEFAULT '0',
  `question` int(11) NOT NULL DEFAULT '0',
  `count` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`user`,`question`),
  KEY `count` (`count`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `nickname` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The query I found to work very well with getting random records from the question table is as following:

SELECT  id, question
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        @lim := @lim - 1
                FROM    questions q 
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ) i

But now I would like to incorporate the served table to make sure the random values being picked from the questions that were served the least. The query I thought of was as following:

SELECT  id, question, count
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        s.count,
                        @lim := @lim - 1
                FROM    questions q
                LEFT JOIN served s
                ON s.question = q.id
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ORDER BY count ASC) i

The problem with this query is that it never gives my limit of 10 results + it never gives the records I would want. Could anyone push me in the right direction?

As requested a SQL Fiddle with some data to test with: http://sqlfiddle.com/#!2/3e5ed/5. I would expect the results to be 10 questions where the "count" of served for user 1 is the least (or not existing offcourse).

I have ended up using a modified query, it had to be quick:

SELECT q.*, s1.count AS count_a, s2.count AS count_b
FROM questions q
LEFT JOIN served s1
ON (s1.question = q.id AND s1.user = 1)
LEFT JOIN served s2
ON (s2.question = q.id AND s2.user = 2)
WHERE q.categorie = 1
ORDER BY IFNULL(s1.count, 0) + IFNULL(s2.count, 0) + RAND()
LIMIT 10

A common way that people get random records in MySQL is like this:

To get 10 random records:

SELECT * FROM questions
ORDER BY RAND()
LIMIT 10

Of course, as may be obvious, this gets all the records in the database, then sorts them randomly in order to get 10 records. It doesn't actually just choose 10 random records from the database. However, this method does easily prevent duplicates.

Now, using the same technique, if you wanted to favor less served questions, you could do something like this:

SELECT questions.* FROM questions
LEFT JOIN served
ON served.question = questions.id
ORDER BY IFNULL(served.count, 0) + RAND()
LIMIT 10

Tweak the algorithm to alter the amount you favor the serve count.

There are more performant ways to fetch random records, such as getting the maximum primary key value (assuming auto_increment) then use RAND() on that, then pick just one record. You can use LIMIT 1 just in case RAND() returns a gap in your keys. However, then you could have duplicates if you repeat this process to return more than one record.

If you have contiguous auto_increment values, you could easily leverage PHP to choose a random set of keys then fetch each record individually. If they're not contiguous, you first fetch a list of keys.

These techniques are covered in more detail in Chapter 16 Random Selection, in the book SQL Antipatterns.

I have heard a lot about anti patterns and would like to read a book on this,which book would you suggest for Anti patterns.

SCHEMA

I have the following set-up in MySQL database:

CREATE TABLE items (
  id SERIAL,
  name VARCHAR(100),
  group_id INT,
  price DECIMAL(10,2),
  KEY items_group_id_idx (group_id),
  PRIMARY KEY (id)
);

INSERT INTO items VALUES 
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1,    40),
(5, 'Item E', 2,    50),
(6, 'Item F', 2,    60),
(7, 'Item G', 2,    70);

PROBLEM

I need to select:

  • All items with group_id that has NULL value, and
  • One item from each group identified by group_id having the lowest price.

EXPECTED RESULTS

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+

POSSIBLE SOLUTION 1: Two queries with UNION ALL

SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;

/* EXPLAIN */
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type  | table      | type | possible_keys      | key                | key_len | ref   | rows | Extra                                        |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
|  1 | PRIMARY      | items      | ref  | items_group_id_idx | items_group_id_idx | 5       | const |    3 | Using where                                  | 
|  2 | UNION        | items      | ALL  | items_group_id_idx | NULL               | NULL    | NULL  |    7 | Using where; Using temporary; Using filesort | 
| NULL | UNION RESULT | <union1,2> | ALL  | NULL               | NULL               | NULL    | NULL  | NULL |                                              | 
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+

However it is undesirable to have two queries since there will be more complex condition in WHERE clause and I would need to sort the final results.

POSSIBLE SOLUTION 2: GROUP BY on expression (reference)

SELECT id, name, group_id, MIN(price) FROM items
GROUP BY CASE WHEN group_id IS NOT NULL THEN group_id ELSE RAND() END;

/* EXPLAIN */
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | items | ALL  | NULL          | NULL | NULL    | NULL |    7 | Using temporary; Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

Solution 2 seems to be faster and simple to use but I'm wondering whether there is a better approach in terms of performance.

UPDATE:

According to documentation referenced by @axiac, this query is illegal in SQL92 and earlier and may work in MySQL only.

According to this answer by @axiac, better solution in terms of compatibility and performance is shown below.

It is also explained in SQL Antipatterns book, Chapter 15: Ambiguous Groups.

To improve performance, combined index is also added for (group_id, price, id).

SOLUTION

SELECT a.id, a.name, a.group_id, a.price
FROM items a
LEFT JOIN items b 
ON a.group_id = b.group_id 
AND (a.price > b.price OR (a.price = b.price and a.id > b.id))
WHERE b.price is NULL;

See explanation on how it works for more details.

By accident as a side-effect this query works in my case where I needed to include ALL records with group_id equals to NULL AND one item from each group with the lowest price.

RESULT

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+

EXPLAIN

+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys                 | key                | key_len | ref                        | rows | Extra                    |
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
|  1 | SIMPLE      | a     | ALL  | NULL                          | NULL               | NULL    | NULL                       |    7 |                          | 
|  1 | SIMPLE      | b     | ref  | PRIMARY,id,items_group_id_idx | items_group_id_idx | 5       | agi_development.a.group_id |    1 | Using where; Using index | 
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+

All popular SQL databases, that I am aware of, implement foreign keys efficiently by indexing them.

Assuming a N:1 relationship Student -> School, the school id is stored in the student table with a (sometimes optional) index. For a given student you can find their school just looking up the school id in the row, and for a given school you can find its students by looking up the school id in the index over the foreign key in Students. Relational databases 101.

But is that the only sensible implementation? Imagine you are the database implementer, and instead of using a btree index on the foreign key column, you add an (invisible to the user) set on the row at the other (many) end of the relation. So instead of indexing the school id column in students, you had an invisible column that was a set of student ids on the school row itself. Then fetching the students for a given school is a simple as iterating the set. Is there a reason this implementation is uncommon? Are there some queries that can't be supported efficiently this way? The two approaches seem more or less equivalent, modulo particular implementation details. It seems to me you could emulate either solution with the other.

In my opinion it's conceptually the same as splitting of the btree, which contains sorted runs of (school_id, student_row_id), and storing each run on the school row itself. Looking up a school id in the school primary key gives you the run of student ids, the same as looking up a school id in the foreign key index would have.

edited for clarity

You seem to be suggesting storing "comma separated list of values" as a string in a character column of a table. And you say that it's "as simple as iterating the set".

But in a relational database, it turns out that "iterating the set" when its stored as list of values in a column is not at all simple. Nor is it efficient. Nor does it conform to the relational model.

Consider the operations required when a member needs to be added to a set, or removed from the set, or even just determining whether a member is in a set. Consider the operations that would be required to enforce integrity, to verify that every member in that "comma separated list" is valid. The relational database engine is not going to help us out with that, we'll have to code all of that ourselves.

At first blush, this idea may seem like a good approach. And it's entirely possible to do, and to get some code working. But once we move beyond the trivial demonstration, into the realm of real problems and real world data volumes, it turns out to be a really, really bad idea.

The storing comma separated lists is all-too-familiar SQL anti-pattern.

I strongly recommend Chapter 2 of Bill Karwin's excellent book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming ISBN-13: 978-1934356555


(The discussion here relates to "relational database" and how it is designed to operate, following the relational model, the theory developed by Ted Codd and Chris Date.)

"All nonkey columns are dependent on the key, the whole key, and nothing but the key. So help me Codd."


Q: Is there a reason this implementation is uncommon?

Yes, it's uncommon because it flies in the face of relational theory. And it makes what would be a straightforward problem (for the relational model) into a confusing jumble that the relational database can't help us with. If what we're storing is just a string of characters, and the database never needs to do anything with that, other than store the string and retrieve the string, we'd be good. But we can't ask the database to decipher that as representing relationships between entities.

Q: Are there some queries that can't be supported efficiently this way?

Any query that would need to turn that "list of values" into a set of rows to be returned would be inefficient. Any query that would need to identify a "list of values" containing a particular value would be inefficient. And operations to insert or remove a value from the "list of values" would be inefficient.

Here is the simplified table:

id - company_id - report_year - code

1  - 123456     - 2013        - ASD  
2  - 123456     - 2013        - SDF  
3  - 123456     - 2012        - ASD  
4  - 123456     - 2012        - SDF 

I would like to get all codes for the highest report_year available for the specified company_id.

So I should get:

1 - 123456 - 2013 - ASD  
2 - 123456 - 2013 - SDF

But I can not hard code WHERE year = 2013, because for some company latest report year may be 2012 or 2009 for example. So I need to get data based on the latest year available.

So far I have query like this:

SELECT id, company_id, report_year, code,
FROM `my_table`
WHERE company_id= 123456

I have tried with some mixtures of group by and max() but I couldn't get what I need, this is the first time I am facing such a request, its confusing.

Any ideas ? I am using mysql.

Often, an anti-join yields better performance than using subqueries:

SELECT t1.id, t1.company_id, t1.report_year, t1.code
FROM `my_table` t1
LEFT JOIN `my_table` t2
ON t2.company_id = t1.company_id AND t2.report_year > t1.report_year
WHERE t1.company_id = 123456 AND t2.report_year IS NULL

For best performance, ensure you have a multi-column index on (company_id, report_year).

You can read more about this technique in the book SQL Antipatterns, which is where I learned it.

I am a beginner but know the basics and am peering into more advanced data mining and stored procedure routines. I have learned small concepts that mimic C# Design Patterns such as looping structures but have not seen much (on the web) about SQL Design Patterns.

I ask because I ran across this book http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 but have also always been told that you will find better information on sites like Stack than in a book.

I've been told that for programming professionals Design Patterns are a must. Is this also the case for SQL programmers?

*(Wasn't sure if this belonged on Meta or not. It's not a question about the site but is a general discussion question)

Design patterns for SQL are very useful as well. Is it any good to gather data if you do not store it properly and retrieve it back to make useful decisions?

I have found Joe Celco's books very useful on database design patterns. http://www.amazon.com/Joe-Celko/e/B000ARBFVQ

Of course you should read books. In the first place they tend to be written by people recognized for their expertise (not just anyone who happens to log into the site) so the information is likelier to start out as more accurate. Then they have editors who help make the presentation of the information better. Finally a good book should discuss in more depth than an SO post can do and thus you will get information at a deeper level than just the code to fix your current problem. This means your understanding will grow and you will know why you would do X vice Y and be able to expand that understanding to new problems.

DBAs who don't read books probably don't understand the internals of how the database works very well and are likely to be less effective than those who do read in depth about their profession.

A book I recommend is:

http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=pd_sim_b_1

Also books on performance tuning are critical to read if you want to use good patterns in your SQL. A huge number of performance problems are caused by badly designed SQL. YOu should know what works well and what doesn't. Those tend to be databse backend specific, so look for ones realting to the type of databases you support.

I am aware of the MySQL function IN(), which tries to match a single column value with an array of given values ( in this case we're using a PHP codebase and $ids = [1,2,3,4]; ):

SELECT * FROM galleries WHERE id IN ($ids)

If my column 'id' has multiple values in it, with each value being separated by a comma, e.g. 1,3,4,6,14 ... how can I write a SQL statement to compare all the values in my $ids array with all the values in this column?

In order to get this to work, you are going to need to split your array $ids into individual values.

If the match criteria is that every value in your array appears in the column, you'll need to check each value. You may be able to make use the MySQL FIND_IN_SET function.

Reference: https://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_find-in-set

For example, let's assume that your array $ids contains four elements: '2','3','5','7'.

To check whether each of those values is contained within another string, you'd need SQL that does something like this:

WHERE FIND_IN_SET( '2', '1,2,3,4,5,6,7,8')
  AND FIND_IN_SET( '3', '1,2,3,4,5,6,7,8')
  AND FIND_IN_SET( '5', '1,2,3,4,5,6,7,8')
  AND FIND_IN_SET( '7', '1,2,3,4,5,6,7,8')

I used a literal string as the second argument in the FIND_IN_SET there to show what was going on, that literal represents the "comma separated list of values" stored your column. Obviously, your query would replace that literal with a reference to your column:

WHERE FIND_IN_SET( '2', t.mycol)
  AND FIND_IN_SET( '3', t.mycol)
  AND FIND_IN_SET( '5', t.mycol)
  AND FIND_IN_SET( '7', t.mycol)

You could also use a LIKE function, if you add leading and trailing commas to the column, and the value

WHERE CONCAT(',','1,2,3,4,5,6,7,8',',') LIKE CONCAT('%,','2',',%')
  AND CONCAT(',','1,2,3,4,5,6,7,8',',') LIKE CONCAT('%,','3',',%')
  AND CONCAT(',','1,2,3,4,5,6,7,8',',') LIKE CONCAT('%,','5',',%')
  AND CONCAT(',','1,2,3,4,5,6,7,8',',') LIKE CONCAT('%,','7',',%')

Again, the literal of the "list of comma separated values" represents your column, the individual values '2', '3', '5', '7' represent the values from the elements of your array $ids.

(If a "match" requires finding just one of the values in the list, and not all values, then replace the AND with OR.)

If you are looking for an "exact match" of the elements; that is, there shouldn't be any values in the "comma separated list of values" that are not also a value in the array, you'd need to combine the elements from the array into a comma separated list, so the query would be something like this:

 WHERE '2,3,5,7' = '1,2,3,4,5,6,7,8'

If the order of the values stored in the column isn't canonical, then you'd need to check all the possible permutations...

 WHERE '2,3,5,7' = '1,2,3,4,5,6,7,8'
    OR '2,5,3,7' = '1,2,3,4,5,6,7,8'
    OR '2,5,7,3' = '1,2,3,4,5,6,7,8'
 ... 

For a lot of values, I'd probably go with making both "lists of values" canonical (ordering the values in each list.)


If this looks ugly, it's because it is ugly. And it's going to be horrible performance on large sets. SQL isn't designed to handle manipulating a "set" as a comma separated list. It's designed to handle a "set" of values as a set of rows.

Bill Karwin has an excellent book available. Chapter 2 covers issues with the "comma separated list" pattern.

http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

I have a query where it takes my total sales and subtracts by my canceled sales, however if there are no canceled sales the value is null. then when the

so basically if my canceledsales where null it returns a null value for the subtraction column instead of the totalsales

totalsales-canceledsales(null) = null
1000-null=null

i want it to be like this

1000-null=1000

The issue is that any function against NULL is NULL. In this case, NULL means unknown.

If you subtract a number from an unknown number, the result is unknown.

If you subtract an unknown number from a known number, the result is unknown.

Both numbers must be known in order to return a known answer. In this case if one of the operands is NULL, NULL is the correct answer.

However, if you'd rather see 0 than NULL, then use ISNULL():

totalsales - isnull(canceledsales, 0)

The book SQL AntiPatterns has a whole chapter better explaining NULL.

I have the following data structure already in the system.

ItemDetails:

ID Name
--------
1  XXX
2  YYY
3  ZZZ
4  TTT
5  UUU
6  WWW

And the hierarchies are in separate table (with many to many relationships)

ItemHierarchy:

ParentCode ChildCode
--------------------
1            2
1            3
3            4
4            5
5            3
5            6

As you can see that 3 is child node for 1 and 3. I want to traverse records say for example that from the node 3.

I need to write a stored procedure and get all the ancestors of 3 and all the child nodes of 3.

Could you please let me know whether any possibilities to pull the data? If so, which data structure is OK for it.

Please note that my table is containing 1 million records and out of it 40% are having multiple hierarchies.

I did 'CTE' with level and incrementing it based upon the hierarchy but I'm getting max recursive error when we traverse from root to leaf level node. I have tried 'HierarchyID' but unable to get all the details when its having multiple parent for a node.

Update: I can set a recursion limit to max and run the query. Since it has millions of rows, I'm unable to get the output at all.

I want to create a data structure such that its capable to giving information from top to bottom or bottom to top (at any node level).

Could someone kindly please help me with that?

Using RDBMS for hierarchical data structure is not recommended, its why graph database have been created.

BTW following Closure Table pattern will help you.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.

The key point to use the pattern is how you must fill ItemHierarchy table.
Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself.
Think we have a simple graph like bellow: enter image description here
The doted arrows shows the rows in ItemHierarchy table: enter image description here
To retrieve descendants of #3:

SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ChildCode
WHERE IH.ParentCode = 3;

To retrieve ancestors of #3:

SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ParentCode 
WHERE IH.ChildCode = 3;

To insert a new leaf node, for instance a new child of #5, first insert the self-referencing row. Then add a copy of the set of rows in TreePaths that reference comment #5 as a descendant (including the row in which #5 references itself), replacing the descendant with the number of the new item: INSERT INTO ItemHierarchy (parentCode, childCode)

SELECT IH.parentCode, 8
FROM ItemHierarchy AS IH
WHERE IH.childCode = 5
UNION ALL
SELECT 8, 8;

To delete a complete sub-tree, for instance #4 and its descendants, delete all rows in ItemHierarchy that reference #4 as a descendant, as well as all rows that reference any of #4’s descendants as descendants:

DELETE FROM ItemHierarchy 
WHERE chidCode IN (SELECT childCode
FROM ItemHierarchy 
WHERE parrentCode = 4);



UPDATE
Since the sample data you have shown us leads to recursive loops(not hierarchies) like:

1 -> 3 -> 4 -> 5 -> 3 -> 4 -> 5

Following Path Enumeration pattern will help you.
A UNIX path like /usr/local/lib/ is a path enumeration of the file system, where usr is the parent of local, which in turn is the parent of lib.
You can create a Table or View from ItemHierarchy table, calling it EnumPath:
Table EnumPath(NodeCode, Path)

For the sample data we will have:
enter image description here
To find ancestors of node #4:

select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E2.path like E1.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;

To find descendants of node #4:

select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E1.path like E2.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;

Sqlfiddle demo

Table 1

ID
1
2
3

Table 2

ID    date    opt
1    1/1/10    1
1    1/2/10    0
2    1/1/10    1

I Want

ID    date    opt
1    1/2/10    0
2    1/1/10    1

How do I join these 2 tables? Just match all the ID's in table 1 with their most recent opt in table 2. Without partitions, please. I'm in sql 2005. Thanks.

Here's the solution without subqueries:

SELECT t1.ID, t2.date, t2.opt
FROM Table1 t1
LEFT JOIN Table2 t2
  ON t2.ID = t1.ID
LEFT JOIN Table2 t3
  ON t3.ID = t1.ID AND t3.date > t2.date
WHERE t3.date IS NULL

You could get duplicates if you have two entries in table 2 with the same "latest" date and ID. However, you could add additional conditions to handle two entries with the same "latest" date. Also, you will get NULL values for date and opt if there is no corresponding record in table 2.

There is a whole chapter about this type of solution in the book SQL Antipatterns.

When developing stored procedure, the code is typically far from clean. It is often impractical to break down the stored procedure code into single responsibility (CTE's, long select lists, breaking down adds even more functions/procedures and complexity). PL/SQL provide packages, but there is not an equivalent in SQL Server and I minimal experience with those to weigh in on how well they help with the Clean Code problem.

One of my projects relies heavily on stored procedure, with a great many of these having more than 500 lines of code.

After looking at a presentation on Clean Code, I was wondering if there is any advice on applying those concepts to the database.

Succinctly, what are some means (tools, methods) used to make database code more readable and maintainable given its tendency to demand large masses of code and when the project is vested in a database heavy model?

Personally I would not worry about Clean Code in database terms. What you need to worry about is data integrity and performance and security.

I believe as you get more familiar with the database model, it will become easier to understand those long procs. Some things I do though to help troubleshoot later are: If the proc has dynamic sql always write it with the debug variable mode that you can use to grab the SQL code that is built. This will save hours of troubleshooting time.

If the proc does a lot of work make sure to put it in transactions with Try Catch blocks so that everything gets rolled back if there is a problem. You can also take advantage of the fact that rollbacks don't affect table variables and store debugging and error info you will need in an exception table after the rollback. That will make it much easier on prod to see what happens when it fails.

Try to be consistent in your spacing and structure. There are various tools you can use to format. And consistent aliasing helps too. Alias every field referenced because 6 months later you may not remember which one it came from easily. Personally I find it much easier to understand long procs if all the inner joins are before the left joins. This lets me easily see what tables must have data and if the results are wrong it is often one of these that is to blame because it should be a left join.

Be consistent in your parameter data types. Make them match the database datatype of the field you will reference with that parameter.

Store everything in source control and ruthlessly delete all commented out code. If a parameter is not being used, get rid of it. If something is hardcoded, you may want to convert it to a parameter particularly if it is being used more than once.

If you have some business logic which might not be easy to figure out later, put in a comment explaining it.

Try to build new procs in sections, so that you can easily test the results as you go. So for instance suppose you are doing a financial report of all returned orders. You might first write a cte or temp table or table variable to get all the returns. Then you might join to that to get the financial records for just those returns and aggregate them. Then finally you would join all that to other details that you need in the report. So you can check by running just the first part and seeing that there are 2098 returns in the period. After you have aggregated the financial information, you should have one record for each of those 2098 returns. Then at then end you can see of you still have all the needed records. This helps to make sure you didn't lose any along the way by joining unless the data is such that you should have and you understand why you did. You can also see if you are getting extra records from joins to one to many tables which may or may not be appropriate (which is why understanding the data model is crucial).

If you are going to write complex sql, you need to read some books on performance tuning (learning which constructs like correlated subqueries and cursors and what to use instead) to avoid using and how to make sure your where clauses are sargable. There are several books available for SQL Server performance tuning. You also need to read the book on SQL Antipatterns which will be of much more help than the clean code books: http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?ie=UTF8&qid=1406570810&sr=8-1&keywords=sql+antipatterns

Codeigniter when i submit more than one option of form_multiselect(), Only just the last one that saved on database.

in my view :

<label>Trimestres :</label>
    <div class="controls" >

      <?php $options = array(
            'trim1'  => ' Premier trimestre (Janv,Fév,Mars)',
            'trim2'    => ' Deuxiéme trimestre (Avril,Mai,Juin)',
            'trim3'   => ' Troisiéme trimestre (Juill,Aout,Sept)',
            'trim4' => ' Quatriéme trimestre (Oct,Nov,Déc)',
             );
     echo form_multiselect('trimestres', $options , $this->input->post('trimestres') ? $this->input->post('trimestres') : $participant_sport->trimestres, 'id="trim"'); ?>

     </div>
</div>

in my controller :

public function inscriresport ($id = NULL)
{

// Fetch a participant or set a new one
if ($id) {
$this->data['participant_sport'] = $this->participantsport_m->get($id);
count($this->data['participant_sport']) || $this->data['errors'][] = 'participant non trouvé';
}
else {
$this->data['participant_sport'] = $this->participantsport_m->get_new();
    }


    // Process the form

$this->participantsport_m->array_from_post(array('matricule', 'nom', 'prenom', 'beneficiaire', 'sexe', 'telephone', 'date_naissance', 'date_inscription_sport', 'trimestres' ,'sport_montant_paye', 'sport_debut_periode', 'sport_fin_periode'));
$this->participantsport_m->save($data, $id);
redirect('admin/agent/profile/3608');
    }

    // Load the view
    $this->data['subview'] = 'admin/agent/inscriresport';
    $this->load->view('admin/_layout_main', $this->data);
}

The function array_from_post() is defined on application\core\MY_Model.php :

public function array_from_post($fields){
    $data = array();
    foreach ($fields as $field) {
        $data[$field] = $this->input->post($field);
    }
    return $data;
}

in my model :

public function get_new()

{

$participant_sport = new stdClass();

$participant_sport->matricule = '';
$participant_sport->nom = '';
$participant_sport->prenom = '';
$participant_sport->beneficiaire = '';
$participant_sport->sexe = '';
$participant_sport->telephone = '';
$participant_sport->date_naissance = '';
$participant_sport->date_inscription_sport = '';
$participant_sport->trimestres = '';
$participant_sport->sport_montant_paye = '';
$participant_sport->sport_debut_periode = '';
$participant_sport->sport_fin_periode = '';
  return $participant_sport;

}

Any help Please? i think that must be an array but i don't know how to do it?

i thing that i must do something like that :

foreach($_POST["strategylist[]"] as $s) {
    # do the insert here, but use $s instead of $_POST["strategylist[]"]
    $result=mysql_query("INSERT INTO sslink (study_id, strategyname) " .
       "VALUES ('$id','" . join(",",$s) . "')")
        or die("Insert Error: ".mysql_error());
}

to insert more than one option selected in one row but i don't know how to do it in codeigniter

the get() function :

public function get($id = NULL, $single = FALSE){

    if ($id != NULL) {
        $filter = $this->_primary_filter;
        $id = $filter($id);
        $this->db->where($this->_primary_key, $id);
        $method = 'row';
    }
    elseif($single == TRUE) {
        $method = 'row';
    }
    else {
        $method = 'result';
    }

    if (!count($this->db->ar_orderby)) {
        $this->db->order_by($this->_order_by);
    }
    return $this->db->get($this->_table_name)->$method();
}

If select name (in HTML tag) is trimestres it will always remember last selection. Use trimestres[] as a name to get array with all selected values`

<select name="trimestres[]" multiple …

By the way: I don't know how array_from_post() works but it has to change trimestres[] values to one string to save all of them in one column. It is hard to search/add/delete one value if all values are in one string. It is "SQL Antipattern". You could do another table in database for trimestres - one value in one row.

Edit:

It will change all arrays into string with elements connected by ,. Not tested.

public function array_from_post($fields){
    $data = array();
    foreach ($fields as $field) {

        // print_r($this->input->post($field));

        if( is_array( $this->input->post($field) ) ) {
            $data[$field] = join(",", $this->input->post($field));
        } else {
            $data[$field] = $this->input->post($field);
        }

        // print_r($data[$field]);

    }
    return $data;
}

Edit:

Not tested.

public function inscriresport ($id = NULL)
{

    // Fetch a participant or set a new one
    if ($id) {
        $this->data['participant_sport'] = $this->participantsport_m->get($id);
        count($this->data['participant_sport']) || $this->data['errors'][] = 'participant non trouvé';

        // explode to array
        // print_r($this->data['participant_sport']->trimestres); // test before explode

        // $this->data['participant_sport']['trimestres'] = explode(",", $this->data['participant_sport']['trimestres']);
        $this->data['participant_sport']->trimestres = explode(",", $this->data['participant_sport']->trimestres);

        // print_r($this->data['participant_sport']->trimestres); // test after explode

    } else {
        $this->data['participant_sport'] = $this->participantsport_m->get_new();
    }


    // rest of code 

}

I am currently trying to get values from tables I have in a webpage into a database using PHP. However, the order of the values inside each box is important, so I want the first box to be ranked 1 and the second box to be ranked 2 and so on... There is no limit to the number of items in each box. There may be as many as 100 or 0. Each item in each box is dragged from a bank of items into the table, which represents a topic. The tables are the output of the interface.

So example table:

  1. Rank1
  2. Rank2
  3. Rank3

I've currently tried dumping the entire page once the user fills it in into a text file and parsing it from there but i'm looking for a more functional and practical way of doing it.

If I understand your goal correctly, you have two sections on your webpage. A word bank and a <table>. A user can drag and drop items from the word bank to a cell in your <table>. Next you want the order of the contents of the cell to be preserved when the data is saved to a database-table.

Given the description you have provided, it seems you may be new to using databases. It is not a problem but you will need to do some study on how databases work and how to use them.

Again, if I understand correctly, your <table> looks like the following:

| topic-1 | topic-2 | topic-3 |
-------------------------------
| ---A--- | ---J--- | ---V--- |
| ---B--- | ---J--- | ---X--- |
| ---C--- | NULL | ---Y--- |
| NULL- | NULL | ---Z--- |

To move this into a database you need to create a database, and some table, for example MyDataTable. Next that table may have three columns: id, contents, category.

Here is some info on Mysql Insert.

INSERT INTO MyDataTable(`id`, `contents`, `category`)
VALUES (1, A, Topic-1)

You will need to format that call to work correctly via php, but that is the general idea. In your php code you will need to iterate over the elements in the order you want, and you can use the id to refer to your order... Alternatively you could add another column to your database-table and have that refer to your order, which would allow you to use some other value, perhaps auto-increment, as the primary key.

I would recommend becoming comfortable with MySQL if you aren't already, before tackling your project's specific needs. Namely, INSERT, DELETE, UPDATE, SELECT and WHERE will almost certainly be necessary MySQL functions.

One resource I found very helpful in grasping MySQL is this book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming .

Using the phpMyAdmin page can be very helpful in debugging SQL queries, once the query syntax is correct, you can translate it to php and incorporate your dynamic variables. Also if you store your sql-syntax in php as $sql, using var_dump($sql) can also be a great help when debugging sql syntax.

In programming, iterate over, typically refers to creating a for-loop to iterate, or step through 1-by-1, each of the items of an array or similar container.

$MyArray = array();
$MyArray = GetTableContents(); //You have to write this function.
for ($idx = 0; $idx < count($MyArray); $idx++)
{
 // do my stuff, like, Insert into my DB-table.
}

In order for the code above to work, you need to fill MyArray with the contents from your <table>. In the example code above, the $idx, could be used as your id.

In addition to MySQL, it seems it would also be worth your time to learn more about php programming. Read simple tutorials, how-to's, and books. There are numerous resources on the internet about php. One such example is here.

I hope this is helpful.

I have a database which has three tables

Messages - PK = MessageId
Drafts - PK = DraftId
History - FK = RelatedItemId

The History table has a single foreign Key [RelatedItemId] which maps to one of the two Primary keys in Messages and Drafts.

Is there a name for this relationship?

Is it just bad design?

Is there a better way to design this relationship?

Here are the CREATE TABLE statements for this question:

 CREATE TABLE [dbo].[History](
    [HistoryId] [uniqueidentifier] NOT NULL,
    [RelatedItemId] [uniqueidentifier] NULL,
    CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED ( [HistoryId] ASC )
 )

CREATE TABLE [dbo].[Messages](
    [MessageId] [uniqueidentifier] NOT NULL,
    CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED (    [MessageId] ASC )
 )


CREATE TABLE [dbo].[Drafts](
    [DraftId] [uniqueidentifier] NOT NULL,
    CONSTRAINT [PK_Drafts] PRIMARY KEY CLUSTERED (  [DraftId] ASC )
)

In a short description the solution you have used is called:
Polymorphic Association
Objective: Reference Multiple Parents
Resulting anti-pattern: Use dual-purpose foreign key, violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship

More information about the problem.

BTW createing a common super-table will help you:

enter image description here

I have a table , it has a column called id_numbers with comma separated numbers

ID | id_numbers 
_______________

1  | 1,2,3,4,5
2  | 2,3,4,5,6
3  | 3,4,5,6,7

I want to get unique id_numbers from every row of table. In my case it should be

1,2,3,4,5,6,7

I stated with

SELECT GROUP_CONCAT(id_numbers SEPARATOR ',') FROM my_table

but I am getting duplicate vales here, I searched but I found articles saying that I cannot do this only using MySQL. Is it possible to do this in MySQL? do I have to loop this with PHP using explode() and array_unique()

Please help, I need to do it only using MySQL. Thanks in Advance.

To generate a distinct list of "numbers" from all of your strings:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.id_numbers,',',d.i+1),',',-1) AS n
  FROM mytable t
  JOIN ( SELECT 0 AS i
         UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
         UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
         UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
       ) d
    ON d.i <= ( CHAR_LENGTH(t.id_numbers) 
              - CHAR_LENGTH(REPLACE(t.id_numbers,',','')) )
 GROUP BY n
 HAVING n <> ''

To put that back into a comma separated list (ARRGGHH! WHY do we want to that?), we can use the query above as an inline view...

SELECT GROUP_CONCAT(q.n ORDER BY q.n+0) AS id_numbers
  FROM ( SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.id_numbers,',',d.i+1),',',-1) AS n
           FROM mytable t
           JOIN ( SELECT 0 AS i 
                  UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
                  UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
                  UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
                 ) d
              ON d.i <= ( CHAR_LENGTH(t.id_numbers) 
                        - CHAR_LENGTH(REPLACE(t.id_numbers,',','')) )
           GROUP BY n
           HAVING n <> ''
       ) q

This is essentially the same answer that was given to the question that Mark identified as a possible duplicate.

Rather than explain how all of that that works, I'm going to suggest that you AVOID storing comma separated lists of a values as a string.

I recommend you take a look at Chapter 2 of Bill Karwin's excellent book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming".

http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

Those articles you saw that said this can't be done in MySQL, those articles are essentially right... you can't do this in SQL because you do not want to do this in SQL. Doing it does violence to the relational database model.

I have been searching around for a bit about how I could select the rows inside my database with an array that matches some values with another array in another table.

I have a table called "users". Inside the table a column named "user_interests" contains an array with interests. An example could be "food, fashion, music" as a value.

In another table called "posts" contains a column named "post_keywords" with the value of another array like inside the "users" table.

Can anyone help me with find the right way of selecting the posts that matches with the users interests? I understand so far that I have to do this in the MySQL query.

Thanks in advance.

Before we answer the question that was asked, we should address a more important question: whether we should really be storing values in comma separated lists. That's a SQL antipattern.


<!-- That particular SQL Antipattern is explained pretty well in Chapter 2 of Bill Karwin's excellent tome: SQL Antipatterns: Avoiding the Pitfalls of Database Programming available from many fine booksellers, including Amazon

https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

-->

The chapter also addresses a more appropriate SQL Pattern, storing values as separate rows, each value on its own row. That's how SQL was designed to operate... on rows.


After considering the warnings, if you still have a need to search for values in a comma separated lists of values in a string...

MySQL provides a FIND_IN_SET function, which will return the position of a value within a list.

If a matching value is found, the function will return a positive integer.

If a matching value is not found, the function will return 0. If either of the arguments is NULL, the function will return NULL.

As a simple demonstration of how the function works:

  SELECT FIND_IN_SET('bowling' ,'food,fashion,music')    -- > 0
       , FIND_IN_SET('food'    ,'food,fashion,music')    -- > 1
       , FIND_IN_SET('fashion' ,'food,fashion,music')    -- > 2
       , FIND_IN_SET('music'   ,'food,fashion,music')    -- > 3

To make use of that in a query, MySQL allows us to use an expression in a boolean context...

  SELECT t.id
    FROM my_table t
   WHERE FIND_IN_SET('food'  ,t.my_comma_separated_list_col)
     AND FIND_IN_SET('music' ,t.my_comma_separated_list_col)

This would return rows from my_table that had values in my_comma_separated_list_col such as:

'food,music'
'bowling,dancing,food,music'
'music,food'

If the specified search value (e.g. 'food') is found in the list, FIND_IN_SET returns a positive integer, which evaluates to TRUE in a boolean context.

If the specified value is not found in the list, FIND_IN_SET returns a 0, which evaluates to FALSE.

A return of NULL is just a NULL, neither TRUE nor FALSE.


NOTES:

A NULL value will cause FIND_IN_SET to return NULL.

SELECT FIND_IN_SET(NULL  ,NULL)                        -- > NULL
     , FIND_IN_SET(NULL  ,'food,fashion,music')        -- > NULL 
     , FIND_IN_SET('food',NULL)                        -- > NULL

Leading and trailing spaces in the values in the list, or the value being searched for will not be found. Unless both the search value and the list value are an exact match including leading and trailing spaces. (So, just don't use any spaces between the values in the list):

Leading and/or trailing spaces in the list of values results in "not found"

SELECT FIND_IN_SET('bowling'  ,' food, fashion, music') -- > 0
     , FIND_IN_SET('food'     ,' food, fashion, music') -- > 0 !
     , FIND_IN_SET('fashion'  ,' food, fashion, music') -- > 0 !
     , FIND_IN_SET('music'    ,' food, fashion, music') -- > 0 !
;    
SELECT FIND_IN_SET('bowling'  ,'food ,fashion ,music ') -- > 0
     , FIND_IN_SET('food'     ,'food ,fashion ,music ') -- > 0 !
     , FIND_IN_SET('fashion'  ,'food ,fashion ,music ') -- > 0 !
     , FIND_IN_SET('music'    ,'food ,fashion ,music ') -- > 0 !
;

leading or trailing spaces in values cause "not found"

SELECT FIND_IN_SET('bowling'  ,'food,fashion,music')    -- > 0
     , FIND_IN_SET('food '    ,'food,fashion,music')    -- > 0 !
     , FIND_IN_SET(' fashion' ,'food,fashion,music')    -- > 0 !
     , FIND_IN_SET(' music '  ,'food,fashion,music')    -- > 0 !
;

I have the query:

SELECT * FROM
(
 SELECT
 a.*, 
 (CASE  concat(question_id, type)
 WHEN @curType 
 THEN @curRow := @curRow + 1
 ELSE @curROw := 1 AND @curType := concat(question_id, type) END) + 0 AS rank
 FROM ul_attempt_responses a
 INNER JOIN us_attempts b ON a.attempt_id = b.id
 WHERE b.user_id = 1  and response IS NOT NULL AND trim(response) != ''  and b.authenticated = 1 ORDER BY question_id DESC, type DESC, id DESC) aa
WHERE rank = 1
ORDER BY question_id ASC, type asc;

The above query should be give me the latest response, on the attempt_id and also just give me one, however, it is giving me 2 or more responses.

SQLFiddle: http://sqlfiddle.com/#!2/2cd26

Please help.

I need a query that will Get the last response of user for all attempts. If in attempt has a question answered, and the next attempt has no answer, get the response of the former attempt.

You can accomplish this much more efficiently with an anti-join:

SELECT ar.*
FROM us_users u
JOIN us_attempts a1
  ON a1.user_id = u.id
LEFT JOIN us_attempts a2
  ON a2.user_id = u.id
  AND a2.id < a1.id
JOIN ul_attempt_responses ar
  ON ar.attempt_id = a1.id
WHERE u.id = 1 AND a2.id IS NULL
ORDER BY question_id, type

If you want to learn more about this technique, check out the book SQL Antipatterns.

In a MySQL table i have a field, containing this value for a given record : "1908,2315,2316"

Here is my sql Query :

SELECT * FROM mytable WHERE 2316 IN (myfield) 

I got 0 results!

I tried this :

SELECT * FROM mytable WHERE 2315 IN (myfield) 

Still 0 results

And then i tried this :

SELECT * FROM mytable WHERE 1908 IN (myfield) 

Surprisingly i obtained the record when searching with 1908! What should i do to also obtain the record when searching with 2315 and 2316 ? What am i missing ?

Thanks

This question has been asked and answered before, but I don't want to hunt for it; this question should be closed as a duplicate. But, to answer your question:

The commas in the string, the column value, are just characters. Those are part of the string. They aren't seen as "separators" between values in the SQL text. The way SQL sees it, the column contains a single value, not a "list" of values.

So, in your query, the IN (field) is equivalent to an equals comparison. It's equivalent to comparing to a string. For example:

... WHERE 2316 = '1908,2315,2316'

And those aren't equal, so the row isn't returned. The "surprisingly" finding of a match, in the case of:

... WHERE 1908 IN ('1908,2315,2316')

that's explained because that string is being evaluated in a numeric context. That is, the comparison returns true, because all of these also true:

... WHERE 1908 = '1908,2315,2316' + 0
... WHERE 1908 = '1908xyz' + 0
... WHERE 1908 = '1907qrs' + 1

(When evaluated in a numeric context, a string gets converted to numeric. It just happens that the string evaluates to a numeric value that equals the integer value it's being comparing to.)


You may be able to make use of the MySQL FIND_IN_SET function. For example:

... WHERE FIND_IN_SET(2316,'1908,2315,2316')

But, please seriously reconsider the design of storing comma separated list. I recommend Bill Karwin's "SQL Antipatterns" book...

http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

I need to select all records from assets_tbl (A). Included with these records I need to have any lightboxes (lightbox_name) that are linked to the asset_id from assets_tbl where author = "scott@co.com" OR authorized_viewers includes "scott@co.com"

I think this is close to what I need but it returns multiple rows of the same record:

SELECT
  A.*,
  C.lightbox_name,
  C.author,
  C.authorized_viewers
FROM
  media_tbl A
  LEFT JOIN lightbox_assets_tbl B ON A.asset_id = B.asset_id
  LEFT JOIN lightboxes_tbl C 
    ON B.lightbox_id = C.id
    AND C.author = "scott@co.com"
  LEFT JOIN lightboxes_tbl D ON D.authorized_viewers LIKE "scott@co.com"
ORDER BY A.id DESC

Here are the tables:

lightboxes_tbl
+-----+----------------+---------------+---------------------+
|id   |lightbox_name   |author         |authoried_viewers    |
+-----+----------------+---------------+---------------------+
|100  | aircraft-types |scott@co.com   |jon@co.com,aj@co.com |
|101  | maintenance    |nicole@co.com  |jon@co.com           |
|102  | ramp           |nicole@co.com  |scott@co.com         |
+-----+----------------+---------------+---------------------+



lightbox_assets_tbl
+-----+-------------+-------------+---------------+----------+
|id   |lightbox_id  |asset_name   |asset_path     | asset_id |
+-----+-------------+-------------+---------------+----------+
|1    |100          |a321.jpg     |project1/imgs/ | 3700     |
|2    |100          |b757.jpg     |project1/imgs/ | 3444     |
|3    |101          |FlyBy.swf    |project4/imgs/ | 1444     |
|4    |102          |Door_757.swf |project5/imgs/ | 3701     |
+-----+-------------+-------------+---------------+----------+


assets_tbl
+-----+---------------------+-------------------------------------+
|asset_id   |asset_name           | asset_location                |
+-----------+---------------------+-------------------------------+
|3700       |a321.jpg             |Libraries\Library_Media\Images |
|200        |757_Taxi.swf         |Libraries\Library_Media\Images |
|3444       |b757.jpg             |Libraries\Library_Media\Images |
|1444       |FlyBy.swf            |Libraries\Library_Media\Images |
|3701       |Door_757.swf         |Libraries\Library_Media\Images |
+----------+---------------------+--------------------------------+

Here are the expected RESULTS of the query:

+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|asset_id   |asset_name           | asset_location                |lightbox_name     | author      | authorized_viewers     |
+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|3700       |a321.jpg             |Libraries\Library_Media\Images |aircraft-types    |scott@co.com |jon@co.com,aj@co.com    |
+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|200        |757_Taxi.swf         |Libraries\Library_Media\Images |NULL              |NULL         |NULL                    |
+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|3444       |b757.jpg             |Libraries\Library_Media\Images |aircraft-types    |scott@co.com |jon@co.com,aj@co.com    |
+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|1444       |FlyBy.swf            |Libraries\Library_Media\Images |NULL              |NULL         |NULL                    |
+-----------+---------------------+-------------------------------+------------------+-------------+------------------------+
|3701       |Door_757.swf         |Libraries\Library_Media\Images |ramp              |nicole@co.com|scott@co.com            |
+----------+---------------------+--------------------------------+------------------+-------------+------------------------+

SQL Fiddle

Thanks!

I'm wondering why you need two joins to the lightboxes_tbl table. It seems like the second reference to that table (alias D) is unnecessary. Seems like you could just use an OR.

As a demonstration, replicating the predicates in your query:

LEFT JOIN lightboxes_tbl C
       ON B.lightbox_id = C.id
      AND ( C.author = 'scott@co.com'
          OR C.authorized_viewers = 'scott@co.com'
          )

But given that authorized_user contains a comma separated list (ACCKKK!!!), I suspect you really want to look for an exact match to an item in the comma separated list. The LIKE comparison that currently have is equivalent to an equals comparison (to the entire contents of authorized_viewers column). You could add '%' wildcard characters to search for the value as part of the string...

But that approach is will also match strings containing e.g. ebscott@co.com, which may not be what you really want.

You could use the FIND_IN_SET function to find an exact match within the comma separated list...

LEFT JOIN lightboxes_tbl C
       ON B.lightbox_id = C.id
      AND ( C.author = 'scott@co.com'
          OR FIND_IN_SET('scott@co.com',C.authorized_viewers)
          )

Storing comma separated lists is a SQL anti-pattern. I recommend Bill Karwin's book: SQL Anti-Patterns: Avoiding the Pitfalls of Database Programming

http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

In my web application i have a company which can be a buyer or a supplier or both.
So my database tables would be like this

Company( id_company, ..., is_buyer, is_supplier, ... )

Or :

Company( id_company, ... )
Type_company( id_type_company, type )
Extra_table(id_company, id_type_company )

Or :

Company( id_company, ... )
Type_company( id_company, id_type_company, type )

I want a explication (Pros and Cons) for every case if it's possible.

You can consider using the common supertype like this

CREATE TABLE companies
(
  id int not null primary key, 
  name varchar(128)
  -- other columns
);
CREATE TABLE buyers
(
  company_id int not null primary key,
  foreign key (company_id) references companies (id)  
);
CREATE TABLE suppliers
(
  company_id int not null primary key,
  foreign key (company_id) references companies (id)
);

Here are some sample queries:

-- Select all buyers
SELECT c.id, c.name
  FROM companies c JOIN buyers b 
    ON c.id = b.company_id;

-- Select all suppliers
SELECT c.id, c.name
  FROM companies c JOIN suppliers s 
    ON c.id = s.company_id;

-- Select companies that are both buers and suppliers
SELECT c.id, c.name
  FROM companies c JOIN buyers b 
    ON c.id = b.company_id JOIN suppliers s 
    ON c.id = s.company_id;

-- Select companies that are buers BUT NOT suppliers
SELECT c.id, c.name
  FROM companies c JOIN buyers b 
    ON c.id = b.company_id LEFT JOIN suppliers s 
    ON c.id = s.company_id
 WHERE s.company_id IS NULL; 

Here is SQLFiddle demo

Recommended reading:

Is there any resource that I can find a compiled list of data modeling anti-patterns.

For quite some time I have been discussing matters like OTLT (or MUCK Tables) without knowing there is a name for it! I think knowing a list of anti-patterns would shorten or eliminate many database design debates.

Thank you,

You might try the SQL Antipatterns book.

I've been wondering if normalization is needed. My primary concern is not the readability and ease of use of these tables but speed. Take for example a car table. Assume this car table has upwards of 10,000 records.

My question is: is the time penalty of doing many joins worth the better design of a database system? and how what is the difference between doing '%%' queries vs joins

Cars

-------------------------------------------------------------
|CarID  |Car_Name   |         Car_Features                   |
------------------------------------------------------------- 
|F31344 | Corolla   | Sedan - Sun Roof Power Lock Manual 4WD |
-------------------------------------------------------------

and doing queries like :

  1. Select Car_ID,Car_Name,Car_Features From Cars Where Car_Features like '%4WD'%;
  2. Select Car_Name from Cars where Car_Features like '%Manual Sun Roof%';

Duration/Fetch time is relatively fast.

Vs the normalized approach.

Cars

------------------------------------------
|Car_ID | Car_Name |Transmission|Features|
------------------------------------------
|F31344 | Corolla  | M4WD       | EX1    |
------------------------------------------

Transmission

---------------------------------
|ID           | Drive | Manual  |
--------------------------------
|M4WD         | 4WD   |   1     |
--------------------------------


Features
---------------------------------
|Features | Sun_Roof | Power_lock|
---------------------------------
|EX1      | 1         |     0    |
---------------------------------

Now I have to write joins to accomplish the same thing. As well as introducing two new tables, and two look-up columns in my original table.

> 1.Select Car_ID,Car_Name,Drive From Cars inner join Transmission
> ON Cars.Transmission = Transmission.ID
> Where Drive = '4WD'; 

Normalization is not about performance, although in most cases improved performance is a side effect of properly normalized database. It mostly about integrity and consistency of your data.

In denormalized approach you won't be able to normally maintain (e.g. consistently add new or update existing features to(of) all cars records) and query your data (when you start to mix and match car features). And your second query is a perfect illustration to that

Select Car_Name from Cars where Car_Features like '%Manual Sun Roof%';

This query won't return you anything.

Here is SQLFiddle demo

Recommended reading:

Lets say i have two tables EMPLOYEE and INCHARGE_ROOM , EMPLOYEE.INCHARGE_ROOMS has | delimted data of INCHARGE_ROOM.ID

EMPLOYEE


ID      NAME      INCHARGE_ROOMS
1        K           1|2|3|4
2        S           2|4|5

INCHARGE_ROOM

INCHARGE_ROOM_ID      INCHARGE_ROOM_NAME
1                                        RED
2                                       BLUE
3                                      YELLOW
4                                       ORANGE
5                                       BLACK

If i want to

SELECT ALL THE ROOMS FOR WHICH K IS INCHARGE.

The Expected result is

NAME                         INCHARGE_ROOM
K                                        RED
K                                        BLUE
K                                        YELLOW
K                                        ORANGE

This is a classic example of how not to structure the data. Bill Karwin's book 'SQL AntiPatterns' calls it JayWalking.

Your table design should be fixed. If you can't fix it, then you have to use DBMS-specific techniques to undo the damage done by the broken schema design.

You need 3 tables, not 2:

Employee (EmployeeID, Name)
InChargeRoom (InChargeRoomID, InChargeRoomName)
EmpInCharge (EmployeeID, InChargeRoomID)

Then you can do a simple 3-way join to get the required result trivially.

Suppose I have an application that renders a seating chart to the user.

There are three seating sections.

Each section contains five rows.

Each row contains a variable number of seats.

Each seat has its own attributes, among which is an association with the customer who has purchased the seat.

Would it make sense for me to model the data into the following tables?

  • floor_plan
  • seating_section
  • seating_row
  • seat
  • customer

Ultimately, this data will need to be aggregated so that it is meaningful and useful to my front end. How does one go about assembling data from the database into something that is useful for and specific to a given view?

Also, I have a gazillion more questions regarding similar database-design-related items. Are there any good books that could give me a solid foundation in this stuff?

From the relational point of view, data should be granular enough that

  • the dbms can enforce sensible constraints on it, and
  • client code never has to parse it.

Assuming there's only one venue (only one floorplan), the enterprise will typically identify a seat by its section, row, and number. Principles are the same for multiple floorplans and multiple venues.

Let's assume that section 1 has 3 rows, section 2 has 5 rows, and section 3 has 4 rows. (Tested in PostgreSQL.)

create table sections (
  section_num integer primary key
    check (section_num between 1 and 3)
);

insert into sections values (1), (2), (3);

create table section_rows (
  section_num integer not null
    references sections (section_num),
  row_num integer not null
    check (row_num between 1 and 5),
  primary key (section_num, row_num)
);

insert into section_rows values
(1,1), (1,2), (1,3),
(2,1), (2,2), (2,3), (2,4), (2,5),
(3,1), (3,2), (3,3), (3,4);

create table seats (
  section_num integer not null,
  row_num integer not null,
  seat_num integer not null,
  primary key (section_num, row_num, seat_num),
  foreign key (section_num, row_num) 
    references section_rows (section_num, row_num)
);

insert into seats values
(1, 1, 1), (1, 1, 2), (1, 1, 3),
(1, 2, 1), (1, 2, 2), (1, 2, 3),
(1, 3, 1), (1, 3, 2), (1, 3, 3), (1, 3, 4),
(2, 1, 1), (2, 1, 2), (2, 1, 3),
(2, 2, 1), (2, 2, 2), (2, 2, 3),
(2, 3, 1), (2, 3, 2), (2, 3, 3), (2, 3, 4),
(2, 4, 1), (2, 4, 2), (2, 4, 3), (2, 4, 4),
(2, 5, 1), (2, 5, 2), (2, 5, 3), (2, 5, 4), (2, 5, 5),
(3, 1, 1), (3, 1, 2), (3, 1, 3),
(3, 2, 1), (3, 2, 2), (3, 2, 3),
(3, 3, 1), (3, 3, 2), (3, 3, 3), (3, 3, 4),
(3, 4, 1), (3, 4, 2), (3, 4, 3), (3, 4, 4);

This last table, "seats" identifies every seat in the venue. Once these three tables are populated, you never have to change them unless you tear out seats or install new ones.

Now you can sell each one to a customer.

create table event_sales (
  -- Assumes an event identifier identifies the date and time as well
  -- as the event's name.
  event_id integer not null, -- references events (not shown)
  section_num integer not null,
  row_num integer not null,
  seat_num integer not null,
  customer_columns_go_here char(1) default 'x',
  primary key (event_id, section_num, row_num, seat_num),
  foreign key (section_num, row_num, seat_num) 
    references seats (section_num, row_num, seat_num)
);

insert into event_sales values
(1, 1, 1, 1, 'a'),
(1, 1, 1, 2, 'a'),
(1, 1, 1, 3, 'a'),
(1, 2, 2, 1, 'b'),
(2, 2, 1, 1, 'a'),
(2, 2, 1, 2, 'b'),
(2, 2, 1, 3, 'c'),
(2, 3, 2, 1, 'd');

All these tables are in at least 5NF.

What seats are available for event 1? (Probably the most common query for a seating application.)

select *
from seats
except
(select section_num, row_num, seat_num from event_sales where event_id = 1)
order by section_num, row_num, seat_num;

Database design is a bigger topic than most people think it is. You're not likely to get it right by skimming a couple of web sites. Avoid bad practices while you learn. I think you're probably best served by Bill Karwin's book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

I have a temp table to store some data for calculating payroll. Data is related to many tables and is a long query logic.

This stored procedure is to get some data from other tables then insert to PayrollTemp table for later use. It needs 3 parameters: the start and end date, and the EmployeeID (EmpID) for the main query and the nested queries.

So whenever we run this stored procedure, we only get

(1 row(s) affected)

My question is how to use loop in SQL Server to pass every EmployeeID (EmpID) sequentially to store procedure in order to insert every record into the PayrollTemp table.

Create PROCEDURE [dbo].[sp_PayrollTemp_Init]
@startDate date, @endDate date, @EmpID int

AS

INSERT INTO dbo.PayrollTemp([EmpID],[BasicSalary],[Allowance],[W_Hrs],[W_days],[OT150Hrs],[OT200Hrs],[OT300Hrs],[Night130Hrs],
[NightOT150Hrs],[NightOT200Hrs],[NightOT300Hrs],[WeekdayDutyDays],[WeekendDutyDays],[OtherEarning],[SI],[HI],[UI],[UnionFee],[DependentDeduction],[OtherDeduction])
        select e.Id,e.BasicSalary,e.Allowance,
        (select SUM(ts.NormalHours) from TimeSheetA ts where EmpID=@EmpID)W_Hrs,
        (SELECT
           (DATEDIFF(dd, @startDate, @endDate) + 1)
          -(DATEDIFF(wk, @startDate, @endDate) * 2)
          -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday' THEN 1 ELSE 0 END)
          -(CASE WHEN DATENAME(dw, @endDate) = 'Saturday' THEN 1 ELSE 0 END))W_days,
        (select SUM(ts.OT150Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)OT150Hrs,
        (select SUM(ts.OT200Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)OT200Hrs,
        (select SUM(ts.OT300Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)OT300Hrs,
        (select SUM(ts.Night130Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)Night130Hrs,
        (select SUM(ts.NightOT150Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)NightOT150Hrs,
        (select SUM(ts.NightOT200Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)NightOT200Hrs,
        (select SUM(ts.NightOT300Hrs) from TimeSheetA ts where EmpID=@EmpID and TimeSheetDate between @startDate and @endDate)NightOT300Hrs,

        (select count(EmpID)WeekdayDuty from NightDutySchedule where DATENAME(DW,DutyDate) not in('Saturday','Sunday') and DutyDate between @startDate and @endDate and EmpID=@EmpID)WeekdayDutyDays,
        (select COUNT(EmpID) from NightDutySchedule where DATENAME(DW,DutyDate)='Saturday' or DATENAME(DW,DutyDate)='Sunday' and DutyDate between @startDate and @endDate and EmpID=@EmpID)WeekendDutyDays,
        (select SUM(EarningAmount) from EarningDetails where EarningDate between @startDate and @endDate and EmpID=@EmpID)OtherEarning,
        (SELECT case e.Is_Probation
            WHEN  'False' THEN
            e.BasicSalary*0.08
            ELSE 0 end 
          )SI,
          (SELECT case e.Is_Probation
            WHEN  'False' THEN
            e.BasicSalary*0.015
            ELSE 0 end 
          )HI,
          (SELECT case e.Is_Probation
            WHEN  'False' THEN
            e.BasicSalary*0.01
            ELSE 0 end 
          )UI,
          (SELECT case e.Is_Probation
            WHEN  'False' THEN
            case  when e.BasicSalary > 4000000 or e.BasicSalary = 4000000 then  20000   else 15000 end
            ELSE 0 end 
          )UnionFee,
          (SELECT case e.Is_Probation
            WHEN  'False' THEN
                e.Dependants * 3600000
            ELSE 0 end 
          )DependentDeduction,
         (select SUM(DeductionAmount) from DeductionDetails where DeductionDate between @startDate and @endDate and EmpID=@EmpID)OtherDeduction
          from Employee e where e.Is_Active='True' and e.Foreigner='False' and e.Id=@EmpID

   declare @startDate date, @endDate date, @EmpID int
    set @startDate='2015-09-01'
    set @endDate='2015-09-30'
    set  @EmpID='3'
    select e.Id,e.BasicSalary,e.Allowance,SUM(ts.NormalHours)W_Hrs,
    (SELECT
               (DATEDIFF(dd, @startDate, @endDate) + 1)
              -(DATEDIFF(wk, @startDate, @endDate) * 2)
              -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday' THEN 1 ELSE 0 END)
              -(CASE WHEN DATENAME(dw, @endDate) = 'Saturday' THEN 1 ELSE 0 END))W_days,
        SUM(ts.OT150Hrs)OT150Hrs,SUM(ts.OT200Hrs)OT200Hrs, SUM(ts.OT300Hrs)OT300Hrs,SUM(ts.Night130Hrs) Night130Hrs,
        SUM(ts.NightOT150Hrs)NightOT150Hrs,SUM(ts.NightOT200Hrs)NightOT200Hrs,SUM(ts.NightOT300Hrs)NightOT300Hrs,
        (SELECT COUNT(case 
                WHEN DATENAME(DW,sch.DutyDate) not in('Saturday','Sunday') THEN
                    1
                ELSE 0 end 
              )WeekdayDutyDays,

        (SELECT COUNT(case 
                WHEN DATENAME(DW,sch.DutyDate)  in('Saturday','Sunday') THEN
                    1
                ELSE 0 end 
              )WeekendDutyDays,
        SUM(ed.EarningAmount)OtherEarning,
        (SELECT case e.Is_Probation
                WHEN  'False' THEN
                e.BasicSalary*0.08
                ELSE 0 end 
              )SI,
              (SELECT case e.Is_Probation
                WHEN  'False' THEN
                e.BasicSalary*0.015
                ELSE 0 end 
              )HI,
              (SELECT case e.Is_Probation
                WHEN  'False' THEN
                e.BasicSalary*0.01
                ELSE 0 end 
              )UI,
              (SELECT case e.Is_Probation
                WHEN  'False' THEN
                case  when e.BasicSalary > 4000000 or e.BasicSalary = 4000000 then  20000   else 15000 end
                ELSE 0 end 
              )UnionFee,
              (SELECT case e.Is_Probation
                WHEN  'False' THEN
                    e.Dependants * 3600000
                ELSE 0 end 
              )DependentDeduction,
             SUM(ded.DeductionAmount) OtherDeduction

    from Employee e 
    inner join TimeSheetA ts
    on e.Id=ts.EmpID and e.Is_Active='True' and e.Foreigner='False' 
    inner join NightDutySchedule sch
    on e.Id=sch.EmpID and sch.DutyDate >= @startDate and sch.DutyDate <= @endDate
    inner join  EarningDetails ed
    on e.Id=ed.EmpID and ed.EarningDate >= @startDate and ed.EarningDate <= @endDate
    inner join DeductionDetails ded
    on e.Id= ded.EmpID and ded.DeductionDate >= @startDate and ded.DeductionDate <= @endDate

I modifed to use inner join statement but when I excute , it display the error like this:

    Msg 102, Level 15, State 1, Line 62
Incorrect syntax near '@endDate'.

Line 62 is: on e.Id= ded.EmpID and ded.DeductionDate >= @startDate and ded.DeductionDate <= @endDate (it is the last one, why?)

Why not do it all at once and join to the employees table rather than sending in the employee id?

Of course if you want it to be performant at all, you need to completely rework the proc to use joins instead of correlated subqueries.

You desperately need to read this book: http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

Say I have a table and one of the columns is titled tags with data that is comma separated like this.

"tag1,tag2,new york,tag4"

As you can see, some of tags will have spaces.

Whats the best or most accurate way of querying the table for any tags that are equal to "new york"?

In the past I've used:

SELECT id WHERE find_in_set('new york',tags) <> 0 

But find_in_set does not work when the value has a space.

I'm currently using this:

SELECT id WHERE concat(',',tags,',') LIKE concat(',%new york%,') 

But I'm not sure if this is the best approach.

How would you do it?

If I needed to ignore the leading spaces before and after the commas in tags.

For example, if tags had a value of:

'atlanta,boston , chicago, los angeles  , new york '

and assuming spaces are the only character I want to ignore, and the tag I'm searching for doesn't have any leading or trailing spaces, then I'd likely use a regular expression. Something like this:

SELECT ...
  FROM t
 WHERE t.tags REGEXP CONCAT('^|, *', 'new york' ,' *,|$')

I recommend Bill Karwin's excellent book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming"

https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

Chapter 2 Jaywalking covers the antipattern of comma separated lists.