Developing Time-oriented Database Applications in SQL

Richard T. Snodgrass

Mentioned 6

The accompanying CD-ROM includes all the SQL statements and embedded host code in the book, along with data from actual applications. The package provides real-life extended case studies which gradually introduce concepts and techniques.

More on Amazon.com

Mentioned in questions and answers.

I was reading about temporal databases and it seems they have built in time aspects. I wonder why would we need such a model?

How different is it from a normal RDBMS? Can't we have a normal database i.e. RDBMS and say have a trigger which associates a time stamp with each transaction that happens? May be there would be a performance hit. But I'm still skeptical on temporal databases having a strong case in the market.

Does any of the present databases support such a feature?

As I understand it (and over-simplifying enormously), a temporal database records facts about when the data was valid as well as the the data itself, and permits you to query on the temporal aspects. You end up dealing with 'valid time' and 'transaction time' tables, or 'bitemporal tables' involving both 'valid time' and 'transaction time' aspects. You should consider reading either of these two books:

I need a data structure that can store non-overlapping ranges within a single dimension. The entire range of the dimension need not be completely covered.

An example would be a conference room scheduler. The dimension is time. No two schedules may overlap. The conference room isn't always scheduled. In other words, for a given time there can be at most one schedule.

A quick solution is for a range to store the start and end times.

Range {
    Date start
    Date end
}

This is non-normalized and requires the container to enforce no overlapping. For two adjacent ranges, the previous' end will be redundant with the next's start.

Another scheme might involve storing one boundary value with each range. But for a contiguous sequence of ranges, there will always be one more boundary values than ranges. To get around this the sequence could be represented as alternating boundary values and ranges:

B = boundary value, r = range

B-r-B-r-B

The data structure might look like:

Boundary {
    Date value
    Range prev
    Range next
}

Range {
    Boundary start
    Boundary end
}

In essence it's a doubly linked list with alternating types.

Ultimately, whatever data structure I use will be represented in both memory (application code) and a relational database.

I'm curious what academic or industry tried solutions exists.

This is non-trivial because (in the database world) you have to compare multiple rows to determine non-overlapping ranges. Clearly, when the information is in memory, then other representations such as lists in time order are possible. I think, though, that you'd be best off with your 'start + end' notation, even in a list.

There are whole books on the subject - part of 'Temporal Database' handling. Two you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu (so a Google search makes it pretty easy to find).

One of the relevant data structures is, I believe, an R-Tree. That is often used for 2-dimensional structures, but can also be effective for 1-dimensional structures.

You can also look for "Allen's Relations" for intervals - they may be helpful to you.

I am using asp mvc 3, jquery full calendar, ms sql sever 2008 and c#.

I am wondering if anyone knows how to make recurring events?

I am unsure how to make them.

For instance in google calendar you can make an appointment repeat yearly forever. I doubt that they generate that appointment X times in the database.

I am wondering how I could have one row in my db and somehow know to call that up when needed.

Also google calendar and outlook have lots of repeating options like repeat on the 1st month, last month and etc.

Is there any libraries build that have this? Or do I got to make it from scratch?

P.S

I am on a shared host so a solution has to work with limited rights.

Whenever someone asks a question like this the link to book "Developing Time-Orientated Database Applications in SQL" comes up. It's available legitimately as a free PDF and on Amazon.

My original question can be found here, for which I've gotten some great answers, idas and tips.

As part of a feasibility and performance study, I've started to convert my schemas in order to version my data using those ideas. In doing so, I've come up with some kind of other problem.

In my original question, my example was simple, with no real relational references. In an attempt to preserve the example of my previous question, I will now extend the 'Name' part to another table.

So now, my data becomes:

Person
------------------------------------------------
ID                UINT NOT NULL,
NameID            UINT NOT NULL,
DOB               DATE NOT NULL,
Email             VARCHAR(100) NOT NULL

PersonAudit
------------------------------------------------
ID                UINT NOT NULL,
NameID            UINT NOT NULL,
DOB               DATE NOT NULL,
Email             VARCHAR(100) NOT NULL,
UserID            UINT NOT NULL,         -- Who
PersonID          UINT NOT NULL,         -- What
AffectedOn        DATE NOT NULL,         -- When
Comment           VARCHAR(500) NOT NULL  -- Why

Name
------------------------------------------------
ID                UINT NOT NULL,
FirstName         VARCHAR(200) NOT NULL,
LastName          VARCHAR(200) NOT NULL,
NickName          VARCHAR(200) NOT NULL

NameAudit
------------------------------------------------
ID                UINT NOT NULL,
FirstName         VARCHAR(200) NOT NULL,
LastName          VARCHAR(200) NOT NULL,
NickName          VARCHAR(200) NOT NULL,
UserID            UINT NOT NULL,         -- Who
NameID            UINT NOT NULL,         -- What
AffectedOn        DATE NOT NULL,         -- When
Comment           VARCHAR(500) NOT NULL  -- Why

In a GUI, we could see the following form:

ID            :  89213483
First Name    :  Firsty
Last Name     :  Lasty
Nick Name     :  Nicky
Date of Birth :  January 20th, 2005
Email Address :  my.email@host.com

A change can be made to:

  1. Only to the 'name' part
  2. Only to the 'person' part
  3. To both the 'name' and person parts

If '1' occurs, we copy the original record to NameAudit and update our Name record with the changes. Since the person reference to the name is still the same, no changes to Person or PersonAudit are required.

If '2' occurs, we copy the original record to PersonAudit and update the Person record with the changes. Since the name part has not changed, no changes to Name or NameAudit are required.

If '3' occurs, we update our database according to the two methods above.

If we were to make 100 changes to both the person and name parts, one problem occurs when you later try to show a history of changes. All my changes show the person having the last version of the name. Which is wrong obviously.

In order to fix this, it would seem that the NameID field in Person should reference the NameAudit instead (but only if Name has changes).

And it is this conditional logic that starts complicating things.

I would be curious to find out if anyone has had this kind of problem before with their database and what kind of solution was applied?

You should probably try to read about 'Temporal Database' handling. Two books you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu. You can also look for "Allen's Relations" for intervals - they may be helpful to you.


I assume that the DATE type in your database includes time (so you probably use Oracle). The SQL standard type would probably be TIMESTAMP with some number of fractional digits for sub-second resolution. If your DBMS does not include time with DATE, then you face a difficult problem deciding how to handle multiple changes in a single day.

What you need to show, presumably, is a history of the changes in either table, with the corresponding values from the other table that were in force at the time when the changes were made. You also need to decide whether what you are showing is the before or after image; presumably, again, the after image. That means that you will have a 'sequenced' query (Snodgrass's term), with columns like:

Start time        -- When this set of values became valid
End time          -- When this set of values became invalid
PersonID          -- Person.ID (or PersonAudit.ID) for Person data
NameID            -- Name.ID (or NameAudit.ID) for Name data
DOB               -- Date of Birth recorded while data was valid
Email             -- Email address recorded while data was valid
FirstName         -- FirstName recorded while data was valid
LastName          -- LastName recorded while data was valid
NickName          -- NickName recorded while data was valid

I assume that once a Person.ID is established, it does not change; ditto for Name.ID. That means that they remain valid while the records do.

One of the hard parts in this is establishing the correct set of 'start time' and 'end time' values, since transitions could occur in either table (or both). I'm not even sure, at the moment, that you have all the data you need. When a new record is inserted, you don't capture the time it becomes valid (there is nothing in the XYZAudit table when you insert a new record, is there?).


There's a lot more could be said. Before going further, though, I'd rather have some feedback about some of the issues raised so far.


Some other SO questions that might help:


Since this answer was first written, there's another book published about another set of methods called 'Asserted Versioning' for handling temporal data. The book is 'Managing Time in Relational Databases: How to Design, Update and Query Temporal Data' by Tom Johnston and Randall Weiss. You can find their company at AssertedVersioning.com. Beware: there may be patent issues around the mechanism.

Also, the SQL 2011 standard (ISO/IEC 9075:2011, in a number of parts) has been published. It includes some temporal data support. You can find out more about that and other issues related to temporal data at TemporalData.com, which is more of a general information site rather than one with a particular product axe to grind.

I am looking for a way to integrate as directly as possible a temporal awareness into my classes. I deal with data that change with time quite a lot, like share prices, so this would probably need some attention, and be dealt with in one place to ensure "separation of concerns".

Have you had similar experience with temporal data/databases ?

What could you recommend reading / knowing ?

(I am thinking of wrapping my computations around a TimeSlice(date) computation expression builder, in order to scope retrieval to the specified date which constitute the 'horizon' of my computation. like

let oldCEO = 
      historicaDate(Today() - 10 years)            {
          let! company = CompanyFinder("MSFT")
          company.CEO 
        }
let todayCEO = 
      historicaDate(Today() )            {
          let! company = CompanyFinder("MSFT")
          company.CEO 
        }

I believe that considering temporal aspect of reality captured in your data model should begin with very explicit distinction between valid time and transaction time dimensions; your example approach with CEO "at point in time" having "CEO" state accompanied by just one generic temporal dimension state "When" may be good enough as long as you do not consider multiple sources of your captured "facts" on fine grain time scales. Otherwise, if you turn to mentioned in your question matter of share prices and their relationships, like between value of S&P 500 index and prices of its constituents, you cannot get away from distinguishing these two dimensions and capturing them in your internal data state: the value of S&P 500 index have sense only with regard to some transaction time as you need to collect constituent prices at some valid time moment and perform (not instantaneous) weighting and calculations in order to come up with index value at some later transaction time moment, which, in fact, is value for the past moment of valid time.

Or, even staying within your example, imagine that you are required by some compliance regiment to audit anytime what was your response at some moment in the past. In other words, your time specificator

historicalDate(date: <time expression>)

is a particular variant of more generic case

historicalDate(date: <valid time expression>,
               asOf: <transaction time expression>)

so historicalDate(Today() - 1yr) is, in fact, historicalDate(Today() - 1yr, Now()), but, in principle, may be historicalDate(Today() - 1yr, Now() - 20days)

Dealing with data bitemporally is not a matter simple enough to give a short one-fits-all recipe on taming it through specific language mechanisms like computation expressions. It definitely would be a good idea to do some reading. If being asked what a single book I may recommend on the subject, my answer would be Managing Time in Relational Databases, otherwise if you want exhaustive research you may also consider classic Developing Time-Oriented Database Applications in SQL and Temporal Data & the Relational Model.

Maybe someone else can point you to approaches of handling temporal data state outside of relational-based persistence mechanism, but my own experience is limited only to the such.

I'm writing an app that contains the following tables: (1) employee_type, (2) employee and (3) employee_action.

Employee_action is foreign-keyed to employee, and contains a description of what happened and the date of the event, just as you might expect.

However, employees can change their type over time (promotion, demotion, relocation, etc). If my schema was just as simple as this, then you might generate a historical report that says that John was the CEO of the company when he was out delivering pizzas 10 years ago.

What would be the best way for me to save the fact that employees had a certain set of characteristics at the time that they performed an action, which are not necessarily their characteristics at the present time?

I'm stating my problem simply here. I have a lot more tables than 3, and the employees position is not the only characteristic that i'm worried about. It's not an option for me to just denormalize everything and make a history table with every possible employee field in it.

Thanks, I hope this is clear.

Representing time data in SQL is tricky. There is a very good book on the subject, and it's even available for free online from the author: http://www.cs.arizona.edu/people/rts/tdbbook.pdf.

The Amazon page is on http://www.amazon.com/Developing-Time-Oriented-Database-Applications-Management/dp/1558604367, but it's out of print.

If you are serious about modeling changes over time in SQL, this book is a must-read. I learned a lot from it and I only understand maybe 25% of it, having read it only once :)