Cost-Based Oracle Fundamentals

Jonathan Lewis

Mentioned 7

The question, "Why isn’t Oracle using my index?" must be one of the most popular (or perhaps unpopular) questions ever asked on the Oracle help forums. You’ve picked exactly the right columns, you’ve got them in the ideal order, you’ve computed statistics, you’ve checked for null columns—and the optimizer flatly refuses to use your index unless you hint it. What could possibly be going wrong? If you’ve suffered the frustration of watching the optimizer do something completely bizarre when the best execution plan is totally obvious, or spent hours or days trying to make the optimizer do what you want it to do, then this is the book you need. You’ll come to know how the optimizer thinks, understand why it makes mistakes, and recognize the data patterns that make it go awry. With this information at your fingertips, you will save an enormous amount of time on designing and trouble-shooting your SQL. The cost-based optimizer is simply a piece of code that contains a model of how Oracle databases work. By applying this model to the statistics about your data, the optimizer tries to efficiently convert your query into an executable plan. Unfortunately, the model can't be perfect, your statistics can't be perfect, and the resulting execution plan may be far from perfect. In Cost-Based Oracle Fundamentals, the first book in a series of three, Jonathan Lewis—one of the foremost authorities in this field—describes the most commonly used parts of the model, what the optimizer does with your statistics, and why things go wrong. With this information, you’ll be in a position to fix entire problem areas, not just single SQL statements, by adjusting the model or creating more truthful statistics.

More on Amazon.com

Mentioned in questions and answers.

Let's say I have poorly performing query. How do you usually go about sql optimization? What are the first things I should look in query execution plan? Is there a good article or book about this?

The Performance Tuning Guide is a great place to start but Jonathan Lewis's Cost Based Oracle Fundamentals is the canonical reference on what the optimizer is doing and why. Depending on the complexity of the problem, CBO Fundamentals may be radical overkill, though.

As a first pass, if you look at the estimated cardinality of each step in the plan, it's useful to ask whether that cardinality is reasonable. If the cardinalities are reasonable, it's likely that Oracle is picking the most efficient plan and you need to look for other approaches to tuning the query. If the cardinalities are wildly incorrect, on the other hand, it's likely that Oracle has chosen a poorly performing query plan and that something about the statistics needs to be adjusted.

In order to make use of the execution plan (which is a description of how the database will implement your request), one has to understand what options are available to it (the database) and make a judgement on whether the choice made by the optimizer was the correct one. This requires a lot of knowledge and experience.

For the work I'm doing (ETL processing), performance "problems" usually falls into one of two categories:

  1. The query takes a long time because reading lots of data takes lots of time :)
  2. The optimizer made a mistake and picked the wrong execution plan

For (1), I have to decide whether I can restructure the data differently so I scan less data. Indexes are seldom of use since I'm interested in large enough subsets to make indexed access slower than a full table scan. For example, I might store a horizontal subset of the data (last 30 days) or a vertical subset of the data (10 columns instead of 80), or an aggregate of the data. Either way will reduce the size of the data to make an increase in processing speed. Of course, if the data is used only once, I've just moved the problem elsewhere.

For (2), I usually start by checking "Cardinality" (num rows) at the top line in the xplan. If I know that my query returns 5,000,000 rows, but it says 500, I can be pretty sure that the optimizer messed up somewhere. If total cardinality is in the right ball park, I start from the other end instead and check each step of the plan until I find the first big estimation error. If the cardinality is wrong, the join method is probably wrong between that table and the next, and this error will cascade through the rest of the plan.

Google for "Tuning by cardinality feedback", and you will find a paper written by Wolfgang Breitling who describes (in a much better way) the abobve approach. It's a really good read!

Also, be sure to hang around Jonathan Lewis Blog. if there is something about the Oracle optimizer he doesn't know, it's not worth knowing. He has written the best book on the subject as well. Check out Cost-Based Oracle fundamentals. If I could send one book back in time to myself, this would be it.

Expert Oracle Database Architecture by Tom Kyte, (the man behind "Ask tom"), is an awesome read as well. My first read of this book was a disappointment, because I was looking for "tuning tips" and found none. On my second read I realized that by knowing how the database works, you can eliminate entire classes of performance problems by "designing for performance" from start instead of "adding" performance at the end :)

SQL Tuning by Dan Tow, is another awesome read for the fundamentals of how exactly one could determine the optimal execution plan. That knowledge can be used as a way of troubleshooting an execution plan (or forcing the optimizer into doing it your way).

If you made it this far, now would be a good time to fiddle around with hints. Not before.

Oracle decided to dismiss the rule-based optimizer from version 10g, leaving the cost-based one as the only choice.

I think that a rule-based optimizer has the unvaluable positive side of being always predictable. I've seen instead Oracle 10g changing execution plans from night to day, leading to turtle-like performances.

Which could be the rationale behind this change?

(I am not a DBA.)

My understanding is that Oracle has been moving away from the RBO for a long time in favor of CBO. It seems useful to me to stop supporting a feature that is no longer in active development (given a long enough depreciation period) so that everyone is using the most effective features.

It's interesting that you called predictability an "unvaluable" effect of using the rule-based optimizer. It seems like when the data changes to make an execution plan sub-optimal it would be best to switch to a new one. Only in the case you alluded to where the optimizer flip-flops between two execution plan would there be a problem with picking the best plan for the the data you are actually querying. I'm not sure what advantage predictability is in more normal situation.

Ending support of the out-dated optimizer ought to free up support for the newer optimizer.

For someone new to SQL Tuning with Oracle, can people please provide pointers as well as informative URLs that can assist me with how to approach an SQL query as well as define indexes where necessary, when it comes to tuning SQL queries for tables holding large quantity of data, in excess of 50,000 rows?

What are the main things that I should be looking out for to speed up an SQL statement?

It's a big topic. Anything by Tom Kyte is good:

Jonathan Lewis Cost-Based Oracle Fundamentals is the best book on the market if you want to learn how Oracle "thinks" when selecting access paths, join mechanism, join order etcetera.

It takes a few readings to get the hang of it, but once it "clicks" you have a ridiculusly powerful tool in your box, because not only can you troubleshoot much better, but you start to design tables and indexes in a way that play to the strengths of the database.

As I understand it, most query optimizers are "cost-based". Others are "rule-based", or I believe they call it "Syntax Based". So, what's the best way to optimize the syntax of SQL statements to help an optimizer produce better results?

Some cost-based optimizers can be influenced by "hints" like FIRST_ROWS(). Others are tailored for OLAP. Is it possible to know more detailed logic about how Informix IDS and SE's optimizers decide what's the best route for processing a query, other than SET EXPLAIN? Is there any documentation which illustrates the ranking of SELECT statements as to what's the fastest way to access rows, assuming it's indexed?

I would imagine that "SELECT col FROM table WHERE ROWID = n" is the fastest (rank 1).

If I'm not mistaking, Informix SE's ROWID is a SERIAL(INT) which allows for a max. of 2GB nrows, or maybe it uses INT9 for TB's nrows? SE's optimizer is cost based when it has enough data but it does not use distributions like the IDS optimizer.

IDS'ROWID isn't an INT, it is the logical address of the row's page left shifted 8 bits plus the slot number on the page that contains the row's data.

IDS' optimizer is a cost based optimizer that uses data about the index depth and width, number of rows, number of pages, and the data distributions created by update statistics MEDIUM and HIGH to decide which query path is the least expensive, but there's no ranking of statements?

I think Oracle uses HEX values for ROWID. Too bad ROWID can't be oftenly used, since a rows ROWID can change. So maybe ROWID can be used by the optimizer as a counter to report a query progress?, an idea I mentioned in my "Begin viewing query results before query completes" question? I feel it wouldn't be that difficult to report a query's progress while being processed, perhaps at the expense of some slight overhead, but it would be nice to know ahead of time: A "Google-like" estimate of how many rows meet a query's criteria, display it's progress every 100, 200, 500 or 1,000 rows, give users the ability to cancel it at anytime and start displaying the qualifying rows as they are being put into the current list, while it continues searching?.. This is just one example, perhaps we could think other neat/useful features, the ingridients are more or less there.

Perhaps we could fine-tune each query with more granularity than currently available? OLTP queries tend to be mostly static and pre-defined. The "what-if's" are more OLAP, so let's try to add more control and intelligence to it? So, therefore, being able to more precisely control, not just "hint/influence" the optimizer is what's needed. We can then have more dynamic SELECT statements for specific situations! Maybe even tell IDS to read blocks of index nodes at-a-time instead of one-by-one, etc. etc.

For Oracle, your best resource would be Cost Based oracle Fundamentals. It's about 500 pages (and billed as Volume 1 but there haven't been any followups yet).

For a (very) simple full-table scan, progress can sometimes be monitored through v$session_longops. Oracle knows how many blocks it has to scan, how many blocks it has scanned, how many it has to go, and reports on progress.

Indexes are a different matter. If I search for records for a client 'Frank', and use the index, the database will make a guess at how many 'Frank' entries are in the table, but that guess can be massively off. It may be that you have 1000 'Frankenstein' and just 1 'Frank' or vice versa.

It gets even more complicated as you add in other filter and access predicates (eg where multiple indexes can be chosen), and makes another leap as you include table joins. And thats without getting into the complex stuff about remote databases, domain indexes like Oracle Text and Locator.

In short, it is very complicated. It is stuff that can be useful to know if you are responsible for tuning a large application. Even for basic development you need to have some grounding in how the database can physically retrieve that data you are interested.

But I'd say you are going the wrong way here. The point of an RDBMS is to abstract the details so that, for the most part, they just happen. Oracle employs smart people to write query transformation stuff into the optimizer so us developers can move away from 'syntax fiddling' to get the best plans (not totally, but it is getting better).

In oracle Is there any way to determine howlong the sql query will take to fetch the entire records and what will be the size of it, Without actually executing and waiting for entire result.

I am getting repeatedly to download and provide the data to the users using normal oracle SQL select (not datapump/import etc) . Some times rows will be in millions.

This requires good statistics, explain plan for ..., adjusting sys.aux_stats, and then adjusting your expectations.


  1. Good statistics The explain plan estimates are based on optimizer statistics. Make sure that tables and indexes have up-to-date statistics. On 11g this usually means sticking with the default settings and tasks, and only manually gathering statistics after large data loads.

  2. Explain plan for ... Use a statement like this to create and store the explain plan for any SQL statement. This even works for creating indexes and tables.

    explain plan set statement_id = 'SOME_UNIQUE_STRING' for
    select * from dba_tables cross join dba_tables;
    

    This is usually the best way to visualize an explain plan:

    select * from table(dbms_xplan.display);
    
    
    Plan hash value: 2788227900
    
    -------------------------------------------------------------------
    | Id  | Operation              | Name  | Rows  | Bytes | Time     |
    -------------------------------------------------------------------
    |   0 | SELECT STATEMENT       |       |    12M|  5452M| 00:00:19 |
    |*  1 |  HASH JOIN RIGHT OUTER |       |    12M|  5452M| 00:00:19 |
    |   2 |   TABLE ACCESS FULL    | SEG$  |  7116 |   319K| 00:00:01 |
    ...
    

    The raw data is stored in PLAN_TABLE. The first row of the plan usually sums up the estimates for the other steps:

    select cardinality, bytes, time
    from plan_table
    where statement_id = 'SOME_UNIQUE_STRING'
        and id = 0;
    
    CARDINALITY    BYTES         TIME
    12934699       5717136958    19
    
  3. Adjust sys.aux_stats$ The time estimate is based on system statistics stored in sys.aux_stats. These are numbers for metrics like CPU speed, single-block I/O read time, etc. For example, on my system:

    select * from sys.aux_stats$ order by sname
    
    SNAME            PNAME         PVAL1              PVAL2
    SYSSTATS_INFO    DSTART                           09-11-2014 11:18
    SYSSTATS_INFO    DSTOP                            09-11-2014 11:18
    SYSSTATS_INFO    FLAGS                        1    
    SYSSTATS_INFO    STATUS                           COMPLETED
    SYSSTATS_MAIN    CPUSPEED
    SYSSTATS_MAIN    CPUSPEEDNW    3201.10192837466    
    SYSSTATS_MAIN    IOSEEKTIM                   10    
    SYSSTATS_MAIN    IOTFRSPEED                4096    
    SYSSTATS_MAIN    MAXTHR
    SYSSTATS_MAIN    MBRC
    SYSSTATS_MAIN    MREADTIM
    SYSSTATS_MAIN    SLAVETHR
    SYSSTATS_MAIN    SREADTIM
    

    The numbers can be are automatically gathered by dbms_stats.gather_system_stats. They can also be manually modified. It's a SYS table but relatively safe to modify. Create some sample queries, compare the estimated time with the actual time, and adjust the numbers until they match.

  4. Discover you probably wasted a lot of time

    Predicting run time is theoretically impossible to get right in all cases, and in practice it is horribly difficult to forecast for non-trivial queries. Jonathan Lewis wrote a whole book about those predictions, and that book only covers the "basics".

    Complex explain plans are typically "good enough" if the estimates are off by one or two orders of magnitude. But that kind of difference is typically not good enough to show to a user, or use for making any important decisions.

Can anyone explain how the cost is evaluated in an Oracle explain plan? Is there any specific algorithm to determine the cost of the query?

For example: full table scans have higher cost, index scan lower... How does Oracle evaluate the cases for full table scan, index range scan, etc.?

This link is same as what I am asking: Question about Cost in Oracle Explain Plan

But can anyone explain with an example, we can find the cost by executing explain plan, but how does it work internally?

There are many, many specific algorithms for computing the cost. Far more than could realistically be discussed here. Jonathan Lewis has done an admirable job of walking through how the cost-based optimizer decides on the cost of a query in his book Cost-Based Oracle Fundamentals. If you're really interested, that's going to to be the best place to start.

It is a fallacy to assume that full table scans will have a higher cost than, say, an index scan. It depends on the optimizer's estimates of the number of rows in the table and the optimizer's estimates of the number of rows the query will return (which, in turn, depends on the optimizer's estimates of the selectivity of the various predicates), the relative cost of a sequential read vs. a serial read, the speed of the processor, the speed of the disk, the probability that blocks will be available in the buffer cache, your database's optimizer settings, your session's optimizer settings, the PARALLEL attribute of your tables and indexes, and a whole bunch of other factors (this is why it takes a book to really start to dive into this sort of thing). In general, Oracle will prefer a full table scan if your query is going to return a large fraction of the rows in your table and an index access if your query is going to return a small fraction of the rows in your table. And "small fraction" is generally much smaller than people initially estimate-- if you're returning 20-25% of the rows in a table, for example, you're almost always better off using a full table scan.

If you are trying to use the COST column in a query plan to determine whether the plan is "good" or "bad", you're probably going down the wrong path. The COST is only valid if the optimizer's estimates are accurate. But the most common reason that query plans would be incorrect is that the optimizer's estimates are incorrect (statistics are incorrect, Oracle's estimates of selectivity are incorrect, etc.). That means that if you see one plan for a query that has a cost of 6 and a plan for a different version of that query that has a cost of 6 million, it is entirely possible that the plan that has a cost of 6 million is more efficient because the plan with the low cost is incorrectly assuming that some step is going to return 1 row rather than 1 million rows.

You are much better served ignoring the COST column and focusing on the CARDINALITY column. CARDINALITY is the optimizer's estimate of the number of rows that are going to be returned at each step of the plan. CARDINALITY is something you can directly test and compare against. If, for example, you see a step in the plan that involves a full scan of table A with no predicates and you know that A has roughly 100,000 rows, it would be concerning if the optimizer's CARDINALITY estimate was either way too high or way too low. If it was estimating the cardinality to be 100 or 10,000,000 then the optimizer would almost certainly be either picking the table scan in error or feeding that data into a later step where its cost estimate would be grossly incorrect leading it to pick a poor join order or a poor join method. And it would probably indicate that the statistics on table A were incorrect. On the other hand, if you see that the cardinality estimates at each step is reasonably close to reality, there is a very good chance that Oracle has picked a reasonably good plan for the query.