Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 2 weeks 1 day ago

Random Upgrade

Mon, 2018-10-08 07:36

Here’s a problem that (probably) won’t affect the day to day running of most systems – but it could be a pain in the backside for people who write programs to generate repeatable test data. I’m not going to say much about the problem, just leave you with a test script.


rem
rem	Script	random_upgrade.sql
rem	Author:	Jonathan Lewis
rem	Dated:	Oct 2018
rem
rem	Last tested
rem		18.3.0.0
rem		12.2.0.1
rem	Notes
rem	In the upgrade from 12.2.0.1 something
rem	changed that meant
rem		create as select dbms_random
rem	gets different data from
rem		select dbms_random
rem

drop table t4 purge;
drop table t3 purge;
drop table t2 purge;
drop table t1 purge;
drop table t0 purge;

set feedback off

create table t0 as
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
;


execute dbms_random.seed(0);

create table t1
as
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0);

create table t2
as
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;

prompt	=================
prompt	Diff the two CTAS
prompt	=================

select count(*)
from (
select * from t1
minus
select * from t2
union all
select * from t2
minus
select * from t1
)
;


create table t3 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

create table t4 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

execute dbms_random.seed(0)

insert into t3
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0)

insert into t4
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;


prompt	===================
prompt	Diff the two Insert
prompt	===================

select count(*)
from (
select * from t3
minus
select * from t4
union all
select * from t4
minus
select * from t3
)
;


prompt	===========
prompt	Sum of CTAS
prompt	===========

select sum(normal) from t1;

prompt	=============
prompt	Sum of Insert
prompt	=============

select sum(normal) from t3;


execute dbms_random.seed(0)

prompt	=============
prompt	Sum of select
prompt	=============

with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select sum(n) from (
select
	dbms_random.normal n
from
	g1
)
;


I’m repeatedly using dbms_random.seed(0) to reset the random number generator and trying to generate 10,000 normally distributed numbers. (I’ve chosen the normal distribution because that happened to be the function in a script I sent someone with the comment that “this will recreate the data for the demonstration” – and they wrote back to say that it didn’t.)

I’ve got two “create as select”, and two “insert as select”. One of each pair selects from a real existing table to get 10,000 rows, the other uses the “select dual connect by” trick to generate rows. I’ve written SQL that shows whether or not the two pairs of tables end up with the same data (they do, pairwise), then I’ve summed one table from each pair to see if the different mechanisms produce the same data – and that depends on the version of Oracle you’re using. Finally I’ve reset the random number generator and summed across a pure select to see what that produces.

If you run this code on 12.2.0.1 or earlier you’ll see that the “diffs” report zeros and the “sums” report -160.39249. If you upgrade to 18.3 the diffs will still report zeros and some of the sums will still report -160.39249 but the sum of the CTAS will report -91.352172.

Bottom Line

If you’ve got code that you wrote to create reproducible test cases and the code uses: “create table … as select … dbms_random …” then it won’t produce the same data when you upgrade to 18.3. You’ll have to modify the code to do “create table (); insert as select …”.

As of this afternoon I have 1,209 test scripts on my laptop that use the dbms_random package to model data distribution patterns. It is almost certain that I will end up modifying every single one of them eventually.

There are words to express how I feel about this – but not ones that I would consider publishing.

Join Cardinality – 2

Fri, 2018-10-05 09:37

In the previous note I posted about Join Cardinality I described a method for calculating the figure that the optimizer would give for the special case where you had a query that:

  • joined two tables
  • used a single-column to join on equality
  • had no nulls in the join columns
  • had a perfect frequency histogram on the columns at the two ends of the join
  • had no filter predicates associated with either table

The method simply said: “Match up rows from the two frequency histograms, multiply the corresponding frequencies” and I supplied a simple SQL statement that would read and report the two sets of histogram data, doing the arithmetic and reporting the final cardinality for you. In an update I also added an adjustment needed in 11g (or, you might say, removed in 12c) where gaps in the histograms were replaced by “ghost rows” with a frequency that was half the lowest frequency in the histogram.

This is a nice place to start as the idea is very simple, and it’s likely that extensions of the basic idea will be used in all the other cases we have to consider. There are 25 possibilities that could need separate testing – though only 16 of them ought to be relevant from 12c onwards. Oracle allows for four kinds of histograms – in order of how precisely they describe the data they are:

  • Frequency – with a perfect description of the data
  • Top-N (a.k.a. Top-Frequency) – which describes all but a tiny fraction (ca. one bucket’s worth) of data perfectly
  • Hybrid – which can (but doesn’t usually, by default) describe up to 2,048 popular values perfectly and gives an approximate distribution for the rest
  • Height-balanced – which can (but doesn’t usually, by default) describe at most 1,024 popular values with some scope for misinformation.

Finally, of course, we have the general case of no histogram, using only 4 numbers (low value, high value, number of rows, number of distinct values) to give a rough picture of the data – and the need for histograms appears, of course, when the data doesn’t look anything like an even distribution of values between the low and high with close to “number of rows”/”number of distinct values” for each value.

So there are 5 possible statistical descriptions for the data in a column – which means there are 5 * 5 = 25 possible options to consider when we join two columns, or 4 * 4 = 16 if we label height-balanced histograms as obsolete and ignore them (which would be a pity because Chinar has done some very nice work explaining them).

Of course, once we’ve worked out a single-column equijoin between two tables there are plenty more options to consider:  multi-column joins, joins involving range-based predicates, joins involving more than 2 tables, and queries which (as so often happens) have predicates which aren’t involved in the joins.

For the moment I’m going to stick to the simplest case – two tables, one column, equality – and comment on the effects of filter predicates. It seems to be very straightforward as I’ll demonstrate with a new model

rem
rem     Script:         freq_hist_join_03.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1(
        id      number(8,0),
        n0040   number(4,0),
        n0090   number(4,0),
        n0190   number(4,0),
        n0990   number(4,0),
        n1      number(4,0)
)
;

create table t2(
        id      number(8,0),
        n0050   number(4,0),
        n0110   number(4,0),
        n0230   number(4,0),
        n1150   number(4,0),
        n1      number(4,0)
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   40) + 1                   n0040,
        mod(rownum,   90) + 1                   n0090,
        mod(rownum,  190) + 1                   n0190,
        mod(rownum,  990) + 1                   n0990,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   50) + 1                   n0050,
        mod(rownum,  110) + 1                   n0110,
        mod(rownum,  230) + 1                   n0230,
        mod(rownum, 1150) + 1                   n1150,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
end;
/

You’ll notice that in this script I’ve created empty tables and then populated them. This is because of an anomaly that appeared in 18.3 when I used “create as select”, and should allow the results from 18.3 be an exact match for 12c. You don’t need to pay much attention to the Nxxx columns, they were there so I could experiment with a few variations in the selectivity of filter predicates.

Given the purpose of the demonstration I’ve gathered histograms on the column I’m going to use to join the tables (called n1 in this case), and here are the summary results:


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS
-------------------- -------------------- --------------- ------------ -----------
T1                   N1                   FREQUENCY                119         119
T2                   N1                   FREQUENCY                124         124

     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2488       2619    6,516,072
         1       2693       2599    6,999,107
         2       2635       2685    7,074,975
         3       2636       2654    6,995,944
...
       113          1          3            3
       115          1          2            2
       116          4          3           12
       117          1          1            1
       120          1          2            2
                                 ------------
sum                               188,114,543

We’ve got frequencyy histograms, and we can see that they don’t have a perfect overlap. I haven’t printed every single line from the cardinality query, just enough to show you the extreme skew, a few gaps, and the total. So here are three queries with execution plans:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));


select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
and     t2.n1150 = 25
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I’ve queried the pure join – the count was exactly the 188,114,543 predicted by the cardinality query, of course – then I’ve applied a filter to one table, then to both tables. The first filter n0990 = 20 will (given the mod(,990)) definition identify one row in 990 from the original 100,000 in t1; the second filter n1150 = 25 will identify one row in 1150 from t2. That’s filtering down to 101 rows and 87 rows respectively from the two tables. So what do we see in the plans:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:23.47 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:23.47 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    188M|    188M|00:00:23.36 |     748 |  6556K|  3619K| 8839K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.02 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.02 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    190K|    200K|00:00:00.02 |     748 |  2715K|  2715K| 1647K (0)|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T1"."N0990"=20)



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    165 |    165 |00:00:00.01 |     748 |  2715K|  2715K| 1678K (0)|
|*  3 |    TABLE ACCESS FULL| T2   |      1 |     87 |     87 |00:00:00.01 |     374 |       |       |          |
|*  4 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T2"."N1150"=25)
   4 - filter("T1"."N0990"=20)


The first execution plan shows an estimate of 188M rows – but we’ll have to check the trace file to confirm whether that’s only an approximate match to our calculation, or whether it’s an exact match. So here’s the relevant pair of lines:


Join Card:  188114543.000000 = outer (100000.000000) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 188114543 Computed: 188114543.000000

Yes, the cardinality calculation and the execution plan estimates match perfectly. But there are a couple of interesting things to note. First, Oracle seems to be deriving the cardinality by multiplying the individual cardinalities of the two tables with a figure it calls “sel” – the thing that Chinar Aliyev has labelled Jsel the “Join Selectivity”. Secondly, Oracle can’t do arithmetic (or, removing tongue from cheek) the value it’s reported for the join selectivity is reported at only 6 decimal places, but stored to far more. What is the Join Selectivity, though ? It’s the figure we derive from the cardinality SQL divided by the cardinality of the cartesian join of the two tables – i.e. 188,114,543 / (100,000 * 100,000).

With the clue from the first trace file, can we work out why the second and third plans show 190K and 165 rows respectively. How about this – multiply the filtered cardinalities of the two separate tables, then multiply the result by the join selectivity:

  • 1a)   n0990 = 20: gives us 1 row in every 990.    100,000 / 990 = 101.010101…    (echoing the rounded execution plan estimate).
  • 1b)   100,000 * (100,000/990) * 0.0188114543 = 190,014.69898989…    (which is in the ballpark of the plan and needs confirmation from the trace file).

 

  • 2a)   n1150 = 25: gives us 1 row in every 1,150.    100,000 / 1,150 = 86.9565217…    (echoing the rounded execution plan estimate)
  • 2b)   (100,000/990) * (100,000/1,150) * 0.0188114543 = 165.2301651..    (echoing the rounded execution plan estimate).

Cross-checking against extracts from the 10053 trace files:


Join Card:  190014.689899 = outer (101.010101) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 190015 Computed: 190014.689899

Join Card:  165.230165 = outer (86.956522) * inner (101.010101) * sel (0.018811)
Join Card - Rounded: 165 Computed: 165.230165

Conclusion.

Remembering that we’re still looking at very simple examples with perfect frequency histograms: it looks as if we can work out a “Join Selectivity” (Jsel) – the selectivity of a “pure” unfiltered join of the two tables – by querying the histogram data then use the resulting value to calculate cardinalities for simple two-table equi-joins by multiplying together the individual (filtered) table cardinality estimates and scaling by the Join Selectivity.

Acknowledgements

Most of this work is based on a document written by Chinar Aliyev in 2016 and presented at the Hotsos Symposium the same year. I am most grateful to him for responding to a recent post of mine and getting me interested in spending some time to get re-acquainted with the topic. His original document is a 35 page pdf file, so there’s plenty more material to work through, experiment with, and write about.

 

Join Cardinality

Wed, 2018-10-03 06:01

Following up my “Hacking for Skew” article from a couple of days ago, Chinar Aliyev has written an article about a method for persuading the optimizer to calculate the correct cardinality estimate without using any undocumented, or otherwise dubious, mechanisms. His method essentially relies on the optimizer’s mechanism for estimating join cardinality when there are histograms at both ends of the join, so I thought I’d write a short note describing the simplest possible example of the calculation – an example where the query is a single column equi-join with no nulls in either column and a perfect frequency histograms at both ends of the join.  (For a detailed description of more general cases I always refer to the work done by Alberto Dell’Era a few years ago). We start with two data sets that exhibit a strong skew in their data distributions:


execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 254'
        );
end;
/


I’ve generated two tables of 10,000 randomly generated values using the dbms_random.normal() function, but I’ve scaled the value up by a factor of three and taken the absolute value – which has given me a range of 12 distinct integer values with a nicely skewed distribution. Then I’ve gathered stats requesting histograms of up to 254 buckets. Since I’ve tested this only on versions from 11.2.0.4 onwards this means I’ll get a perfect histogram on the n1 columns on both tables.

Now I’m going run a query that reports the values and frequencies from the two tables by querying user_tab_histograms using a variant of an analytic query I published a long time ago to convert the cumulative frequencies recorded as the endpoint values into simple frequencies. If, for some reason, this query doesn’t run very efficiently in your tests you could always /*+ materialize */ the two factored subqueries (CTEs – common table expressions):


prompt  =======================================================================
prompt  Multiply and sum matching frequencies. An outer join is NOT needed
prompt  because rows that don't match won't contributed to the join cardinality
prompt  =======================================================================

break on report skip 1
compute sum of product on report
column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         4        856        845      723,320
         5        513        513      263,169
         6        294        249       73,206
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                18,746,698

As you can see, the two columns do have a highly skewed data distribution. The pattern of the two data sets is similar though the frequencies aren’t identical, of course. The total I get from this calculation is (I claim) the cardinality (rows) estimate that the optimizer will produce for doing an equi-join on these two tables – so let’s see the test:


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the resulting output:

Session altered.
Session altered.


  COUNT(*)
----------
  18746698


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:03.23 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:03.23 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     18M|     18M|00:00:02.96 |      40 |  2616K|  2616K| 2098K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

As we can see, the estimate for the hash join is “18M” which is in the right ballpark but, in its current format, isn’t entirely helpful which is why I’ve enabled the 10053 trace to get an exact figure from the trace file, and this is what we see:


***********************
Best so far:  Table#: 0  cost: 4.352468  card: 9487.000000  bytes: 28461.000000
              Table#: 1  cost: 378.482370  card: 18467968.000000  bytes: 110807808.000000
***********************

The optimizer’s estimate is exactly the sum of the products of the frequencies of matching values from the (frequency) histogram data. There is a simple rationale for this – it gets the right answer. For each row in t1 with value ‘X’ the (frequency) histogram on t2 tells Oracle how many rows will appear in the join, so multiplying the frequency of ‘X’ in t1 by the frequency of ‘X’ in t2 tells Oracle how many rows the ‘X’s will contribute to the join. Repeat for every distinct value that appears in both (frequency) histograms and sum the results.

As a refinement on this (very simple) example, let’s delete data from the two tables so that we have rows in t1 that won’t join to anything in t2, and vice versa – then re-gather stats, query the histograms, and check the new prediction. We want to check whether a value that appears in the t1 histogram contributes to the join cardinality estimate even if there are no matching values in the t2 histogram (and vice versa):


delete from t1 where n1 = 4;
delete from t2 where n1 = 6;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 254', no_invalidate=>false)
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 254', no_invalidate=>false)

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the output – with a little cosmetic tidying:


856 rows deleted.
249 rows deleted.

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         5        513        513      263,169
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                17,950,172


Session altered.
Session altered.


  COUNT(*)
----------
  17950172


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:02.89 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:02.89 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     17M|     17M|00:00:02.61 |      40 |  2616K|  2616K| 2134K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |   9144 |   9144 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   9751 |   9751 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")


From the 10053 trace file:
***********************
Best so far:  Table#: 0  cost: 4.340806  card: 9144.000000  bytes: 27432.000000
              Table#: 1  cost: 368.100010  card: 17950172.000000  bytes: 107701032.000000
***********************

You can see from the frequency histogram report that we “lost” values 4 and 6 from the report; then the total from the report matches the actual number of rows returned by the query, and the cardinality estimate in the plan is again in the right ballpark – with the trace file showing an exact match.

I’ve run this test on 11.2.0.4,  12.1.0.2,  12.2.0.1 and  18.3.0.0 – and there’s an anomaly that appears in 11.2.0.4 (though maybe that should be “disappeared from”): the optimizer’s estimate for the cardinality was 17,952,157 – too large by 1,985. If you read the original document by Alberto Dell’Era you’ll see that he does mention a component of the calculation that seems to be redundant and possibly shouldn’t be there. I haven’t checked the details yet, but maybe that little difference in 11.2.0.4 is exactly the component that shouldn’t be there and isn’t there any more.

Conclusion:

For an incredibly simple class of queries with perfect frequency histograms there’s a very simple way to calculate the cardinality estimate that the optimizer will predict. Match up rows from the two frequency histograms, multiply the corresponding frequencies (making sure you don’t multiply the cumulative frequencies), and sum.

This is, of course, only a tiny step in the direction of seeing how Oracle uses histograms and covers only a type of query that is probably too simple to appear in a production system, but it’s a basis on which I may build in future notes over the next few weeks.

Case Study

Sun, 2018-09-30 13:59

A question about reading execution plans and optimising queries arrived on the ODC database forum a little while ago; the owner says the following statement is taking 14 minutes to return 30,000 rows and wants some help understanding why.

If you look at the original posting you’ll see that we’ve been given the text of the query and the execution plan including rowsource execution stats. There’s an inconsistency between the supplied information and the question asked, and I’ll get back to that shortly, but to keep this note fairly short I’ve excluded the 2nd half of the query (which is a UNION ALL) because the plan says the first part of the query took 13 minutes and 20 second and the user is worried about a total of 14 minutes.

SELECT /*+ gather_plan_statistics*/ DISTINCT
                rct.org_id,
                hzp.party_name,
                hca.account_number,
                rct.interface_header_attribute1 order_number,
                rct.customer_trx_id,
                rct.trx_number,
                rct.trx_date,
                rctd.gl_date,
                rct.creation_date,
                rctl.line_number,
                rct.invoice_currency_code inv_currency,
                (
                       SELECT SUM (rct_1.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_1
                       WHERE  rct_1.customer_trx_id = rct.customer_trx_id
                       AND    rct_1.line_type = 'LINE') inv_net_amount,
                (
                       SELECT SUM (rct_2.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_2
                       WHERE  rct_2.customer_trx_id = rct.customer_trx_id
                       AND    rct_2.line_type = 'TAX') inv_tax_amount,
                (
                       SELECT SUM (rct_3.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_3
                       WHERE  rct_3.customer_trx_id = rct.customer_trx_id) inv_gross_amount,
                gll.currency_code                                    func_currency,
                Round((
                        (
                        SELECT SUM (rct_4.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_4
                        WHERE  rct_4.customer_trx_id = rct.customer_trx_id
                        AND    rct_4.line_type = 'LINE')*gdr.conversion_rate),2) func_net_amount,
                Round((
                        (
                        SELECT SUM (rct_5.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_5
                        WHERE  rct_5.customer_trx_id = rct.customer_trx_id
                        AND    rct_5.line_type = 'TAX')*gdr.conversion_rate),2) func_tax_amount,
                Round((
                        (
                        SELECT SUM (rct_6.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_6
                        WHERE  rct_6.customer_trx_id = rct.customer_trx_id)*gdr.conversion_rate),2) func_gross_amount,
                glcc.segment1                                                                 company,
                glcc.segment2                                                                 account,
                hg.geography_name                                                             billing_country,
                gdr.conversion_rate
FROM            apps.hz_parties hzp,
                apps.hz_cust_accounts hca,
                apps.ra_customer_trx_all rct,
                apps.ra_customer_trx_lines_all rctl,
                apps.ra_cust_trx_line_gl_dist_all rctd,
                apps.gl_code_combinations_kfv glcc,
                apps.hz_cust_site_uses_all hcsua,
                apps.hz_cust_acct_sites_all hcasa,
                apps.hz_party_sites hps,
                apps.hz_locations hl,
                apps.hz_geographies hg,
                apps.gl_ledgers gll,
                apps.gl_daily_rates gdr
WHERE           hzp.party_id = hca.party_id
AND             hca.cust_account_id = rct.bill_to_customer_id
AND             hca.cust_account_id = hcasa.cust_account_id
AND             rct.customer_trx_id = rctl.customer_trx_id
AND             rctl.customer_trx_line_id = rctd.customer_trx_line_id
AND             glcc.code_combination_id = rctd.code_combination_id
AND             rct.bill_to_site_use_id = hcsua.site_use_id
AND             hcsua.cust_acct_site_id = hcasa.cust_acct_site_id
AND             hcasa.party_site_id = hps.party_site_id
AND             hps.location_id = hl.location_id
AND             hl.country = hg.country_code
AND             hg.geography_type = 'COUNTRY'
AND             rctl.line_type = 'TAX'
AND             gll.ledger_id = rct.set_of_books_id
AND             gdr.from_currency = rct.invoice_currency_code
AND             gdr.to_currency = gll.currency_code
AND             to_date(gdr.conversion_date) = to_date(rctd.gl_date)
AND             gdr.conversion_type = 'Corporate'
AND             rctd.gl_date BETWEEN To_date ('01-JAN-2018', 'DD-MON-YYYY') AND  To_date ('31-JAN-2018', 'DD-MON-YYYY')
AND             glcc.segment1 = '2600'
AND             glcc.segment2 = '206911'
GROUP BY        hzp.party_name,
                hca.account_number,
                rct.interface_header_attribute1,
                rct.trx_number,
                rct.trx_date,
                rct.creation_date,
                rctl.line_number,
                rctl.unit_selling_price,
                rct.org_id,
                rctd.gl_date,
                rct.customer_trx_id,
                glcc.segment1,
                glcc.segment2,
                hg.geography_name,
                rct.invoice_currency_code,
                gll.currency_code,
                gdr.conversion_rate 

We note that there are six scalar subqueries in the text I’ve reported – and they form two groups of three, and the difference between the two groups is that one group is multiplied by a conversion rate while the other isn’t; moreover in each group the three subqueries are simply querying subsets of the same correlated data set. So it looks as if all 6 scalar subqueries could be eliminated and replaced by the inclusion of an aggregate view in the from clause and the projection of 6 columns from that view.

However, before pursuing that option, take a look at the plan with the rowsource execution stats – where is the time going ?


-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                                           |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   1 |  UNION-ALL                                                 |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   2 |   HASH UNIQUE                                              |                              |      1 |      1 |    501 |00:13:20.17 |    3579K|  
|   3 |    HASH GROUP BY                                           |                              |      1 |      1 |  19827 |00:13:20.15 |    3579K|  
|   4 |     NESTED LOOPS                                           |                              |      1 |        |  21808 |00:13:10.26 |    3579K|  
|   5 |      NESTED LOOPS                                          |                              |      1 |      1 |  21808 |00:13:10.11 |    3578K|  
|   6 |       NESTED LOOPS OUTER                                   |                              |      1 |      1 |  21808 |00:13:09.90 |    3576K|  
|   7 |        NESTED LOOPS OUTER                                  |                              |      1 |      1 |  21808 |00:13:09.25 |    3501K|  
|   8 |         NESTED LOOPS OUTER                                 |                              |      1 |      1 |  21808 |00:13:08.48 |    3426K|  
|   9 |          NESTED LOOPS OUTER                                |                              |      1 |      1 |  21808 |00:13:07.66 |    3333K|  
|  10 |           NESTED LOOPS OUTER                               |                              |      1 |      1 |  21808 |00:13:06.92 |    3258K|  
|  11 |            NESTED LOOPS OUTER                              |                              |      1 |      1 |  21808 |00:13:06.08 |    3183K|  
|  12 |             NESTED LOOPS                                   |                              |      1 |      1 |  21808 |00:13:04.69 |    3090K|  
|  13 |              NESTED LOOPS                                  |                              |      1 |      1 |  21808 |00:13:05.75 |    3026K|  
|  14 |               NESTED LOOPS                                 |                              |      1 |      1 |  21808 |00:13:03.30 |    2961K|  
|  15 |                NESTED LOOPS                                |                              |      1 |      1 |  33459 |00:00:04.33 |    1123K|  
|  16 |                 NESTED LOOPS                               |                              |      1 |    351 |  33459 |00:00:03.67 |    1025K|  
|  17 |                  NESTED LOOPS                              |                              |      1 |    351 |  33459 |00:00:03.06 |     926K|  
|  18 |                   NESTED LOOPS                             |                              |      1 |    351 |  33459 |00:00:02.47 |     827K|  
|* 19 |                    HASH JOIN                               |                              |      1 |    351 |  33459 |00:00:01.90 |     730K|  
|  20 |                     TABLE ACCESS FULL                      | GL_LEDGERS                   |      1 |     38 |     39 |00:00:00.01 |      15 |  
|  21 |                     NESTED LOOPS                           |                              |      1 |        |  33459 |00:00:01.75 |     730K|  
|  22 |                      NESTED LOOPS                          |                              |      1 |    351 |  33459 |00:00:01.44 |     696K|  
|  23 |                       NESTED LOOPS                         |                              |      1 |    351 |  33459 |00:00:01.11 |     646K|  
|* 24 |                        HASH JOIN                           |                              |      1 |    385 |  33459 |00:00:00.40 |     526K|  
|* 25 |                         TABLE ACCESS BY INDEX ROWID BATCHED| GL_CODE_COMBINATIONS         |      1 |     35 |      1 |00:00:00.01 |     108 |  
|* 26 |                          INDEX RANGE SCAN                  | GL_CODE_COMBINATIONS_N2      |      1 |    499 |     77 |00:00:00.01 |       3 |  
|* 27 |                         TABLE ACCESS BY INDEX ROWID BATCHED| RA_CUST_TRX_LINE_GL_DIST_ALL |      1 |    651K|   1458K|00:00:02.22 |     526K|  
|* 28 |                          INDEX RANGE SCAN                  | RA_CUST_TRX_LINE_GL_DIST_N2  |      1 |    728K|   1820K|00:00:01.60 |   11147 |  
|* 29 |                        TABLE ACCESS BY INDEX ROWID         | RA_CUSTOMER_TRX_LINES_ALL    |  33459 |      1 |  33459 |00:00:00.53 |     119K|  
|* 30 |                         INDEX UNIQUE SCAN                  | RA_CUSTOMER_TRX_LINES_U1     |  33459 |      1 |  33459 |00:00:00.31 |   86364 |  
|* 31 |                       INDEX UNIQUE SCAN                    | RA_CUSTOMER_TRX_U1           |  33459 |      1 |  33459 |00:00:00.21 |   49850 |  
|  32 |                      TABLE ACCESS BY INDEX ROWID           | RA_CUSTOMER_TRX_ALL          |  33459 |      1 |  33459 |00:00:00.20 |   33459 |  
|  33 |                    TABLE ACCESS BY INDEX ROWID             | HZ_CUST_ACCOUNTS             |  33459 |      1 |  33459 |00:00:00.42 |   97887 |  
|* 34 |                     INDEX UNIQUE SCAN                      | HZ_CUST_ACCOUNTS_U1          |  33459 |      1 |  33459 |00:00:00.24 |   64428 |  
|  35 |                   TABLE ACCESS BY INDEX ROWID              | HZ_PARTIES                   |  33459 |      1 |  33459 |00:00:00.44 |   98783 |  
|* 36 |                    INDEX UNIQUE SCAN                       | HZ_PARTIES_U1                |  33459 |      1 |  33459 |00:00:00.26 |   65175 |  
|  37 |                  TABLE ACCESS BY INDEX ROWID               | HZ_CUST_SITE_USES_ALL        |  33459 |      1 |  33459 |00:00:00.46 |   98374 |  
|* 38 |                   INDEX UNIQUE SCAN                        | HZ_CUST_SITE_USES_U1         |  33459 |      1 |  33459 |00:00:00.28 |   64915 |  
|* 39 |                 TABLE ACCESS BY INDEX ROWID                | HZ_CUST_ACCT_SITES_ALL       |  33459 |      1 |  33459 |00:00:00.45 |   98195 |  
|* 40 |                  INDEX UNIQUE SCAN                         | HZ_CUST_ACCT_SITES_U1        |  33459 |      1 |  33459 |00:00:00.26 |   64736 |  
|  41 |                TABLE ACCESS BY INDEX ROWID BATCHED         | GL_DAILY_RATES               |  33459 |      1 |  21808 |00:12:44.59 |    1838K|  
|* 42 |                 INDEX RANGE SCAN                           | GL_DAILY_RATES_U1            |  33459 |      1 |  21808 |00:13:08.16 |    1837K|  
|  43 |               TABLE ACCESS BY INDEX ROWID                  | HZ_PARTY_SITES               |  21808 |      1 |  21808 |00:00:00.35 |   64339 |  
|* 44 |                INDEX UNIQUE SCAN                           | HZ_PARTY_SITES_U1            |  21808 |      1 |  21808 |00:00:00.23 |   42531 |  
|  45 |              TABLE ACCESS BY INDEX ROWID                   | HZ_LOCATIONS                 |  21808 |      1 |  21808 |00:00:00.33 |   64353 |  
|* 46 |               INDEX UNIQUE SCAN                            | HZ_LOCATIONS_U1              |  21808 |      1 |  21808 |00:00:00.18 |   42545 |  
|  47 |             VIEW PUSHED PREDICATE                          | VW_SSQ_1                     |  21808 |      1 |  21808 |00:00:01.17 |   93476 |  
|  48 |              SORT GROUP BY                                 |                              |  21808 |      1 |  21808 |00:00:01.06 |   93476 |  
|  49 |               TABLE ACCESS BY INDEX ROWID BATCHED          | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |     16 |    145K|00:00:00.84 |   93476 |  
|* 50 |                INDEX RANGE SCAN                            | XXC_CUSTOMER_GETPAID         |  21808 |     16 |    145K|00:00:00.36 |   59938 |  
|  51 |            VIEW PUSHED PREDICATE                           | VW_SSQ_2                     |  21808 |      1 |  21808 |00:00:00.69 |   74433 |  
|  52 |             SORT GROUP BY                                  |                              |  21808 |      1 |  21808 |00:00:00.59 |   74433 |  
|  53 |              TABLE ACCESS BY INDEX ROWID BATCHED           | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  92201 |00:00:00.49 |   74433 |  
|* 54 |               INDEX RANGE SCAN                             | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  92201 |00:00:00.24 |   59903 |  
|  55 |           VIEW PUSHED PREDICATE                            | VW_SSQ_3                     |  21808 |      1 |  21808 |00:00:00.61 |   74852 |  
|  56 |            SORT GROUP BY                                   |                              |  21808 |      1 |  21808 |00:00:00.51 |   74852 |  
|  57 |             TABLE ACCESS BY INDEX ROWID BATCHED            | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  53060 |00:00:00.38 |   74852 |  
|* 58 |              INDEX RANGE SCAN                              | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  53060 |00:00:00.19 |   59148 |  
|  59 |          VIEW PUSHED PREDICATE                             | VW_SSQ_4                     |  21808 |      1 |  21808 |00:00:00.70 |   93490 |  
|  60 |           SORT GROUP BY                                    |                              |  21808 |      1 |  21808 |00:00:00.61 |   93490 |  
|  61 |            TABLE ACCESS BY INDEX ROWID BATCHED             | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |     16 |    145K|00:00:00.63 |   93490 |  
|* 62 |             INDEX RANGE SCAN                               | XXC_CUSTOMER_GETPAID         |  21808 |     16 |    145K|00:00:00.25 |   59950 |  
|  63 |         VIEW PUSHED PREDICATE                              | VW_SSQ_5                     |  21808 |      1 |  21808 |00:00:00.63 |   74427 |  
|  64 |          SORT GROUP BY                                     |                              |  21808 |      1 |  21808 |00:00:00.54 |   74427 |  
|  65 |           TABLE ACCESS BY INDEX ROWID BATCHED              | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  92201 |00:00:00.44 |   74427 |  
|* 66 |            INDEX RANGE SCAN                                | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  92201 |00:00:00.21 |   59900 |  
|  67 |        VIEW PUSHED PREDICATE                               | VW_SSQ_6                     |  21808 |      1 |  21808 |00:00:00.59 |   74846 |  
|  68 |         SORT GROUP BY                                      |                              |  21808 |      1 |  21808 |00:00:00.50 |   74846 |  
|  69 |          TABLE ACCESS BY INDEX ROWID BATCHED               | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  53060 |00:00:00.35 |   74846 |  
|* 70 |           INDEX RANGE SCAN                                 | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  53060 |00:00:00.17 |   59144 |  
|* 71 |       INDEX RANGE SCAN                                     | HZ_GEOGRAPHIES_N11           |  21808 |   5812 |  21808 |00:00:00.13 |    2684 |  
|  72 |      TABLE ACCESS BY INDEX ROWID                           | HZ_GEOGRAPHIES               |  21808 |    168 |  21808 |00:00:00.07 |     620 |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  

Let’s start by raising some concerns about the quality of information available.

First, the OP says it takes 14 minutes to return 30,000 rows: but the top line of the plan says it has taken 13 minutes and 20 seconds to return the first 501 rows, and if we look a little further down the plan operation 3 (Hash Group By) reports 00:13:20.15 to aggregate down to 19,827 rows. So this half of the plan cannot return more than 19,827 rows, and the half I have discarded (for the moment) must be returning the other 10,000+ rows. The information we have is incomplete.

Of course you may think that whatever the rest of the plan does is fairly irrelevant – it’s only going to be responsible for at most another 40 seconds of processing – except my previous experience of rowsource execution statistics tells me that when you do a large number of small operations the times reported can be subject to fairly large rounding errors and that enabling the measurement can increase the execution time by a factor of three or four. It’s perfectly feasible that this half of the query is actually the faster half under normal run-time circumstances but runs much more slowly (with a much higher level of CPU utilisation) when rowsource execution stats is in enabled. So let’s not get too confident.

With that warning in mind, what can we see in this half of the plan.

Big picture: the inline scalar subqueries have disappeared. In 12c the optimimzer can unnest scalar subqueries in the select list and turn them into outer joins, and we can see that there are 6 “Nested Loop Outer” operations, corresponding to 6 “View Pushed Predicate” operations against views labelled VW_SSQ1 through to VW_SSQ6 (SSQ = Scalar Sub Query ?). This goes back to my early comment – a person could probably rewrite the 6 scalar subqueries as a single aggregate view in the from clause: the optimizer isn’t quite clever enough to manage that in this case, but in simpler cases it might be able to do exactly that.

Big picture 2: most of the 13 minutes 20 seconds appears at operation 14 as it processes the 33,459 rows supplied to it from the 4.33 seconds of work done by operation 15 and its descendants. Reducing this part of the execution plan to the smallest relevant section we get the following:

-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|  14 |               NESTED LOOPS                                 |                              |      1 |      1 |  21808 |00:13:03.30 |    2961K|  
|  15 |                NESTED LOOPS                                |                              |      1 |      1 |  33459 |00:00:04.33 |    1123K|  
|  41 |                TABLE ACCESS BY INDEX ROWID BATCHED         | GL_DAILY_RATES               |  33459 |      1 |  21808 |00:12:44.59 |    1838K|  
|* 42 |                 INDEX RANGE SCAN                           | GL_DAILY_RATES_U1            |  33459 |      1 |  21808 |00:13:08.16 |    1837K|  
-----------------------------------------------------------------------------------------------------------------------------------------------------  

For each row supplied by operation 15 Oracle calls operation 41, which calls operation 42 to do an index range scan to supply a set of rowids so that operation 41 can access a table and return rows. Apparently the total time spent by operation 41 waiting for operation 42 to return rowids and then doing its own work is 12 minutes 44 seconds, while the range scans alone (all 33,459 of them) take 13 minutes and 8 seconds. Remember, though, that “lots of small operations = scope of rounding errors” when you look at these timings. Despite the inconsistency between the timings for operations 41 and 42 it’s reasonable to conclude that between them that’s where most of the execution time went.

Two questions – (a) can we refine our analysis of how the time is split between the two operations and (b) why do these lines take so much time.

Check the Starts and the A-rows: (reminder: for comparison, we expect A-rows to be approximately E-rows * Starts) for both operations we see 33,459 starts and 21,808 rows. The index range scans return (on average) a single rowid about two-thirds of the time, and every time a range scan returns a rowid the corresponding row is returned from the table (If you check the Id column there’s no asterisk on operation 41 so no extra predicate is applied as Oracle accesses the table row – but even if there were an extra predicate we’d still be happy to infer that if 21,808 rowids returned from operation 42 turned into 21,808 rows returned from the table then there are no wasted accesses to the table).

Now look at the Buffers for the index range scan – 1.837M: that’s roughly 56 buffers per range scan – that’s a lot of index to range through to find one rowid, which is a good clue that perhaps we do a lot of work with each Start and really do use up a lot of CPU on this operation. Let’s see what the Predicate Section of the plan tells us about this range scan:


Predicate Information (identified by operation id):  
---------------------------------------------------  
  42 - access("GDR"."FROM_CURRENCY"="RCT"."INVOICE_CURRENCY_CODE" AND "GDR"."TO_CURRENCY"="GLL"."CURRENCY_CODE" AND   
              "GDR"."CONVERSION_TYPE"='Corporate')  
       filter(("GDR"."CONVERSION_TYPE"='Corporate' AND TO_DATE(INTERNAL_FUNCTION("GDR"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("RCTD"."  
              GL_DATE"))))  

We have access predicates (things which narrow down the number of leaf blocks that we walk through) and filter predicates (things we do to test every key entry we access). Notably the gdr.conversion type is a filter predciate as well as an access predicate – and that suggests that our set of predicates has “skipped over” a column in the index: from_currency and to_currency might be the first two columns in the index, but conversion_type is then NOT the third.

More significantly, though, there’s a column called conversion_date in the index (maybe that’s column 3 in the index – it feels like it ought to be); but for every index entry we’ve selected from the 56 blocks we walk through we do some sort of internal conversion (or un-translated transformation) to the column then convert the result to a date to compare it with another date (similarly processed from an earlier operation). What is that “internal function” – let’s check the query:


AND             gdr.from_currency = rct.invoice_currency_code
AND             gdr.to_currency = gll.currency_code
AND             gdr.conversion_type = 'Corporate'
AND             to_date(gdr.conversion_date) = to_date(rctd.gl_date)
AND             rctd.gl_date BETWEEN To_date ('01-JAN-2018', 'DD-MON-YYYY') AND  To_date ('31-JAN-2018', 'DD-MON-YYYY')

(I’ve swapped the order of a couple of lines to highlight a detail).

The filter predicate is comparing gdr.conversion_date with rctd.gl_date – and we can probably assume that both columns really are dates because (a) the word “date” is in their names and (b) the rctd.gl_date is being compared with genuine date values in the next predicate down (and – though I haven’t shown it – the way the plan reports the next predicate proves that the column really is a date datatype).

So the predicate in the SQL applies the to_date() function to two columns that are dates – which means the optimizer has to convert the date columns to some default character format and then convert them back to dates. The “internal function” is a to_char() call. Conversions between date and character formats are CPU-intensive, and we’re doing a double conversion (to_date(to_char(column_value)) to every data value in roughly 56 blocks of an index each time we call that line of the plan. It’s not surprising we spend a lot of time in that line.

Initial strategy:

Check the column types for those two columns, if they are both date types decide whether or not the predicate could be modified to a simple gdr.conversion_date = rctd.gl_date (though it’s possible that something slightly more sophisticated should be used) but whatever you do avoid the redundant conversion through character format.

Warning

Simply eliminating the to_date() calls may changes the results. Here’s a demonstration of how nasty things happen when you apply to_date() to a date:


SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 D1                                     DATE
 D2                                     DATE

SQL> insert into t1 values(sysdate, sysdate + 10/86400);

1 row created.

SQL> select * from t1 where d1 = d2;

no rows selected

SQL> select * from t1 where to_date(d1) = to_date(d2);

D1        D2
--------- ---------
30-SEP-18 30-SEP-18

1 row selected.

SQL> alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';

Session altered.

SQL> select * from d1 where to_date(d1) = to_date(d2);

no rows selected

Different users could get different results because they have different settings for their nls_date_format.

Reminder

I started my analysis with two comments about the quality of information – first, we don’t really know whether or not this half of the union all would be responsble for most of the time if rowsource execution statistics were not enabled; secondly large number of small operations can lead to a lot of rounding errors in timing. There are six occurrences of unnested scalar subqueries which are all called 21,808 times – and the recorded time for all 6 of them is remarkably small given the number of executions, even when allowing for the precision with which they operate; it’s possible that these subqueries take a larger fraction of the total time than the plan indicates, in which case it might become necessary (rather than just nice) to do a manual unnesting and reduce the number of inline views to 3 (one for each line_type), 2 (one with, one without, conversion_rate) or just one.

Footnote

Once again I’ve spent a couple of hours writing notes to explain the thoughts that went through my mind in roughly 10 minutes of reading the original posting. It’s a measure really of how many bits of information you can pull together, and possibly discard, very quickly once you understand how many things the optimizer is capable of doing and how the execution plan tries to show you how a statement was (or will be) handled.

 

Hacking for Skew

Fri, 2018-09-28 07:23

In my presentation to the UKOUG SIG yesterday “Struggling with Statistics – part 2” I described a problem that I wrote about a few months ago: when you join a fact table with a massively skewed distribution on one of the surrogate key columns to a dimension holding the unique list of keys and descriptions a query against a description “loses” the skew. Here’s an demo of the problem that’s a little simpler than the one in the previous article.


rem
rem     Script:         bitmap_join_histogram.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem     Updated:        Sep 2018
rem 

execute dbms_random.seed(0)

create table facts
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 --> comment to avoid wordpress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      id_status,
        lpad(rownum,10,'0')                     v1,
        lpad('x',100,'x')                       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 --> comment to avoid wordpress format issue
;

alter table facts add constraint fct_pk primary key(id);
alter table facts modify id_status not null;

create table statuses
as
select
        id,
        chr(65 + id)            status_code,
        rpad('x',100,'x')       description
from    (
        select
                distinct(id_status)             id
        from
                facts
        )
;

alter table statuses modify status_code not null;

alter table statuses add constraint sta_pk primary key (id);
alter table facts add constraint fct_fk_sta foreign key (id_status) references statuses(id);

create bitmap index fct_b1 on facts(id_status);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'facts',
                method_opt       => 'for all columns size skewonly'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'statuses',
                method_opt       => 'for all columns size 254'
        );
end;
/

The definition of the facts.id_status column means I get a nice skewing effect on the data and this is what my data looks like:


select id_status, count(*) from facts group by id_status order by id_status;

 ID_STATUS   COUNT(*)
---------- ----------
         0      26050
         1      23595
         2      18995
         3      13415
         4       8382
         5       4960
         6       2643
         7       1202
         8        490
         9        194
        10         55
        11         17
        12          2

13 rows selected.

The statuses table translates the numbers 0 – 12 into the letters ‘A’ – ‘M’.

A quick check will show you that there are 55 rows for id_status = 10, which means 55 rows for status_code = ‘K’. So what happens when we write the two queries that should show us these results. I don’t really care what the execution plans are at this point, I’m interested only in the optimizer’s estimate of cardinality – so here are two queries, each followed by its execution plan:


select
        sum(fct.id)
from
        facts   fct
where
        fct.id_status = 10
;


-----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |     1 |     8 |    12   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |        |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| FACTS  |    55 |   440 |    12   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |        |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE        | FCT_B1 |       |       |            |          |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("FCT"."ID_STATUS"=10)


select
        sum(fct.id)
from
        facts           fct,
        statuses        sta
where
        fct.id_status = sta.id
and     sta.status_code = 'K'
;

--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     1 |    13 |   233   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE     |          |     1 |    13 |            |          |
|*  2 |   HASH JOIN         |          |  7692 | 99996 |   233   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| STATUSES |     1 |     5 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| FACTS    |   100K|   781K|   229   (3)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FCT"."ID_STATUS"="STA"."ID")
   3 - filter("STA"."STATUS_CODE"='K')

The estimated cardinality for the query against the base column reflects the value 55 from the histogram, but the estimated cardinality of the join is 7,692 – which is num_rows(facts) / num_distinct(id_status). Oracle has lost information about the skew. There is a way to get Oracle to produce a correct estimate (shown in the previous article) by rewriting the join as an IN subquery with the (undocumented) “precompute_subquery” hint, but there is an alternative which David Kurtz hypothesized in a conversation after the presentation was over (in fact someone else had described their use of exactly his suggested approach in a comment on a much older blog note about this problem): take the histogram from the id_status column on the facts table and “apply it” to the status_code column on the statuses table. In discussion with David I expressed the opinion that this probably shouldn’t work, and it wasn’t really a bit of fakery I’d want to apply to a production system – but we both tried it when we got home … with differing degrees of success.

Here’s a piece of code that I inserted into my script immediately after gathering stats on the statuses table. I’ll explain the details below as it makes a couple of assumptions that need to be pointed out:


declare

        srec                    dbms_stats.statrec;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;

        c_array                 dbms_stats.chararray;

begin

        dbms_stats.get_column_stats(
                ownname         => 'test_user',
                tabname         => 'facts',
                colname         => 'id_status',
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        srec.bkvals := dbms_stats.numarray();
        c_array     := dbms_stats.chararray();

        for r in (
                select  stt.status_code, count(*) ct
                from    facts fct, statuses stt
                where   stt.id = fct.id_status
                group by
                        stt.status_code
                order by
                        stt.status_code
        ) loop

                c_array.extend;
                c_array(c_array.count) := r.status_code;
                srec.bkvals.extend;
                srec.bkvals(srec.bkvals.count) := r.ct;

        end loop;

        dbms_stats.prepare_column_values(srec, c_array);

        dbms_stats.set_column_stats(
                ownname         => 'test_user',
                tabname         => 'statuses',
                colname         => 'status_code',
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

end;
/

alter system flush shared_pool;

The code isn’t intended to be efficient, and I’ve been a bit lazy in setting up the content.

The first step gets the column stats from facts.id_status – and I know that I’ve got a frequency histogram that covers exactly the right number of distinct values on that column so almost everything is set up correctly to copy the stats across to statuses.status_code, except one column is numeric and the other is character and (although I know it’s true because of the way I defined the status_code values) I need to ensure that the bucket values I write to the status_code need to be arranged in alphabetic order of status_code.

So my second step is to run a query against the facts table to get the counts of status_code in alphabetical order and copy the results in order into a pair of arrays – one being a standalone array of the type defined in the dbms_stats package as an array of character types, the other being the array of bucket values that already exists in the stats record for the facts.id_status column that I’ve pulled into memory. (The bucket values array is stored as cumulative frequency values, so I do have to overwrite it with the simple frequency values at this point).

Finally I “prepare column values” and “set column stats” into the correct column, and the job is done. The flush of the shared pool is there to avoid any accidents of cursors surviving previous tests and causing confusion.

So what happens when I run a couple of queries with these faked stats in place ?

set autotrace traceonly explain

select  
        sum(fct.id)
from
        facts           fct,
        statuses        sta
where
        fct.id_status = sta.id
and     sta.status_code = 'K'
;


--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     1 |    14 |   233   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE     |          |     1 |    14 |            |          |
|*  2 |   HASH JOIN         |          |    55 |   770 |   233   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| STATUSES |     1 |     6 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| FACTS    |   100K|   781K|   229   (3)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FCT"."ID_STATUS"="STA"."ID")
   3 - filter("STA"."STATUS_CODE"='K')



select
        sum(fct.id)
from
        facts           fct,
        statuses        sta
where
        fct.id_status = sta.id
and     sta.status_code = 'D'
;


--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     1 |    14 |   233   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE     |          |     1 |    14 |            |          |
|*  2 |   HASH JOIN         |          | 13415 |   183K|   233   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| STATUSES |     2 |    12 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| FACTS    |   100K|   781K|   229   (3)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FCT"."ID_STATUS"="STA"."ID")
   3 - filter("STA"."STATUS_CODE"='D')

Querying for ‘K’ the prediction is 55 rows, querying for ‘D’ the prediction is for 13,415 rows – both estimates are exactly right. Wow !!!

Problem – that’s not what David Kurtz saw. In an email to me he said: “To my surprise, if I fake a histogram on the dimension table using the skew on the join column from the fact table I do get the correct number of rows calculated in the execution plan (provided it is less than the value if the histogram was not present)”. To make that concrete – when he queried for ‘K’ he got the correct prediction, when he queried for ‘D’ he was back to a prediction of 7,692. Looking at the report of the actual data, he’d get the right prediction for codes ‘F’ to ‘M’ and the wrong prediction for codes ‘A’ to ‘E’.

So what went wrong (and with whom) ?

When I run up new tests I tend to test Oracle versions in the order 12.1.0.2, then 11.2.0.4, then 12.2.0.1, then 18.3.0.0 – it’s the order of popularity that I currently see. So I was running my test on 12.1.0.2; David was running his test on 18.3.0.0. So I jumped a step and ran my test on 12.2.0.1: here are my results when querying for status_code = ‘D’:


--------------------------------------------------------------------------------
| Id  | Operation	    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |	       |     1 |    14 |   233	 (4)| 00:00:01 |
|   1 |  SORT AGGREGATE     |	       |     1 |    14 |	    |	       |
|*  2 |   HASH JOIN	    |	       |  7692 |   105K|   233	 (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| STATUSES |     1 |     6 |     2	 (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| FACTS    |   100K|   781K|   229	 (3)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FCT"."ID_STATUS"="STA"."ID")
   3 - filter("STA"."STATUS_CODE"='D')

As David has seen with 18.3, Oracle used the num_distinct to estimate the cardinality for  ‘D’. (It still used the value indicated by the histogram for ‘K’.) When I set the optimizer_features_enable parameter back to 12.1.0.2 the cardinality estimate for ‘D’ wentback to 13,415 – so it looks as if this is a deliberate piece of coding. 172 fix controls and 31 optimizer state parameters changed, but none of the more likely looking candidates had any effect when I tried testing them separately; possibly there’s a new sanity check when the number of rows recorded for the table is a long way off the total histogram bucket count.

I took a quick look at the 10053 trace in 12.2, with and without the change to optimizer_features_enable. The key difference was in the single table access path analysis – which didn’t give me any further clues.

With optimizer_features_enable = 12.1.0.2
=========================================
Access path analysis for STATUSES
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for STATUSES[STA]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

 kkecdn: Single Table Predicate:"STA"."STATUS_CODE"='K'
  Estimated selectivity: 5.5000e-04 , endpoint value predicate, col: #2

Access path analysis for STATUSES
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for STATUSES[STA]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

 kkecdn: Single Table Predicate:"STA"."STATUS_CODE"='D'
  Estimated selectivity: 0.134150 , endpoint value predicate, col: #2


With optimizer_features_enable defaulting to 12.2.0.1
=====================================================
Access path analysis for STATUSES
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for STATUSES[STA]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

 kkecdn: Single Table Predicate:"STA"."STATUS_CODE"='K'
  Estimated selectivity: 5.5000e-04 , endpoint value predicate, col: #2


Access path analysis for STATUSES
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for STATUSES[STA]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

 kkecdn: Single Table Predicate:"STA"."STATUS_CODE"='D'
  Estimated selectivity: 0.076923 , endpoint value predicate, col: #2


Bottom line on this – there’s at least one person who already uses this method to work around the optimizer limitation, they need to be careful when they upgrade to 12.2 (or above) as the method no longer works in all cases.

 

 

Column Group Catalog

Thu, 2018-09-27 11:16

I seem to have written a number of aricles about column groups – the rather special, and most useful, variant on extended stats. To make it as easy as possible to find the right article I’ve decided to produce a little catalogue (catalog) of all the relevant articles, with a little note about the topic each article covers. Some of the articles will link to others in the list, and there are a few items in the list from other blogs. There are also a few items which are the titles of drafts which have been hanging around for the last few years.

 

Column Stats

Wed, 2018-09-12 07:46

A little while ago I added a post-script to a note I’d written five years ago about gathering stats on a virtual column and had updated with a reference to a problem on the Oracle database forum that complained that stats collection had taken much longer after the addition of a function-based index. The problem related to the fact that the function-based index was supported by a virtual column that used an instr() function on a CLOB (XML) column – and gathering stats on the virtual column meant applying the function to every CLOB in the table.

So – my post-script, add about a month ago, suggested adding a preference (dbms_stats.set_table_preference) to avoid gathering stats on that column. There’s a problem with this suggestion – it doesn’t work because Oracle doesn’t play nicely when you try to limit the stats collection to a few columns – even in version 18.3. Here’s a demonstration of the effect; first we create a table that includes a column group (extended stats), a virtual column, and a function-based index – i.e. the three different ways of generating user-related virtual columns.


rem
rem     Script:         stats_struggle_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2
from
        generator       v1
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

execute dbms_stats.delete_table_stats(user,'t1')

begin
        dbms_output.put_line(
                dbms_stats.create_extended_stats(
                        ownname         => user,
                        tabname         => 'T1',
                        extension       => '(v1, v2)'
                )
        );
end;
/

alter table t1 add id_12 
        generated always as (mod(id,12)) virtual
;

create index t1_id on t1(mod(id,10));


Since I’ve run this on 12c and 18c I’ve included a call to delete table stats after creating the table. So the next step is to enable SQL trace and see what Oracle does under the covers when we gather stats on just a couple of columns in the table:


alter session set events '10046 trace name context forever';

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 't1',
                method_opt  => 'for columns size 1 id v1',
                cascade     => false
        );
end;
/

alter session set events '10046 trace name context off';

column column_name  format a32
column data_default format a32

select 
        column_name, data_default,
        num_nulls, num_distinct, to_char(last_analyzed,'hh24:mi:ss') gathered
from    user_tab_cols 
where   table_name = 'T1' 
order by 
        internal_column_id
;

COLUMN_NAME                      DATA_DEFAULT                      NUM_NULLS NUM_DISTINCT GATHERED
-------------------------------- -------------------------------- ---------- ------------ --------
ID                                                                         0        10000 16:13:12
V1                                                                         0        10000 16:13:12
V2
SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7   SYS_OP_COMBINED_HASH("V1","V2")
ID_12                            MOD("ID",12)
SYS_NC00006$                     MOD("ID",10)

According to the output of the last query we’ve gathered stats only on the two columns specified. But have we really avoided the work ? Here, with some cosmetic tidying, is the SQL executed by the package:

select 
        /*+
                full(t) no_parallel(t) no_parallel_index(t) dbms_stats
                cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
                xmlindex_sel_idx_tbl no_substrb_pad 
         */
        to_char(count(ID)),
        substrb(dump(min(ID),16,0,64),1,240),
        substrb(dump(max(ID),16,0,64),1,240),
        to_char(count(V1)),
        substrb(dump(min(V1),16,0,64),1,240),
        substrb(dump(max(V1),16,0,64),1,240),
        to_char(count(V2)),
        to_char(count(SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7)),
        to_char(count(ID_12)),
        to_char(count(SYS_NC00006$))
from
        TEST_USER.T1 t  /* NDV,NIL,NIL,NDV,NIL,NIL,ACL,ACL,ACL,ACL*/

We can see that Oracle has done a count(), min() and max() on ID and V1, and the “comment” at the end of the text tells us that it’s applied the approximate_ndv mechanism to the first two columns queried but not the rest. However it has count()ed all the other columns – which means it’s evaluated their underlying expressions. So if you were hoping that limiting the columns gathered would avoid a really expensive function call, bad luck.

Threat / Bug alert

One odd little detail that came up when I ran a test case that used a deterministic function to generate a virtual column: in 12.1.0.2 the function was called once per row (possibly because every row had a different value) whether or not it was in the list of columns for gathering stats; in 18.3 the function was called nearly twice per row when I didn’t specificy stats gathering for the column and nearly 4 times per row when I did. This looks like it might be a change (possibly accidental) to how deterministic functions can cache their inputs and outputs – possibly something as “minor” as the size of the cache.

 

Stats time

Mon, 2018-09-10 07:37

I wrote a note a couple of years ago explaining how I used to get a rough idea (with some errors) of how much time was spent in the overnight stats collection by each object. One of the nice little enhancements that appeared in 12c was the appearance of a couple of functions that can report information about this type of thing, and more. These are the dbms_stats function report_stats_operations() and report_single_stats_operation() with the following definitions:


function report_stats_operations(
        detail_level  varchar2                  default 'TYPICAL',
        format        varchar2                  default 'TEXT', 
        latestN       number                    default null,
        since         timestamp with time zone  default null,
        until         timestamp with time zone  default null,
        auto_only     boolean                   default false,
        container_ids dbms_utility.number_array default dbms_stats.NULL_NUMTAB
) return clob;

function report_single_stats_operation(
        opid         number,
        detail_level varchar2 default 'TYPICAL', 
        format       varchar2 default 'TEXT', 
        container_id number   default null
) return clob;

As you can see, there are lots of options to generating the report of stats operations, and you can check the manuals or $ORACLE_HOME/rdbms/admin/dbmsstat.sql for information about how you can use it. One of the simplest options would be to run from SQL*Plus:

set long 1000000

set pagesize    0
set linesize  255
set trimspool on

column text_line format a254

select
        dbms_stats.report_stats_operations(
                since => sysdate - 3
        ) text_line
from dual
;

Of course you wouldn’t be able to pick the option that limited the report to just the auto gather stats jobs (auto_only => true) as SQL doesn’t a boolean type, so you would have to write a little PL/SQL wrapper to capture just those details. Here’s a sample of the (rather wide) output:


select
        dbms_stats.report_single_stats_operation(25809) text_line
from dual
;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation Id | Operation             | Target                             | Start Time          | End Time            | Status      | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25811        | purge_stats           |                                    | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 0           | 0                | 0            | 0            |
|              |                       |                                    | 01.47.37.764146 PM  | 01.47.38.405437 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25810        | purge_stats           |                                    | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 0           | 0                | 0            | 0            |
|              |                       |                                    | 01.47.35.827284 PM  | 01.47.37.763926 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809        | gather_database_stats | AUTO                               | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 285         | 282              | 3            | 0            |
|              | (auto)                |                                    | 01.46.31.672033 PM  | 01.47.35.826873 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25807        | gather_table_stats    | TEST_USER.T1                       | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 1           | 1                | 0            | 0            |
|              |                       |                                    | 12.59.57.704111 PM  | 12.59.57.822695 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |

etc.

You’ll notice in this little sample that operation 25809 is an (auto) gather_database_stats operation which ran 285 tasks, failing on 3 and succeeding on 282 – so lets run the “single stats operation” report to find out more.


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation Id | Operation                    | Target | Start Time                      | End Time                        | Status    | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809        | gather_database_stats (auto) | AUTO   | 08-SEP-18 01.46.31.672033 PM    | 08-SEP-18 01.47.35.826873 PM    | COMPLETED | 285         | 282              | 3            | 0            |
|              |                              |        | +01:00                          | +01:00                          |           |             |                  |              |              |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                                                                                                                                     |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|                                                                                              T A S K S                                                                                              |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | Target                                                         | Type            | Start Time                          | End Time                            | Status                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$                                                | TABLE           | 08-SEP-18 01.46.50.719791 PM +01:00 | 08-SEP-18 01.46.51.882418 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$_OBJ                                            | INDEX           | 08-SEP-18 01.46.51.273134 PM +01:00 | 08-SEP-18 01.46.51.773297 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$_TS                                             | INDEX           | 08-SEP-18 01.46.51.777032 PM +01:00 | 08-SEP-18 01.46.51.787730 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
...
...
...
|    | SYS.WRH$_SEG_STAT_PK.WRH$_SEG_ST_3089296639_5150               | INDEX PARTITION | 08-SEP-18 01.47.35.409615 PM +01:00 | 08-SEP-18 01.47.35.483637 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$LOGMNR_CONTENTS                                          | TABLE           | 08-SEP-18 01.47.35.520504 PM +01:00 | 08-SEP-18 01.47.35.696953 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$LOGMNR_REGION                                            | TABLE           | 08-SEP-18 01.47.35.699253 PM +01:00 | 08-SEP-18 01.47.35.722545 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$DRC                                                      | TABLE           | 08-SEP-18 01.47.35.725003 PM +01:00 | 08-SEP-18 01.47.35.801384 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|                                                                                                                                                                                                     |
|                                                                                                                                                                                                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I’ve trimmed out most of the 285 entries, of course, showing that the last three in the list failed; but with no indication why they failed. Fortunately we could have called the report with “detail_level => ‘ALL'” – so let’s see what that gives us:

select
        dbms_stats.report_single_stats_operation(
                opid         => 25809,
                detail_level => 'ALL'
        ) text_line
from dual
;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation | Operation             | Target | Start Time      | End Time        | Status    | Total    | Successful | Failed   | Active   | Job Name | Session  | Additional Info             |
| Id        |                       |        |                 |                 |           | Tasks    | Tasks      | Tasks    | Tasks    |          | Id       |                             |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809     | gather_database_stats | AUTO   | 08-SEP-18       | 08-SEP-18       | COMPLETED | 285      | 282        | 3        | 0        |          | 250      | Parameters: [block_sample:  |
|           | (auto)                |        | 01.46.31.672033 | 01.47.35.826873 |           |          |            |          |          |          |          | FALSE] [cascade: NULL]      |
|           |                       |        | PM +01:00       | PM +01:00       |           |          |            |          |          |          |          | [concurrent: FALSE]         |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [degree:                    |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_DEGREE_VALUE]       |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [estimate_percent:          |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_ESTIMATE_PERCENT]   |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [granularity:               |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_GRANULARITY]        |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [method_opt:                |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_METHOD_OPT]         |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [no_invalidate:             |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DBMS_STATS.AUTO_INVALIDATE] |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [reporting_mode: FALSE]     |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [stattype: DATA]            |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                                                                                                                                     |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|                                                                                              T A S K S                                                                                              |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | Target       | Type         | Start Time   | End Time     | Status    | Rank  | Job Name | Estimated    | Batching     | Histogram    | Extended     | Reason Code  | Additional   |        |
|       |              |              |              |              |           |       |          | Cost         | Info         | Columns      | Stats        |              | Info         |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | SYS.RECYCLEB | TABLE        | 08-SEP-18 01 | 08-SEP-18 01 | COMPLETED | 1     |          | N/A          | N/A          |              |              | stale stats  |              |        |
|       | IN$          |              | .46.50.71979 | .46.51.88241 |           |       |          |              |              |              |              |              |              |        |
|       |              |              | 1 PM +01:00  | 8 PM +01:00  |           |       |          |              |              |              |              |              |              |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
...
...
...
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | SYS.X$DRC    | TABLE        | 08-SEP-18 01 | 08-SEP-18 01 | FAILED    | 151   |          | N/A          | N/A          |              |              | no stats     | ORA-20000:   |        |
|       |              |              | .47.35.72500 | .47.35.80138 |           |       |          |              |              |              |              |              | Unable to    |        |
|       |              |              | 3 PM +01:00  | 4 PM +01:00  |           |       |          |              |              |              |              |              | analyze      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | TABLE "SYS". |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | "X$DRC", log |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | miner or     |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | data guard   |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | must be      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | started      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | before       |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | analyzing    |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | this fixed   |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | table"       |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|                                                                                                                                                                                                     |
|                                                                                                                                                                                                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




So we can now see that stats collection failed on the one object I’ve left in the extract because it’s an X$ object that only exists when LogMiner is running. You’ll notice that the we also get some information about things like input parameters to calls and reasons why objects were selected (“stale stats” in the first item in this list).

It’s a great convenience – but it’s always possible to grumble: I’d rather like to see the elapsed time for each operation, or even a filter to limit the report to any operation that took more than X seconds. However, if I want to do a quick check on a client site I’d rather not have to type in the code to query the base tables by hand.

Subquery Order

Wed, 2018-09-05 07:09

From time to time I’ve wanted to optimize a query by forcing Oracle to execute existence (or non-existence) subqueries in the correct order because I know which subquery will eliminate most data most efficiently, and it’s always a good idea to look for ways to eliminate early. I’ve only just discovered (which doing some tests on 18c) that Oracle 12.2.0.1 introduced the /*+ order_subq() */ hint that seems to be engineered to do exactly that.

Here’s a very simple (and completely artificial) demonstration of use.


rem
rem     Script:         122_order_subq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1 as select * from all_objects;
create table t2 as select * from all_objects;
create table t3 as select * from all_objects;

create index t2_i1 on t2(object_id);
create index t3_i1 on t3(object_id);

prompt  =============================
prompt  order_subq(@main subq2 subq3)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq2 subq3)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2585036931

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ2" "SUBQ3")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM
              "T2" "T2" WHERE "T2"."OBJECT_ID"=:B1*5) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM "T3" "T3" WHERE
              "T3"."OBJECT_ID"=:B2*13))
   3 - access("T2"."OBJECT_ID"=:B1*5)
   4 - access("T3"."OBJECT_ID"=:B1*13)

I’ve blocked subquery unnesting for the purposes of the demo and given a query block name to the two subqueries (using a name that identifies the associated table). As you can see, the execution plan uses the subqueries as filter subqueries, operating them in the order I’ve specified in my hint. You can also see that the hint is echoed down into the Outline section of the plan.

It’s possible that this is the plan that the optimizer would have chosen without the order_subq hint, so I ought to see if I can also use the hint to make the subqueries filter in the oppostie order. I happen to know that executing the subquery against t3 is likely to eliminate more rows that executing the subquery against t2. (The “* 13” compared to the “* 5” is significant) so I really want the subqueries to be used in the opposite order anyway – so here’s what happens when I reverse the order in the hint:


prompt  =============================
prompt  order_subq(@main subq3 subq2)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq3 subq2)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3585049451

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ3" "SUBQ2")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM
              "T3" "T3" WHERE "T3"."OBJECT_ID"=:B1*13) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM "T2" "T2" WHERE
              "T2"."OBJECT_ID"=:B2*5))
   3 - access("T3"."OBJECT_ID"=:B1*13)
   4 - access("T2"."OBJECT_ID"=:B1*5)

With the modified hint in place the order of the filter subqueries is reversed. Notice how the Predicate section also echoes the ordering of the subqueries.

Footnote

It should be noted that the order_subq() hint doesn’t get mentioned in the 18c SQL Language Reference “Alphabetical List of Hints”. If it were then one of the little oddities that might get a mention is that the optimizer seems to ignore the hint if you disable CPU costing. (not that anyone should be doing that since 10g).

RTFM ?

Wed, 2018-08-29 05:04

My entrance at the Polish Oracle User Group conference 2017 has just resurfaced on Twitter. There is a back-story to this which contains an allegorical lesson in using Oracle. As I said in the opening remarks in the subsequent presentation: “The lesson began before I got to the stage”.

Like all the other speakers at POUG2017 I had received an email asking me to name a tune for a playlist. Having decided that Beethoven, Brahms and Tchaikowski were not the sort of composers the organisers had in mind I nominated (with a touch of irony) “Who wants to live forever?” by Queen (despite Richard Foote’s strenuous efforts to turn the Oracle world to David Bowie).

When the conference started I noticed two things: first, that the tunes for the “playlist” were actually being used to accompany speakers to the stage, secondly that the admin staff were all wearing monk-like gowns and hoods. So I asked for my tune to be changed to “The Imperial March” (Darth Vader’s theme) and borrowed the gown from the tallest admin person, with the results you see in the video clip.

So what’s the Oracle allegory ?

First you read the manuals, then you observe how it really works before you go live.

 

Error Logging

Fri, 2018-08-24 05:19

Error logging is a topic that I’ve mentioned a couple of times in the past, most recently as a follow-up in a discussion of the choices for copying a large volume of data from one table to another, but originally in an addendum about a little surprise you may get when you use extended strings (max_string_size = EXTENDED).

If you use the default call to dbms_errlog.create_error_log() to create an error logging table then Oracle will create a table with a few columns of its own plus every column (name) that you have in your original table – but it will create your columns as varchar2(4000), or nvarchar2(2000), or raw(2000) – unless you’ve set the max_string_size to extended.  Here’s a simple  demo script with results from two different systems, one with the default setting the other with the extended setting (note, there’s a little inconsistency in handling raw() columns.


rem
rem     Script:         log_errors_min.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem

create table t1 (
        v1      varchar2(10),
        n1      number(2,0),
        d1      date,
        nv1     nvarchar2(10),
        r1      raw(10)
);


execute dbms_errlog.create_error_log('t1')

desc err$_t1


max_string_size = STANDARD
--------------------------
 Name			       Null?	Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$			NUMBER
 ORA_ERR_MESG$				VARCHAR2(2000)
 ORA_ERR_ROWID$ 			ROWID
 ORA_ERR_OPTYP$ 			VARCHAR2(2)
 ORA_ERR_TAG$				VARCHAR2(2000)
 V1					VARCHAR2(4000)
 N1					VARCHAR2(4000)
 D1					VARCHAR2(4000)
 NV1					NVARCHAR2(2000)
 R1					RAW(2000)


max_string_size = EXTENDED
--------------------------
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$                        NUMBER
 ORA_ERR_MESG$                          VARCHAR2(2000)
 ORA_ERR_ROWID$                         ROWID
 ORA_ERR_OPTYP$                         VARCHAR2(2)
 ORA_ERR_TAG$                           VARCHAR2(2000)
 V1                                     VARCHAR2(32767)
 N1                                     VARCHAR2(32767)
 D1                                     VARCHAR2(32767)
 NV1                                    NVARCHAR2(16383)
 R1                                     RAW(32767)

Every single “original” column that appears in this table will be a LOB, with an inline LOB locator of 30 or more bytes. (At least, that’s the 12.1.0.2 implementation, I haven’t checked for 12.2 or 18.3).

If this is going to be a problem (e.g. you have a table defined with 500 columns but only use 120 of them) you can create a minimalist error logging table. Provided you create it with the ora_err% columns suitably defined you can add only those columns you’re really interested in (or feel threatened by), and you don’t have to declare them at extreme lengths. e.g.


create table err$_special (
        ora_err_number$         number,
        ora_err_mesg$           varchar2(2000),
        ora_err_rowid$          rowid,
        ora_err_optyp$          varchar2(2),
        ora_err_tag$            varchar2(2000),
        n1                      varchar2(128)
)
;

insert into t1 values(1,'abc','02-jan-1984',sys_op_c2c('abc'),hextoraw('0xFF')) 
log errors into err$_special
reject limit unlimited
;

execute print_table('select * from err$_special')


ORA_ERR_NUMBER$               : 1722
ORA_ERR_MESG$                 : ORA-01722: invalid number

ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
N1                            : abc


If you try to create an error logging table that doesn’t include the 5 critical columns you’ll see Oracle error ORA-38900: missing mandatory column “ORA_ERR_{something}” of error log table “{your logging table name}” when you try to log errors into it, and the 5 critical columns have to be the first 5 columns (in any order) in the table or you’ll get Oracle error ORA-38901: column “ORA_ERR_{something}$” of table “{your logging table name}” when you try to log errors into it.

Descending bug

Wed, 2018-08-22 07:20

Following on from Monday’s posting about reading execution plans and related information, I noticed a question on the ODC database forum asking about the difference between “in ({list of values})” and a list of “column = {constant}” predicates connected by OR. The answer to the question is that there’s essentially no difference as you would be able to see from the predicate section of an execution plan:


SELECT  c1, c2, c3, c4, c5, c6, c7, c8..  
FROM    TAB1  
WHERE   STS IN ( 'A', 'B')  
AND     cnt < '4'  
AND     dt < sysdate  
and     rownum <=1;  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  

Note how the predicate section tells you that the original “sts in ( ‘A’, ‘B’ )” has been transformed into “sts = ‘A’ or sts = ‘B'”.

A further point I made about IN-lists in Monday’s post was that as one step in the transformation Oracle would sort the list and eliminate duplicates, and it suddenly occurred to me to wonder whether Oracle would sort the list in descending order if the only relevant index were defined to start with a descending column. Naturally I had to try it so here’s a suitable script to prepare some data:

rem
rem     Script:         descending_bug_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0        Crashes
rem             12.2.0.1        Crashes
rem             12.1.0.2        Crashes
rem             11.2.0.4        Bad Plan
rem

create table t1
nologging
pctfree 95 pctused 5
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        case mod(rownum,1000)
                when 0 then 'A'
                when 3 then 'B'
                when 6 then 'C'
                       else 'D'
        end                             sts,
        case mod(rownum,1000)
                when 0 then '1'
                when 3 then '2'
                when 6 then '3'
                       else '4'
        end                             cnt,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

create index t1_i1a on t1(sts) nologging;
create index t1_i1d on t1(sts desc) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size skewonly'
        );
end;
/

There is one oddity in this script – if you’ve got every column in an index declared as DESC you’ve made a mistake and none of the columns should be declared as DESC. The feature is relevant only if you want a mixture of ascending and descending column in a single index.

An important detail of the script is that I’ve gathered stats AFTER creating the objects – it’s important to do this, even in 18.3, because (a) creating the “descending” index will result in a hidden virtual column being created to represent the descending column and I want make sure I have stats on that column and (b) the “stats on creation” code doesn’t generate histograms and I want a (frequency) histogram on columns sts and the hidden, virtual, descending version of the column.

After generating the data and checking that I have the correct histograms for sts and sys_nc00006$ (the relevant hidden column) I can then run the following test:

set serveroutput off
alter session set statistics_level = all;

alter index t1_i1d invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter index t1_i1d   visible;
alter index t1_i1a invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on


The code makes one index invisible then runs a query that should use an inlist iterator; then it switches indexes making the invisible one visible and vice versa and repeats the query. I’ve enabled rowsource execution statistics and pulled the execution plans from memory to make sure I don’t get fooled by any odd glitches that might exist within “explain plan”. Here are the results from 11.2.0.4 – normal index, then descending index – with a little cosmetic cleaning:


S   COUNT(*)
- ----------
B        100
C        100

SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.01 |       5 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.01 |       5 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    200 |00:00:00.01 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |      2 |    178 |    200 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))


Index altered.
Index altered.


S   COUNT(*)
- ----------
C        100
B        100


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.02 |     198 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.02 |     198 |
|*  2 |   INDEX FULL SCAN    | T1_I1D |      1 |   1000 |    200 |00:00:00.02 |     198 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


As expected we see counts of 100 for ‘B’s and ‘C’s, and we also see that the “sort group by nosort” operation with the descending index has produced the results in reverse order. The problem though is that the optimizer has decided to use an “index full scan” on the descending index, and the estimate of the rows returned is terribly wrong (and seems to be the common “5% guess”, used once for each target value), and the number of buffer visits is huge compared to the result from the normal index – Oracle really did walk every leaf block in the index to get this result. The predicate section also looks rather silly – why hasn’t the optimizer produced predicates more like: “sys_nc00006$ = sys_op_descend(‘B’)” ?

In passing you’ll notice that the estimated rows in the plan using the normal index is a little low. This is the result of Oracle using a small sample (ca. 5,500 rows) in 11g to gather histogram stats. 12c will do better for a frequency histogram with the fast algorithm it uses for a 100% (auto) sample size.

So 11g doesn’t do very well but we’ve got 12.1.0.2, 12.2.0.1, and (in the last couple of weeks) 18.3 to play with. Here’s the result from 12.1.0.2 and 12.2.0.1 for the query that should use the descending index:


select  sts, count(*)
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qernsRowP], [1], [], [], [], [], [], [], [], [], [], []


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      0 |00:00:00.01 |       0 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      0 |00:00:00.01 |       0 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    101 |00:00:00.03 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1D |      2 |    200 |    101 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR
              "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))

The query crashed! The plan, however, did look appropriate – the optimizer picked an inlist iterator, picked an index range scan, got the correct estimate of rows (index entries), and did better with the predicate section (though having used a sensible predicate for the access predciate it then used the bizarre version as the filter predicate). Judging from the A-rows column the query seems to have crashed at roughly the point where the optimizer was switching from the range scan for the first iteration into the range scan for the second iteration.

And then there’s Oracle 18.3 – which does the same as the 12c versions :(

To make sure that my silly “single column so it shouldn’t be declared descending” index was the sole cause of the problem I repeated the tests using a two-column index on (sts, cnt).

Conclusion:

Descending indexes or (to be more accurate) indexes with descending columns can still produce problems even in the very latest version of Oracle.

Footnote

Oracle MoS has the wonderful “ORA-600/ORA-7445/ORA-700 Error Look-up Tool (Doc ID 153788.1)” (which doesn’t yet allow you to choose 18.3 as a version) so I used this to do a look up for ORA-00600 errors with first paremeter qernsRowP in 12.2.0.1 and got the following suggestion from doc ID 285913.1: “set event:10119 to disable no-sort fetch and then reparse the failing SQL.” The example suggested setting the event to level 12, and this solved the problem for all three failing versions – but the suggestion came with a warning: “Setting this event at system level may impact the performance of database.” The execution plan (taken, in this case, from 18.2) may explain the warning:

 

S   COUNT(*)
- ----------
B	 100
C	 100

SQL_ID	f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	    |	   1 |	      |      2 |00:00:00.01 |	    4 |       |       | 	 |
|   1 |  HASH GROUP BY	   |	    |	   1 |	    2 |      2 |00:00:00.01 |	    4 |  1558K|  1558K|  659K (0)|
|   2 |   INLIST ITERATOR  |	    |	   1 |	      |    200 |00:00:00.01 |	    4 |       |       | 	 |
|*  3 |    INDEX RANGE SCAN| T1_I1D |	   2 |	  200 |    200 |00:00:00.01 |	    4 |       |       | 	 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


The plan has changed from using a “sort group by nosort” – which effectively means just keeping a running count as you go – to a real “hash group by” which means you have to do the hashing arithmetic for every value (though maybe there’s a deterministic trick that means Oracle won’t do the arithmetic if the next value to be hashed is the same as the previous value) and the actual memory used (659K) does seem a little extreme for counting two distinct values.

 

Masterclass – 1

Mon, 2018-08-20 04:42

A recent thread on the Oracle developer community database forum raised a fairly typical question with a little twist. The basic question is “why is this (very simple) query slow on one system when it’s much faster on another?” The little twist was that the original posting told use that “Streams Replication” was in place to replicate the data between the two systems.

To make life easy for remote trouble-shooters the poster had supplied (for each system) the output from SQL Monitor when running the query, the autotrace output (which shows the predicate section that SQL Monitor doesn’t report), and the session statistics for the query run, plus some statistics about the single table in the query, the index used in the plan, and the column on which that index was based.

Here, with a little cosmetic editing (and a query that has clearly been camouflaged by the OP), is the information supplied for the faster database, where the query took about 30 seconds to complete.


SELECT c1, c2, c3, c4, c5, c6, c7, c8..  
FROM TAB1  
WHERE STS IN ( 'A', 'B')  
AND cnt < '4'  
AND dt < sysdate  
and rownum <=1;  
  
Sql_monitor and stats from DB1  
******************************  
  
Global Information  
------------------------------  
 STS              :  DONE (ALL ROWS)             
 Instance ID         :  1                           
 Execution Started   :  08/17/2018 08:31:22         
 First Refresh Time  :  08/17/2018 08:31:22         
 Last Refresh Time   :  08/17/2018 08:31:53         
 Duration            :  31s                         
 Program             :  sqlplus.exe                 
 Fetch Calls         :  1                           
  
Global Stats  
===============================================================================  
| Elapsed |   Cpu   |    IO    | Concurrency | Fetch | Buffer | Read  | Read  |  
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs  | Bytes |  
===============================================================================  
|      33 |    3.00 |       30 |        0.08 |     1 |   102K | 38571 | 301MB |  
===============================================================================  
  
SQL Plan Monitoring Details (Plan Hash Value=715774357)  
======================================================================================================================================================================================  
| Id |            Operation            |          Name           |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Activity |       Activity Detail        |  
|    |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes |   (%)    |         (# samples)          |  
======================================================================================================================================================================================  
|  0 | SELECT STATEMENT                |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  1 |   COUNT STOPKEY                 |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  2 |    INLIST ITERATOR              |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  3 |     TABLE ACCESS BY INDEX ROWID | TAB1                    |    114K | 33399 |        32 |     +0 |     2 |        1 | 38377 | 300MB |    96.77 | Cpu (1)                      |  
|    |                                 |                         |         |       |           |        |       |          |       |       |          | db file sequential read (16) |  
|    |                                 |                         |         |       |           |        |       |          |       |       |          | read by other session (13)   |  
|  4 |      INDEX RANGE SCAN           | TAB1_STS_IDX            |    115K |   723 |        30 |     +2 |     2 |     118K |   194 |   2MB |     3.23 | read by other session (1)    |  
======================================================================================================================================================================================  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  
  
  
Table stats 
-----------
table_name    num_rows        blocks  empty_blocks  chain_cnt  avg_row_len             
TAB1        79,654,925    22,416,917             0          0        1,847          
  
column_stats(STS)
------------------
table_name  column_name  num_distinct  num_nulls    density  avg_col_len   
TAB1        STS                     5          0  6.2049E-9            2         
  
Index_stats(on STS)
-------------------
index_name    leaf_blocks  distinct_keys  avg_leaf_blocks_per_key  avg_data_blocks_per_key  clustering_factor    num_rows
TAB1_STS_IDX      487,939              5                   97,587                4,458,874         22,294,372  78,308,939   
 
Session stats
-------------
process last non-idle time              1,534,508,966
session connect time                    1,534,508,966
logical read bytes from cache             839,663,616
cell physical IO interconnect bytes       316,055,552
physical read bytes                       316,055,552
physical read total bytes                 316,055,552
file io wait time                          17,044,083
session pga memory                          8,643,880
session pga memory max                      8,643,880
temp space allocated (bytes)                4,194,304
session uga memory                          1,755,696
session uga memory max                      1,755,696
buffer is pinned count                        135,743
table fetch by rowid                          117,519
non-idle wait count                           107,301
session logical reads                         102,500
consistent gets                               102,450
consistent gets from cache                    102,448
no work - consistent read gets                102,368
buffer is not pinned count                    101,741
free buffer inspected                          43,458
free buffer requested                          38,592
physical read total IO requests                38,581
physical read IO requests                      38,581
physical reads                                 38,581
physical reads cache                           38,579
hot buffers moved to head of LRU               37,258
bytes sent via SQL*Net to client                7,370
bytes received via SQL*Net from client          6,869
redo size                                       5,536
undo change vector size                         4,432
DB time                                         3,166
non-idle wait time                              2,962
user I/O wait time                              2,954
table fetch continued row                       2,423


And here’s the equivalent information from the slower database where the query took more than 9 times as long (4 minutes 42 seconds) to complete.


Global Information  
------------------------------  
 STS              :  DONE (ALL ROWS)           
 Instance ID         :  1                         
 Execution Started   :  08/17/2018 08:21:47       
 First Refresh Time  :  08/17/2018 08:21:47       
 Last Refresh Time   :  08/17/2018 08:26:29       
 Duration            :  282s                      
 Module/Action       :  SQL*Plus/-                
 Program             :  sqlplus.exe               
 Fetch Calls         :  1                         
  
Global Stats  
================================================================  
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |  
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |  
================================================================  
|     287 |    8.76 |      278 |     1 |   110K | 110K | 858MB |  
================================================================  
  
SQL Plan Monitoring Details (Plan Hash Value=715774357)  
======================================================================================================================================================================================  
| Id |            Operation            |          Name           |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |        Activity Detail        |  
|    |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |          (# samples)          |  
======================================================================================================================================================================================  
|  0 | SELECT STATEMENT                |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  1 |   COUNT STOPKEY                 |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  2 |    INLIST ITERATOR              |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  3 |     TABLE ACCESS BY INDEX ROWID | TAB1                    |    142K | 40211 |       282 |     +1 |     2 |        1 | 109K | 854MB |   100.00 | db file sequential read (277) |  
|  4 |      INDEX RANGE SCAN           | TAB1_STS_IDX            |    142K |   892 |       280 |     +3 |     2 |     118K |  491 |   4MB |          |                               |  
======================================================================================================================================================================================  
  
Execution Plan (autotrace) 
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 40211   (1)| 00:08:03 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   141K|   249M| 40211   (1)| 00:08:03 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   141K|       |   892   (1)| 00:00:11 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  
            
Table stats
-----------
table_name    num_rows        blocks  empty_blocks  chain_cnt  avg_row_len             
TAB1        79,447,350   22,318,667            710    537,597        1,847  
  
column_stats(STS)
-----------------
table_name  column_name  num_distinct  num_nulls    density  avg_col_len  
TAB1        STS                     5          0  6.1789E-9            2  
  
Index_stats(on STS)
-------------------
index_name    leaf_blocks  distinct_keys  avg_leaf_blocks_per_key  avg_data_blocks_per_key  clustering_factor    num_rows
TAB1_STS_IDX      493,152              5                   98,630                4,382,625         21,913,127  79,106,263   


Session Stats
-------------
process last non-idle time              1,534,508,200
session connect time                    1,534,508,200
logical read bytes from cache             903,790,592
physical read total bytes                 899,629,056
cell physical IO interconnect bytes       899,629,056
physical read bytes                       899,629,056
file io wait time                         277,881,742
session pga memory                          8,586,744
session pga memory max                      8,586,744
temp space allocated (bytes)                4,194,304
session uga memory max                      1,690,184
session uga memory                          1,690,184
buffer is pinned count                        129,148
table fetch by rowid                          117,521
session logical reads                         110,326
consistent gets                               110,276
consistent gets from cache                    110,276
no work - consistent read gets                110,071
non-idle wait count                           109,879
free buffer requested                         109,830
physical read IO requests                     109,818
physical reads cache                          109,818
physical reads                                109,818
physical read total IO requests               109,818
buffer is not pinned count                    109,577
free buffer inspected                          70,740
hot buffers moved to head of LRU               31,910
DB time                                        28,203
non-idle wait time                             27,788
user I/O wait time                             27,788
dirty buffers inspected                        19,067
bytes sent via SQL*Net to client               14,927
bytes received via SQL*Net from client         10,607
redo size                                       5,440
undo change vector size                         4,432
table fetch continued row                       3,660

There are all sorts of noteworthy details in these two sets of information – some of the “how to see what’s in front of you” type, some of the “be careful, Oracle can deceive you” type. So I’m going to walk though the output picking up a number of background thoughts before commenting on the answer to the basic question.

We’ll start with the object statistics, then we’ll look at the SQL Monitor plan to see if we can determine where the extra time was spent, then we’ll try to work out what else the plan might be telling us about the code and data, then we’ll summarise my observations to make a claim about the difference in behaviour.

Object statistics

The table has 79M rows with average length of 1,847 bytes, using 22M blocks. With an 8KB block size and that average row size we would expect to see about 3 rows per block, and that’s fairly consistent with the value of rows / blocks.  We don’t know what the sample size was for this stats collection, but it might have been a “small” sample size rather than the the 100% you would get from using auto_sample_size, so that might also explain some discrepancy between the two different views on the figures.

We note that the secondary system reports a chain_cnt in excess of 500,000 rows. The only (unhacked) way that this figure could be set would be through a call to analyze statistics, and once the figure is there it won’t go away unless you use the analyze command again to delete statistics.  We don’t know the history of how and when the figure got there so it doesn’t guarantee that there are any chained or migrated rows, nor does the zero in the table stats on the primary system guarantee that it doesn’t have any chained or migrated rows – all it tells us is that at some time someone used the wrong command to gather stats and there were some (less than 1%) migrated or chained rows in the table at the time. (The optimizer will use this figure in its arithmetic if it is set, by the way, so it may affect some of the cost calculations – but not by a huge amount.)

The column sts reports 5 distinct values, no nulls, and a density of 6.2e-9 which is roughly half of 1/79M: so we have a frequency histogram on the column (in the absence of a histogram the density would be 1/5, and it’s reasonable to assume that the number of buckets was either the default or set to something larger than 5).  We were told that the system was running 11.2.0.4 – so we have to be a little suspicious about the accuracy of this histogram since it will have been sampled with a very small sample if the stats collection had used auto_sample_size. (12c will use a specially optimized 100% sample for frequency and top-N histograms when using auto_sample_size)

The index on sts has a clustering_factor of around 22M which is similar to the number of blocks in the table – and that’s not too surprising if there are are only a very small number of distinct values in the column – especially when the presence of the histogram suggest that there’s a skew in the data distribution. (There’s more to come on that point.) The number of leaf blocks is about 500,000 (being lazy about arithmetic) – just as a side note this suggests the index is running in a fairly inefficient state (and probably hasn’t been created with the compress keyword).

Doing a rough estimate of the index arithmetic :  the avg_col_len for sts is 2, so the space required for each index entry will be 13 bytes (2 for the column, 7 for the rowid content, 2 for the row header, 2 for the row directory entry).  Take off the block overhead, and assume the index is running at a “typical” 70% space usage per leaf block and you might expect 5,600 bytes used per leaf block for current index data and that works out to about 430 index entries per leaf block.  With 79M rows in the table that should lead to 79M/430 leaf blocks – i.e. roughly 184,000 leaf blocks, not 493,000 leaf blocks.  However it’s not unusual to see an index with extremely repetitive values operating at something like 50% utilisation, which would bring our estimate to about 310 rows per leaf block and 255,000 leaf blocks – which is still off by a factor of nearly 2 compared to what we’ve actually got. Again, of course, we have to be a little bit cautious about these statistics – we don’t know the sample size, and Oracle uses a surprisingly small number of blocks to sample the stats for an index.

Where’s the time.

The SQL Monitor gives us a very clear report of where most of the time went – almost all of it was spent in I/O waits, and almost all of the wait time was in the “table access by index rowid” opration in both cases; but the primary system did 38,377 read requests while the secondary did 109,000 read requests in that line of the plan. It is significant, though, that quite a lot (40%) of the ASH samples for that operation on the primary system were for “read by other session” rather than “db file sequential read”:  in other words some other session(s) were doing a lot of work to pull the data we wanted into the buffer cache at the same time. Apart from the fact that a wait for “read by other session” often means we spend less time waiting than if we’d had to do the read ourselves, the presence of this wait suggests that other sessions may be “pre-caching” data for us so that we end up having to read far fewer blocks than would otherwise be the case.

It’s important to note at the same time that the difference in Buffer Gets for the two systems was small – 102K vs. 110K – and the “Rows (actual)” was the same in both cases – 118K entries returned by the index range scan.  Both systems did similar amounts of “logical” work, to process similar amounts of data; the difference was the fraction of the work that required a buffer get to turn into a disc read or a “wait for other read”.

We might want to pick up a few more numbers to corroborate the view that the only significant difference was in the volume of data cached and not some more esoteric reason.  Some of the session statistics should help.


DB1:  table fetch by rowid                          117,519
DB2:  table fetch by rowid                          117,521

DB1:  undo change vector size                         4,432
DB2:  undo change vector size                         4,432

DB1:  redo size                                       5,536
DB2:  redo size                                       5,440

DB1:  session logical reads                         102,500
DB2:  session logical reads                         110,326

DB1:  no work - consistent read gets                102,368
DB2:  no work - consistent read gets                110,071

DB1:  table fetch continued row                       2,423
DB2:  table fetch continued row                       3,660

The number of rows fetched by rowid is virtually identical and we have done (virtually) no work that generates undo or redo – such as delayed block cleanout; there are no statistics shown for “%undo record applied” so we probably haven’t done very much work to get a read consistent view of the data though we can’t be sure that the OP simply failed to copy that stat into list supplied (but then the similarity of “session logical reads” to “no work – consistent read gets” confirms the hypothesis that we didn’t do any (significant) work on visiting undo blocks.

We do see a few percent increase in the number of buffer gets (“session logical reads”) – but this may reflect the fact that the actual pattern of data in one table is slightly different from the pattern in the other – thanks to ASSM the process id of the process that inserts a row into a table can affect (within a small range, usually) the block into which the row is inserted; but at 102,000 / 110,000 buffer gets to visit 117,500 rows in the table we can see that there must be some table blocks that hold two (or more) rows that are identified as consecutive in the index – leading to some row visits being achieved through a buffer pin and without a fresh buffer get. You’ll note that this argument is consistent with the small variation in clustering_factor (not that we entirely trust those figures) for the two indexes – the system with the lower clustering_factor for the index has done fewer buffer gets to acquire the same number of rows from the table – by definition that means (assuming default setup) that there are more cases where “the next table row” is in the same block as the current row.

The final figure I’ve shown is the “table fetch continued rows”: according to the table stats (which we don’t necessarily trust completely) 500K out of 79M rows are chained/migrated which is roughly 0.6%. We know that we’re visiting about 117K table rows so might expect (on average) roughly the same percentage migrated/chained viz: 0.6% of 117K = 743, so there’s a little anomaly there (or an error in our assumption about “average” behaviour.  It’s worth noting, though, that a “continued fetch” would have to do an extra buffer visit (and maybe an extra physical read).  You might wonder, of course, how there could be any chained or migrated rows when the average row length is 1,847 bytes but in a follow-up post the OP did say there were 3 BLOB columns in the table, which can cause havoc with interpreting stats for all sorts of reasons. We don’t have any information about the table structure – particularly whether the columns in the query appear before or after the BLOB columns in the table definition – and we don’t know what processing takes place (for example, maybe the 3rd BLOB is only updated after the sts column has been changed to a value other than A or B which would help to explain why we shouldn’t be using the 0.6% calculation above as a table-wide average), so we’re not in a position to say why any of the continued fetches appear but there are several guesses we could make and they’re all easy to check.

Plan observations

If we examine row estimates we see that it 114K for the faster plan and 141K for the slower plan (with a closely corresponding variation in cost). The difference in estimates simply tells us that the histogram gathering was probably a small sample size and subject to a lot of variation. The scale of the estimates tells us that the A and B rows are probably rare – call it 125K out of 79M rows, about 0.16% of the total rows in the table, so it would not be surprising to see consecutive samples for the histogram producing significant variations in estimates.

The other interesting thing we can note in the SQL Monitor plan is that the Starts column for the index range scan / table access operations in both plans shows the value 2: this means that there are no “A” rows that match the other predicates:  Oracle has run the “A” iteration to completion then started the “B” iteration and found a row on the second iteration. Is this a coincidence, or should it always happen, or is it only fairly likely to happen; is it possible to find times when there are no suitable “B” rows but plenty of suitable “A” rows. The final predicate in the query is “rownum <= 1” – so the query is picking one row with no explicit strategy for choosing a specific row when there are multiple appropriate rows, does this mean that we could optimizer the query by rewriting it as a “union all” that searched for B rows first and A rows second ? We just don’t know enough about the processing.

In passing, we can’t get Oracle to search the B rows first by changing the order of the in-list.  If you have a predicate like “where sts in ({list of literal values})” the optimizer will sort the list to eliminate duplicates before rewriting the predicate as a list of disjuncts, and then (if the path uses an iterator) iterate through the list in the resulting order.

In the absence of information about the way in which the data is processed we can only say that we need to avoid visiting the table so frequently. To do this we will need to add one or both of the columns from the other predicates to the index – this might double the size of the index, but eliminate 95% of the potential I/O.  For example if we discover that A and B rows are initially created “into the future” and this query is looking for a row whose “time has come” so that it can be processed and changed to an X row (say) then there may only ever be a tiny number of rows where the “sts = A and the dt < sysdate” and an index on (sts, dt) would be a perfect solution (especially if it were compressed on at least the first column).

The OP has declared a reluctance to add an index to the table – but there are two points to go with this indexing strategy. Since we know there’s a frequency histogram and the A and B rows appear to be rare values – what benefit is there in having an index that covers the other values (unless 2 of the remaining 3 are also rare).  How about creating a function-based index that represents only the rare values and modifying this code to use that index – e.g.

create index t1_id on t1 (
        case sts when 'A' then sts when 'B' then sts end,
        case sts when 'A' then dt  when 'B' then dt  end
) compress 1
;

select  *
from    t1
where   case sts when 'A' then sts when 'B' then sts end in ('A','B')
and     case sts when 'A' then dt  when 'B' then dt  end < sysdate
and     cnt < '4'
and     rownum <= 1
/


You might be able to replace a huge index (79M rows worth) with this small one (120K rows worth) unless there’s too much other code in the system that has to be adjusted or the sts column is actually the target of a referential integrity constraint; at worst you could add this index knowing that it’s generally not going to consume much in the way of extra space or processing resources and is going to save you a lot of work for this query.

Summary

The execution plan from SQL Monitor points very strongly to the fast system benefiting from having a lot of the relevant data cached and constantly being reloaded into the cache by other sessions while the slow system has to acquire almost all of its data by real phyiscal reads. Most of the reads address the table so engineering an index that is low-cost and (fairly) high precision is the most significant strategy for reducing the workload and time on the slow system.

The fact that all the potential A rows fail to match the full predicate set suggests that there MAY be some aspect of the processing that means it would be more efficient to check for B rows before checking for A rows.

Given the massive skew in the data distribution a function-based index that hides all the non-popular values (or even all the values that are not of interest to this query) may be the most cost-effective way of adding a very effective index to the table with minimal resource requirements.

And finally

It’s taken me more than 4 hours to write this note after spending about 10 minutes reading through the information supplied by the OP and identifying and cross-checking details. A small fraction of the 4+ hours was spent creating a little model to check something I had to say about in-lists, the rest of it was trying to write up a coherent description covering all the details.

That’s it for today, but I may have missed a couple of points that I noticed as I read the OP’s posting; and I will want to do a little cosmetic work on this article and check grammar and spelling over the next couple of days.

<h3<Update (already)

Shortly after I posted this blog note the owner of the question reported the following as the distribution of values for the sts column:

 STS   COUNT(*)
---- ----------
   A          6
   E        126
   D        866
   C   80212368
   B     117631

Two things stand out about these figures – first it’s an ideal example of a case where it would be nice avoid having index entries for the 80 million ‘C’ rows. Depending on the coding and testing costs, the supportability of the application and the possible side effects this could be done with a function-based index, or by introducing a virtual column that hides the ‘C’s behing a NULL, or by changing the code to use NULL instead of ‘C’.

Secondly – I made a comment about rewriting the code to query the B’s before the A’s. But we saw that Oracle worked through about 117,000 rows before returning a result: so the fitures above tell us that it must have worked through almost all the B’s and the handful of A’s was just a tiny little blip before it got to the B iteration – so there’s no point in making that change.

My suggestion for the function-based index above could be modified in two ways, of course – add two more “when”s to each “case” to capture the D and E rows, or take the opposite viewpoint and create an index on expressions like: “case sts when ‘C’ then to_char(null) else sts end”. The benefit of the latter approach is that you don’t have to modify the index definition (and rebuild the index) if a new legal value for sts appears.

Extended Histograms – 2

Thu, 2018-08-02 08:13

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Here’s the starting code that generates the data, and creates histograms on all the columns (I’ve run this against 12.1.0.2 and 12.2.0.1 so far):


rem
rem     Script:         histogram_hack_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue.
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue.
;

variable v1 varchar2(128)

begin
        :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)');
        dbms_output.put_line(:v1);
end;
/

execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');

In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.

Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:


select
        endpoint_value
from 
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = :v1
;

select 
        distinct c2, c3, 
        mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_value
from t1
;

ENDPOINT_VALUE
--------------
    4794513072
    6030031083

2 rows selected.


C C ENDPOINT_VALUE
- - --------------
N Y     4794513072
Y N     6030031083

2 rows selected.

So we have a method of generating the values that Oracle should store in the histogram; now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:


declare
        l_distcnt number;
        l_density number;
        l_nullcnt number;
        l_avgclen number;

        l_srec  dbms_stats.statrec;
        n_array dbms_stats.numarray;

begin
        dbms_stats.get_column_stats (
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

        l_srec.novals := dbms_stats.numarray();
        l_srec.bkvals := dbms_stats.numarray();

        for r in (
                select
                        mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size
                from    (
                        select 'Y' c2, 'Y' c3, 1 bucket_size from dual
                        union all
                        select 'N' c2, 'N' c3, 1 from dual
                        union all
                        select 'Y' c2, 'N' c3, 71482 from dual
                        union all
                        select 'N' c2, 'Y' c3, 1994 from dual
                        )
                order by hash_value
        ) loop
                l_srec.novals.extend;
                l_srec.novals(l_srec.novals.count) := r.hash_value;

                l_srec.bkvals.extend;
                l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size;
        end loop;

        n_array := l_srec.novals;

        l_distcnt  := 4;
        l_srec.epc := 4;

--
--      For 11g rpcnts must not be mentioned
--      For 12c is must be set to null or you
--      will (probably) raise error:
--              ORA-06533: Subscript beyond count
--

        l_srec.rpcnts := null;

        dbms_stats.prepare_column_values(l_srec, n_array);

        dbms_stats.set_column_stats(
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

end;

The outline of the code is simply: get_column_stats, set up a couple of arrays and simple variables, prepare_column_values, set_column_stats. The special detail that I’ve included here is that I’ve used a “union all” query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). In the past I’ve used this type of approach but used an analytic query against the table data to produce the equivalent of 12c Top-frequency histogram in much older versions of Oracle.

A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.

After running this procedure, here’s what I get in user_tab_histograms for the column:


select
        endpoint_value                          column_value,
        endpoint_number                         endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                               prev_endpoint,
                endpoint_value
        from
                user_tab_histograms
        where
                table_name  = 'T1'
        and     column_name = :v1
        )
order by endpoint_number
;

COLUMN_VALUE ENDPOINT_NUMBER  FREQUENCY
------------ --------------- ----------
   167789251               1          1
  4794513072            1995       1994
  6030031083           73477      71482
  8288761534           73478          1

4 rows selected.


It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.

Extended Histograms

Tue, 2018-07-31 17:05

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predicat a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

  • Create simple stats (no histograms) on c2 and c3.
  • Create frequency histograms on c2 and c3
  • Create a column group (extended stats) on (c2,c3) but no histograms
  • Create a column group (extended stats) on (c2,c3) with a histogram on (c2, c3)

If you do these tests you’ll find the estimated cardinalities are (from 12.1.0.2):

  • 18,369 – derived as 73,476 / 4  … total rows over total possible combinations
  • 1,940   – derived as 73,476 * (1,994/73,476) * (71,482/73,476) … total rows * fraction where c2 = ‘N’ * fraction where c3 = ‘N’
  • 36,738 – derived as 73,476 / 2 … total rows / number of distinct combinations of (c2, c3)
  • 997      – derived as 1,994 / 2 … half the frequency of the least frequently occurring value in the histogram

The last algorithm appeared in 10.2.0.4; prior to that a “value not in histogram” would have been given an estimated cardinality of 1 (which is what the person on Oracle-L wanted to see).

In fact the optimizer’s behaviour can be reverted to the 10.2.0.3 mechanism by setting fix-control 5483301 to zero (or off), either with an “alter session” call or inside the /*+ opt_param() */ hint. There is, however, another option – if you get the column stats, then immediately set them (dbms_stats.get_column_stats(), dbms_stats.set_column_stats()) the optimizer defines the stats as “user defined” and (for reasons I don’t know – perhaps it’s an oversight) reverts to the 10.2.0.3 behaviour. Here’s some code to demonstrate the point; as the srcipt header says, I’ve tested it on versions up to 18.1


rem
rem     Script:         histogram_hack_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             18.1.0.0        via LiveSQL (with some edits)
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to avoid format issue
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 10 for columns (c2,c3) size 10');

column column_name format a128 new_value m_colname

select  column_name
from    user_tab_cols
where   table_name = 'T1'
and     column_name not in ('C2','C3')
;

set autotrace traceonly explain
select /* pre-hack */ * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

declare
        l_distcnt number default null;
        l_density number default null;
        l_nullcnt number default null;
        l_srec    dbms_stats.statrec;
        l_avgclen number default null;

begin

        dbms_stats.get_column_stats (
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

        dbms_stats.set_column_stats(
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

end;
/

set autotrace traceonly explain
select /* post-hack */  * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

I’ve created a simple table for the data and collected stats including histograms on the two columns and on the column group. I’ve taken a simple strategy to find the name of the column group (I could have used the function dbms_stats.create_extended_stats() to set an SQL variable to the name of the column group, of course), and then run a little bit of PL/SQL that literally does nothing more than copy the column group’s stats into memory then write them back to the data dictionary.

Here are the “before” and “after” execution plans that we get from autotrace:

BEFORE
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   997 |  3988 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   997 |  3988 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')


AFTER
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     4 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     4 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')

As required – the estimate for the (‘N’,’N’) rows drops down to (the optimizer’s best approximation to ) zero.

Footnote:

An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past  but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well.

 

Direct IOT

Mon, 2018-07-16 07:02

A recent (automatic ?) tweet from Connor McDonald highlighted an article he’d written a couple of years ago about an enhancement introduced in 12c that allowed for direct path inserts to index organized tables (IOTs). The article included a demonstration seemed to suggest that direct path loads to IOTs were of no benefit, and ended with the comment (which could be applied to any Oracle feature): “Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.”

Clearly it’s necessary to pose the question – “so when would direct mode insert be a good option for IOTs?” – because if it’s never a good option you have to wonder why it has been implemented. This naturally leads on to thinking about which tests have not yet been done – what aspects of IOTs did Connor not get round to examining in his article. (That’s a standard principle of trouble-shooting, or testing, or investigation: when someone shows you a test case (or when you think you’ve finished testing) one thing you should do before taking the results as gospel is to ask yourself what possible scenarios have not been covered by the test.)

So if you say IOT what are the obvious tests once you’ve got past the initial step of loading the IOT and seeing what happens. First, I think, would be “What if the IOT weren’t empty before the test started”; second would be “IOTs can have overflow segments, what impact might one have?”; third would be “Do secondary indexes have any effects?”; finally “What happens with bitmap indexes and the requirement for a mapping table?” (Then, of course, you can worry about mixing all the different possibilities together – but for the purposes of this note I’m just going to play with two simple examples: non-empty starting tables, and overflow segments.)

Here’s some code to define a suitable table:


create table t2 
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	3 * rownum			id,
	lpad(rownum,10,'0')		v1,
	lpad('x',50,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
order by
	dbms_random.value
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T2',
		method_opt  => 'for all columns size 1'
	);
end;
/

create table t1(
	id,
	v1,
	padding,
	constraint t1_pk primary key(id)
)
organization index
-- including v1
-- overflow
nologging
as
select * from t2
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T1',
		method_opt  => 'for all columns size 1'
	);
end;
/

I’ve created a heap table t2 that holds 100,000 rows with an id column that is randomly ordered; then I’ve used this ta1ble as a source to create an IOT, with the option to have an overflow segment that contains just the 100 character padding columns. I’ve used 3 * rownum to define the id column for t2 so that when I insert another copy of t2 into t1 I can add 1 (or 2) to the id and interleave the new data with the old data. (That’s another thought about IOT testing – are you loading your data in a pre-existing order that suits the IOTs or is it arriving in a way that’s badly out of order with respect to the IOT ordering; and does your data go in above the current high value, or spread across the whole range, or have a partial overlap with the top end of the range and then run on above it.)

Have created the starting data set, here’s the test:


execute snap_my_stats.start_snap
execute snap_events.start_snap

insert 
	/*  append */
into t1
select
	id + 1, v1, padding
from
	t2
;


execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’m doing is using a couple of my snapshot packages to check the work done and time spent while insert 100,000 interleaved rows – which are supplied out of order – into the existing table. As shown the “append” is a comment, not a hint, so I’ll be running the test case a total of 4 times: no overflow, with and without the hint – then with the overflow, with and without the hint. (Then, of course, I could run the test without the overflow but an index on v1).

Here are some summary figures from the tests – first from the test without an overflow segment:

                                      Unhinted       With Append
                                  ============      ============
CPU used when call started                 153               102
CPU used by this session                   153               102
DB time                                    166               139

redo entries                           130,603            42,209
redo size                           78,315,064        65,055,376

sorts (rows)                                30           100,031

You’ll notice that with the /*+ append */ hint in place there’s a noticeable reduction in redo entries and CPU time, but this has been achieved at a cost of sorting the incoming data into order. The reduction in redo (entries and size) is due to an “array insert” effect that Oracle can take advantage of with the delayed index maintenance that takes place when the append hint is legal (See the section labelled Option 4 in this note). So even with an IOT with no overflow there’s a potential benefit to gain from direct path loading that depends on how much the new data overlaps the old data, and there’s a penalty that depends on the amount of sorting you’d have to do.

What happens in my case when I move the big padding column out to an overflow segment – here are the equivalent results:


Headline figures                      Unhinted       With Append
================                  ============      ============
CPU used when call started                 158                52
CPU used by this session                   158                52
DB time                                    163                94
redo entries                           116,669            16,690
redo size                           51,392,748        26,741,868
sorts (memory)                               4                 5
sorts (rows)                                33           100,032

Interestingly, comparing the unhinted results with the previous unhinted results, there’s little difference in the CPU usage between having the padding column in the “TOP” section of the IOT compared to having it in the overflow segment, though there is a significant reduction in redo (the index entries are still going all over the place one by one, but the overflow blocks are being pinned and packed much more efficiently). The difference between having the append hint or not, though, is damatic. One third of the CPU time (despited still having 100,000 rows to sort), and half the redo. One of the side effects of the overflow, of course, is that the things being sorted are much shorted (only the id and v1 columns that go into the TOP section, and not the whole IOT row.

So, if you already have an overflow segment that caters for a significant percentage of the row, it looks as if the benefit you could get from using the /*+ append */ hint would far outweigh the penalty of sorting you have to pay. Of course, an IOT with a large overflow doesn’t look much different from a heap table with index – so perhaps that result isn’t very surprising.

I’ll close by re-iterating Connor’s closing comment:

Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.

Before you dive in and embrace it, or ruthlessly push it to one side, make sure you do some testing that reflects the situations you have to handle.

Quiz Night

Sat, 2018-07-14 13:07

Because it’s been a long time since the last quiz night.  Here’s a question prompted by a recent thread on the ODevCom database forum – how many rows will Oracle sorts (assuming you have enough rows to start with in all_objects) for the final query, and how many sort operations will that take ?


drop table t1 purge;

create table t1 nologging as select * from all_objects where rownum < 50000;

select owner, count(distinct object_type), count(distinct object_name) from t1 group by owner;

Try to resist the temptation of doing a cut-n-paste and running the code until after you’ve thought about the answer.

pushing predicates

Fri, 2018-07-13 07:05

I came across this odd limitation (maybe defect) with pushing predicates (join predicate push down) a few years ago that made a dramatic difference to a client query when fixed but managed to hide itself rather cunningly until you looked closely at what was going on. Searching my library for something completely different I’ve just rediscovered the model I built to demonstrate the issue so I’ve tested it against a couple of newer versions  of Oracle (including 18.1) and found that the anomaly still exists. It’s an interesting little detail about checking execution plans properly so I’ve written up the details. The critical feature of the problem is a union all view:


rem
rem	Script:		push_pred_limitation.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2015
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2
rem		11.2.0.4
rem

create table t1
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t2
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t3
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1 for columns owner size 254'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T3',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t2_id on t2(object_id);
-- create index t2_id_ot on t2(object_id, object_type);

create index t3_name_type on t3(object_name, object_type);

create or replace view v1
as
select 
	/*+ qb_name(part1) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.created	date_2,
	t3.created	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
union all
select
	/*+ qb_name(part2) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.last_ddl_time	date_2,
	t3.last_ddl_time	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
;

Two points to note so far: first, the view is basically joining the same two tables in the same way twice but selecting different columns. It’s a close model of what the client was doing but so much simpler that it wouldn’t be hard to find a different way of getting the same result: the client’s version would have been much far harder to rewrite. Secondly, I’ve listed two possible indexes for table t2 but commented one of them out. The indexing will make a difference that I’ll describe later.

So here’s the query with execution plan (from explain plan – but pulling the plan from memory gives the same result):


select
	/*+ qb_name(main) */
	t1.object_name, t1.object_type,
	v1.object_id, v1.date_2, v1.date_3
from
	t1,
	v1
where
	v1.object_id = t1.object_id
and	v1.object_type_2 = t1.object_type
and	v1.object_type_3 = t1.object_type
and	t1.owner = 'OUTLN'
;

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   588 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   588 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|*  3 |   VIEW                                   | V1           |     1 |    44 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   3 - filter("V1"."OBJECT_TYPE_2"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

The execution plan appears to be fine – we can see at operation 4 that the union all view has been access with the pushed predicate option and that the subsequent sub-plan has
used index driven nested loop joins in both branches – until we look a little more closely and examine the Predicate section of the plan. What, exactly, has been pushed ?

Look at the predicate for operation 3: “V1″.”OBJECT_TYPE_2″=”T1″.”OBJECT_TYPE”. It’s a join predicate that hasn’t been pushed into the view. On the other hand the original, and similar, join predicate v1.object_type_3 = t1.object_type has been pushed into the view, appearing at operations 9 and 15. There is a difference, of course, the object_type_3 column appears as the second column of the index on table t3.

Two questions then: (a) will the object_type_2 predicate be pushed if we add it to the relevant index on table t2, (b) is there a way to get the predicate pushed without adding it to the index. The answer to both questions is yes. First the index – re-run the test but create the alternative index on t2 and the plan changes to:

Plan hash value: 497545587

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

Notice how the predicate at operation 3 has disappeared, and the access predicate at operation 8 now includes the predicate “T2″.”OBJECT_TYPE”=”T1″.”OBJECT_TYPE”.

Alternatively, don’t mess about with the indexes – just tell Oracle to push the predicate. Normally I would just try /*+ push_pred(v1) */ as the hint to do this, but the Outline section of the original execution plan already included a push_pred() hint that looked like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 1), so I first copied exactly that into the SQL to see if it would make any difference. It did – I got the following plan (and the hint in the outline changed to PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2 1) so this may be a case where the plan produced by a baseline will perform better than the plan that the produced the baseline!):

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='TEST_USER')
   7 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  13 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

In this case we see that the critical late-joining predicate has disappeared from operation 3 and re-appeared as a filter predicate at operation 7 In many cases you may find that the change in predicate use makes little difference to the performance – in my example the variation in run time over several executions of each query was larger than the average run time of the query; nevertheless it’s worth noting that the delayed use of the predicate could have increased the number of probes into table t3 for both branches of the union all and resulted in redundant data passing up through several layers of the call stack before being eliminated … and “eliminate early” is one of the major commandments of optimisation.

You might notice that the Plan Hash Value for the hinted execution plan is the same as for the original execution plan: the hashing algorithm doesn’t take the predicates into account (just one of many points that Randolf Geist raised in a blog post several years ago). This is one of the little details that makes it easy to miss the little changes in a plan that can make a big difference in performance.

Summary

If you have SQL that joins simple tables to set based (union all, etc.) views and you see the pushed predicate option appearing take a little time to examine the predicate section of the execution plan to see if the optimizer is pushing all the join predicates that it should and, if it isn’t, test the effects of pushing more predicates.

In many cases adding the hint /*+ push_pred(your_view_name) */ at the top of the query may be sufficient to get the predicate pushing you need, but you may need to look at the outline section of the execution plan and add a series of more complicated push_pred() and no_push_pred() hints because the push_pred hint has evolved over time to deal with increasingly complicated transformations.

 

Cardinality Puzzle

Thu, 2018-07-12 06:57

One of the difficulties of being a DBA and being required to solve performance problems is that you probably never have enough time to think about how you got to a solution and why the solution works; and if you don’t learn about the process itself , you just don’t get better at it. That’s why I try (at least some of the time) to write articles and books (as I did with CBO Fundamentals) that

  1. explain simple details that can be used as background facts
  2. walk through the steps of solving a problem

So here’s an example from a question on the ODC database forum asking about the cause and workaround for a bad cardinality estimate that is producing a poorly performing execution plan. It’s actually a type of problem that comes up quite frequently on large data sets and explains why a simple “gather stats” is almost guaranteed to leave you with a few headaches (regardless of whether or not you choose to include histograms as part of the process). I’m not going to offer “the answer” – I’m just going to talk about the inferences we can make from the facts supplied and where we have to go from there.

The DBA has a table holding 80,000,000,000 rows. It is list/hash partitioned with 2 partitions and 1,024 sub-partitions (per partition) but neither of the partitioning key columns appears in the query. The query runs parallel and the optimizer (presumably thanks to the specific settings of various parameters related to parallel execution uses dynamic sampling at level 3).

There is an inline view defined in the query and the DBA has isolated this as a key component of the problem and supplied a query and plan (from “explain plan”) against that view.


select * from TAB2 T
WHERE T.DT = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');
 
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                          |   479M|    76G|  1756K (14)| 05:51:14 |       |       |        |      |            |
|   1 |  PX COORDINATOR              |                          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000                 |   479M|    76G|  1756K (14)| 05:51:14 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL     |                          |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  1024 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS STORAGE FULL| TAB1                     |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  2048 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))
       filter(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))

Note
-----
   - dynamic sampling used for this statement (level=3)

The DBA’s problem is that if the estimated cardinality of this extract goes over roughly 500M the optimizer chooses a bad plan for the overall query – and on occasion this extract has given an estimate of 5 billion rows. Moreover, the actual number of rows returned by this extract is typically in the order of 40M, so the estimate is a long way off even when it’s “good enough”.

So where do we start looking to work out what’s going wrong? You’ll note, of course, that after text expansion the user’s single predicate has changed, and an extra predicate (previously hidden inside the view) has appeared; instead of just T.DT = to_date(:b1,’MM/DD/YYYY HH24:MI:SS’) we now have (cosmetically adjusted):

        COALESCE(DFG,'N')='N' 
AND     TRUNC(DT_TM)=TO_DATE(:B1,'MM/DD/YYYY HH24:MI:SS')

There are two immediately obvious threats here – first that the combination of predicates means Oracle is likely to make a mistake because it will check the individual selectivities and multiply them together to get the combined selectivity, second that the appearance of predicates of the form “function(column) = constant” means that Oracle will guess 1% as the individual selectivities.

Without checking more details we might assume that a possible quick fix (that would require no changes to existing code) would be to create a couple of virtual columns (or extended stats) to represent the two expressions and gather stats on the resulting columns – though it is a restriction of extended stats that you can’t “double up” and create a column group on the two column expressions, so there’s still some scope for a cardinality estimate that is still sufficiently bad even with this approach. We also note that if we can change the coalesce(DFG,’N’) that must have been hidden in the view to nvl(DFG,’N’) then Oracle would be able to “or expand” the nvl() and use a more appropriate selectivity for that part of the predicate.

However, the points I’ve covered so far tend to produce estimates that are too small and often much too small. So maybe the key to the problem is in the Note section that tells us that Oracle has (successfully) used dynamic sampling for this statement. In other words, all the theory of how the optimizer calculates selectivity may be irrelevant – the estimate will be based on the luck of the sample.

So let’s take a look at the (slightly edited) table stats we’ve been given:

column_name data_type num_distinct low_value      high_value     density   num_null  histogram
DT_TM       DATE           6179571 78740B1E0A383C 7876020B01262B 1.6182E-7 0         NONE
DFG         VARCHAR2             1 4E             4E             1         0         NONE

Notice that the DFG (apparently) has the value ‘N’ for every row in the table (low_value = high_value = 0x4E, num_nulls = 0). The date range is 30-Nov-2016 to 11-Feb-2018, with no histogram but 6.18M distinct values for 80 Billion rows. Neither column has a histogram.

A little arithmetic tells us that (on average) there ought to be about 182M (= 80B / 438 days) rows for any one day – and that’s worth thinking about for three separate reasons.

First, an estimate of 479M against an average of 182M isn’t too surprising if it’s based on a fairly small sample, it’s only out by a factor of 2.6. On the other hand, getting an an estimate of 5 billion – which can happen on bad days – is extremely unlikely if the data is uniformly distributed across dates.

Secondly, the DBA supplied us with some data from the recent past with an aggregate query for “trunc(dt_tm)”, with the following results:

TRUNC(DT_TM)   COUNT(*)
------------ ----------
01-FEB-18    44,254,425
02-FEB-18    46,585,349
03-FEB-18    43,383,099
04-FEB-18    32,748,364
05-FEB-18    37,993,126
06-FEB-18    39,708,994
07-FEB-18    38,696,777
08-FEB-18    41,871,780
09-FEB-18    46,702,852
10-FEB-18    42,744,870
11-FEB-18    34,971,845
12-FEB-18    37,165,983

Recent data seems to follow an average of around 40M rows per day, so the estimate of 182M that we can derive from the stored statistics is a long way off: the present is behaving very differently from the past and that’s a relatively common problem with very large data sets – though it’s more usual for rolling averages to increase from the past to the present because the data is often representing the growth of a business over time. Can we create a hypothesis to explain the discrepancy, and could that hypothesis also account for the sample producing some very strange estimates ?

Finally, slightly more subtle and only included for completeness, if this column is supposed to hold date and time to the nearest second – which is what you might expect from an Oracle date type – there are 38 million possible values (438 x 86,400) it could be holding, and that’s more than the actual number of distinct values by a factor of 6. We can also work out that 80 billion rows over 438 days is 2,000 rows per second (on average). Averages are often misleading, of course, many systems have a pattern where a working day shows most of the data created in a 12 – 16 hour window with a couple of hours of more intense activity. For reference, though: average rows per second for the recent data is roughly 40M/86400 = 460; while the average we derive from the stored statistics is 80B / 6M = 13000 rows per second; this unlikely pattern needs a “non-uniform” explanation.

How do these three thoughts help us to understand or, to be more accurate, to make a sensible guess about why the optimizer can use dynamic sampling and get a wildly variable estimate which can be 1 or 2 orders of magnitude wrong. (varying between 479M and 5,000M compared to the recent actual 40M)?

Here’s one simple idea: extrapolate the 40M rows per day over 80B rows: that’s 2,000 days (possibly rather more since businesses tend to grow). What if the dt_tm is the timestamp for the moment the row was loaded into the database, and a couple of years ago (maybe around “30th Nov 2016”) the data was restructured and the existing five years of data was loaded over a very short period of time – let’s say one week. This would leave you with 17B rows of “new” data with a dt_tm spread at 40M rows per day for most of 438 days, and 63B rows of “historic” data packed into 7 days (at 9B rows per day).

I don’t know how Oracle would have randomly selected its sample from an extremely large table with 2,048 physical data segments but it’s totally believable that a small, widely scattered sample could end up with an extremely unrepresentative subset of the data. A completely random sample of the data would produce an estimate of around 500M rows for the predicate; but it would only take a fairly small variation in the sample (taking a few too many “historic” rows) to produce a large enough change in the estimate to change the execution plan, and a rare, but not extreme variation could easily take the estimate up to 5B.

Next Steps

It would be at this point in a performance assignment that I’d be asking around to find out if my guess about a massive data load operation was correct – if I couldn’t get the answer by talking to people I’d run a query against the whole data set to check the hypothesis, because there’s clearly some sort of skew in the data that’s causing a problem. I’d also run the critical part of the query a couple of times with events 10046/level 4 and 10053 set (but only fetching the first few rows) to find out from the trace file how large a sample Oracle was using, and then run the sampling query a few times to see what the sampled results looked like. Depending on the results I’d either find a way to stop Oracle from sampling for this query or I might create a virtual column (or just extended stats since it’s 11g) on just the trunc(dt_tm), possibly with a histogram in place (maybe coded by hand) if that could isolate the special dates and leave Oracle with a better estimate of the typical date. I might find I had to change the coalesce() to an nvl() as well – or create a virtual  column – to stop the sampling.

Finally, it’s worth noting that in 11g it’s possible to create pending (table preference “PUBLISH” = FALSE) stats for testing purposes; it’s also worth noting that the default histogram on trunc(dt_tm) would be a height-balanced histogram while we could create a frequency histogram in 12c since 12c allows us to specify up to 2,048 columns.

Footnote

If you check the ODC thread you’ll see that the OP has marked as correct a suggestion to change:

    TRUNC (TB1.DT_TM)  = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');  

to

    dt_tm >= trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))
and dt_tm <  trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))+1

Note that that’s “greater than or equal to” at one end and “strictly less than” at the other when using “date + 1”.

This has the effect of giving the optimizer a chance of using the low/high values of the column to produce a better (though perhaps still overlarge) and consistent estimate of the rows in the date range; and it may also stop the optimizer from doing dynamic sampling at level 3 (the “I’m guessing, let’s check” level) though it’s possible that the sampling would be disabled only if the coalesce() were changed to an nvl() as well.

Of course, from the information supplied, this looks like the OP would have to change a view definition and the run-time code to achieve the result. But in an ideal world doing things that avoid confusing the optimizer is usually the sensible strategy provided it doesn’t take an extreme amount of coding and testing.

 

Validate FK

Tue, 2018-07-10 04:42

A comment arrived yesterday on an earlier posting about an enhancement to the truncate command in 12c that raised the topic of what Oracle might do to validate a foreign key constraint. Despite being sure I had the answer written down somewhere (maybe on a client site or in a report to a client) I couldn’t find anything I’d published about it, so I ran up a quick demo script to show that all Oracle does is construct a simple SQL statement that will do check the data – and then do whatever the optimizer does to produce the fastest possible plan.

Here’s the script – with a few variations to show what happens if you start tweaking features to change the plan.

rem
rem     Script:         validate_fk.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table parent 
as 
select  * 
from    all_Objects 
where   rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table parent add constraint par_pk primary key(object_id);

execute dbms_stats.gather_table_stats(null, 'parent', cascade=>true)


create table child 
as 
select par.* 
from    (select rownum from dual connect by level <= 10) v1, --> comment to avoid wordpress format issue
        parent par
; 

alter table child add constraint chi_fk_par foreign key(object_id) references parent enable novalidate; 
create index chi_fk_par on child(object_id); 
execute dbms_stats.gather_table_stats(null, 'child', cascade=>true)


-- alter table child modify object_id null;
-- alter table child parallel(degree 8);
-- alter session set "_fast_full_scan_enabled" = FALSE;
-- alter session set "_optimizer_outer_to_anti_enabled" = false;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 12';
alter table child modify constraint chi_fk_par validate;
alter session set events '10046 trace name context off;

All I’ve done is created parent table with a primary key, and a child table with 10 rows per parent. I’ve created a foreign key constraint on the child table, enabled it (so future data will be checked) but not validated it (so there’s no enforced guarantee that the existing data is correct). Then I’ve issued a command to validate the foreign key.

The flush of the buffer cache is to allow me to see the I/O that takes place and will also (usually) let me see some if there are any strange issues due to any recursive SQL Oracle. As you can see I’ve also got a couple of commented commands that might cause a couple of variations in behaviour.

Here’s the critical content from the output of the trace file summary from tkprof (in versions from 11.2.0.4 to 12.2.0.1):


select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3
from
 "TEST_USER"."CHILD" A , "TEST_USER"."PARENT" B where( "A"."OBJECT_ID" is not
  null) and( "B"."OBJECT_ID" (+)= "A"."OBJECT_ID") and( "B"."OBJECT_ID" is
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.02        241        373          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.02       0.02        241        373          0           0

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=373 pr=241 pw=0 time=21779 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=17753 us starts=1 cost=32 size=1700000 card=100000)(object id 104840)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=4494 us starts=10000 cost=0 size=49995 card=9999)(object id 104838)

As you can see, Oracle writes SQL for an outer join with an “is null” predicate on the outer table – which the optimizer converts to an anti-join, running a nested loop in this case. It’s an interesting little oddity that the code includes the predicate “A”.”OBJECT_ID” is not null given that the column is declared as not null – but this is presumably a developer deciding to re-use code even if it then includes a redundant predicate (which is effectively zero cost – since the optimizer can use transitive closure to eliminate it).

Given that Oracle has converted an outer join to an anti join I obviously had to check what would happen if I disabled this conversion by altering the “_optimizer_outer_to_anti_enabled” parameter to false. The optimizer obeyed the session setting with the following plan in the trace:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  FILTER  (cr=373 pr=241 pw=0 time=226926 us starts=1)
    100000     100000     100000   NESTED LOOPS OUTER (cr=373 pr=241 pw=0 time=177182 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000    INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=40811 us starts=1 cost=32 size=1700000 card=100000)(object id 104848)
    100000     100000     100000    INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=119363 us starts=100000 cost=0 size=49995 card=9999)(object id 104846)

The significant difference is in the CPU usage, of course, and to a degree the magnitude of the change is dictated by the pattern and distribution of the data. The number of CR gets hasn’t changed as the number of index probes jumps from 10,000 to 100,000 because Oracle will have pinned index blocks (There’s a very old article on my old website if you want to read more about buffer pins).

The original question was about the effect of a local session setting that disabled index fast full scans, and followed up with a question on parallelism. After seeing the effect of changing one optimizer parameter at the session level you probably won’t be surprised by the following two results.  First, when the only change I make is the setting of the “_index_fast_full_scan_enabled” parameter, and then when the only change is the declared parallelism of the child table.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  MERGE JOIN ANTI (cr=240 pr=240 pw=0 time=120163 us starts=1 cost=247 size=22000 card=1000)
    100000     100000     100000   INDEX FULL SCAN CHI_FK_PAR (cr=218 pr=218 pw=0 time=20314 us starts=1 cost=222 size=1700000 card=100000)(object id 104852)
    100000     100000     100000   SORT UNIQUE (cr=22 pr=22 pw=0 time=81402 us starts=100000 cost=25 size=50000 card=10000)
     10000      10000      10000    INDEX FULL SCAN PAR_PK (cr=22 pr=22 pw=0 time=1185 us starts=1 cost=22 size=50000 card=10000)(object id 104850)


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=15 pr=2 pw=0 time=722483 us starts=1)
         0          0          0   PX SEND QC (RANDOM) :TQ10001 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0    HASH JOIN ANTI BUFFERED (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0     PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0      TABLE ACCESS FULL CHILD (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0      PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0       PX SELECTOR  (cr=0 pr=0 pw=0 time=0 us starts=0)
         0          0          0        INDEX FAST FULL SCAN PAR_PK (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)(object id 104854)

In the first case my version of Oracle has switched to a merge anti-join with an index full scan (not FAST full scan). It’s interesting to note that the merge join anti hasn’t been as clever as the nested loop anti in avoiding probes of the second data source as it walks the foreign key index (note how starts=100000 in the SORT UNIQUE line).

In the second case all the work was done by the parallel query slaves – and the PX SELECTOR line tells you that this plan must have come from 12c. As you can see we’re still doing an anti-join but this time we do a parallel tablescan of the child table (as we haven’t enabled the index for parallel execution – if we had altered the index to parellel(degree 8) as well we would have seen a parallel index fast full scan instead of the parallel tablescan.)

Bottom line: the SQL executed to validate a foreign key constraint is essentially a join between the parent and child tables, Oracle will simply optimize that statement to the best of its abilities based on the current session settings. If you want to test on a clone (or accurate model) of the tables you may find that you can create an sql_patch that works (even though the necessary SQL will be optimised as SYS – though so far I’ve only tried this with a couple of variants of the parallel() hint on 12.2.0.1)

<h3>Footnote</h3>

If you were wondering what the three bind variables in the query were, this is the relevant extract from the 10046 trace file with bind variable tracing enabled:

 Bind#0
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c38  bln=32  avl=09  flg=05
  value="TEST_USER"
 Bind#1
  oacdty=01 mxl=32(05) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c00  bln=32  avl=05  flg=05
  value="CHILD"
 Bind#2
  oacdty=01 mxl=32(10) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359bc8  bln=32  avl=10  flg=05
  value="CHI_FK_PAR"

The values are the owner, table, and constraint names. (Though you have to modify the code a little to show that the last one is the constraint name and not the index name).

 

 

Pages