ONLINE STATISTICS GATHERING DOESN’T WORK ON INDEXES WITH DUPLICATE VALUES ON ORACLE 19C (eng version)
Oracle

From the Oracle 9i version when a new index is created, statistics are collected also on the new created object in a silent and completely automatic way. The same principle is used starting from the Oracle 12c version on the direct path operations as CTAS (Create Table AS Select) and IAS (Insert AS Select) on empty tables.

With Oracle 19c  this functionality has been extended to indexes that will benefit of the “on the fly” re-gathered statistics with CTAS and IAS operations.

A NEW ROW SOURCE HAS BEEN ADDED IN EXECUTION PLAN

Indications on the Online Statistics Gathering are also provided in execution plan, a new row source has been added (OPTIMIZER STATISTICS GATHERING ):

SQL> EXPLAIN PLAN FOR
     2  CREATE TABLE t (id1, id2, id3, str1, CONSTRAINT t_pk PRIMARY KEY (id1)) AS SELECT rownum id1, rownum id2, rownum id3, dbms_random.string('a', 10) str1
 FROM dual connect by level <= 100000;
 SQL> SELECT * FROM table(dbms_xplan.display(format=>'basic'));
  
 -------------------------------------------------
 | Id  | Operation                        | Name |
 -------------------------------------------------
 |   0 | CREATE TABLE STATEMENT           |      |
 |   1 |  LOAD AS SELECT                  | T    |
 |   2 |   OPTIMIZER STATISTICS GATHERING |      |
 |   3 |    COUNT                         |      |
 |   4 |     CONNECT BY WITHOUT FILTERING |      |
 |   5 |      FAST DUAL                   |      |
 -------------------------------------------------   

The presence of the new  row source  in the execution plan doesn’t give any garantee on the runtime online statistics executions.

ARE ONLINE STATISTICS COLLECTED ON ANY TYPE OF INDEX?

Why not, I thought. I imagine that Online Statistics Gathering in 19c does something like a call to dbms_stats.gather_table_stats with cascade=> true parameter so that the statistics can also be calculate on indexes. Running some test I found that Online Statistics Gathering doesn’t  run on all type of available indexes.

 

The following test run a CTAS with three column (id1,id2 e id3) : on field id1 will be built the PK index, on field id2 will be created one UNIQUE index UNIQUE and on field id3 will be created an index with duplicate values. The Oracle database version used is 19.5:

SQL> select banner_full from v$version;
  
 BANNER_FULL
 ------------------------------------------------------------------------
 Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
 Version 19.5.0.0.0 

I have created a table T:

SQL> CREATE TABLE t (id1, id2, id3, str1, CONSTRAINT t_pk PRIMARY KEY (id1)) AS  SELECT rownum id1, rownum id2, rownum id3, dbms_random.string('a', 10) str1
 FROM dual connect by level <= 100000;

And I look at the statistics recalculation time plus other reference values:

SQL> SELECT table_name, num_rows, blocks,
last_analyzed, stale_stats FROM USER_TAB_STATISTICS;
 TABLE_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED          STALE_STATS
 ----------- ---------- ---------- ---------------------- -----------
 T               100000        441 13-nov-2019 15:45:32   NO  

Next I created a UNIQUE index and a normal B-Tree index; at the end, the table T will have three indexes:

SQL> CREATE UNIQUE INDEX t_unique on t(id2);
SQL> CREATE INDEX t_no_unique on t(id3);
SQL> SELECT index_name, last_analyzed, stale_stats FROM USER_IND_STATISTICS WHERE table_name='T';

INDEX_NAME     LAST_ANALYZED          STALE_STATS
 -------------- ---------------------- -----------
 T_PK           13-nov-2019 15:45:32   NO
 T_UNIQUE       13-nov-2019 15:45:32   NO
 T_NO_UNIQUE    13-nov-2019 15:45:32   NO 

One of the Online Statistics Gathering prerequisites on the direct path mode operations is to have any data in the table, the table must be empty as indicated in the Oracle documentation:

“Specifically, bulk loads do not gather statistics automatically when any of the following conditions applies to the target table, partition, or subpartition:

The object contains data. Bulk loads only gather online statistics automatically when the object is empty. “

If you want to see all the other conditions, refer to the following link:

https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/optimizer-statistics-concepts.html#GUID-CDDB5A54-0991-4E68-A9D7-2305777B608B

I then execute a TRUNCATE TABLE on table T to put myself in the indicated conditions, the table must not contain data:

SQL> truncate table t;  
After executing the TRUNCATE statement  the statistics on the table and on the indexes become STALE:

SQL> SELECT table_name, num_rows, blocks, last_analyzed, stale_stats FROM USER_TAB_STATISTICS;
  
TABLE_NAME     NUM_ROWS     BLOCKS LAST_ANALYZED          STALE_STATS
------------ ---------- ---------- ---------------------- -----------
T                100000        441 13-nov-2019 15:45:32   YES
  
SQL> SELECT index_name, last_analyzed, stale_stats FROM USER_IND_STATISTICS WHERE table_name='T'; 
  
INDEX_NAME   LAST_ANALYZED          STALE_STATS
------------ ---------------------- -----------
T_PK         13-nov-2019 15:45:32   YES
T_UNIQUE     13-nov-2019 15:45:32   YES
T_NO_UNIQUE  13-nov-2019 15:45:32   YES  

I execute an insert in direct path:

SQL> INSERT /*+ append */ INTO t SELECT rownum id1, rownum id2, rownum id3, dbms_random.string('a', 10) str1 FROM dual connect by level <= 100000;
 
100000 righe create.
SQL> commit;
 
Commit completato 
 
SQL> SELECT table_name, num_rows, blocks, last_analyzed, stale_stats FROM USER_TAB_STATISTICS;
 
TABLE_NAME     NUM_ROWS     BLOCKS LAST_ANALYZED         STALE_STATS
------------ ---------- ---------- --------------------- -----------
T                100000        441 13-nov-2019 15:45:50  NO

SQL> SELECT index_name, last_analyzed, stale_stats FROM USER_IND_STATISTICS WHERE table_name='T';

INDEX_NAME     LAST_ANALYZED           STALE_STATS
-------------- ----------------------- -----------
T_PK           13-nov-2019 15:45:50    NO
T_UNIQUE       13-nov-2019 15:45:50    NO
T_NO_UNIQUE    13-nov-2019 15:45:32    YES  

after executing the insert in direct path, I found an “anomaly”: online statistics were performed as expected on the table, on the primary key and on the UNIQUE index, but they were not performed on the B-tree index, that admits duplicate values (T_NO_UNIQUE index) .

The statistics on the T_NO_UNIQUE index remain STALE, Online Statistics Gathering were not performed on this index, the time indicated in the last_analyzed column remained unchanged.

TRACING DBMS_STATS ON ONLINE STATISTICS GATHERING

I then verified the execution of online statistics by executing a trace dbms_stats:

 

modify the file name and open the trace

alter session set tracefile_identifier = 'online_stats_trace';
exec dbms_stats.set_global_prefs('TRACE',4+8+16+32);  

run a direct path insert

INSERT /*+ append */ INTO t SELECT rownum id1, rownum id2, dbms_random.string('a', 10) str1 FROM DBA_OBJECTS;
commit; 

close the trace

exec dbms_stats.set_global_prefs('TRACE',0); 
In the output of the trace file I find the indication that the Online Statistics Gathering for bulk load have been executed, the following sentence found in the trace file confirms it:

“DBMS_STATS: postprocess online optimizer stats gathering for SETTEMBRINO.T”

below is the extract of the trace file:

TRACING ONLINE STATISTICS WITH 10046 EVENT

In the trace file generated by activating 10046 event I then identified a call to two undocumented DBMS_STATS procedures which should then be the two procedures that deal with running Online Statistics Gathering:

dbms_stats.postprocess_stats(:owner, :tabname, :tobj, :fobjn, :flags, :rawstats, :selmap, :clist, null, null);
and
dbms_stats.postprocess_indstats(:ilist, :flags); 

I don’t think I’m wrong by saying that postprocess_stats is the DBMS_STATS procedure that performs online statistics on the table, while the postprocess_indstats is the procedure that deals with executing them on the indexes.

To confirm what I just said, I identified two postprocess_stats calls, one immediately after the CTAS and one after the IAS and a postprocess_indstats call immediately after the IAS.

SUMMARY

From the tests it would seem therefore that Online Statistics Gathering for Bulk Loads are not executed on B-tree indexes that allow duplicate values. I can’t find plausible reasons to justify the behavior found in the tests, so I have to assume that it is a bug. Moreover, Online Statistics Gathering are not performed even on bitmap and constraint deferrable indexes.


Any comments and/or corrections they are as always welcome:)

 italian version

condividi su
Donatello Settembrino
AUTORE Donatello Settembrino

Oracle Specialist at Performance Team @ ICTeam Spa