ACCESS[20170603]12c Top Frequency histogram.txt

—恢复内容开始—

[20170603]12c Top Frequency histogram.txt

–//个人对直方图了解非常少,以前2种植直方图类型对于目前之浩大运用来讲都够,或者说遇到的题材十分少.
–//抽一点点岁月,简单探究12c Top Frequency histogram.

–//以前的效率直方图Frequency
histogram,受限bucket(桶的分寸),如果产生255只不等值,oracle分析后无见面建立频率直方图,而是建立大
–//度直方图.这样的状会造成一些流行值的统计在亮执行计划时差别十分大.而12c引入了Top
Frequency histogram,注意这里的top,
–//我的理解就是是流行值(popular),也尽管是这般树立的直方图仅仅包括popular,其他non-popular不考虑,这样在sql语句之询问这些
–//popular时,显示的统计信息相对可靠,从而利于oracle选择对的履计划.

–//以下是本人之求学笔记,也许会在不少破绽百出,仅仅开一个记录.我哉看了累累旁人的blog.^_^.而且我当下的条件就生12.0.0.1(版本太
–//低).

1.环境:
SCOTT@test01p> @ ver1
PORT_STRING                    VERSION       
BANNER                                                                              
CON_ID




IBMPC/WIN_NT64-9.1.0           12.1.0.1.0     Oracle Database 12c
Enterprise Edition Release 12.1.0.1.0 – 64bit Production             
0

–//如果要确立Top Frequency histogram必须要满足几乎只尺码:
–//链接
raajeshwaran.blogspot.co.id/2016/06/top-frequency-histogram-in-12c.html

The database creates a Top frequency histogram, when the following
criteria are met.

NDV is greater than n, where n is the requested number of buckets
(default 254)
The percentage of rows occupied by Top-frequent values is greater than
or equal to the threshold p where p is (1-(1/n)*100).
The estimate_percent parameter in dbms_stats gathering procedure
should be auto_sample_size (set to default)

–//翻译过来NDV(也就是字段的免同值)大于N(指bucket的数).
–//流行值(popular)在Top-frequent中商数量/总计多少的比要逾(1-(1/n)*100).如果起10个桶,这样流行值的合计要于90%之上

2.第一验证(1-(1/n)*100)比值是否正确:
SCOTT@test01p> create table t as select * from dba_objects;
Table created.

select column_name,num_distinct,density,histogram,SAMPLE_SIZE
  from user_tab_col_statistics
  where table_name =’T’
  and column_name =’OWNER’;

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32     .03125 NONE                 
91695

–//12c ctas 建立统计信息,但是非会见树立直方图.density 1/32=.03125.
SCOTT@test01p> select count(*) from t;

  COUNT(*)

     91695

–//随手写的sql语句:

with a as (select distinct owner,count(*) over(partition by owner) n1
,count(*) over () n2 from t order by 2 desc ),
b as (select owner,n1,n2,sum(n1) over (order by n1 desc) n3  from a
order by n1 desc)
select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2
from b;

ROWNUM OWNER                N1         N2         N3         X1        
X2



     1 SYS               41942      91695      41942     .45741         
0
     2 PUBLIC            37142      91695      79084     .86247        
.5
     3 APEX_040200        3405      91695      82489      .8996    
.66667
     4 ORDSYS             3157      91695      85646     .93403       
.75
     5 MDSYS              1819      91695      87465     .95387        
.8
     6 XDB                 985      91695      88450     .96461    
.83333
     7 SYSTEM              641      91695      89091      .9716    
.85714
     8 CTXSYS              405      91695      89496     .97602      
.875
     9 WMSYS               387      91695      89883     .98024    
.88889
    10 DVSYS               352      91695      90235     .98408        
.9
    11 SH                  309      91695      90544     .98745    
.90909
    12 ORDDATA             292      91695      90836     .99063    
.91667
    13 LBACSYS             209      91695      91045     .99291    
.92308
    14 OE                  142      91695      91187     .99446    
.92857
    15 SCOTT                96      91695      91283     .99551    
.93333
    16 GSMADMIN_INTERNAL    77      91695      91360     .99635     
.9375
    17 IX                   58      91695      91418     .99698    
.94118
    18 DBSNMP               55      91695      91473     .99758    
.94444
    19 PM                   44      91695      91517     .99806    
.94737
    20 HR                   35      91695      91552     .99844       
.95
    21 OLAPSYS              25      91695      91577     .99871    
.95238
    22 OJVMSYS              23      91695      91600     .99896    
.95455
    23 DVF                  19      91695      91619     .99917    
.95652
    24 FLOWS_FILES          13      91695      91632     .99931    
.95833
    25 AUDSYS               12      91695      91644     .99944       
.96
    26 ORDPLUGINS           10      91695      91664     .99966    
.96154
    27 OUTLN                10      91695      91664     .99966    
.96296
    28 BI                    8      91695      91688     .99992    
.96429
    29 ORACLE_OCM            8      91695      91688     .99992    
.96552
    30 SI_INFORMTN_SCHEM     8      91695      91688     .99992    
.96667
    31 APPQOSSYS             5      91695      91693     .99998    
.96774
    32 TEST                  2      91695      91695          1    
.96875

–//如果加入条件where round(n3/n2,5)
>round(1-1/rownum,5),全部输出.也就算是如此只要桶小于32,大于1.树之都是Top
Frequency.

3.继往开来测试:
D:\temp>cat a1.sql
cat a1.sql
exec 
dbms_stats.gather_table_stats(ownname=>user,tabname=>’T’,method_opt=>’for
columns owner size &1′);
select column_name,num_distinct,density,histogram,SAMPLE_SIZE from
user_tab_col_statistics where table_name =’T’ and column_name
=’OWNER’;

SCOTT@test01p> @ a1.sql 2
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32     .03125 HYBRID                
5500

SCOTT@test01p> @ a1.sql 3
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 5.4529E-06 TOP-FREQUENCY        
91695

SCOTT@test01p> @ a1.sql 4
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 5.4529E-06 TOP-FREQUENCY        
91695

SCOTT@test01p> @ a1.sql 31
PL/SQL procedure successfully completed.
COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 5.4529E-06 TOP-FREQUENCY        
91695

SCOTT@test01p> @ a1.sql 32
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 5.4529E-06 FREQUENCY            
91695

–//除了bucket=2,32成立之直方图HYBRID,FREQUENCY外,建立的且是TOP-FREQUENCY.
–//以10个bucket为例.解方程式(90235-x)/(91695-x)=0.9
,得到x=77095.啊即是只要抽77095.

–//delete t where owner=’SYS’ and rownum<=41000;
–//delete t where owner=’PUBLIC’ and rownum<=36095;

SCOTT@test01p> delete t where owner=’SYS’ and rownum<=41000;
41000 rows deleted.

SCOTT@test01p> delete t where owner=’PUBLIC’ and rownum<=36095;
36095 rows deleted.

SCOTT@test01p> commit ;
Commit complete.

with a as (select distinct owner,count(*) over(partition by owner) n1
,count(*) over () n2 from t order by 2 desc ),
b as (select owner,n1,n2,sum(n1) over (order by n1 desc) n3  from a
order by n1 desc)
select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2
from b where rownum<=11;

ROWNUM OWNER         N1         N2         N3         X1         X2


     1 APEX_040200 3405      14600       3405     .23322          0
     2 ORDSYS      3157      14600       6562     .44945         .5
     3 MDSYS       1819      14600       8381     .57404     .66667
     4 PUBLIC      1047      14600       9428     .64575        .75
     5 XDB          985      14600      10413     .71322         .8
     6 SYS          942      14600      11355     .77774     .83333
     7 SYSTEM       641      14600      11996     .82164     .85714
     8 CTXSYS       405      14600      12401     .84938       .875
     9 WMSYS        387      14600      12788     .87589     .88889
    10 DVSYS        352      14600      13140         .9         .9
    11 SH           309      14600      13449     .92116     .90909
11 rows selected.
–//backet=10,前面10个值占90%.

SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.
COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 .000034247 TOP-FREQUENCY        
14600

–//再减少1条记录.
SCOTT@test01p> delete t where owner=’SYS’ and rownum<=1;
1 row deleted.

SCOTT@test01p> commit ;
Commit complete.

ROWNUM OWNER         N1         N2         N3         X1         X2


     1 APEX_040200 3405      14599       3405     .23324          0
     2 ORDSYS      3157      14599       6562     .44948         .5
     3 MDSYS       1819      14599       8381     .57408     .66667
     4 PUBLIC      1047      14599       9428      .6458        .75
     5 XDB          985      14599      10413     .71327         .8
     6 SYS          941      14599      11354     .77772     .83333
     7 SYSTEM       641      14599      11995     .82163     .85714
     8 CTXSYS       405      14599      12400     .84937       .875
     9 WMSYS        387      14599      12787     .87588     .88889
    10 DVSYS        352      14599      13139     .89999         .9
    11 SH           309      14599      13448     .92116     .90909
11 rows selected.
–//现在前10占.89999.

SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32    .018378 HYBRID               
14599

–//可以窥见建立的直方图无是TOP-FREQUENCY,而是HYBRID(混合型直方图).

4.转化成TOP-FREQUENCY.
SCOTT@test01p> insert into t  select * from dba_objects where
owner=’SYS’ and rownum=1;
1 row created.

SCOTT@test01p> commit ;
Commit complete.

SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM      
SAMPLE_SIZE



OWNER                          32 .000034247 TOP-FREQUENCY        
14600

–//DENSITY=1/SAMPLE_SIZE/2, 1/14600/2=.00003424657534246575刚刚符合.

5.今省执行计划:
SCOTT@test01p> select count(*) from t where owner=’DVSYS’;

  COUNT(*)

       352

SCOTT@test01p> @ dpc ” ”

PLAN_TABLE_OUTPUT

SQL_ID  2at34f0zaqhzj, child number 0

select count(*) from t where owner=’DVSYS’

Plan hash value: 2966233522

| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost

(%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |

|   0 | SELECT STATEMENT   |      |      1 |        |       |   428
(100)|          |      1 |00:00:00.01 |    1544 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     8 |           
|          |      1 |00:00:00.01 |    1544 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |    352 |  2816 |   428  

(0)| 00:00:01 |    352 |00:00:00.01 |    1544 |

Query Block Name / Object Alias (identified by operation id):

   1 – SEL$1
   2 – SEL$1 / T@SEL$1

Predicate Information (identified by operation id):

   2 – filter(“OWNER”=’DVSYS’)

–//可以发现e_rows=A-Rows.可以窥见异常准确.
–//查看直方图信息.

SCOTT@test01p> select endpoint_number,endpoint_actual_value from
user_tab_histograms where table_name =’T’ and column_name =’OWNER’
order by 1;
ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU


           3405 APEX_040200
           3810 CTXSYS
           4162 DVSYS
           5981 MDSYS
           9138 ORDSYS
          10185 PUBLIC
          11127 SYS
          11768 SYSTEM
          12155 WMSYS
          13140 XDB
10 rows selected.

–//4162-3810=352,可以窥见正好符合.也就是popular值统计很正确.看看非popular值.

SCOTT@test01p> select count(*) from t where owner=’DVSYS1′;

  COUNT(*)

         0

Plan hash value: 2966233522        

| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost

(%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |

|   0 | SELECT STATEMENT   |      |      1 |        |       |   428
(100)|          |      1 |00:00:00.01 |    1544 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     8 |           
|          |      1 |00:00:00.01 |    1544 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |     66 |   528 |   428  

(0)| 00:00:01 |      0 |00:00:00.01 |    1544 |

SCOTT@test01p> select count(*) from t where owner=’SCOTT’;

  COUNT(*)

        96

Plan hash value: 2966233522

| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost

(%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |

|   0 | SELECT STATEMENT   |      |      1 |        |       |   428
(100)|          |      1 |00:00:00.01 |    1544 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     8 |           
|          |      1 |00:00:00.01 |    1544 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |     66 |   528 |   428  

(0)| 00:00:01 |     96 |00:00:00.01 |    1544 |

–//可以窥见估计E-Rows如何计算的为?

6.做10053跟踪:
D:\tools\sqllaji>cat 10053x.sql
execute
dbms_sqldiag.dump_trace(p_sql_id=>’&1′,p_child_number=>&2,p_component=>’Compiler’,p_file_id=>’&&1′);

SCOTT@test01p> @ 10053x f31kz63ksu1tc 0
PL/SQL procedure successfully completed.

***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T[T]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1):
    NewDensity:0.004545, OldDensity:0.000034 BktCnt:13140.000000,
PopBktCnt:13140.000000, PopValCnt:10, NDV:32
  Column (#1): OWNER(VARCHAR2)
    AvgLen: 8 NDV: 32 Nulls: 0 Density: 0.000000
    Histogram: Top-Freq  #Bkts: 13140  UncompBkts: 13140  EndPtVals:
10  ActualVal: yes
  Table: T  Alias: T
    Card: Original: 14600.000000  Rounded: 66  Computed: 66.36  Non
Adjusted: 66.36
  Access Path: TableScan
    Cost:  428.36  Resp: 428.36  Degree: 0
      Cost_io: 428.00  Cost_cpu: 14122025
      Resp_io: 428.00  Resp_cpu: 14122025
  Best:: AccessPath: TableScan
         Cost: 428.36  Degree: 1  Resp: 428.36  Card: 66.36  Bytes: 0

    check parallelism for statement[<unnamed>]
kkfdtParallel: parallel is possible (no statement type restrictions)
    kkfdPaForcePrm: dop:1 ()
     use dictionary DOP(1) on table
kkfdPaPrm:- The table : 106380
kkfdPaPrm:DOP = 1 (computed from hint/dictionary/autodop)
kkfdiPaPrm: dop:1 serial(?)
***************************************

–//使用 NewDensity:0.004545.
 BktCnt:13140.000000, PopBktCnt:13140.000000 =>
对许就是前10单流行值的总和.

–//非流行值的多寡: 14600-13140=1460
–//非流行值的桶数量: 32-10=22
–//非流行值的多寡/非流行值的桶数量
1460/22=66.36363636363636363636,四放弃五适合66,正好契合执行计划的推断.
–//NewDensity的精打细算 =1460/14600/22=.00454545454545454545,非常接近.

—恢复内容了—

相关文章