联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
在一次ORA-7445导致oracle数据库down掉故障分析中,发现一条类似的sql非常大(通过复制到文档确定该sql大小是5M左右)
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("DCREDITMSG_00") FULL("DCREDITMSG_00") NO_PARALLEL_INDEX("DCREDITMSG_00") */ 1 AS C1, CASE WHEN "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR ……………………N多OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码' "DCREDITMSG_00"."PHONE_NO"='具体电话号码' THEN 1 ELSE 0 END AS C2 FROM "BSSADMIN"."DCREDITMSG_00" SAMPLE BLOCK (0.032410 , 1) SEED (1) "DCREDITMSG_00") SAMPLESUB
当时该sql因某种原因导致大量的sql area中很多内存泄露,最终导致数据库down掉.通过实验找出类此奇怪SQL.
创建模拟表
SQL> create table t_xifenfei 2 as 3 select * from dba_objects; Table created. SQL> select count(*) from t_xifenfei; COUNT(*) ---------- 74605 SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 2 where table_name='T_XIFENFEI' and owner='CHF'; NUM_ROWS LAST_ANALYZE ---------- ------------
得出信息:
1.该表一共有记录数74605条
2.该表未收集统计信息
查看执行计划
SQL> set autotrace trace exp SQL> select /*+ dynamic_sampling(t 0) */ * from t_xifenfei t; Execution Plan ---------------------------------------------------------- Plan hash value: 548923532 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 88868 | 17M| 299 (2)| 00:00:04 | | 1 | TABLE ACCESS FULL| T_XIFENFEI | 88868 | 17M| 299 (2)| 00:00:04 | -------------------------------------------------------------------------------- --通过hint指定动态采样sql相关对象统计信息, 可以看到我们实际的表记录是74605而数据库采样出来的记录为88868,原则上还是可以接受 SQL> select * from t_xifenfei; Execution Plan ---------------------------------------------------------- Plan hash value: 548923532 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 73449 | 14M| 298 (1)| 00:00:04 | | 1 | TABLE ACCESS FULL| T_XIFENFEI | 73449 | 14M| 298 (1)| 00:00:04 | -------------------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement (level=2) --因为没有统计信息,数据库动态采样sql相关对象统计信息 可以看到我们实际的表记录是74605而数据库采样出来的记录为73449,比手工指定采样准确
对自动采样进行10046跟踪
SQL> conn / as sysdba Connected. SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 Statement processed. SQL> select count(*) from CHF.t_xifenfei; COUNT(*) ---------- 74605 SQL> oradebug TRACEFILE_NAME /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_27967.trc
分析trace文件
*** 2012-07-12 15:42:34.991 WAIT #0:nam='SQL*Net message from client'ela= 56716427 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078954991525 ===================== PARSING IN CURSOR #3063864268 len=404 dep=1 uid=0 oct=3 lid=0 tim=1342078955037387 hv=4184780033 ad='385d3708' sqlid='3gjvvxzwqxb81' SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("T_XIFENFEI") FULL("T_XIFENFEI") NO_PARALLEL_INDEX("T_XIFENFEI") */ 1 AS C1, 1 AS C2 FROM "CHF"."T_XIFENFEI" SAMPLE BLOCK (5.790441 , 1) SEED (1) "T_XIFENFEI") SAMPLESUB END OF STMT PARSE #3063864268:c=6000,e=5404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=205916192,tim=1342078955037303 EXEC #3063864268:c=0,e=206,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=205916192,tim=1342078955037901 FETCH #3063864268:c=4998,e=4759,p=0,cr=65,cu=0,mis=0,r=1,dep=1,og=1,plh=205916192,tim=1342078955042730 STAT #3063864268 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=65 pr=0 pw=0 time=4795 us)' STAT #3063864268 id=2 cnt=4253 pid=1 pos=1 obj=76370 op='TABLE ACCESS SAMPLE T_XIFENFEI (cr=65 pr=0 pw=0 time=8247 us cost=19 size=61752 card=5146)' CLOSE #3063864268:c=0,e=7,dep=1,type=0,tim=1342078955043024 ===================== PARSING IN CURSOR #3063864784 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342078955043465 hv=2174183953 ad='3ed2d700' sqlid='fadutqq0tfuhj' select count(*) from CHF.t_xifenfei END OF STMT PARSE #3063864784:c=51991,e=51648,p=0,cr=66,cu=0,mis=1,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043464 EXEC #3063864784:c=0,e=75,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043645 WAIT #3063864784: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078955043705
通过这个trace的分析,果然发现在执行我们需要的sql前,执行了SELECT /* OPT_DYN_SAMP */组成的一个复杂的采样sql语句.
收集统计信息查看执行计划
SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI'); PL/SQL procedure successfully completed. SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 2 where table_name='T_XIFENFEI' and owner='CHF'; NUM_ROWS LAST_ANALYZE ---------- ------------ 74605 12-JUL-12 SQL> set autotrace trace exp SQL> select * from t_xifenfei; Execution Plan ---------------------------------------------------------- Plan hash value: 548923532 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 74605 | 7139K| 298 (1)| 00:00:04 | | 1 | TABLE ACCESS FULL| T_XIFENFEI | 74605 | 7139K| 298 (1)| 00:00:04 | -------------------------------------------------------------------------------- --执行计划未提示dynamic sampling
继续做10046
SQL> conn / as sysdba Connected. SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 Statement processed. SQL> select count(*) from CHF.t_xifenfei; COUNT(*) ---------- 74605 SQL> oradebug TRACEFILE_NAME /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_29780.trc
分析trace文件
*** 2012-07-12 16:14:53.914 Oradebug command 'EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12' console output: <none> WAIT #0: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080893914307 *** 2012-07-12 16:14:59.376 WAIT #0: nam='SQL*Net message from client' ela= 5461608 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080899376008 ===================== PARSING IN CURSOR #3063709248 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342080899379562 hv=2174183953 ad='3ed2d700' sqlid='fadutqq0tfuhj' select count(*) from CHF.t_xifenfei END OF STMT
通过这里可以发现,当有了统计信息后,数据库不能再使用那条N多hint的sql去动态采样统计信息.
总结建议
动态采样(Dynamic Sampling)技术的最初提出是在Oracle 9i R2,在段(表,索引,分区)没有分析的情况下,为了使CBO 优化器得到足够的信息以保证做出正确的执行计划而发明的一种技术,可以把它看做分析手段的一种补充。当段对象没有统计信息时(即没有做分析),动态采样技术可以通过直接从需要分析的对象上收集数据块(采样)来获得CBO需要的统计信息。为了cbo,oracle引进了该功能,原则上说是一个很不错的东西,但是偶尔也是会出现一些意外,所以如果发现数据库中有表未做统计分析,建议手工处理下,ORACLE的自动收集统计信息程序也有不靠谱的时候(发现多次10g的库中有部分表未收集统计信息)