联系:手机/微信(+86 17813235971) QQ(107644445)
标题:CURSOR_SHARING=SIMILAR引起的悲剧
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一个客户反馈说有一系统经常性负载比较高,让我帮忙分析原因
系统负载情况
[oracle@zwq-kfdialdb ~]$ top -c top - 17:11:06 up 78 days, 1:12, 5 users, load average: 124.83, 125.90, 112.13 Tasks: 836 total, 152 running, 684 sleeping, 0 stopped, 0 zombie Cpu(s): 98.1%us, 0.1%sy, 0.0%ni, 1.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132070908k total, 90494280k used, 41576628k free, 1147384k buffers Swap: 67108856k total, 0k used, 67108856k free, 79109904k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12179 oracle 15 0 32.5g 142m 139m R 39.0 0.1 6:09.20 oracleahdial (LOCAL=NO) 11530 oracle 16 0 32.5g 469m 466m R 36.1 0.4 11:14.23 oracleahdial (LOCAL=NO) 11816 oracle 15 0 32.5g 467m 463m R 36.1 0.4 6:33.86 oracleahdial (LOCAL=NO) 11577 oracle 15 0 32.5g 480m 477m R 34.7 0.4 7:15.98 oracleahdial (LOCAL=NO) 12136 oracle 16 0 32.5g 455m 452m R 31.9 0.4 9:07.88 oracleahdial (LOCAL=NO) 11237 oracle 16 0 32.5g 997m 992m R 31.2 0.8 20:53.50 oracleahdial (LOCAL=NO) 11427 oracle 16 0 32.5g 137m 135m R 31.2 0.1 11:50.16 oracleahdial (LOCAL=NO) 12051 oracle 16 0 32.5g 459m 456m R 31.2 0.4 6:12.67 oracleahdial (LOCAL=NO) [oracle@zwq-kfdialdb ~]$ vmstat 3 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 137 0 0 41566600 1147384 79109888 0 0 0 3 0 0 0 0 99 0 0 146 0 0 41567548 1147384 79109888 0 0 0 47 1058 32190 98 0 2 0 0 150 0 0 41568056 1147384 79109888 0 0 0 39 1081 31543 98 0 2 0 0 12 0 0 41568464 1147384 79109888 0 0 0 23 1056 32111 98 0 2 0 0 62 0 0 41568776 1147384 79109888 0 0 0 11 1067 31474 98 0 2 0 0 108 0 0 41568304 1147384 79109888 0 0 0 39 1059 31193 98 0 2 0 0 140 0 0 41569280 1147384 79109888 0 0 0 48 1063 31171 98 0 2 0 0 140 0 0 41569444 1147384 79109888 0 0 0 40 1075 30508 98 0 2 0 0
通过top和vmstat看出系统现在负载很高,主要都是用户进程导致.
查询等待事件
SQL> select event from v$session where wait_class#<>6; EVENT ---------------------------------------------------------------- cursor: mutex S SQL> / EVENT ---------------------------------------------------------------- cursor: mutex S SQL> / EVENT ---------------------------------------------------------------- cursor: mutex S SQL> / EVENT ---------------------------------------------------------------- cursor: mutex S SQL> SELECT a.*, s.sql_text 2 FROM v$sql s, 3 (SELECT sid, 4 event, 5 wait_class, 6 p1 cursor_hash_value, 7 p2raw Mutex_value, 8 TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid 9 FROM v$session_wait 10 WHERE event LIKE 'cursor%') a 11 WHERE s.HASH_VALUE = a.cursor_hash_value 12 / no rows selected SQL> select event from v$session where wait_class#<>6; EVENT ---------------------------------------------------------------- SQL*Net message to client
数据库开始的等待事件只有cursor: mutex S,等该等待事件消失后系统负载也恢复正常
再次查看系统负载
[oracle@zwq-kfdialdb ~]$ top -c -i10 top - 17:13:51 up 78 days, 1:15, 6 users, load average: 12.57, 78.21, 96.45 Tasks: 702 total, 2 running, 700 sleeping, 0 stopped, 0 zombie Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132070908k total, 86477808k used, 45593100k free, 1147500k buffers Swap: 67108856k total, 0k used, 67108856k free, 79116036k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15045 oracle 16 0 13136 1476 724 R 10.6 0.0 0:00.12 top -c -i10 10010 root 10 -5 0 0 0 D 0.0 0.0 0:00.68 [kondemand/4] 10019 root 10 -5 0 0 0 D 0.0 0.0 1:41.58 [kondemand/13] 10020 root 10 -5 0 0 0 D 0.0 0.0 1:52.28 [kondemand/14] 10021 root 10 -5 0 0 0 R 0.0 0.0 2:01.54 [kondemand/15] 12166 root 24 0 10084 300 216 D 0.0 0.0 0:00.00 /opt/VRTSgab/gablogd [oracle@zwq-kfdialdb ~]$ vmstat 3 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 45484264 1147528 79117360 0 0 0 3 0 0 0 0 99 0 0 2 0 0 45468452 1147528 79117456 0 0 4 161 1372 5369 2 2 97 0 0 3 0 0 45463712 1147528 79117584 0 0 4 187 1602 7253 6 0 93 0 0 1 0 0 45458220 1147528 79117648 0 0 1 99 1358 5821 2 0 98 0 0 0 0 0 45475168 1147528 79117712 0 0 0 41 1321 5321 2 0 98 0 0 0 0 0 45473624 1147528 79117744 0 0 3 104 1378 5455 2 0 98 0 0 2 0 0 45474656 1147528 79117776 0 0 0 55 1196 4872 1 0 99 0 0 0 0 0 45474376 1147532 79117824 0 0 8 113 1170 4990 2 0 98 0 0 1 0 0 45475440 1147532 79117872 0 0 1 56 1187 5301 3 0 97 0 0 1 0 0 45475824 1147532 79117888 0 0 3 99 1083 4643 3 0 97 0 0
结合上面的等待事件查询,我们可以大概评估出来,当cursor: mutex S等待消失后,系统负载也恢复正常,现在已经不存在环境,如果要找出问题只能够是借助AWR和ASH
分析ASH
Top User Events
Top SQL with Top Events
Activity Over Time
通过对ASH分析,可以大概确定,在这段时间内,引起系统负载高主要是cursor: mutex S导致
分析AWR
awr整体信息(从这里看数据库是相当的繁忙)
Load Profile(从这里看数据库业务比较小)
Top 5 Timed Foreground Events(主要等待事件cursor: mutex S,和前面分析相符)
OS LOAD(虽然和系统看到有一定出入,但是整体还是展示系统负载较高)
SQL ordered by Version Count(出现cursor: mutex S,因为load profile中解析不多,所以想到高版本问题,这里确实非常高)
通过这里的一些列分析,我们已经基本上可以确定,该数据库因为高版本问题导致cursor: mutex S以及library cache 相关等待严重,从而出现系统负载过高.
找出高版本原因
高版本相关信息和查询请见:关于High Versions Count总结
SQL> select * from table(version_rpt('f8b9tba7sfsb5')); COLUMN_VALUE -------------------------------------------------------------------------------- Version Count Report Version 3.2.1 -- Today's Date 19-jul-12 18:13 RDBMS Version :11.2.0.1.0 Host: zwq-kfdialdb Instance 1 : ahdial ================================================================== Addr: 000000080FA4CEA0 Hash_Value: 2408014181 SQL_ID f8b9tba7sfsb5 Sharable_Mem: 206315729 bytes Parses: 48689 Stmt: 0 select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales 1 0_ where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and tblsps 2 ales0_.MOBILE_TELE_NO=:"SYS_B_1" 3 COLUMN_VALUE -------------------------------------------------------------------------------- Versions Summary ---------------- AUTH_CHECK_MISMATCH :4 TRANSLATION_MISMATCH :4 ROLL_INVALID_MISMATCH :10219 PURGED_CURSOR :9 Total Versions:10219 Plan Hash Value Summary ----------------------- COLUMN_VALUE -------------------------------------------------------------------------------- Plan Hash Value Count =============== ===== 791727930 920 2820478500 9300 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for AUTH_CHECK_MISMATCH : # of Ver PARSING_USER_ID PARSING_SCHEMA_ID PARSING_SCHEMA_NAME ========== =============== ================= =================== 10218 75 75 HOLLYSP COLUMN_VALUE -------------------------------------------------------------------------------- 2 107 107 HOLLYSP_TEST ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for TRANSLATION_MISMATCH : Summary of objects probably causing TRANSLATION_MISMATCH Object# Owner.Object_Name ========= ================= 76737 HOLLYSP.TBL_SP_SALES_RECORDS HOLLYSP.TBL_SP_SALES_RECORDS HOLLYSP_TEST.TBL_SP_SALES_RECORDS COLUMN_VALUE -------------------------------------------------------------------------------- 107043 HOLLYSP_TEST.TBL_SP_SALES_RECORDS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for ROLL_INVALID_MISMATCH : No details available ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for PURGED_CURSOR : No details available #### To further debug Ask Oracle Support for the appropiate level LLL. COLUMN_VALUE -------------------------------------------------------------------------------- alter session set events 'immediate trace name cursortrace address 2408014181, level LLL'; To turn it off do use address 1, level 2147483648 ================================================================ 59 rows selected. SQL> select * from table(version_rpt('6zhjf3qh8gyp9')); COLUMN_VALUE -------------------------------------------------------------------------------- Version Count Report Version 3.2.1 -- Today's Date 19-jul-12 18:15 RDBMS Version :11.2.0.1.0 Host: zwq-kfdialdb Instance 1 : ahdial ================================================================== Addr: 00000007FD46D2A8 Hash_Value: 2693266089 SQL_ID 6zhjf3qh8gyp9 Sharable_Mem: 111904227 bytes Parses: 4880 Stmt: 0 select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales 1 0_ where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and tblsps 2 ales0_.CAMPAIGN_ID=:"SYS_B_1" and tblspsales0_.IS_CONN=:"SYS_B_2 3 " and tblspsales0_.SALES_TIME>=:"SYS_B_3" and tblspsales0_.SALES 4 _PERSON=:"SYS_B_4" COLUMN_VALUE -------------------------------------------------------------------------------- 5 Versions Summary ---------------- BIND_MISMATCH :4804 INCOMP_LTRL_MISMATCH :372 HASH_MATCH_FAILED :4936 Total Versions:4935 Plan Hash Value Summary COLUMN_VALUE -------------------------------------------------------------------------------- ----------------------- Plan Hash Value Count =============== ===== 1645985080 3 2040125427 4933 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for BIND_MISMATCH : Consolidated details for : BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,BIND_UACS_DIFF and COLUMN_VALUE -------------------------------------------------------------------------------- BIND_EQUIV_FAILURE (Mislabled as ROW_LEVEL_SEC_MISMATCH BY bug 6964441 in 11gR1) from v$sql_bind_capture COUNT(*) POSITION MIN(MAX_LENGTH) MAX(MAX_LENGTH) DATATYPE (PRECISION,SCALE) ======== ======== =============== =============== ======== ================ 4936 1 32 32 1 (,) 4936 2 32 32 1 (,) 4936 3 32 32 1 (,) 4936 4 32 32 1 (,) 4936 5 32 32 1 (,) COLUMN_VALUE -------------------------------------------------------------------------------- SUM(DECODE(column,Y, 1, 0) FROM V$SQL IS_OBSOLETE IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE =========== ================= ============= ============ 0 4309 0 4309 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for INCOMP_LTRL_MISMATCH : No details available ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for HASH_MATCH_FAILED : COLUMN_VALUE -------------------------------------------------------------------------------- No details available #### To further debug Ask Oracle Support for the appropiate level LLL. alter session set events 'immediate trace name cursortrace address 2693266089, level LLL'; To turn it off do use address 1, level 2147483648 ================================================================ 62 rows selected.
根据经验,出现这么多的高版本情况,很可能是cursor_sharing参数设置问题
--对应sql语句 select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales0_ where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and tblspsales0_.CAMPAIGN_ID=:"SYS_B_1" and tblspsales0_.IS_CONN=:"SYS_B_2" and tblspsales0_.SALES_TIME>=:"SYS_B_3" and tblspsales0_.SALES_PERSON=:"SYS_B_4" --cursor_sharing参数 SQL> show parameter cursor_sharing; NAME TYPE VALUE ------------------------------------ -------------------------------- ---------------- cursor_sharing string SIMILAR
根据oracle官方建议在11g中不推荐使用cursor_sharing=SIMILAR,其实在所有版本中都不推荐,设置为该值很容易导致高版本问题.而且该值会出现莫名其妙的,无法解释的高版本问题.而且根据oracle相关文档,在即将发布的12c版本中,将除掉SIMILAR值.对于客户库的该问题,因为很多sql未绑定参数,为了减少硬解析,建议在业务低谷时设置cursor_sharing=FORCE,并刷新shared pool.
Cursor Mutex S Wait Event And Too Many Child Cursors
关于高版本的一些mos文章
Troubleshooting: High Version Count Issues [ID 296377.1]
High SQL Version Counts – Script to determine reason(s) [ID 438755.1]
ANNOUNCEMENT: Deprecating the cursor_sharing = ‘SIMILAR’ setting [ID 1169017.1]
High Version Count with CURSOR_SHARING = SIMILAR or FORCE from pre 10g to 11.2.0.3 [ID 1401814.1]