联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
今天遇到诡异的事情,AIX 5.3 ORACLE 9I RAC的gv$视图只能查询到本地的记录,而不是所有节点.但是所有节点均运行正常,除gv$视图之外未发现其他异常.
异常时节点1信息
SQL> show parameter clu; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ cluster_database boolean TRUE cluster_database_instances integer 2 cluster_interconnects string 192.168.6.24 SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 1 zwq_crm1 OPEN SQL> show parameter par; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ fast_start_parallel_rollback string LOW log_parallelism integer 1 parallel_adaptive_multi_user boolean FALSE parallel_automatic_tuning boolean FALSE parallel_execution_message_size integer 2152 parallel_instance_group string parallel_max_servers integer 5 parallel_min_percent integer 0 parallel_min_servers integer 0 parallel_server boolean TRUE parallel_server_instances integer 2 parallel_threads_per_cpu integer 2 partition_view_enabled boolean FALSE recovery_parallelism integer 0 SQL> !ps -ef|grep p0 oracrm 1929258 1 0 09:49:19 - 0:01 ora_p005_crm1 oracrm 745844 1 0 Jun 26 - 0:20 ora_p004_crm1 oraeye 2421272 3948648 0 21:53:49 pts/0 0:00 grep p0 oracrm 3060406 1 0 Jun 26 - 0:20 ora_p002_crm1 oracrm 3170868 1 0 Jun 20 - 2:13 ora_p000_crm1 oracrm 787414 1 0 Jun 26 - 0:20 ora_p001_crm1 oracrm 2552690 1 0 Jun 26 - 0:20 ora_p003_crm1
1.节点最大允许5个并发进程,现在已经启动并发进程到p005(6个)
2.gv$视图只能查询一个节点信息
异常时节点2信息
问题所有情况和1节点完全相似
SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 2 zwq_crm2 OPEN SQL> show parameter par; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ fast_start_parallel_rollback string LOW log_parallelism integer 1 parallel_adaptive_multi_user boolean FALSE parallel_automatic_tuning boolean FALSE parallel_execution_message_size integer 2152 parallel_instance_group string parallel_max_servers integer 5 parallel_min_percent integer 0 parallel_min_servers integer 0 parallel_server boolean TRUE parallel_server_instances integer 2 parallel_threads_per_cpu integer 2 partition_view_enabled boolean FALSE recovery_parallelism integer 0 SQL> !ps -ef|grep p0 oracrm 1867938 1 0 15:17:25 - 0:00 ora_p004_crm2 oracrm 2633748 1 0 09:49:19 - 0:01 ora_p005_crm2 oraeye 3059876 1007714 0 21:54:01 pts/0 0:00 grep p0 oracrm 323884 1 120 Jun 20 - 10692:47 ora_p000_crm2 oracrm 1839818 1 0 06:16:32 - 0:00 ora_p003_crm2 oracrm 459660 1 107 Jun 26 - 1857:00 ora_p001_crm2 oracrm 2351894 1 0 16:52:52 - 0:00 ora_p002_crm2
在异常2节点上做10046
SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 4 Statement processed. SQL> select * from gv$version; INST_ID BANNER ---------- ---------------------------------------------------------------- 2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 2 PL/SQL Release 9.2.0.8.0 - Production 2 CORE 9.2.0.8.0 Production 2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 2 NLSRTL Version 9.2.0.8.0 - Production SQL> oradebug EVENT 10046 trace name context off Statement processed. SQL> oradebug TRACEFILE_NAME /oracle9/app/admin/crm/udump/crm2_ora_517066.trc
分析10046内容发现
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41759005850609 hv=775381991 ad='21195808' select * from gv$version END OF STMT PARSE #1:c=20000,e=33894,p=1,cr=59,cu=3,mis=1,r=0,dep=0,og=4,tim=41759005850607 BINDS #1: kxfpg1srv could not start P006, inst 1 kxfpg1srv could not start local P006 EXEC #1:c=0,e=3540,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41759005854254 FETCH #1:c=0,e=28,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41759005854347 FETCH #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=41759005854678 STAT #1 id=1 cnt=5 pid=0 pos=1 obj=0 op='VIEW ' STAT #1 id=2 cnt=5 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里观察10046可以看到:在2节点上查询gv$instance,需要通过使用并发进程去访问1节点,但是因为1节点的parallel_max_servers为5,而当前的并发进程已经达到最大数目,从而使得想在节点1上启动并发失败,进入使得gv$视图只能查询出来本节点数据
gv$视图异常解决方法
--重启两个节点,查询正常 SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 1 zwq_crm1 OPEN 2 zwq_crm2 OPEN
针对这个问题,很可能是oracle bug(因为是9i版本,我无法深究),或者是并发进程僵死所致,当时有一个想法,kill 掉数据库并发进程,因客户不同意(采用稳妥重启方案),未能通过尝试验证我的猜想.
对gv$视图正常做10046
SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 Statement processed. SQL> select * from gv$version; INST_ID BANNER ---------- ---------------------------------------------------------------- 1 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 1 PL/SQL Release 9.2.0.8.0 - Production 1 CORE 9.2.0.8.0 Production 1 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 1 NLSRTL Version 9.2.0.8.0 - Production 2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 2 PL/SQL Release 9.2.0.8.0 - Production 2 CORE 9.2.0.8.0 Production 2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 2 NLSRTL Version 9.2.0.8.0 - Production 10 rows selected. SQL> oradebug EVENT 10046 trace name context off Statement processed. SQL> oradebug TRACEFILE_NAME /oracle9/app/admin/crm/udump/crm1_ora_1708916.trc
分析10046内容
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41752681925071 hv=775381991 ad='92ef64f0' select * from gv$version END OF STMT PARSE #1:c=0,e=6770,p=0,cr=12,cu=3,mis=1,r=0,dep=0,og=4,tim=41752681925070 BINDS #1: WAIT #1: nam='PX Deq: reap credit' ela= 21 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 237 p1=268500992 p2=1 p3=504403208016510312 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 64 p1=268500992 p2=2 p3=504403208016510312 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 46 p1=268566528 p2=1 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 855 p1=268566528 p2=2 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq Credit: send blkd' ela= 188 p1=268566528 p2=1 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 406 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 7442 p1=200 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 1664 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 EXEC #1:c=0,e=11572,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41752681936727 WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Execute Reply' ela= 166 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Execute Reply' ela= 310 p1=200 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 FETCH #1:c=0,e=557,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41752681937320 WAIT #1: nam='SQL*Net message from client' ela= 223 p1=1650815232 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 18 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=5 WAIT #1: nam='DFS lock handle' ela= 254 p1=1128857605 p2=9 p3=1 WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=3 WAIT #1: nam='DFS lock handle' ela= 73 p1=1128857605 p2=9 p3=2 WAIT #1: nam='DFS lock handle' ela= 363 p1=1128857605 p2=9 p3=2 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 24 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Signal ACK' ela= 3 p1=0 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Signal ACK' ela= 155 p1=10 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 FETCH #1:c=0,e=1735,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=4,tim=41752681939329 WAIT #1: nam='DFS lock handle' ela= 249 p1=1128857605 p2=9 p3=1 WAIT #1: nam='DFS lock handle' ela= 258 p1=1128857605 p2=9 p3=3 WAIT #1: nam='DFS lock handle' ela= 66 p1=1128857605 p2=9 p3=2 WAIT #1: nam='DFS lock handle' ela= 369 p1=1128857605 p2=9 p3=2 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 19 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 WAIT #1: nam='enqueue' ela= 201 p1=1347616774 p2=2 p3=0 WAIT #1: nam='enqueue' ela= 41 p1=1347616774 p2=2 p3=0 WAIT #1: nam='SQL*Net message from client' ela= 6981860 p1=1650815232 p2=1 p3=0 STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='VIEW ' STAT #1 id=2 cnt=0 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里可以看出,rac在正常情况下gv$视图默认的查询就是并发方式进行