标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,671)
- DB2 (22)
- MySQL (73)
- Oracle (1,533)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (92)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- Kylin Linux 安装19c
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
分类目录归档:ORA-xxxxx
多cpu环境中运行root.sh失败,asm报ORA-04031
有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
根据上面记录,查看asmca日志
[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738] Reinitializing SQLEngine... [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222] m_home 11.2.0.3.0 [main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223] version > 112 true [main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555] Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8 [main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565] NLS_LANG: AMERICAN_AMERICA.AL32UTF8 [main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325] Execing SQLPLUS/SVRMGR process... [main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362] m_bReaderStarted: false [main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366] Starting Reader Thread... [main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415] Waiting for m_bReaderStarted to be true [main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189] Done called [main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173] SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM [main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174] ORA-01012: not logged on [main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175] oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658) oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831) oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036) oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049) oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944) oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797) oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174) oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369) [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173] SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174] ORA-03113: end-of-file on communication channel [main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175] oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel
这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志
Reconfiguration complete Fri Jul 24 12:49:29 2015 LCK0 started with pid=22, OS id=46913 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc (incident=81): ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ") Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc (incident=177): ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss") Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc (incident=73): ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ") Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc: ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss") System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. LCK0 (ospid: 46913): terminating the instance due to error 4031 Fri Jul 24 12:49:35 2015 ORA-1092 : opitsk aborting process Instance terminated by LCK0, pid = 46913
进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
通过asm日志发现相关默认值配置
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options. ORACLE_HOME = /u01/app/11.2.0/grid System name: Linux Node name: RAC01 Release: 2.6.32-358.el6.x86_64 Version: #1 SMP Tue Jan 29 11:47:41 EST 2013 Machine: x86_64 Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01 System parameters with non-default values: large_pool_size = 16M instance_type = "asm" remote_login_passwordfile= "EXCLUSIVE" asm_power_limit = 1 diagnostic_dest = "/u01/app/grid" Cluster communication is configured to use the following interface(s) for this instance 10.10.10.31 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 Fri Jul 24 12:49:27 2015
通过查询/proc/cpuinfo,检查cpu数量
processor : 191 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz stepping : 7 cpu MHz : 1200.000 cache size : 24576 KB physical id : 7 siblings : 24 core id : 13 cpu cores : 12 apicid : 251 initial apicid : 251 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes
而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述
误修改/u01权限/所有者的故障恢复
有朋友找到我,说他对生产库做了误操作,导致数据库异常,请我帮忙处理,对/u01目录修改了用户和权限,导致数据库无法登录,但是业务还在继续
误操作命令
mkdir -p /u01/app/grid mkdir -p /u01/app/11.2.0/grid chown -R grid:oinstall /u01 mkdir -p /u01/app/oracle chown oracle:oinstall /u01/app/oracle chmod -R 775 /u01
尝试sqlplus登录数据库报ORA-12547
[oracle@www.xifenfei.com admin]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Wed Jul 1 17:40:42 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. ERROR: ORA-12547: TNS:lost contact Enter user-name: ERROR: ORA-12547: TNS:lost contact Enter user-name: ERROR: ORA-12547: TNS:lost contact SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus
数据库alert日志报错
Wed Jul 01 18:03:22 2015 Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_28977.trc (incident=129553): ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129553/xifenfei_ora_28977_i129553.trc Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129553/xifenfei_ora_28977_i129553.trc: ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Wed Jul 01 18:03:22 2015 Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_28979.trc (incident=129561): ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129561/xifenfei_ora_28979_i129561.trc Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129561/xifenfei_ora_28979_i129561.trc: ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Wed Jul 01 18:03:22 2015 Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_28985.trc (incident=129569): ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129569/xifenfei_ora_28985_i129569.trc Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129569/xifenfei_ora_28985_i129569.trc: ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Wed Jul 01 18:03:22 2015 Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_28983.trc (incident=129577): ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129577/xifenfei_ora_28983_i129577.trc Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_129577/xifenfei_ora_28983_i129577.trc: ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], []
trace文件信息
Dump file /u01/app/oracle/diag/rdbms/hybris01/hybris01/incident/incdir_129577/hybris01_ora_28983_i129577.trc Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1 System name: Linux Node name: dpppdridbo00 Release: 2.6.32-431.17.1.el6.x86_64 Version: #1 SMP Wed May 7 23:32:49 UTC 2014 Machine: x86_64 Instance name: hybris01 Redo thread mounted by this instance: 0 <none> Oracle process number: 0 Unix process pid: 28983, image: oracle@dpppdridbo00 *** 2015-07-01 18:03:22.296 Dump continued from file: /u01/app/oracle/diag/rdbms/hybris01/hybris01/trace/hybris01_ora_28983.trc ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] ========= Dump for incident 129577 (ORA 600 [spstp: ORACLE_HOME uid does not match euid]) ======== *** 2015-07-01 18:03:22.297 dbkedDefDump(): Starting incident default dumps (flags=0x0, level=3, mask=0x0) ----- Error Stack Dump ----- ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [1100], [1101], [], [], [], [], [], [], [], [], [] ----- SQL Statement (None) ----- Current SQL information unavailable - no SGA.
相关用户名相关信息
[oracle@dpppdridbo00 incdir_129577]$ id grid uid=1100(grid) gid=1000(oinstall) groups=1000(oinstall),1100(asmadmin),1300(asmdba),1301(asmoper) [oracle@dpppdridbo00 incdir_129577]$ id oracle uid=1101(oracle) gid=1000(oinstall) groups=1000(oinstall),1200(dba),1300(asmdba)
这里比较明显,数据库的oracle_home,的所有者id为1101,但是被修改为了1100
查看相关目录文件权限
[oracle@www.xifenfei.com ~]$ env|grep ORA ORACLE_SID=xifenfei ORACLE_BASE=/u01/app/oracle ORACLE_TERM=xterm ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1 [oracle@www.xifenfei.com ~]$ cd /u01/app/oracle/product/11.2.0/db_1 [oracle@www.xifenfei.com db_1]$ ls -ltr total 308 -rwxrwxr-x. 1 grid oinstall 63 Mar 1 16:39 oraInst.loc drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:40 assistants drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:40 crs drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 csmig drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:40 cv drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 demo drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 diagnostics drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 has drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 emcli drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 dv drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:40 ide drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:40 javavm drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 jdev drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:40 md drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:40 nls drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:40 odbc drwxrwxr-x. 5 grid oinstall 4096 Mar 1 16:40 olap drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 oracore drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:40 ord drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 owm drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:40 precomp drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 scheduler drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 relnotes drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 slax drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 sqlj drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 wwg drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:40 usm drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:40 uix drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:40 plsql drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:40 utl drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:40 xdk drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:40 instantclient drwxrwxr-x. 20 grid oinstall 4096 Mar 1 16:40 oc4j drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:40 timingframework drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:41 clone drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:41 j2ee drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:41 apex drwxrwxr-x. 13 grid oinstall 4096 Mar 1 16:41 sqldeveloper drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:41 jlib drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:41 dc_ocm drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:41 jdk drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:41 jdbc drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:41 ucp drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:41 OPatch drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:41 ccr drwxrwxr-x. 26 grid oinstall 4096 Mar 1 16:41 owb drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:41 mgw drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:41 opmn drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:41 cdata drwxrwxr-x. 6 grid oinstall 4096 Mar 1 16:41 css drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:41 mesg drwxrwxr-x. 2 grid oinstall 4096 Mar 1 16:41 config drwxrwxr-x. 3 grid oinstall 4096 Mar 1 16:41 EMStage drwxrwxr-x. 12 grid oinstall 4096 Mar 1 16:41 ldap drwxrwxr-x. 15 grid oinstall 4096 Mar 1 16:41 sysman drwxrwxr-x. 9 grid oinstall 4096 Mar 1 16:41 srvm drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:41 racg drwxrwxr-x. 10 grid oinstall 4096 Mar 1 16:41 ctx drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:42 sqlplus drwxrwxr-x. 8 grid oinstall 4096 Mar 1 16:42 oui drwxrwxr-x. 4 grid oinstall 12288 Mar 1 16:42 lib drwxrwxr-x. 5 grid oinstall 4096 Mar 1 16:42 perl drwxrwxr-x. 11 grid oinstall 4096 Mar 1 16:42 network drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:42 deinstall drwxrwxr-x. 5 grid oinstall 4096 Mar 1 16:42 hs -rwxrwxr-x. 1 grid oinstall 494 Mar 1 16:42 root.sh drwxrwxr-x. 13 grid oinstall 4096 Mar 1 16:42 rdbms drwxrwxr-x. 13 grid oinstall 4096 Mar 1 16:42 inventory drwxrwxr-x. 4 grid oinstall 4096 Mar 1 16:42 cfgtoollogs drwxrwxr-x. 7 grid oinstall 4096 Mar 1 16:44 install drwxrwxr-x. 2 grid oinstall 12288 Mar 1 16:44 bin drwxrwxr-x. 4 grid oinstall 4096 Mar 4 18:49 log drwxrwxr-x. 2 grid oinstall 4096 Jun 30 22:31 dbs
处理方法
root用户 chown -R oracle:oinstall /u01 chown oracle:oinstall /u01/app/oracle oracle用户 chmod 6751 $ORACLE_HOME/bin/oracle
这里的修改权限,为了保证业务运行正常,是尽量往大的方向修改的,如果条件允许,在后期有条件的情况下,建议重新安装oracle软件
ORA-00600[kjhn_post_ha_alert0-862]原因分析
数据库版本和平台信息
数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上
ORACLE V10.2.0.1.0 - Production vsnsta=0 vsnsql=14 vsnxtr=3 Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, OLAP and Data Mining options Windows Server 2003 Version V5.2 Service Pack 2 CPU : 2 - type 586, 1 Physical Cores Process Affinity : 0x00000000 Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M Instance name: orcl
数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误
Wed Jun 03 21:50:40 2015 Restarting dead background process MMON MMON started with pid=11, OS id=3804 Wed Jun 03 21:50:43 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:50:49 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:55:44 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:55:49 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 22:00:40 2015 Thread 1 advanced to log sequence 476 Current log# 1 seq# 476 mem# 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG Wed Jun 03 22:00:44 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
查询对应trace文件发现
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], [] Current SQL statement for this session: BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
人工执行该过程
SQL> var success varchar2 SQL> begin 2 :success := sys.dbms_ha_alerts_prvt.check_ha_resources; 3 end; 4 / PL/SQL procedure successfully completed. SQL> print success SUCCESS -------------------------------- N
通过查询相关资料得到如下说明
@ This check is triggered with FAN enabled at this instance and it seems to be @ associated with a startup action. From the procedure itself which is called @ this is a run-once MMON (startup) action which supports instance down @ notification reliability. It does the folowing a) registers the current @ instance incarnation in recent_resource_incarnations$ if it's not already @ there b) deletes recent_resource_incarnations$ records that don't apply to @ this database. They may, e.g., have been copied from seed db or from a former @ DataGuard primary c) scans recent_resource_incarnations$ for instance @ incarnations that are no longer alive, and submits instance down alerts for @ them . If all is good then return 'Y' else 'N' (or error) if there is a @ failure. That failure is to get back to MMON, so that it may retry this @ action later. In the local instance I get a 'Y' but in the customer's system @ it fails with a 'N' which seems related to the ORA-600 assert. @ This function is kjhn_post_ha_alert0() which is internal and does the real work of @ posting HA alerts. It is used by both kjhn_post_ha_alert and @ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of @ kjhn_post_ha_alert,other than the fact that it uses individual parameters @ rather than the more easily extensible structure. Also the parameters passed @ to it are the instance_name and the host_name which is the kernelized @ implementation for posting HA alerts. Without actually having the arguments @ the guess is that either the host_name or the instance_name raised in the @ assert is null which triggered it.
mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.
处理方法
This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
The bug is fixed in 11.1.0.6. A workaround is available for the problem.
该bug在11.1.0.6中得以修复
To implement the workaround, please execute the following steps as the SYS user: 1. Collect the following information and spool it to a file for your records. a. output of select * from v$instance b. show parameter instance_name c. set pages 1000 d. select * from recent_resource_incarnations$ 2. Create a backup table of recent_resource_incarnations$. SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$; 3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running. Do not issue this statement if a shutdown is pending. SQL> truncate table recent_resource_incarnations$; 4. Perform a clean shutdown, followed by a startup.
具体参考:
ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS