联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ora-600 kfdpMetaBlk_pickle 故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户反馈集群的crs无法正常启动观察发现是由于gmon进程crash asm实例导致,经过测试确认是在mount data磁盘组的时候会触发给问题
SQL> alter diskgroup data mount; alter diskgroup data mount * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 7517 Session ID: 918 Serial number: 5
对应的alert日志报ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误
SQL> alter diskgroup data mount NOTE: cache registered group DATA number=2 incarn=0x3078f05f NOTE: cache began mount (first) of group DATA number=2 incarn=0x3078f05f NOTE: Assigning number (2,1) to disk (/dev/rdisk/disk93) NOTE: Assigning number (2,3) to disk (/dev/rdisk/disk96) NOTE: Assigning number (2,2) to disk (/dev/rdisk/disk94) NOTE: Assigning number (2,0) to disk (/dev/rdisk/disk92) Sat Jul 17 05:21:01 2021 Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc (incident=255833): ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/crs_base/diag/asm/+asm/+ASM2/incident/incdir_255833/+ASM2_gmon_7457_i255833.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc: ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], [] GMON (ospid: 7457): terminating the instance due to error 493 Sat Jul 17 05:21:03 2021 System state dump requested by (instance=2, osid=7457 (GMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_diag_7429.trc Instance terminated by GMON, pid = 7457
对于ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误,查询了mos没有任何有效信息
对应的trace文件发现如下信息
2021-07-17 03:51:16.277603*:800002A2:KGF:kgfdputl.c@1411:kgfdpMetaSet_getMaxClique(): inc=2 ver=4294967295 2021-07-17 03:51:16.277619 :800002A3:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2 2021-07-17 03:51:16.277620 :800002A4:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2 2021-07-17 03:51:16.277992 :800002A5:KFDP:kfdp.c@9417:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle upto 6 metablks 2021-07-17 03:51:16.277993 :800002A6:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 3 2021-07-17 03:51:16.278154 :800002A7:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 1 2021-07-17 03:51:16.278268 :800002A8:KFDP:kfdp.c@5851:kfdp_read(): kfdp_read end ok=1 2021-07-17 03:51:16.278277 :800002A9:KFDP:kfdp.c@7073:kfdp_doQuery(): kfdp_doQuery rewrite_kfdp=1 2021-07-17 03:51:16.278282 :800002AA:KFDP:kfdp.c@12511:kfdpLckValue_pickle(): kfdpLckValue_pickle size=0 endian=0xff ndisks=0 lckvalid=0 2021-07-17 03:51:16.278293 :800002AB:db_trace:kfdp.c@12803:kfdpLck_convPriv(): [10499:19:396] kfdpLck_conv: grp=1, type=0, mode=5, line=7155 2021-07-17 03:51:16.278294 :800002AC:KFDP:kfdp.c@12663:kfdpLckValue_unpickle(): kfdpLckValue_unpickle size=28 res=0 ok=0 ver=-1 dcnt=0 lckvalid=0 flags=0x2 inst=0 (I am 2) version=0 2021-07-17 03:51:16.278499*:800002AD:KGF:kgfdputl.c@485:kgfdpDta_getAllDsks(): kgfdpDta_getAllDsks using saved iterator 0x9ffffffffd571220 with 4 disks 2021-07-17 03:51:16.278688 :800002AE:KFDP:kfdp.c@5566:kfdp_write(): kfdp_write: pstDskCnt=3 grow=0 degenerate=0 2021-07-17 03:51:16.278688*:800002AF:KGF:kgfdputl.c@2619:kgfdpTraceSet(): writing pst to disks (n=3): 0 1 3
通过删除信息,基本上可以确认由于pst信息异常(pst中记录的只有0 1 3三个磁盘,认为2是老磁盘),但是实际磁盘为4个,导致gmon进程异常.通过底层解决该问题,数据库恢复成功
SQL> recover database using backup controlfile; ORA-00279: change 30075814973 generated at 07/17/2021 01:12:08 needed for thread 2 ORA-00289: suggestion : +FRA ORA-00280: change 30075814973 for thread 2 is in sequence #120561 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /tmp/asm/group_16 ORA-00279: change 30075814973 generated at 07/17/2021 01:11:54 needed for thread 1 ORA-00289: suggestion : +FRA/xff/archivelog/2021_07_17/thread_1_seq_79949.1543.1078103529 ORA-00280: change 30075814973 for thread 1 is in sequence #79949 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /tmp/asm/group_13 ORA-00279: change 30075815013 generated at 07/17/2021 01:12:09 needed for thread 1 ORA-00289: suggestion : +FRA ORA-00280: change 30075815013 for thread 1 is in sequence #79950 ORA-00278: log file '/tmp/asm/group_13' no longer needed for this recovery Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /tmp/asm/group_11 Log applied. Media recovery complete. SQL> alter database open resetlogs; Database altered.
运气不错,对于该故障的恢复,实现数据0丢失.
GMON进程磁盘组监控进程,这个进程负责维护磁盘组各个磁盘状态的一致性。当磁盘组中磁盘成员发生改变时(例如:添加,删除,或者磁盘出现损坏)该进程负责offline或者online磁盘。
GMON monitors all the disk groups mounted in an ASM instance and is responsible for maintaining consistent disk membership and status information. Membership changes result from adding and dropping disks, whereas disk status changes result from taking disks offline or bringing them online.