ora-600 kfdpMetaBlk_pickle 故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ora-600 kfdpMetaBlk_pickle 故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈集群的crs无法正常启动观察发现是由于gmon进程crash asm实例导致,经过测试确认是在mount data磁盘组的时候会触发给问题

SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 7517
Session ID: 918 Serial number: 5

对应的alert日志报ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误

SQL> alter diskgroup data mount
NOTE: cache registered group DATA number=2 incarn=0x3078f05f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x3078f05f
NOTE: Assigning number (2,1) to disk (/dev/rdisk/disk93)
NOTE: Assigning number (2,3) to disk (/dev/rdisk/disk96)
NOTE: Assigning number (2,2) to disk (/dev/rdisk/disk94)
NOTE: Assigning number (2,0) to disk (/dev/rdisk/disk92)
Sat Jul 17 05:21:01 2021
Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc  (incident=255833):
ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/crs_base/diag/asm/+asm/+ASM2/incident/incdir_255833/+ASM2_gmon_7457_i255833.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc:
ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []
GMON (ospid: 7457): terminating the instance due to error 493
Sat Jul 17 05:21:03 2021
System state dump requested by (instance=2, osid=7457 (GMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_diag_7429.trc
Instance terminated by GMON, pid = 7457

对于ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误,查询了mos没有任何有效信息
kfdpMetaBlk_pickle


对应的trace文件发现如下信息

2021-07-17 03:51:16.277603*:800002A2:KGF:kgfdputl.c@1411:kgfdpMetaSet_getMaxClique():   inc=2 ver=4294967295
2021-07-17 03:51:16.277619 :800002A3:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2
2021-07-17 03:51:16.277620 :800002A4:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2
2021-07-17 03:51:16.277992 :800002A5:KFDP:kfdp.c@9417:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle upto 6 metablks
2021-07-17 03:51:16.277993 :800002A6:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 3
2021-07-17 03:51:16.278154 :800002A7:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 1
2021-07-17 03:51:16.278268 :800002A8:KFDP:kfdp.c@5851:kfdp_read(): kfdp_read end ok=1
2021-07-17 03:51:16.278277 :800002A9:KFDP:kfdp.c@7073:kfdp_doQuery(): kfdp_doQuery   rewrite_kfdp=1
2021-07-17 03:51:16.278282 :800002AA:KFDP:kfdp.c@12511:kfdpLckValue_pickle(): kfdpLckValue_pickle size=0 
                            endian=0xff ndisks=0 lckvalid=0
2021-07-17 03:51:16.278293 :800002AB:db_trace:kfdp.c@12803:kfdpLck_convPriv(): [10499:19:396] 
                            kfdpLck_conv: grp=1, type=0, mode=5, line=7155
2021-07-17 03:51:16.278294 :800002AC:KFDP:kfdp.c@12663:kfdpLckValue_unpickle(): kfdpLckValue_unpickle
                            size=28 res=0 ok=0 ver=-1 dcnt=0 lckvalid=0 flags=0x2 inst=0 (I am 2) version=0
2021-07-17 03:51:16.278499*:800002AD:KGF:kgfdputl.c@485:kgfdpDta_getAllDsks(): kgfdpDta_getAllDsks using 
                            saved iterator 0x9ffffffffd571220 with 4 disks
2021-07-17 03:51:16.278688 :800002AE:KFDP:kfdp.c@5566:kfdp_write(): kfdp_write: pstDskCnt=3 grow=0 degenerate=0
2021-07-17 03:51:16.278688*:800002AF:KGF:kgfdputl.c@2619:kgfdpTraceSet(): writing pst to disks (n=3): 0 1 3

通过删除信息,基本上可以确认由于pst信息异常(pst中记录的只有0 1 3三个磁盘,认为2是老磁盘),但是实际磁盘为4个,导致gmon进程异常.通过底层解决该问题,数据库恢复成功

SQL> recover database using backup controlfile;
ORA-00279: change 30075814973 generated at 07/17/2021 01:12:08 needed for
thread 2
ORA-00289: suggestion : +FRA
ORA-00280: change 30075814973 for thread 2 is in sequence #120561


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_16
ORA-00279: change 30075814973 generated at 07/17/2021 01:11:54 needed for
thread 1
ORA-00289: suggestion :
+FRA/xff/archivelog/2021_07_17/thread_1_seq_79949.1543.1078103529
ORA-00280: change 30075814973 for thread 1 is in sequence #79949


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_13
ORA-00279: change 30075815013 generated at 07/17/2021 01:12:09 needed for
thread 1
ORA-00289: suggestion : +FRA
ORA-00280: change 30075815013 for thread 1 is in sequence #79950
ORA-00278: log file '/tmp/asm/group_13' no longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_11
Log applied.
Media recovery complete.

SQL> alter database open resetlogs;

Database altered.

运气不错,对于该故障的恢复,实现数据0丢失.

此条目发表在 Oracle备份恢复 分类目录,贴了 , 标签。将固定链接加入收藏夹。

ora-600 kfdpMetaBlk_pickle 故障处理》有 1 条评论

  1. 惜分飞 说:

    GMON进程磁盘组监控进程,这个进程负责维护磁盘组各个磁盘状态的一致性。当磁盘组中磁盘成员发生改变时(例如:添加,删除,或者磁盘出现损坏)该进程负责offline或者online磁盘。

    GMON monitors all the disk groups mounted in an ASM instance and is responsible for maintaining consistent disk membership and status information. Membership changes result from adding and dropping disks, whereas disk status changes result from taking disks offline or bringing them online.