联系:手机/微信(+86 17813235971) QQ(107644445)
标题:乱用_allow_resetlogs_corruption参数导致悲剧
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一个朋友11.2.0.1的数据库因为断电,出现不能正常open问题,自己尝试恢复,折腾了几天,最后让我帮忙的时候错误如下
SQL> startup ORACLE 例程已经启动。 Total System Global Area 778387456 bytes Fixed Size 1374808 bytes Variable Size 545260968 bytes Database Buffers 226492416 bytes Redo Buffers 5259264 bytes 数据库装载完毕。 ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 225) ORA-01110: data file 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' 进程 ID: 5964 会话 ID: 1144 序列号: 5
从启动的日志提示看初步判断就是悲剧了,因为根据经验值在11gr2版本中,该错误就是undo$(分析trace文件进步一确定是undo$),该block出现异常,数据库在启动的时候要扫描该表,把相关的回滚段给online起来,现在他异常了,数据库肯定无法正常启动
dbv检查数据库文件
F:\>dbv file='F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' DBVERIFY: Release 11.2.0.1.0 - Production on 星期三 5月 22 11:06:00 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. DBVERIFY - 开始验证: FILE = F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF 页 225 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x004000e1 (file 1, block 225) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x004000e1 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xb98e0601 check value in block header: 0xb307 computed block checksum: 0xe8ae DBVERIFY - 验证完成 检查的页总数: 134400 处理的页总数 (数据): 98226 失败的页总数 (数据): 0 处理的页总数 (索引): 14189 失败的页总数 (索引): 0 处理的页总数 (其他): 4178 处理的总页数 (段) : 1 失败的总页数 (段) : 0 空的页总数: 17806 标记为损坏的总页数: 1 流入的页总数: 1 加密的总页数 : 0 最高块 SCN : 14045769 (0.14045769)
看到这里,可以确定坏块的存在,根据上面的提示,我们发现tailchk值不正确,应该是5120+06+01,而不该是b98e0601,通过bbed查看
BBED> p kcbh struct kcbh, 20 bytes @0 ub1 type_kcbh @0 0x06 ub1 frmt_kcbh @1 0xa2 ub1 spare1_kcbh @2 0x00 ub1 spare2_kcbh @3 0x00 ub4 rdba_kcbh @4 0x004000e1 ub4 bas_kcbh @8 0x00d65120 ub2 wrp_kcbh @12 0x0000 ub1 seq_kcbh @14 0x01 ub1 flg_kcbh @15 0x06 (KCBHFDLC, KCBHFCKV) ub2 chkval_kcbh @16 0x5ba9 ub2 spare3_kcbh @18 0x0000 BBED> p tailchk ub4 tailchk @8188 0xb98e0601
进一步证明是tailchk异常导致,分析alert日志,数据库异常断电,然后启动的时候发现如下错误
Recovery of Online Redo Log: Thread 1 Group 2 Seq 431 Reading mem 0 Mem# 0: F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\REDO02.LOG RECOVERY OF THREAD 1 STUCK AT BLOCK 451 OF FILE 3 Aborting crash recovery due to error 1172 Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc: ORA-01172: 线程 1 的恢复停止在块 451 (在文件 3 中) ORA-01151: 如果需要, 请使用介质恢复以恢复块和还原备份 Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc: ORA-01172: 线程 1 的恢复停止在块 451 (在文件 3 中) ORA-01151: 如果需要, 请使用介质恢复以恢复块和还原备份 ORA-1172 signalled during: alter database open... Tue May 21 14:27:29 2013 ALTER DATABASE RECOVER datafile 3 Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 2 Seq 431 Reading mem 0 Mem# 0: F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\REDO02.LOG Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc (incident=112164): ORA-00600: 内部错误代码, 参数: [3020], [3], [451], [12583363], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 3, block# 451, file offset is 3694592 bytes) ORA-10564: tablespace UNDOTBS1 ORA-01110: 数据文件 3: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\UNDOTBS01.DBF' ORA-10560: block type 'KTU UNDO BLOCK' Media Recovery failed with error 600 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 3 ...
因为file# 3, block# 451和redo信息不一致,出现ora-600[3020]错误,而file# 3为undo文件,朋友从而设置undo_management=’manual’并设置了_allow_resetlogs_corruption=true,然后进行不完全恢复,从而出现了如下错误提示
Tue May 21 14:41:23 2013 SMON: enabling cache recovery Corrupt block relative dba: 0x004000e1 (file 1, block 225) Fractured block found during buffer read Data in bad block: type: 6 format: 2 rdba: 0x004000e1 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xb98e0601 check value in block header: 0xb307 computed block checksum: 0xe8ae Reading datafile 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' for corruption at rdba: 0x004000e1 (file 1, block 225) Reread (file 1, block 225) found same corrupt data Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc (incident=120165): ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225) ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225) ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225) ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF' Error 604 happened during db open, shutting down database USER (ospid: 4892): terminating the instance due to error 604
从而的原因基本上可以从操作过程中了解到:数据库是因为file# 3 block# 451和redo不一致导致问题,而恢复的操作人员冲动的使用了_allow_resetlogs_corruption参数,从而使得数据库出现了不一致性,也就是导致file# 1 block# 225坏块的根本原因,针对这样的情况,完全没有到使用_allow_resetlogs_corruption隐含参数地步
使用bbed修改tailchk
BBED> p tailchk ub4 tailchk @8188 0xb98e0601 BBED> verify DBVERIFY - Verification starting FILE = system01.dbf BLOCK = 225 Block 225 is corrupt *** Corrupt block relative dba: 0x004000e1 (file 0, block 225) Fractured block found during verification Data in bad block - type: 6 format: 2 rdba: 0x004000e1 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06 consistency value in tail: 0xb98e0601 check value in block header: 0x5ba9, computed block checksum: 0x0 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 0 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 1 Total Blocks Influx : 2 BBED> m /x 01062051 File: system01.dbf (0) Block: 226 Offsets: 8188 to 8191 Dba:0x00000000 ------------------------------------------------------------------------ 01062051 <32 bytes per line> BBED> p tailchk ub4 tailchk @8188 0x51200601 BBED> sum apply Check value for File 0, Block 226: current = 0xb307, required = 0xb307 BBED> verify DBVERIFY - Verification starting FILE = system01.dbf BLOCK = 225 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0
bbed修改block之后,数据库直接正常打开,完成数据库恢复任务,在这里很明显是因为错误的使用了_allow_resetlogs_corruption参数,屏蔽了redo前滚导致了相关的坏块,所以大家在数据库异常恢复的时候,需要知道各个参数的意义,而不要乱使用,很可能导致不可控结果
dul通过设置相应参数可以加载坏块数据,并提示坏块原因