联系:手机/微信(+86 17813235971) QQ(107644445)
标题:又一起存储故障导致ORA-00333 ORA-00312恢复
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库
Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc: ORA-00333: redo log read error block 63489 count 2048 ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG' ORA-27091: unable to queue I/O ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc: ORA-00333: redo log read error block 63489 count 2048 Thu Aug 07 10:42:03 2014 ARC0: All Archive destinations made inactive due to error 333 Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc: ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止 ORA-00340: 处理联机日志 (用于线程 ) 时出现 I/O 错误 Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc: ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止 ORA-00340: 处理联机日志 (用于线程 ) 时出现 I/O 错误 Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc: ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止 ORA-00340: 处理联机日志 (用于线程 ) 时出现 I/O 错误 Thu Aug 07 10:42:03 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc: ORA-00340: IO error processing online log 3 of thread 1 ORA-00345: redo log write error block 65238 count 13 ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27070: async read/write failed OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 10:42:03 2014 LGWR: terminating instance due to error 340 Thu Aug 07 10:42:05 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc: ORA-00603: ORACLE server session terminated by fatal error ORA-00449: background process 'LGWR' unexpectedly terminated with error 340 ORA-00340: IO error processing online log of thread Thu Aug 07 10:42:05 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc: ORA-00603: ORACLE server session terminated by fatal error ORA-00449: background process 'LGWR' unexpectedly terminated with error 340 ORA-00340: IO error processing online log of thread Thu Aug 07 10:42:05 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc: ORA-00603: ORACLE server session terminated by fatal error ORA-00449: background process 'LGWR' unexpectedly terminated with error 340 ORA-00340: IO error processing online log of thread Thu Aug 07 17:40:05 2014 ALTER DATABASE OPEN Thu Aug 07 17:40:05 2014 Beginning crash recovery of 1 threads parallel recovery started with 15 processes Thu Aug 07 17:40:06 2014 Started redo scan Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27070: 异步读取/写入失败 OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27091: 无法将 I/O 排队 ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 17:40:06 2014 Aborting crash recovery due to error 333 Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-333 signalled during: ALTER DATABASE OPEN...
进一步检查发现在7月6日系统就已经报io异常
Sun Jul 06 10:05:23 2014 ARC0: All Archive destinations made inactive due to error 333 Sun Jul 06 10:06:07 2014 KCF: write/open error block=0xd03 online=1 file=3 F:\ORADATA\SZCG\SYSAUX01.DBF error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。' Automatic datafile offline due to write error on file 3: F:\ORADATA\SZCG\SYSAUX01.DBF Sun Jul 06 10:06:23 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc: ORA-00333: redo log read error block 63489 count 2048 ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG' ORA-27091: unable to queue I/O ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 10:36:54 2014 ARC1: All Archive destinations made inactive due to error 333 Thu Aug 07 10:37:25 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc: ORA-01135: file 3 accessed for DML/query is offline ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'
检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312
Started redo scan Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27070: 异步读取/写入失败 OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27091: 无法将 I/O 排队 ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 1) 函数不正确。 Thu Aug 07 17:40:06 2014 Aborting crash recovery due to error 333 Thu Aug 07 17:40:06 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-333 signalled during: ALTER DATABASE OPEN...
设置隐含参数_allow_resetlogs_corruption,尝试强制拉库
Started redo scan Fri Aug 08 12:13:25 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27070: 异步读取/写入失败 OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 Fri Aug 08 12:13:25 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG' ORA-27091: 无法将 I/O 排队 ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 1) 函数不正确。 Fri Aug 08 12:13:25 2014 Aborting crash recovery due to error 333 Fri Aug 08 12:13:25 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc: ORA-00333: 重做日志读取块 63016 计数 8192 出错 ORA-333 signalled during: ALTER DATABASE OPEN... Fri Aug 08 12:13:45 2014 ALTER DATABASE RECOVER database until cancel Fri Aug 08 12:13:45 2014 Media Recovery Start parallel recovery started with 15 processes ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel ... Fri Aug 08 12:13:55 2014 ALTER DATABASE RECOVER CANCEL Fri Aug 08 12:13:59 2014 ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ... Fri Aug 08 12:13:59 2014 ALTER DATABASE RECOVER CANCEL ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ... Fri Aug 08 12:14:12 2014 alter database open resetlogs Fri Aug 08 12:14:13 2014 RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. ORA-1245 signalled during: alter database open resetlogs... Fri Aug 08 12:54:11 2014 alter tablespace sysaux offline Fri Aug 08 12:54:11 2014 ORA-1109 signalled during: alter tablespace sysaux offline... Fri Aug 08 13:05:30 2014 alter database open Fri Aug 08 13:05:30 2014 ORA-1589 signalled during: alter database open...
在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作
Sat Aug 09 11:56:03 2014 alter database datafile 3 online Sat Aug 09 11:56:04 2014 Completed: alter database datafile 3 online Sat Aug 09 11:56:08 2014 alter database open resetlogs RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. Sat Aug 09 11:56:18 2014 ARCH: Encountered disk I/O error 19502 Sat Aug 09 11:56:18 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc: ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) ORA-27072: 文件 I/O 错误 OSD-04008: WriteFile() 失败, 无法写入文件 O/S-Error: (OS 1) 函数不正确。 ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) Sat Aug 09 11:56:18 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc: ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) ORA-27072: 文件 I/O 错误 OSD-04008: WriteFile() 失败, 无法写入文件 O/S-Error: (OS 1) 函数不正确。 ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001' Sat Aug 09 11:56:18 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc: ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式 Archive all online redo logfiles failed:265 RESETLOGS after incomplete recovery UNTIL CHANGE 77983856 Resetting resetlogs activation ID 3562192628 (0xd452bef4) Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared Sat Aug 09 11:56:22 2014 Setting recovery target incarnation to 3 Sat Aug 09 11:56:23 2014 Assigning activation ID 3602586269 (0xd6bb1a9d) LGWR: STARTING ARCH PROCESSES ARC0 started with pid=33, OS id=5900 Sat Aug 09 11:56:23 2014 ARC0: Archival started ARC1: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC1 started with pid=34, OS id=5776 Sat Aug 09 11:56:24 2014 Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG Successful open of redo thread 1 Sat Aug 09 11:56:24 2014 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sat Aug 09 11:56:24 2014 ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH Sat Aug 09 11:56:24 2014 ARC0: Becoming the heartbeat ARCH Sat Aug 09 11:56:24 2014 SMON: enabling cache recovery Sat Aug 09 11:56:25 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc: ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], [] Sat Aug 09 11:56:26 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc: ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], [] Sat Aug 09 11:56:26 2014 Error 600 happened during db open, shutting down database USER: terminating instance due to error 600 Instance terminated by USER, pid = 4516 ORA-1092 signalled during: alter database open resetlogs...
ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194
Sat Aug 09 12:01:28 2014 SMON: enabling cache recovery Dictionary check complete Sat Aug 09 12:01:32 2014 SMON: enabling tx recovery Sat Aug 09 12:01:32 2014 Database Characterset is ZHS16GBK Opening with internal Resource Manager plan replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC QMNC started with pid=34, OS id=6116 Sat Aug 09 12:01:34 2014 LOGSTDBY: Validating controlfile with logical metadata Sat Aug 09 12:01:34 2014 LOGSTDBY: Validation complete Sat Aug 09 12:01:34 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc: ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], [] Sat Aug 09 12:01:36 2014 Doing block recovery for file 2 block 319 Resuming block recovery (PMON) for file 2 block 319 Block recovery from logseq 2, block 56 to scn 1073742003 Sat Aug 09 12:01:36 2014 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: F:\ORADATA\SZCG\REDO02.LOG Block recovery stopped at EOT rba 2.79.16 Block recovery completed at rba 2.79.16, scn 0.1073742002 Doing block recovery for file 2 block 153 Resuming block recovery (PMON) for file 2 block 153 Block recovery from logseq 2, block 56 to scn 1073741986 Sat Aug 09 12:01:36 2014 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: F:\ORADATA\SZCG\REDO02.LOG Block recovery completed at rba 2.66.16, scn 0.1073741988 Sat Aug 09 12:01:36 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc: ORA-01595: error freeing extent (4) of rollback segment (10)) ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], [] Sat Aug 09 12:01:36 2014 Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc: ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], [] Sat Aug 09 12:01:36 2014 Completed: alter database open
尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上