标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 ORACLE恢复 Oracle 恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,698)
- DB2 (22)
- MySQL (74)
- Oracle (1,559)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (53)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (571)
- Oracle安装升级 (93)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (81)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]
- ORA-600 ktuPopDictI_1恢复
- impdp导入数据丢失sys授权问题分析
- impdp 创建index提示ORA-00942: table or view does not exist
- 数据泵导出 (expdp) 和导入 (impdp)工具性能降低分析参考
- 19c非归档数据库断电导致ORA-00742故障恢复
- Oracle 19c – 手动升级到 Non-CDB Oracle Database 19c 的完整核对清单
- sqlite数据库简单操作
- Oracle 暂定和恢复功能
- .pzpq扩展名勒索恢复
- Oracle read only用户—23ai新特性:只读用户
- 迁移awr快照数据到自定义表空间
- .hmallox加密mariadb/mysql数据库恢复
- 2025年首个故障恢复—ORA-600 kcbzib_kcrsds_1
- 第一例Oracle 21c恢复咨询
- ORA-15411: Failure groups in disk group DATA have different number of disks.
- 断电引起的ORA-08102: 未找到索引关键字, 对象号 39故障处理
- ORA-00227: corrupt block detected in control file
- 手工删除19c rac
- 解决oracle数据文件路径有回车故障
标签归档:ORA-607
记录一次系统回滚段坏块恢复
在上一篇中深入分析一次ORA-00314错误的数据库继续恢复,出现file 1 block 403和404等坏块,使得后面的恢复进一步写入了复杂境地.更加麻烦的是,这个库里面有一张核心表有40个字段,包括long,nvarchar2等一些字段,但是使用aul,dul,odu挖取都出现异常(表的行数正确,但是有些列的数据不能正常被挖出来),原因是该表中有特殊字段数据,被逼无赖只能继续拉库
数据库启动报错
SQL> startup mount pfile='c:\pfile.txt' ORACLE 例程已经启动。 Total System Global Area 452984832 bytes Fixed Size 1291120 bytes Variable Size 201329808 bytes Database Buffers 243269632 bytes Redo Buffers 7094272 bytes 数据库装载完毕。 SQL> recover database; 完成介质恢复。 SQL> alter database open; alter database open upgrade * 第 1 行出现错误: ORA-01092: ORACLE 实例终止。强制断开连接
查看alert日志
SMON: enabling cache recovery Fri May 16 22:49:53 2014 Hex dump of (file 1, block 404) in trace file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_2788.trc Corrupt block relative dba: 0x00400194 (file 1, block 404) Fractured block found during buffer read Data in bad block: type: 0 format: 0 rdba: 0x00000000 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000000 check value in block header: 0x0 block checksum disabled Reread of rdba: 0x00400194 (file 1, block 404) found same corrupted data Fri May 16 22:49:55 2014 Errors in file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_2788.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 404) ORA-01110: data file 1: 'C:\ORADATA\INTERLIB\SYSTEM01.DBF' Error 604 happened during db open, shutting down database USER: terminating instance due to error 604
跟踪数据库启动过程
这里可以清晰的看到是因为block 404有坏块,数据库递归sql访问到该坏块报错,从而数据库无法继续open,对数据库启动过程做10046跟踪
PARSING IN CURSOR #2 len=148 dep=1 uid=0 oct=6 lid=0 tim=80533759 hv=3540833987 ad='2a6ce1ec' update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1 END OF STMT PARSE #2:c=62500,e=211766,p=14,cr=85,cu=0,mis=1,r=0,dep=1,og=4,tim=80533755 BINDS #2: kkscoacd Bind#0 oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00 oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0 kxsbbbfp=2a6ceac2 bln=32 avl=09 flg=09 value="_SYSSMU1$" Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9b7c bln=24 avl=02 flg=05 value=2 Bind#2 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9b58 bln=24 avl=02 flg=05 value=9 Bind#3 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9b34 bln=24 avl=02 flg=05 value=2 Bind#4 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9b10 bln=24 avl=02 flg=05 value=1 Bind#5 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9aec bln=24 avl=04 flg=05 value=13332 Bind#6 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9ac8 bln=24 avl=04 flg=05 value=24672 Bind#7 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9aa4 bln=24 avl=05 flg=05 value=47727002 Bind#8 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9a80 bln=24 avl=01 flg=05 value=0 Bind#9 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9a5c bln=24 avl=01 flg=05 value=0 Bind#10 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9a38 bln=24 avl=02 flg=05 value=1 Bind#11 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9a14 bln=24 avl=02 flg=05 value=1 Bind#12 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=099b9ba0 bln=22 avl=02 flg=05 value=1 WAIT #2: nam='db file sequential read' ela= 10794 file#=1 block#=404 blocks=1 obj#=-1 tim=80546515 Hex dump of (file 1, block 404) Dump of memory from 0x1FB9C000 to 0x1FB9E000 1FB9C000 00000000 00000000 00000000 00000000 [................] Repeat 286 times 1FB9D1F0 00000000 00000000 00000000 00000001 [................] 1FB9D200 0000A200 0001D831 00000000 05010000 [....1...........] 1FB9D210 00007F30 00000000 00000000 00000000 [0...............] 1FB9D220 00000000 00000000 00000000 00000000 [................] Repeat 221 times Corrupt block relative dba: 0x00400194 (file 1, block 404) Fractured block found during buffer read Data in bad block: type: 0 format: 0 rdba: 0x00000000 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000000 check value in block header: 0x0 block checksum disabled Reread of rdba: 0x00400194 (file 1, block 404) found same corrupted data EXEC #2:c=0,e=45015,p=1,cr=1,cu=3,mis=1,r=0,dep=1,og=4,tim=80578919 ERROR #2:err=1578 tim=811723 ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 404) ORA-01110: data file 1: 'C:\ORADATA\INTERLIB\SYSTEM01.DBF' EXEC #1:c=328125,e=3468857,p=52,cr=619,cu=10,mis=0,r=0,dep=0,og=1,tim=82331842 ERROR #1:err=1092 tim=811898
dbv检测坏块
C:\oracle\product\10.2.0\db_1\BIN>dbv FILE = C:\ORADATA\INTERLIB\SYSTEM01.DBF DBVERIFY: Release 10.2.0.3.0 - Production on 星期一 5月 19 15:22:41 2014 Copyright (c) 1982, 2005, Oracle. All rights reserved. DBVERIFY - 开始验证: FILE = C:\ORADATA\INTERLIB\SYSTEM01.DBF 页 403 标记为损坏 Corrupt block relative dba: 0x00400193 (file 1, block 403) Bad header found during dbv: Data in bad block: type: 70 format: 1 rdba: 0x00030030 last change scn: 0x0000.3a6c2e79 seq: 0x0 flg: 0x00 spare1: 0x4c spare2: 0x45 spare3: 0x2 consistency value in tail: 0x00070000 check value in block header: 0x1 block checksum disabled 页 404 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x00400194 (file 1, block 404) Fractured block found during dbv: Data in bad block: type: 0 format: 0 rdba: 0x00000000 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000000 check value in block header: 0x0 block checksum disabled 页 498 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x004001f2 (file 1, block 498) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x004001f2 last change scn: 0x0000.02d7eb9d seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x37290601 check value in block header: 0x6c5e computed block checksum: 0xc2b5 页 61078 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x0040ee96 (file 1, block 61078) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0040ee96 last change scn: 0x0000.02d5cf11 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xa6e30601 check value in block header: 0x51d4 computed block checksum: 0xf572 页 61147 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x0040eedb (file 1, block 61147) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0040eedb last change scn: 0x0000.02d7f7e6 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x6ed80601 check value in block header: 0x4ae8 computed block checksum: 0x893f 页 61502 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x0040f03e (file 1, block 61502) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0040f03e last change scn: 0x0000.02d810dd seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xaf4c0601 check value in block header: 0xd99b computed block checksum: 0xbf91 页 61989 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x0040f225 (file 1, block 61989) Fractured block found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0040f225 last change scn: 0x0000.02d80f65 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xdff70601 check value in block header: 0x4e2a computed block checksum: 0xd092 DBVERIFY - 验证完成 检查的页总数: 62720 处理的页总数 (数据): 37740 失败的页总数 (数据): 0 处理的页总数 (索引): 7021 失败的页总数 (索引): 0 处理的页总数 (其它): 1784 处理的总页数 (段) : 0 失败的总页数 (段) : 0 空的页总数: 16169 标记为损坏的总页数: 7 流入的页总数: 5 最高块 SCN : 316431787 (0.316431787) C:\oracle\product\10.2.0\db_1\BIN>
根据我们的经验,对数据库启动影响很大的,主要是403,404,498三个坏块,在10201的库中查询得出结论(403,404是rollback,498是seq$).那现在我们可以理解在数据库启动过程中需要undo$表中相关列信息,但是由于rollback对应的block有坏块,使得数据库无法操作update操作,从而无法正常启动.
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A 3 WHERE FILE_ID = &FILE_ID 4 AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; 输入 file_id 的值: 1 原值 3: WHERE FILE_ID = &FILE_ID 新值 3: WHERE FILE_ID = 1 输入 block_id 的值: 404 原值 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 新值 4: AND 404 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER ------------------------------ SEGMENT_NAME -------------------------------------------------------------------------------- SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME ------------------ ------------------------------ ------------------------------ SYS SYSTEM ROLLBACK SYSTEM
常用方法
设置event跳过坏块,使用隐含参数屏蔽回滚段
_corrupted_rollback_segments= _SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$, _SYSSMU13$, _SYSSMU14$, _SYSSMU15$, _SYSSMU16$, _SYSSMU17$, _SYSSMU18$, _SYSSMU19$, _SYSSMU20$ undo_management = MANUAL event = 10231 trace name context forever, level 10, 10232 trace name context forever, level 10, 10233 trace name context forever, level 10 SQL> startup mount pfile='c:\pfile.txt' ORACLE 例程已经启动。 Total System Global Area 452984832 bytes Fixed Size 1291120 bytes Variable Size 201329808 bytes Database Buffers 243269632 bytes Redo Buffers 7094272 bytes 数据库装载完毕。 SQL> recover database; 完成介质恢复。 SQL> alter database open; alter database open upgrade * 第 1 行出现错误: ORA-01092: ORACLE 实例终止。强制断开连接
这里可以看出来,数据库在update undo$的时候因为rollback异常无法通过屏蔽回滚段和跳过坏块的方法来解决。因为是system undo系统块,想法就是从别的相同版本库中拷贝一个相同位置块过来试试看。
拷贝数据块
C:\oracle\product\10.2.0\db_1\BIN>bbed listfile=c:\bbed.txt Password: BBED: Release 2.0.0.0.0 - Limited Production on Mon May 19 10:47:14 2014 (c) Copyright 2000 Oracle Corporation. All rights reserved. ************* !!! For Oracle Internal Use only !!! *************** BBED> set blocksize 8192 BLOCKSIZE 8192 BBED> set mode edit MODE Edit BBED> copy file 2 block 405 to file 1 block 405 File: C:\ORADATA\INTERLIB\SYSTEM01.DBF (1) Block: 405 Offsets: 0 to 511 Dba:0x00400195 ------------------------------------------------------------------------ 02a20000 94014000 d4c50800 00000104 60890000 00000500 36000000 66004545 0000e81f 401fc81e 201ea01d 481da01c f81b501b e81a801a d8197019 08196018 f8179017 e8168016 1816c815 68152815 d8145814 f8137813 18139812 3812f811 a8116811 18119810 3810b80f 580fd80e 780e380e e80da80d 580d180d c80c480c f00b480b e00a380a d0092809 c0087008 1008d007 80074007 f006b006 60062006 d0052805 a8046004 2004d003 d802b401 6001e000 00000000 0a001800 0c001400 1300ff1d 78100000 78100000 02000000 00000000 0a165f45 00002e00 0201e71e 94014000 4c004300 03020100 2b05c000 5e15c000 0c001800 0c002000 29000100 82230000 82230000 02000000 00000000 0a163347 00003200 02014500 94014000 58004600 03022100 f30ec000 f50ec000 00000000 00000000 03000000 0d000e00 0e000e00 06c52008 4b5a2705 c51e3337 5f06c520 084b5a27 06c51f4c 28393906 c520084b 5a2706c5 273e0355 0d000000 ff165f43 0a001800 0c001400 0d004000 82230000 82230000 02000000 00000000 0a163346 00003200 02014500 94014000 58004500 03020100 f30ec000 f80ec000 00000000 00000000 06c52008 4b5a2705 c4201c60 5f000000 0c001800 0c002000 2a000100 82230000 82230000 02000000 00000000 0a163345 00003200 02014500 94014000 58004400 03022100 f30ec000 <32 bytes per line> BBED> sum apply Check value for File 1, Block 405: current = 0x8960, required = 0x8960
重新启动数据库
SQL> startup mount pfile='c:\pfile.txt' ORACLE 例程已经启动。 Total System Global Area 452984832 bytes Fixed Size 1291120 bytes Variable Size 201329808 bytes Database Buffers 243269632 bytes Redo Buffers 7094272 bytes 数据库装载完毕。 SQL> recover database; 完成介质恢复。 SQL> alter database open; alter database open upgrade * 第 1 行出现错误: ORA-01092: ORACLE 实例终止。强制断开连接
分析alert日志
Mon May 19 10:54:05 2014 Recovery of Online Redo Log: Thread 1 Group 3 Seq 13 Reading mem 0 Mem# 0: C:\ORADATA\INTERLIB\REDO03.LOG Block recovery stopped at EOT rba 13.5.16 Block recovery completed at rba 13.5.16, scn 0.316366922 Doing block recovery for file 1 block 9 Block recovery from logseq 13, block 3 to scn 316366921 Mon May 19 10:54:05 2014 Recovery of Online Redo Log: Thread 1 Group 3 Seq 13 Reading mem 0 Mem# 0: C:\ORADATA\INTERLIB\REDO03.LOG Block recovery completed at rba 13.5.16, scn 0.316366922 Mon May 19 10:54:07 2014 Errors in file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_1208.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-00607: 当更改数据块时出现内部错误 ORA-00600: 内部错误代码, 参数: [4193], [102], [58], [], [], [], [], [] Error 604 happened during db open, shutting down database USER: terminating instance due to error 604
这里是因为我们拷贝了一个其他库的undo段过来,然后数据库启动的时候首先使用到该undo块和rollback segment header 不匹配,所以通过通过修改undo header 来修复,使用bbed修改段头信息,因为在以前的文章中描述过,在此不再重复,具体参考:使用bbed解决ORA-00607/ORA-00600[4194]故障启动数据库
SQL> startup mount pfile='c:\pfile.txt' ORACLE 例程已经启动。 Total System Global Area 452984832 bytes Fixed Size 1291120 bytes Variable Size 201329808 bytes Database Buffers 243269632 bytes Redo Buffers 7094272 bytes 数据库装载完毕。 SQL> recover database; 完成介质恢复。 SQL> alter database open upgrade; 数据库已更改。
但是因为seq$有坏块,数据库启动后,如果使用非系统认证登录数据库会报如下错误
C:\oracle\product\10.2.0\db_1\BIN>sqlplus interlib/oracle SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 5月 19 16:29:29 2014 Copyright (c) 1982, 2006, Oracle. All Rights Reserved. ERROR: ORA-00600: 内部错误代码, 参数: [6807], [AUDSES$], [144], [], [], [], [], [] 请输入用户名: C:\oracle\product\10.2.0\db_1\BIN>sqlplus / as sysdba SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 5月 19 16:30:08 2014 Copyright (c) 1982, 2006, Oracle. All Rights Reserved. 连接到: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production With the Partitioning, OLAP and Data Mining options SQL> select object_type from dba_objects where object_name='AUDSES$'; OBJECT_TYPE ------------------- SEQUENCE
因为seq$有坏块导致该问题,因为该数据库需要重建,使用exp导出来数据,然后重建完成相关工作
ORA-607/ORA-600[4194]不一定是重大灾难
以前解决过ORA-607/ORA-600[4194]和模拟过ORA-607/ORA-600[4194]错误,所以固定思维任务ORA-607/ORA-600[4194]可能就是重大灾难,通过这个案例来说明ORA-607/ORA-600[4194]可能也就是一个常规的不能再常规的错误:有一网友数据库因意外关闭电源导致启动过程出现ORA-00607/ORA-00600[4194]/ORA-00600[4097]的错误,使得数据库启动失败.
SMON: enabling tx recovery Fri Aug 31 23:14:08 2012 Database Characterset is ZHS16GBK replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC QMNC started with pid=19, OS id=15619 Fri Aug 31 23:14:10 2012 Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc: ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], [] Fri Aug 31 23:14:12 2012 Non-fatal internal error happenned while SMON was doing logging scn->time mapping. SMON encountered 1 out of maximum 100 non-fatal internal errors. Fri Aug 31 23:14:12 2012 Completed: alter database open Fri Aug 31 23:14:14 2012 Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc: ORA-01595: error freeing extent (2) of rollback segment (4)) ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], [] Fri Aug 31 23:29:41 2012 Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc: ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], [] Fri Aug 31 23:29:43 2012 Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc: ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], [] Fri Aug 31 23:29:44 2012 Errors in file /u01/oradata/orcl/bdump/orcl_pmon_15577.trc: ORA-00474: SMON process terminated with error Fri Aug 31 23:29:44 2012 PMON: terminating instance due to error 474 Instance terminated by PMON, pid = 15577
通过alert日志可以定位到SMON_SCN_TIME表或者其回滚操作可能异常,结合alert和trace分析,发现这次错误的操作主要sql语句为:
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], [] Current SQL statement for this session: UPDATE SYS.COL_USAGE$ SET EQUALITY_PREDS = EQUALITY_PREDS + DECODE(BITAND(:FLAG, 1), 0, 0, 1), EQUIJOIN_PREDS = EQUIJOIN_PREDS + DECODE(BITAND(:FLAG, 2), 0, 0, 1), NONEQUIJOIN_PREDS = NONEQUIJOIN_PREDS + DECODE(BITAND(:FLAG, 4), 0, 0, 1), RANGE_PREDS = RANGE_PREDS + DECODE(BITAND(:FLAG, 8), 0, 0, 1), LIKE_PREDS = LIKE_PREDS + DECODE(BITAND(:FLAG, 16), 0, 0, 1), NULL_PREDS = NULL_PREDS + DECODE(BITAND(:FLAG, 32), 0, 0, 1), TIMESTAMP = :TIME WHERE OBJ# = :OBJN AND INTCOL# = :COLN ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], [] Current SQL statement for this session: UPDATE SYS.MON_MODS$ SET INSERTS = INSERTS + :INS, UPDATES = UPDATES + :UPD, DELETES = DELETES + :DEL, FLAGS = (DECODE(BITAND(FLAGS, :FLAG), :FLAG, FLAGS, FLAGS + :FLAG)), DROP_SEGMENTS = DROP_SEGMENTS + :DROPSEG, TIMESTAMP = :TIME WHERE OBJ# = :OBJN ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], [] Current SQL statement for this session: INSERT INTO SMON_SCN_TIME (THREAD, TIME_MP, TIME_DP, SCN, SCN_WRP, SCN_BAS, NUM_MAPPINGS, TIM_SCN_MAP) VALUES (0, :1, :2, :3, :4, :5, :6, :7)
这里主要涉及到对oracle的三张表的操作
COL_USAGE$:主要是在收集统计信息的时候作为是否需要收集列直方图信息参考
MON_MODS$:Oracle主要利用该表来记录那些表的数据发生改变,方便收集统计信息
SMON_SCN_TIME:记录SCN和TIME的对应关系
通过这里的分析可以确定这三张表中的数据对于数据库来说不是致命的基表信息,在数据库运行过程中可以清理掉这些信息,最多就是因为数据库性能的下降或者SCN和TIME互转功能不完善.
解决思路
完整的undo异常处理顺序
1.从alert中可以看出来数据库是在open之后由于SMON回滚到上述几条sql异常导致数据库down,所以可以尝试使用system回滚段启动数据库,看看是否可以屏蔽相关问题
2.如果方法1不可行,那使用event屏蔽smon对回滚段的相关操作,使得数据库正常启动
3.如果由于存在特殊事务,event无法屏蔽,尝试使用隐含参数处理该问题
4.如果隐含参数尚无法解决给问题考虑使用bbed
5.如果bbed不能解决,那只能选择dul或者其类似工具处理
这个案例中我们明确的看到是因为上面的三条sql回滚异常出现问题导致,对于这样的问题,经过测试使用方法1和2都能够顺利解决问题(open库之后需要重建undo,删除有问题undo表空间,修改参数[可能包括event],切换undo表空间).因为遇到几次ORA-607/ORA-600[4194]是因为system rollback损坏导致,所以这次开始也认为是一次比较复杂的恢复,最后证明这次是一种非常常规的恢复.对于ORACLE的数据库恢复有经验可能会比较快的定位问题,但是如果按照固定的思路去想可能会让自己走进死胡同.