ORA-600 kcratr_nab_less_than_odr故障解决

朋友的数据库服务器出现ORA-00600[kcratr_nab_less_than_odr]，不能open数据库
1.open数据库报ORA-00600[kcratr_nab_less_than_odr]

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

2.查看alert日志

Wed Jan 11 13:56:16 2012
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Started redo scan
Completed redo scan
 read 54591 KB redo, 0 data blocks need recovery
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc  (incident=818557):
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Incident details in: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_818557\dbdms_ora_3936_i818557.trc
Aborting crash recovery due to error 600
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Trace dumping is performing id=[cdmp_20120110214555]

3.查看trace文件

Trace file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 2 - type 8664, 2 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:2250M/4060M, Ph+PgF:5868M/8119M 
Instance name: dbdms
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 3108, image: ORACLE.EXE (SHAD)
…………………………
WARNING! Crash recovery of thread 1 seq 99189 is
ending at redo block 43531 but should not have ended before
redo block 43569
Incident 826550 created, dump file: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_826550\dbdms_ora_3108_i826550.trc
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

通过alert和trace中的内容可以知道，数据库需要恢复到rba到43569，但是因为某种原因实例恢复的时候，只能利用1 thread 99189 seq#，恢复rba到43531。从而导致数据库无法正常open

This Problem is caused by Storage Problem of the Database Files. 
The Subsystem (eg. SAN) crashed while the Database was open. 
The Database then crashed since the Database Files were not accessible anymore. 
This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.

4.解决方法

SQL> SELECT STATUS FROM V$INSTANCE;

STATUS
------------
MOUNTED

--尝试直接recover database
SQL> RECOVER DATABASE ;
ORA-00283: 恢复会话因错误而取消
ORA-00264: 不要求恢复
--提示不用恢复

--再打开数据库，还是kcratr_nab_less_than_odr错误警告
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

--尝试不完全恢复
SQL> RECOVER DATABASE UNTIL CANCEL;
ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01152: 文件 1 没有从过旧的备份中还原
ORA-01110: 数据文件 1: 'D:\DBDMS\DATA\SYSTEM01.DBF'

--重建控制文件
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS 'D:/1.TXT';

数据库已更改。

SQL> SHUTDOWN IMMEDIATE;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT;
ORACLE 例程已经启动。

Total System Global Area  417546240 bytes
Fixed Size                  2176328 bytes
Variable Size             268438200 bytes
Database Buffers          138412032 bytes
Redo Buffers                8519680 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "DBDMS" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 18688
  7  LOGFILE
  8    GROUP 1 'D:\DBDMS\LOG\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'D:\DBDMS\LOG\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'D:\DBDMS\LOG\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    'D:\DBDMS\DATA\SYSTEM01.DBF',
 13    'D:\DBDMS\DATA\SYSAUX01.DBF',
 14    'D:\DBDMS\DATA\RBSG01.DBF',
 15    'D:\DBDMS\DATA\DATA01.DBF',
 16    'D:\DBDMS\DATA\INDX01.DBF',
 17    'D:\DBDMS\DATA\DATA02.DBF',
 18    'D:\DBDMS\DATA\DATA03.DBF',
 19    'D:\DBDMS\DATA\DATA04.DBF',
 20    'D:\DBDMS\DATA\INDX02.DBF',
 21    'D:\DBDMS\DATA\SYSTEM02.DBF'
 22  CHARACTER SET ZHS16GBK
 23  ;

控制文件已创建。

--继续尝试恢复
SQL> RECOVER DATABASE ;
完成介质恢复。
SQL> ALTER DATABASE OPEN;

数据库已更改。
--open成功

在这次恢复中，主要就是重建控制文件，然后直接恢复成功，如果redo有损坏，那么可能需要使用不完全恢复，然后使用resetlogs打开数据库

Applies to: Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2 Information in this document applies to any platform. Symptoms Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with : ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [219], [25020], [25021], [] The Database can't open at this Point. In the corresponding Tracefile we can find this Error Callstack: dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) ----- ALTER DATABASE OPEN ----- Call Stack Trace ----- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- dbgexPhaseII <- dbgexProcessError <- dbgePostErrorKGE <- kgeasnmierr <- kcratr_odr_check <- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv <- opiexe <- opiosq0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <-ssthrdmain <- main <- start Cause This Problem is caused by Storage Problem of the Database Files. The Subsystem (eg. SAN) crashed while the Database was open. The Database then crashed since the Database Files were not accessible anymore. This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600. There are two possible Solutions: 1. If you could restore your Storage Environment and the Online RedoLogs from the Time of the crash you can try a manual Recovery followed by a RESETLOGS: SQL> startup mount; SQL> recover database until cancel using backup controlfile; -> manually provide Online RedoLog containing the last (current) Sequence when asked, eg. ORA-00279: change 100000 generated at xx/xx/xxxx xx:xx:xx needed for thread 1 ORA-00289: suggestion : /flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc ORA-00280: change 100000 for thread 1 is in sequence #100 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /ora/oradata/dbtest/redo04_1.rdo Log applied. Media recovery complete. SQL> alter database open resetlogs; 2. If step1. fails or you don't have the full Set of Files you have to restore and recover the Database from a recent Backup.

《ORA-600 kcratr_nab_less_than_odr故障解决》有 7 条评论

惜分飞说：

2012 年 1 月 16 日16:03

VMCD,
1.ORA-00600[kcratr_nab_less_than_odr]错误的本质就是rba不能恢复到on-disk rba位置，可能的原因就断电的时候，使得redo中记录的rba没有达到控制文件中的on-disk rba，这个问题的解决办法，一般都是利用备份控制文件恢复，现在没有备份控制文件，只能利用重建控制文件解决

2.根据上面的完全恢复和不完全恢复的提示，也可以考虑重建控制文件后再尝试恢复
VMCD 说：

2012 年 1 月 16 日15:57

你重建控制文件的目的是什么
惜分飞说：

2012 年 1 月 11 日21:42

lowrba ：在buffer cache中的数据块第一次数据改变所对应的RAB。脏数据块在检查点队列里面按照low rba排列。
high rba ：在buffer cache中的数据块最近一次数据改变时所对应的RAB。
checkpoint rba：在checkpint queue中（每次checkpoint queue被clean以后）第一个脏数据块第一次被修改对应的RAB,这个RBA之前的脏数据已经被全部写入磁盘。
on-disk rba：是 lgwr 写日志文件的最末位置的地址。

Redo Byte Address (RBA)
the log file sequence number (4 bytes)
the log file block number (4 bytes)
the byte offset into the block at which the redo record starts (2 bytes)

With respect to a dirty block in the buffer cache, the low RBA is the address of the redo for the first change that was applied to the block since it was last clean, and the high RBA is the address of the redo for the most recent change to have been applied to the block.
Dirty buffers are maintained on the buffer cache checkpoint queues in low RBA order. The checkpoint RBA is the point up to which DBWn has written buffers from the checkpoint queues if incremental checkpointing is enabled — otherwise it is the RBA of last full thread checkpoint. The checkpoint RBA is copied into the checkpoint progress record of the controlfile by the checkpoint heartbeat once every 3 seconds. Instance recovery, when needed, begins from the checkpoint RBA recorded in the controlfile. The target RBA is the point up to which DBWn should seek to advance the checkpoint RBA to satisfy instance recovery objectives.
The on-disk RBA is the point up to which LGWR has flushed the redo thread to the online log files. DBWn may not write a block for which the high RBA is beyond the on-disk RBA. Otherwise transaction recovery (rollback) would not be possible, because the redo needed to undo a change is always in the same redo record as the redo for the change itself.
The term sync RBA is sometimes used to refer to the point up to which LGWR is required to sync the thread. However, this is not a full RBA — only a redo block number is used at this point.
惜分飞说：

2012 年 1 月 11 日21:03

low rba ：在buffer cache中的数据块第一次数据改变所对应的RAB。脏数据块在检查点队列里面按照low rba排列。
high rba ：在buffer cache中的数据块最近一次数据改变时所对应的RAB。
checkpoint rba：在checkpint queue中（每次checkpoint queue被clean以后）第一个脏数据块第一次被修改对应的RAB,这个RBA之前的脏数据已经被全部写入磁盘。
on-disk rba：是 lgwr 写日志文件的最末位置的地址。

惜分飞说：

2012 年 1 月 11 日14:57

《ORA-600 kcratr_nab_less_than_odr故障解决》有 7 条评论

文章分类

最近发表

ORA-600 kcratr_nab_less_than_odr故障解决

《ORA-600 kcratr_nab_less_than_odr故障解决》有 7 条评论

标签云

文章分类

最近发表