标签归档:ORA-01555

在数据库open过程中常遇到ORA-01555汇总

在数据库open的过程中,select ctime, mtime, stime from obj$ where obj# = :1语句报ORA-01555错误,数据库无法正常open
一般情况下会报某个回滚段,但是这里ORA-01555: snapshot too old: rollback segment number 0 with name “SYSTEM” too small这里直接报了system(系统回滚段),属于少见情况

Fri Jun 26 11:47:31 2015
SMON: enabling cache recovery
Fri Jun 26 11:47:31 2015
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0b41.37629378):
Fri Jun 26 11:47:31 2015
select ctime, mtime, stime from obj$ where obj# = :1
Fri Jun 26 11:47:31 2015
Errors in file /orabin/admin/doocrm/udump/doocrm_ora_5046722.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 0 with name "SYSTEM" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 5046722
ORA-1092 signalled during: alter database open resetlogs...

在数据库open的过程中,select increment$,minvalue,maxvalue,cycle#,order$,cache,highwater,audit$,flags from seq$ where obj#=:1语句上报ORA-01555,导致数据库open失败
ORA-01555



在数据库open过程中,select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1语句导致数据库open失败.

ARC0: Becoming the 'no SRL' ARCH
Sun Jun 28 16:08:22 2015
ARC1: Becoming the heartbeat ARCH
Sun Jun 28 16:08:22 2015
SMON: enabling cache recovery
Sun Jun 28 16:08:22 2015
ORA-01555 caused by SQL statement below (SQL ID: 7bd391hat42zk, Query Duration=0 sec, SCN: 0x0d27.0a1ce29d):
Sun Jun 28 16:08:22 2015
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
   DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
Sun Jun 28 16:08:22 2015
Errors in file /oracle/app/oracle/admin/ibsscrm/udump/xxxx_ora_30212428.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 0 with name "SYSTEM" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 30212428
ORA-1092 signalled during: alter database open...

出现这类问题,一般是由于obj$,seq$,undo$等基表上对象scn大于数据库当前scn或者是由于这些表上有事务未提交,出现上述两种情况,数据库需要找对应的undo的回滚段中记录,而此时对应的回滚段异常(或者是由于redo未进行正常前滚,导致上述对象或者回滚段记录不正常),从而出现类似情况,一般出现此类情况,可以通过10046定位到block,然后故障原因采用bbed修改scn或者bbed提交事务来解决此类问题.
最近两年的恢复中又遇到一些ora-00704 ora-00604 ora-01555的错误,导致数据库无法正常open,对其进行补充,具体参见:数据库open过程遭遇ORA-1555对应sql语句补充

发表在 Oracle | 标签为 , , , , , | 评论关闭

某集团ebs数据库redo undo丢失导致悲剧

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上,而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到,因此出现该错误,解决方法是通过修改数据scn,让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法:
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

发表在 Oracle备份恢复, 非常规恢复 | 标签为 , , , , , , , | 3 条评论

记录一次ORA-00316 ORA-00312 redo异常恢复

正常运行的数据库报突然报ORA-00316: log 1 of thread 1, type 286 in header is not log file,异常终止

Fri Feb 21 08:44:42 2014
Thread 1 advanced to log sequence 591 (LGWR switch)
  Current log# 1 seq# 591 mem# 0: J:\ORADATA\ORCL\REDO01.LOG
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Fri Feb 21 15:31:20 2014
LGWR: terminating instance due to error 316
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j001_11328.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_14116.trc:
ORA-00316: 日志  (用于线程 ) 标头中的类型  不是日志文件

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw1_8964.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:22 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_10592.trc:
ORA-00316: log  of thread , type  in header is not log file
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。

Fri Feb 21 15:31:41 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_11112.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:41 2014
Instance terminated by LGWR, pid = 10312

数据库启动报ORA-00316错误

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

alert日志信息,报ORA-00316 ORA-00312

Sun Feb 23 13:54:08 2014
Started redo scan
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Sun Feb 23 13:54:08 2014
Aborting crash recovery due to error 316
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

ORA-316 signalled during: alter database open...

通过dump redo header可以看出来redo header完全混乱了,里面很多数据文件内容在里面,初步估计系统或者硬件有问题(不稳定)导致该问题

LOG FILE #1: 
  (name #3) J:\ORADATA\ORCL\REDO01.LOG
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x7d000 seq: 0x0000024f hws: 0x1 bsz: 512 nab: 0xffffffff flg: 0xa dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.4f4a1951
 Low scn: 0x0000.4f4e5400 02/21/2014 08:44:42
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 FILE HEADER:
	Software vsn=4280360459=0xff211e0b, Compatibility Vsn=103886850=0x6313002
	Db ID=842019892=0x32303434, Db Name='01a0001'
	Activation ID=1107558657=0x42040101
	Control Seq=842019892=0x32303434, File size=808464688=0x30303130
	File Number=2417, Blksiz=2013738803, File Type=286 UNKNOWN
 descrip:"00-440201|广东省xxxxxx研究院|00014402010037"
 thread: 767 nab: 0x534e4906 seq: 0xbcfebcbd hws: 0xffffffff eot: 55 dis: 66
 reset logs count: 0x7545245 scn: 0x0101.1e097178
 Low scn: 0x4537.2022002c 01/15/2022 20:59:37
 Next scn: 0x3734.46463135 04/30/2017 17:34:49
 Enabled scn: 0x4345.37333038 10/22/2015 02:04:16
 Thread closed scn: 0x3939.43303143 08/25/2024 00:11:31
 Log format vsn: 0x46343835 Disk cksum: 0x1d09 Calc cksum: 0x70ed
 Terminal Recovery Stop scn: 0x3734.35304141
 Terminal Recovery Stamp  03/17/2014 19:59:48
 Most recent redo scn: 0x3636.31303134
 Largest LWN: 758788710 blocks
 Miscellaneous flags: 0x41444534
 Thread internal enable indicator: thr: 263902399, seq: 808464496 scn: 0x3032.30343431

因为当前redo完全损坏,尝试不完全恢复并结合隐含参数(_allow_resetlogs_corruption)拉库,出现错误ORA-00704 ORA-00604 ORA-01555

Sun Feb 23 14:03:37 2014
SMON: enabling cache recovery
Sun Feb 23 14:03:39 2014
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.4f4e5405):
Sun Feb 23 14:03:39 2014
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 23 14:03:39 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5504.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small

Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704

通过修改scn,让数据库顺利open

发表在 Oracle备份恢复 | 标签为 , , , , , , | 评论关闭