标签归档:ORA-600 4137

记录一次200T的数据库恢复经历

有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
df_size


由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:

SQL> recover database;
ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for
thread 4
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf
ORA-00280: change 318472018583 for thread 4 is in sequence #322810

Wed Aug 28 11:19:55 2019
ALTER DATABASE RECOVER  DATABAE 
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0
  Mem# 0: +REDO/xff/log2.ora
Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0
  Mem# 0: +REDO/xff/log3.ora
Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0
  Mem# 0: +REDO/xff/log6.ora
ORA-279 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Aug 28 11:21:31 2019
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

数据库恢复需要thread 4 sequence #322810,查询redo信息
redo


redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后

Wed Aug 28 12:40:15 2019
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc  (incident=244209):
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Wed Aug 28 12:40:16 2019
ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47).
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc:
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Wed Aug 28 12:40:20 2019
Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc  (incident=245017):
ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Aug 28 12:40:20 2019
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc  (incident=244305):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息

Wed Aug 28 12:57:15 2019
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[57676] Successfully onlined Undo Tablespace 22.
Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Aug 28 12:57:17 2019
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
No Resource Manager plan active
Starting background process GTX0
Wed Aug 28 12:57:18 2019
GTX0 started with pid=45, OS id=57777 
Starting background process RCBG
Wed Aug 28 12:57:18 2019
RCBG started with pid=46, OS id=57779 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 28 12:57:19 2019
QMNC started with pid=47, OS id=57788 
Completed: ALTER DATABASE OPEN

后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
OPEN


发表在 非常规恢复 | 标签为 , , | 2 条评论

非归档数据库异常恢复一例

由于存储故障,数据库为非归档模式,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)收集信息确认数据库redo异常
2


Thu Mar 28 11:36:13 2019
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Thu Mar 28 11:36:13 2019
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_17611.trc:
ORA-00308:cannot open archived log
    '/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_17611.trc:
ORA-00308:cannot open archived log
   '/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397869_%u_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...

Thu Mar 28 11:38:44 2019
ALTER DATABASE RECOVER  datafile 5,6  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 3 Seq 5397870 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 5,6  ...
Thu Mar 28 11:39:08 2019
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397870_%u_.arc
Errors with log /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_5397870_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_17605.trc  (incident=365041):
ORA-00600: internal error code, arguments: [3051], [82], [], [], [], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE RECOVER CANCEL ...

通过屏蔽一致性,强制打开库,报kgegpa错误,实例启动失败

Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x2297740, kgegpa()+40] [flags: 0x0, count:1]
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x229596B, kgebse()+279][flags: 0x2, count:2]
Exception[type:SIGSEGV, Address not mapped to object][ADDR:0x319C0CF3] [PC:0x229596B, kgebse()+279][flags: 0x2, count:2]
Thu Mar 28 11:43:15 2019
PMON (ospid: 17939): terminating the instance due to error 397
Instance terminated by PMON, pid = 17939

处理上述错误相关undo,启动数据库报ORA-00600 4193,ORA-00600 4137, ORA-00600 6006

Thu Mar 28 11:50:37 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_18267.trc  (incident=373059):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_373059/orcl_p001_18267_i373059.trc
Stopping background process MMON
Trace dumping is performing id=[cdmp_20190328115038]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc  (incident=372995):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_372995/orcl_smon_18247_i372995.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc  (incident=372996):
ORA-00600: internal error code, arguments: [4137], [34.22.4206895], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_372996/orcl_smon_18247_i372996.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18263.trc  (incident=373044):
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_373044/orcl_ora_18263_i373044.trc
ORACLE Instance orcl (pid = 16) - Error 600 encountered while recovering transaction (34, 22).
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_18247.trc:
ORA-00600: internal error code, arguments: [4137], [34.22.4206895], [0], [0], [], [], [], [], [], [], [], []

通过重建undo,相关报错消失,安排数据导出重建库

发表在 Oracle备份恢复 | 标签为 , , , | 评论关闭

ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.

ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add 
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect

Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK

Details: 
Undo corruption may be caused after a shrink and the same undo block may be used 
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
>create pfile from spfile;

1. Shutdown the instance

2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'

3. >startup restrict pfile=<initsid.ora>

4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.

If all offline then continue to the next step

5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;

6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;

7. >shutdown immediate;

8 >startup nomount;  --> Using your Original spfile

9 modify the spfile with the new undo tablespace name

  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;

10. >shutdown immediate;

11. >startup;  --> Using spfile
 


The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out 
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭