联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
案例背景:在我接手这个库之前,因为某种原因sysaux表空间的数据文件离线,该库非归档模式,无备份
一、sysaux数据文件离线原因
Mon Jun 7 03:03:22 2010 KCF: write/open error block=0x67009 online=1 file=3 /opt/app/oracle/oradata/BAS/sysaux01.dbf error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 421897 Additional information: -1' Automatic datafile offline due to write error on file 3: /opt/app/oracle/oradata/BAS/sysaux01.dbf
因为该数据库是非归档模式,估计以前的dba也是一段时间后发现sysaux被离线,因为不是归档模式,无法恢复,就一直放置着,让库处于这样的状态中。
二、sysaux数据文件online
1、使用bbed修改sysaux数据文件的scn,见:
bbed 修改datafile header
Oracle Recovery Tools快速解决sysaux文件不能online问题
2、尝试online过程日志如下
Sat Dec 17 19:33:36 2011 ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964. Sat Dec 17 19:33:36 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc: ORA-00376: file 3 cannot be read at this time ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf' Sat Dec 17 19:33:37 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 4571) ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf' Sat Dec 17 19:38:38 2011 ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964. Sat Dec 17 19:38:38 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc: ORA-00376: file 3 cannot be read at this time ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf' Sat Dec 17 19:38:38 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 4571) ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf' Sat Dec 17 19:39:47 2011 ALTER DATABASE RECOVER datafile 3 Sat Dec 17 19:39:47 2011 Media Recovery Start parallel recovery started with 7 processes Sat Dec 17 19:39:47 2011 Recovery of Online Redo Log: Thread 1 Group 6 Seq 13545 Reading mem 0 Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0602.log Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0601.log Sat Dec 17 19:39:47 2011 Recovery of Online Redo Log: Thread 1 Group 7 Seq 13546 Reading mem 0 Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0702.log Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0701.log Sat Dec 17 19:39:47 2011 Media Recovery Complete (BAS) Completed: ALTER DATABASE RECOVER datafile 3 Sat Dec 17 19:39:58 2011 alter database datafile 3 online Sat Dec 17 19:39:58 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc: ORA-01157: cannot identify/lock data file 3 - see DBWR trace file ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf' Sat Dec 17 19:39:58 2011 Completed: alter database datafile 3 online
这个过程虽然在sqlplus中提示online成功,但是alert中的错误警告,以及smon进程占用100%的cup资源,最终导致数据库hang住。
2、分析alert日志和trace文件
alert日志中 Sat Dec 17 19:38:38 2011 ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964. bas_smon_27197.trc中 [oracle@bas bdump]$ more /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1 System name: Linux Node name: bas Release: 2.6.9-78.ELsmp Version: #1 SMP Wed Jul 9 15:46:26 EDT 2008 Machine: x86_64 Instance name: BAS Redo thread mounted by this instance: 1 Oracle process number: 17 Unix process pid: 27197, image: oracle@bas (SMON) *** SERVICE NAME:() 2011-12-17 19:23:33.179 *** SESSION ID:(5490.1) 2011-12-17 19:23:33.179 SMON: about to recover undo segment 70 ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964. *** 2011-12-17 19:23:33.188 ksedmp: internal or fatal error ORA-00376: file 3 cannot be read at this time ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
通过这些证明smon在利用undo segment 70在回滚sysaux中的内容,但是因为某种原因该回滚段异常,不能进行回滚,是的smon一直尝试回滚,但是始终不成功,最后数据库hang住,需要解决sysaux的问题,首先需要解决这个回滚段问题(删除异常回滚段)
3、删除异常回滚段,online datafile 3
强制kill掉smon进程,重启数据库 [oracle@bas bdump]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.1.0 - Production on Sat Dec 17 19:52:14 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to an idle instance. SQL> startup mount ORACLE instance started. Total System Global Area 8589934592 bytes Fixed Size 2034520 bytes Variable Size 1124074664 bytes Database Buffers 7398752256 bytes Redo Buffers 65073152 bytes Database mounted. --为了数据库不hang掉,先offline datafile 3 SQL> alter database datafile 3 offline; Database altered. SQL> select segment_name,status from dba_rollback_segs; select segment_name,status from dba_rollback_segs * ERROR at line 1: ORA-01219: database not open: queries allowed on fixed tables/views only SQL> alter database open; Database altered. SQL> select segment_name,status from dba_rollback_segs where status='NEEDS RECOVERY'; SEGMENT_NAME STATUS ------------------------------ ---------------- _SYSSMU70$ NEEDS RECOVERY SQL> create pfile='/tmp/pfile' from spfile; File created. 关闭数据库,在pfile中增加 *._corrupted_rollback_segments=(_SYSSMU70$) SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup pfile='/tmp/pfile' ORACLE instance started. Total System Global Area 8589934592 bytes Fixed Size 2034520 bytes Variable Size 1124074664 bytes Database Buffers 7398752256 bytes Redo Buffers 65073152 bytes Database mounted. Database opened. SQL> drop rollback segment "_SYSSMU70$"; Rollback segment dropped. SQL> alter database datafile 3 online; alter database datafile 3 online * ERROR at line 1: ORA-01113: file 3 needs media recovery ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf' SQL> recover datafile 3 ; Media recovery complete. SQL> alter database datafile 3 online; Database altered.
三、解决坏块问题
1、alert日志中坏块记录
Sat Dec 17 20:33:31 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 4571) ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf' Sat Dec 17 20:33:54 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28027.trc: Sat Dec 17 20:43:32 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 4571) ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
2、查询坏块对象
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A WHERE FILE_ID = &FILE_ID AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; 3 4 Enter value for file_id: 1 old 3: WHERE FILE_ID = &FILE_ID new 3: WHERE FILE_ID = 1 Enter value for block_id: 4571 old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 new 4: AND 4571 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER ------------------------------ SEGMENT_NAME -------------------------------------------------------------------------------- SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME ------------------ ------------------------------ ------------------------------ SYS SMON_SCN_TIME_SCN_IDX INDEX SYSTEM SQL> alter index SMON_SCN_TIME_SCN_IDX rebulid online; alter index SMON_SCN_TIME_SCN_IDX rebulid online * ERROR at line 1: ORA-02243: invalid ALTER INDEX or ALTER MATERIALIZED VIEW option SQL> select table_name from dba_indexes where index_name='SMON_SCN_TIME_SCN_IDX'; TABLE_NAME ------------------------------ SMON_SCN_TIME
3、解决坏块问题
SQL> truncate table SMON_SCN_TIME; truncate table SMON_SCN_TIME * ERROR at line 1: ORA-03292: Table to be truncated is part of a cluster SQL> truncate cluster SMON_SCN_TIME; truncate cluster SMON_SCN_TIME * ERROR at line 1: ORA-00943: cluster does not exist SQL> SELECT dbms_metadata.get_ddl('TABLE','SMON_SCN_TIME','SYS') FROM dual; DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS') -------------------------------------------------------------------------------- CREATE TABLE "SYS"."SMON_SCN_TIME" ( "THREAD" NUMBER, "TIME_MP" NUMBER, "TIME_DP" DATE, "SCN_WRP" NUMBER, "SCN_BAS" NUMBER, "NUM_MAPPINGS" NUMBER, "TIM_SCN_MAP" RAW(1200), "SCN" NUMBER DEFAULT 0, "ORIG_THREAD" NUMBER DEFAULT 0 /* for downgrade */ ) CLUSTER "SYS"."SMON_SCN_TO_TIME" ("THREAD") SQL> truncate cluster smon_scn_to_time; Cluster truncated. SQL> alter system flush buffer_cache; System altered.
四、解决AUTO_SPACE_ADVISOR_JOB引起bug
Sat Dec 17 21:00:38 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:41 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:44 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:47 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:50 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:54 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], [] Sat Dec 17 21:00:57 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], [] Sat Dec 17 21:01:00 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], [] Sat Dec 17 21:01:03 2011 Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc: ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
查看MOS发现(430223.1,785899.1)Segment advisor带来的buffer坏块,可以禁用AUTO_SPACE_ADVISOR_JOB并清空buffer来解决,最终解决办法,升级数据库
SQL> exec dbms_scheduler.disable('AUTO_SPACE_ADVISOR_JOB'); PL/SQL procedure successfully completed. SQL> alter system flush buffer_cache; System altered.
至此这次数据库sysaux数据文件异常恢复完全结束。再次提醒各位,数据库一定要做好备份和归档工作。
11.2.0.3中的SMON_SCN_TIME表的cluster表为
truncate 命令改为
truncate CLUSTER “SYS”.”SMON_SCN_TO_TIME_AUX”;
SMON_SCN_TIME是Oracle数据库的系统表,用以进行辅助恢复等功能。
SMON_SCN_TIME由smon来维护每5分钟被更新一次,根据数据库的版本不同保留的条数不同(9i中1440条,10g中最多14400条),当这个表中数据出现异常时,直接truncate掉该表对应的cluster,如果是index出现异常可以删除index,然后重建