标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,671)
- DB2 (22)
- MySQL (73)
- Oracle (1,533)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (92)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- Kylin Linux 安装19c
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
标签归档:ORA-01555
使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天
报ORA-00020错误,实例crash
数据库因为超过了系统的进程数,出现dbwn进程写数据文件异常
Sun Aug 25 16:00:41 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc: ORA-01148: 无法刷新数据文件 22 的文件大小 ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf' ORA-00020: 超出最大进程数 () Sun Aug 25 16:00:41 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc: ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式 ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf' Sun Aug 25 16:00:41 CST 2013 DBW0: terminating instance due to error 1242 Termination issued to instance processes. Waiting for the processes to exit Sun Aug 25 16:00:51 CST 2013 Instance termination failed to kill one or more processes Instance terminated by DBW0, pid = 7490
ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:19:23 CST 2013 ALTER DATABASE OPEN Sun Aug 25 19:19:38 CST 2013 Beginning crash recovery of 1 threads parallel recovery started with 16 processes Sun Aug 25 19:19:40 CST 2013 Started redo scan Sun Aug 25 19:20:07 CST 2013 Completed redo scan 12016413 redo blocks read, 93405 data blocks need recovery Sun Aug 25 19:20:19 CST 2013 Started redo application at Thread 1: logseq 53681, block 1091966 Sun Aug 25 19:20:19 CST 2013 Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0 Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Sun Aug 25 19:20:21 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:23 CST 2013 Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:23 CST 2013 Aborting crash recovery due to slave death, attempting serial crash recovery Sun Aug 25 19:20:23 CST 2013 Beginning crash recovery of 1 threads Sun Aug 25 19:20:23 CST 2013 Started redo scan Sun Aug 25 19:20:47 CST 2013 Completed redo scan 12016413 redo blocks read, 93405 data blocks need recovery Sun Aug 25 19:20:54 CST 2013 Started redo application at Thread 1: logseq 53681, block 1091966 Sun Aug 25 19:20:54 CST 2013 Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0 Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Sun Aug 25 19:20:54 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] Sun Aug 25 19:20:56 CST 2013 Aborting crash recovery due to error 600 Sun Aug 25 19:20:56 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc: ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], [] ORA-600 signalled during: ALTER DATABASE OPEN...
使用隐含参数
ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;
报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复,因此这里加入隐含参数,然后尝试resetlogs,然后报如下错误
Sun Aug 25 20:11:54 CST 2013 alter database open resetlogs Sun Aug 25 20:12:10 CST 2013 RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189 Resetting resetlogs activation ID 1312390734 (0x4e397e4e) Sun Aug 25 20:16:25 CST 2013 Setting recovery target incarnation to 2 Sun Aug 25 20:16:42 CST 2013 ************************************************************ Warning: The SCN headroom for this database is only 54 days! ************************************************************ Sun Aug 25 20:16:43 CST 2013 Assigning activation ID 1352200163 (0x5098efe3) Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log Successful open of redo thread 1 Sun Aug 25 20:16:43 CST 2013 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sun Aug 25 20:16:52 CST 2013 SMON: enabling cache recovery Sun Aug 25 20:16:52 CST 2013 ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b): Sun Aug 25 20:16:52 CST 2013 select ctime, mtime, stime from obj$ where obj# = :1 Sun Aug 25 20:16:52 CST 2013 Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Termination issued to instance processes. Waiting for the processes to exit Sun Aug 25 20:17:02 CST 2013 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 2859 ORA-1092 signalled during: alter database open resetlogs...
数据库当前SCN
SQL > select CHECKPOINT_CHANGE# from v$database; CHECKPOINT_CHANGE# ------------------ 13429649947222 SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header; CHECKPOINT_CHANGE# ------------------ 13429649947222
解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke
sqlplus / as sysdba startup mount oradebug setmypid oradebug DUMPvar SGA kcsgscn_ oradebug poke recover database; alter database open;
事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)
--故障原因 The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10]. Due to this bug, during recovery, the class designation of a data block has changed. --处理方法 SQL>startup mount SQL>recover database; SQL>alter database open;
因为MOS上给的解决思路在该数据库中已经无法尝试,不能确定该方法一定可行,但是对于本次的恢复过程中,没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS,尝试该种方法,慎重使用_allow_resetlogs_corruption参数
Query Duration=0与ORA-01555
1.ALERT日志错误
奇怪之处:Query Duration=0 sec,竟然出现了ORA-01555
Tue Feb 7 02:41:34 2012 ORA-01555 caused by SQL statement below (Query Duration=0 sec, SCN: 0x0b2e.efcd78a9): Tue Feb 7 02:41:34 2012 SELECT "ID_NO","CUST_ID" FROM "DBACCADM"."DCUSTMSG" "C" WHERE "ID_NO"=:1
2.ORA-01555解释
超过了undo_retention时间,undo被覆盖导致ORA-01555
[zwq_acc1:/home/oraeye/check]oerr ora 1555 01555, 00000, "snapshot too old: rollback segment number %s with name \"%s\" too small" // *Cause: rollback records needed by a reader for consistent read are // overwritten by other writers // *Action: If in Automatic Undo Management mode, increase undo_retention // setting. Otherwise, use larger rollback segments
3.数据库版本
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production
4.undo基本信息
从这里可以发现,两个节点的undo表空间还有很多剩余,缺发生了undo被覆盖从而出现了ORA-01555
SQL> col name for a20 SQL> col value for a15 SQL> SELECT INST_ID, NAME, VALUE 2 FROM GV$PARAMETER 3 WHERE UPPER (Name) LIKE '%UNDO%'; INST_ID NAME VALUE ---------- -------------------- --------------- 1 undo_management AUTO 1 undo_tablespace UNDOTBS1 1 undo_suppress_errors FALSE 1 undo_retention 1800 2 undo_management AUTO 2 undo_tablespace UNDOTBS2 2 undo_suppress_errors FALSE 2 undo_retention 1800 8 rows selected. TABLESPACE_NAME CURRENT_TOTAL(MB) USED(MB) FREE(MB) FREE% AUT MAX_TOTAL(MB) ------------------------------ ----------------- ---------- ---------- ---------- --- ------------- UNDOTBS1 40950 1587.94 39362.0625 96.12 NO 40950 UNDOTBS2 57330 1926.31 55403.6875 96.64 NO 57330 SQL> SELECT DISTINCT STATUS , 2 COUNT(*) "EXTENT_NUM", 3 SUM(BYTES) / 1024 / 1024 / 1024 "UNDO(G)" 4 FROM DBA_UNDO_EXTENTS 5 GROUP BY STATUS; STATUS EXTENT_NUM UNDO(G) --------- ---------- ---------- ACTIVE 208 .273658752 EXPIRED 7651 2.42865753 UNEXPIRED 941 .752548218
查询MOS[ID 761128.1],发现可能是Oracle bug导致(BUG:6799685 – ORA-1555 ERROR WITH QUERY DURATION=0 AND UNDO_RETENTION=1800和BUG:5475085 – V$UNDOSTAT.EXPBLKREUCNT IS NEVER INCREMENTED)
5.解决方法
Increase the size of the UNDO tablespace and increase the UNDO_RETENTION parameter value to try to prevent required undo expiring too quickly.
基于本库,因为undo空间还有很大剩余,直接设置UNDO_RETENTION=3600即可(可以从一定程度上缓解整个问题,但是要从根本上解决整个问题,需要升级到10.2.0.4及其以上版本)
ORA-01555 caused by SQL statement below
一.发现ORA-01555
Mon Dec 26 10:08:22 2011 ORA-01555 caused by SQL statement below (Query Duration=49146 sec, SCN: 0x0b4b.17f5ae42): Mon Dec 26 10:08:22 2011 SELECT COMPANY_ID, COMPANY_MOBILE, TO_CHAR(NVL(REG_DATE, SYSDATE - 100), 'yyyymmddhh24miss'), BAK_FIELD2 FROM TAB_XN_COMPANY WHERE (COMPANY_STATUS = 1 OR (COMPANY_STATUS = 3 AND NVL(UNREG_DATE, SYSDATE + 100) >= TO_DATE('20111226094500', 'yyyymmddhh24miss'))) AND NVL(REG_DATE, SYSDATE - 100) <= TO_DATE('20111226095959', 'yyyymmddhh24miss') AND PAY_TYPE > 0
二.数据库状态
[oracle@ora02 ~]$ sqlplus "/ as sysdba" SQL*Plus: Release 9.2.0.4.0 - Production on Wed Jan 4 10:48:17 2012 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.4.0 - Production SQL> show parameter undo; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ undo_management string AUTO undo_retention integer 10800 undo_suppress_errors boolean FALSE undo_tablespace string UNDOTBS1 SQL> select sum(maxbytes)/1024/1024/1024, 2 SUM(USER_BYTES)/1024/1024/1024 FROM dba_data_files where tablespace_NAME='UNDOTBS1'; SUM(MAXBYTES)/1024/1024/1024 SUM(USER_BYTES)/1024/1024/1024 ---------------------------- ------------------------------ 61.9999847 32.6834106 SQL> SELECT DISTINCT STATUS "状态", 2 COUNT(*) "EXTENT数量", 3 SUM(BYTES) / 1024 / 1024 / 1024 "UNDO大小" 4 FROM DBA_UNDO_EXTENTS 5 GROUP BY STATUS; 状态 EXTENT数量 UNDO大小 --------- ---------- ---------- ACTIVE 1 .000976563 EXPIRED 2549 31.2333298 UNEXPIRED 3 .000175476
通过undo_retention保留时间为10800秒,而该sql执行了49146秒,在这49146秒钟,TAB_XN_COMPANY表中的数据被修改,而且被修改的undo数据在10800秒后被覆盖导致,导致原查询语句不能获取到scn小于或者等于查询时候的数据块内容(在undo中),所以出现ORA-01555。从这里也可以看出来,在undo空间还剩余的情况下,如果超过了undo_retention限制,undo内容还是有可能被覆盖,而不是使用未使用的undo
三.出现ORA-1555原因
The ORA-1555 errors can happen when a query is unable to access enough undo to build a copy of the data at the time the query started. Committed “versions” of blocks are maintained along with newer uncommitted “versions” of those blocks so that queries can access data as it existed in the database at the time of the query. These are referred to as “consistent read” blocks and are maintained using Oracle undo management.
就是一个查询要访问某个数据块,而这个数据块在这个查询执行过程中修改过,那么该查询需要查询undo中数据块,而undo中该数据块已经不存在,从而出现ORA-1555
四.ORA-1555解决方法
Case 1 – Rollback Overwritten
1.缩短sql运行时间
2.增加undo_retention,这个同时需要考虑undo空间大小
3.减少commit(rollback)次数
4.在一条sql中尽量使数据块访问一次
4.1)Using a full table scan rather than an index lookup
4.2)Introducing a dummy sort so that we retrieve all the data, sort it and then sequentially visit these data blocks.
Case 2 – Rollback Transaction Slot Overwritten
这种问题,主要是延迟块清理导致,一般建议在进行大批量的dml操作后,使用全表(全index)扫描执行一遍,或者收集全部统计信息