标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,750)
- DB2 (22)
- MySQL (76)
- Oracle (1,595)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (162)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (585)
- Oracle安装升级 (96)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (84)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (30)
- SQL Server恢复 (11)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (38)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (21)
-
最近发表
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
- [MY-013183] [InnoDB] Assertion failure故障处理
- Oracle 19c 202504补丁(RUs+OJVM)-19.27
- Oracle Recovery Tools修复ORA-600 6101/kdxlin:psno out of range故障
- pdu完美支持金仓数据库恢复(KingbaseES)
- 虚拟机故障引起ORA-00310 ORA-00334故障处理
- pg创建gbk字符集库
- PostgreSQL运行日志管理
- ora-600 kdsgrp1 错误描述
- GAM、SGAM 或 PFS 页上存在页错误处理
- ORA-600 krhpfh_03-1208
- VMware勒索加密恢复(vmdk勒索恢复)
- ORA-39773: parse of metadata stream failed故障处理
- sql数据库备份失败—失败: 23(数据错误(循环冗余检查)
- vmdk文件被加密恢复(虚拟机文件加密)
- 差点被误操作的ORA-600 kcratr_nab_less_than_odr故障
- win平台19c 打patch遭遇2个小问题汇总
- pg单个数据库目录恢复-pdu恢复单个数据库目录数据
- pg删除数据恢复—pdu恢复pg delete数据
- .[OnlyBuy@cyberfear.com].REVRAC勒索mysql恢复
分类目录归档:数据库
近1万个数据文件的恢复case
朋友介绍一个恢复case,数据库发生过硬件故障,做过硬件恢复之后,数据库无法正常启动.我恢复的已经不是第一现场,客户和我反馈说找过三批人进行恢复,都没有正常打开数据库.数据库整体不大(1T左右),但是数据文件近1万个(9895个数据文件),我看了下alert日志,主要报错有:
ORA-600 kcratr_scan_lastbwr,该错误比较常见,一般是由于坏块或者redo和数据文件不匹配导致,在某些情况下recover下就可以解决,有些时候不行,看人品
Mon Feb 17 15:51:15 2025 Started redo scan Hex dump of (file 3, block 240) in trace file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10508.trc Reading datafile 'F:\ORACLEDATA\ORCL\UNDOTBS01.DBF' for corruption at rdba: 0x00c000f0 (file 3, block 240) Reread (file 3, block 240) found same corrupt data (logically corrupt) Write verification failed for File 3 Block 240 (rdba 0xc000f0) Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10508.trc (incident=293029): ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Incident details in: F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_293029\orcl_ora_10508_i293029.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Aborting crash recovery due to error 600 Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10508.trc: ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Mon Feb 17 15:51:22 2025 Sweep [inc2][293029]: completed Mon Feb 17 15:51:25 2025 Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10508.trc: ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
ORA-600 krr_parse_3错误,官方没有查询到资料,但是从报错的位置分析,应该和redo的应用有直接关系
Thu Feb 20 11:45:03 2025 ALTER DATABASE RECOVER datafile 116 Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 2 Seq 2084282 Reading mem 0 Mem# 0: F:\ORACLEDATA\ORCL\REDO02.LOG Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10840.trc (incident=321616): ORA-00600: 内部错误代码, 参数: [krr_parse_3], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Media Recovery failed with error 600 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 116 ... ALTER DATABASE RECOVER datafile 1168 Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 2 Seq 2084282 Reading mem 0 Mem# 0: F:\ORACLEDATA\ORCL\REDO02.LOG Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_10840.trc (incident=321617): ORA-00600: 内部错误代码, 参数: [krr_parse_3], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Media Recovery failed with error 600 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 1168 ...
上述两个错误,由于数据库部分文件被offline,而且屏蔽一致性打开等操作,绕过了上述的两个ORA-600错误,现在停留在ORA-00604 ORA-00376 ORA-01110故障导致数据库无法打开的情况,该错误是由于数据库启动过程中有事务,需要使用被offline的undo文件.
Fri Feb 21 07:42:37 2025 minact-scn: got error during useg scan e:376 usn:1 minact-scn: useg scan erroring out with error e:376 Fri Feb 21 07:44:02 2025 Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m007_11464.trc: ORA-51106: 由于出错, 检查无法完成。请查看下面的错误 ORA-48223: 已请求中断 - 提取已中止 - 返回代码 [12751] [HM_FINDING] Fri Feb 21 07:45:12 2025 Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_14108.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-00376: 此时无法读取文件 3 ORA-01110: 数据文件 3: 'F:\ORACLEDATA\ORCL\UNDOTBS01.DBF'
分析数据库文件状态,有25个数据文件被offline,而且这些文件的resetlogs信息均不对(截取了部分文件)
SQL> set lines 150 SQL> set numw 16 SQL> col CHECKPOINT_TIME for a40 SQL> set lines 150 SQL> set pages 1000 SQL> SELECT status, 2 to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#, 3 count(*) ROW_NUM 4 FROM v$datafile_header 5 GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy 6 ORDER BY status, checkpoint_change#, checkpoint_time; STATUS CHECKPOINT_TIME FUZZY CHECKPOINT_CHANGE# ROW_NUM -------------- ---------------------------------------- ------ ------------------ ---------------- OFFLINE 2025-02-11 15:27:00 YES 1909526545 22 OFFLINE 2025-02-17 17:24:14 YES 1909551234 2 OFFLINE 2025-02-17 17:27:35 NO 1909551234 1 ONLINE 2025-02-22 17:29:25 YES 2095190672 9869
对于这种情况,最简单的解决方法就是使用开发的小工具Oracle Recovery Tools(Oracle Recovery Tools工具一键解决ORA-00376 ORA-01110故障(文件offline)),对这些offline的文件头信息进行修改
对于这类缺少归档数据文件offline的故障Oracle Recovery Tools可以快速傻瓜式恢复
尝试直接open数据库
SQL> STARTUP MOUNT PFILE='D:/PFILE.TXT' ORACLE 例程已经启动。 Total System Global Area 82309009408 bytes Fixed Size 2290160 bytes Variable Size 12884905488 bytes Database Buffers 69256347648 bytes Redo Buffers 165466112 bytes 数据库装载完毕。 SQL> RECOVER DATAFILE 3; 完成介质恢复。 SQL> RECOVER datafile 6601,7043,7044,7045,7050, 7053,7054,7055,7056,7059,7060,7061,7062,7063,7064,7071,7072,7187 ,7188,7190,7191,7192,7244,9501 ; 完成介质恢复。 SQL> alter database datafile 3,6601,7043,7044,7045,7050, 7053,7054,7055,7056,7059,7060,7061,7062,7063,7064,7071,7072,7187 ,7188,7190,7191,7192,7244,9501 online; SQL> ALTER DATABASE OPEN; 数据库已更改。
Sat Feb 22 18:38:26 2025 alter database mount exclusive Sat Feb 22 18:38:26 2025 MMNL started with pid=25, OS id=7524 Successful mount of redo thread 1, with mount id 3367723362 Database mounted in Exclusive Mode Lost write protection disabled Completed: alter database mount exclusive alter database open Sat Feb 22 18:42:34 2025 Thread 1 opened at log sequence 5 Current log# 2 seq# 5 mem# 0: F:\ORACLEDATA\ORCL\REDO02.LOG Successful open of redo thread 1 Sat Feb 22 18:42:34 2025 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sat Feb 22 18:42:34 2025 SMON: enabling cache recovery [7960] Successfully onlined Undo Tablespace 12273. Undo initialization finished serial:0 start:98760972 end:98761612 diff:640 (6 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Sat Feb 22 18:42:41 2025 QMNC started with pid=29, OS id=8116 Sat Feb 22 18:42:45 2025 Completed: alter database open Sat Feb 22 18:42:47 2025 Starting background process CJQ0 Sat Feb 22 18:42:47 2025 CJQ0 started with pid=31, OS id=3264 Sat Feb 22 18:42:47 2025 db_recovery_file_dest_size of 4977 MB is 0.00% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup.
数据库已经open,后续收尾工作比较简单,不再累赘.
对于这类缺少归档数据文件offline的故障Oracle Recovery Tools可以快速傻瓜式恢复,还是比较方便的
软件下载:OraRecovery下载
使用说明:使用说明
不当使用_allow_resetlogs_corruption参数引起ORA-600 2662错误
有一个数据库由于机房掉电,导致数据库无法启动,由于不当恢复导致open过程报ORA-600 2662等错误客户无法自行解决,让我们协助.幸运的是客户对现场做了备份,我接手恢复之后,让客户还原现场,然后进行恢复,比较顺利的直接open数据库,实现数据0丢失原库直接可用,避免了一次因为不当操作而引起少量数据丢失的风险,和业务的快速恢复(避免的因为强制拉库引起的不一致性问题而要做数据库逻辑迁移).通过日志回顾第一次现场恢复经历
最初故障数据库mount报错
SQL> startup mount pfile='d:/pfile.txt' ORACLE 例程已经启动。 Total System Global Area 1071333376 bytes Fixed Size 1375792 bytes Variable Size 754975184 bytes Database Buffers 310378496 bytes Redo Buffers 4603904 bytes ORA-03113: 通信通道的文件结尾 进程 ID: 964 会话 ID: 1145 序列号: 1
无法mount,大部分情况是由于控制文件损坏,然后客户选择了重建controlfile
Thu Feb 20 10:20:45 2025 Successful mount of redo thread 1, with mount id 1721384698 Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS ARCHIVELOG MAXLOGFILES 32 MAXLOGMEMBERS 2 MAXDATAFILES 254 MAXINSTANCES 1 MAXLOGHISTORY 226 LOGFILE GROUP 1 'D:\app\Administrator\oradata\orcl\redo01.log' size 50M, GROUP 2 'D:\app\Administrator\oradata\orcl\redo02.log' size 50M, GROUP 3 'D:\app\Administrator\oradata\orcl\redo03.log' size 50M DATAFILE 'D:\app\Administrator\oradata\orcl\XFF.DBF', 'D:\app\Administrator\oradata\orcl\XIFENFEI.DBF', 'D:\app\Administrator\oradata\orcl\SYSAUX01.DBF', 'D:\app\Administrator\oradata\orcl\SYSTEM01.DBF', 'D:\app\Administrator\oradata\orcl\UNDOTBS01.DBF', 'D:\app\Administrator\oradata\orcl\USERS01.DBF' CHARACTER SET ZHS16GBK
重建ctl使用了resetlogs模式,然后下一步客户进行恢复使用命令为:ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE
Thu Feb 20 10:22:05 2025 ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE Media Recovery Start started logmerger process Thu Feb 20 10:22:05 2025 WARNING! Recovering data file 1 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 3 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 4 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 5 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 6 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. Parallel Media Recovery started with 4 slaves ORA-279 signalled during: ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE ... ALTER DATABASE RECOVER CONTINUE DEFAULT Media Recovery Log D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC Errors with log D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr00_3044.trc: ORA-00308: cannot open archived log 'D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC' ORA-27041: unable to open file OSD-04002: 无法打开文件 O/S-Error: (OS 2) 系统找不到指定的文件。 ORA-308 signalled during: ALTER DATABASE RECOVER CONTINUE DEFAULT ... ALTER DATABASE RECOVER CONTINUE DEFAULT Media Recovery Log D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC Errors with log D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr00_3044.trc: ORA-00308: cannot open archived log 'D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_02_20\O1_MF_1_9087_%U_.ARC' ORA-27041: unable to open file OSD-04002: 无法打开文件 O/S-Error: (OS 2) 系统找不到指定的文件。 ORA-308 signalled during: ALTER DATABASE RECOVER CONTINUE DEFAULT ... ALTER DATABASE RECOVER CANCEL Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr00_3044.trc: ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF' ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...
直接提示需要找归档日志seq为9087的,但是由于该库为非归档模式,客户直接输入了auto,无法找到对应的日志.然后尝试直接resetlogs打开库
Thu Feb 20 10:22:58 2025 ALTER DATABASE OPEN RESETLOGS ORA-1194 signalled during: ALTER DATABASE OPEN RESETLOGS... ALTER DATABASE OPEN Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_4836.trc: ORA-01589: ??????????? RESETLOGS ? NORESETLOGS ?? ORA-1589 signalled during: ALTER DATABASE OPEN... ALTER DATABASE OPEN RESETLOGS ORA-1194 signalled during: ALTER DATABASE OPEN RESETLOGS...
由于数据文件不一致(前面recover没有成功),导致直接reseltogs方式打开库失败,然后设置一些参数
Thu Feb 20 10:27:24 2025 ALTER SYSTEM SET _allow_error_simulation=TRUE SCOPE=SPFILE; ALTER SYSTEM SET _allow_terminal_recovery_corruption=TRUE SCOPE=SPFILE; Thu Feb 20 10:27:38 2025 ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;
重启库之后强制拉库
Completed: alter database mount Thu Feb 20 10:29:08 2025 alter database open resetlogs RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. Thu Feb 20 10:29:19 2025 Archived Log entry 1 added for thread 1 sequence 9088 ID 0x5d904d0a dest 1: Archived Log entry 2 added for thread 1 sequence 9086 ID 0x5d904d0a dest 1: Thu Feb 20 10:29:31 2025 Archived Log entry 3 added for thread 1 sequence 9087 ID 0x5d904d0a dest 1: RESETLOGS after incomplete recovery UNTIL CHANGE 223770120 Thu Feb 20 10:29:38 2025 Setting recovery target incarnation to 2 Thu Feb 20 10:29:39 2025 Assigning activation ID 1721340650 (0x669992ea) LGWR: STARTING ARCH PROCESSES Thu Feb 20 10:29:39 2025 ARC0 started with pid=20, OS id=2924 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thu Feb 20 10:29:40 2025 ARC1 started with pid=21, OS id=1832 Thu Feb 20 10:29:40 2025 ARC2 started with pid=22, OS id=3668 ARC1: Archival started Thu Feb 20 10:29:40 2025 ARC3 started with pid=23, OS id=5104 ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG Successful open of redo thread 1 Thu Feb 20 10:29:41 2025 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Thu Feb 20 10:29:41 2025 SMON: enabling cache recovery ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_1652.trc (incident=1006385): ORA-00600: ??????, ??: [2662], [0], [223770128], [0], [223811777], [12583040], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_1006385\orcl_ora_1652_i1006385.trc Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_1652.trc: ORA-00600: ??????, ??: [2662], [0], [223770128], [0], [223811777], [12583040], [], [], [], [], [], [] Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_1652.trc: ORA-00600: ??????, ??: [2662], [0], [223770128], [0], [223811777], [12583040], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 1652): terminating the instance due to error 600 Instance terminated by USER, pid = 1652
数据库报ORA-600 2662错误,导致强制拉库失败.
后面客户进行了一系列折腾,导致出现其他错误,比如:
ORA-01595/ORA-600 4194
Fri Feb 21 04:52:19 2025 Trace dumping is performing id=[cdmp_20250221045219] Doing block recovery for file 3 block 472 Resuming block recovery (PMON) for file 3 block 472 Block recovery from logseq 2, block 51 to scn 223837038 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG Block recovery stopped at EOT rba 2.54.16 Block recovery completed at rba 2.54.16, scn 0.223837035 Doing block recovery for file 3 block 144 Resuming block recovery (PMON) for file 3 block 144 Block recovery from logseq 2, block 51 to scn 223837033 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG Block recovery completed at rba 2.52.16, scn 0.223837034 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_2124.trc: ORA-01595: error freeing extent (3) of rollback segment (2)) ORA-00600: internal error code, arguments: [4194], [], [
ORA-600 2256/ORA-600 4194
Fri Feb 21 05:02:43 2025 SMON: enabling cache recovery Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2184.trc (incident=1134285): ORA-00600: 内部错误代码, 参数: [2256], [0], [1073741824], [0], [1073761870], [], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_1134285\orcl_ora_2184_i1134285.trc Successfully onlined Undo Tablespace 2. Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery ********************************************************************* WARNING: The following temporary tablespaces contain no files. This condition can occur when a backup controlfile has been restored. It may be necessary to add files to these tablespaces. That can be done using the SQL statement: ALTER TABLESPACE <tablespace_name> ADD TEMPFILE Alternatively, if these temporary tablespaces are no longer needed, then they can be dropped. Empty temporary tablespace: TEMP ********************************************************************* Database Characterset is ZHS16GBK No Resource Manager plan active Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_648.trc (incident=1134237): ORA-00600: internal error code, arguments: [4194], [], [ Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_1134237\orcl_smon_648_i1134237.trc Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2184.trc (incident=1134286): ORA-00600: 内部错误代码, 参数: [4194], [0], [ Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_1134286\orcl_ora_2184_i1134286.trc
ORA-600 3712
Fri Feb 21 05:24:51 2025 Assigning activation ID 1721440698 (0x669b19ba) Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgwr_3620.trc (incident=1150206): ORA-00600: internal error code, arguments: [3712], [1], [1], [3584], [3], [3584], [1], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_1150206\orcl_lgwr_3620_i1150206.trc Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgwr_3620.trc: ORA-00600: internal error code, arguments: [3712], [1], [1], [3584], [3], [3584], [1], [], [], [], [], [] LGWR (ospid: 3620): terminating the instance due to error 470
基于客户本身库不大,而且在操作之前备份了现场,然后客户还原故障现场备份,我开始接手恢复
重建控制文件
Fri Feb 21 21:57:19 2025 Successful mount of redo thread 1, with mount id 1721495486 Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS NOARCHIVELOG MAXLOGFILES 50 MAXLOGMEMBERS 5 MAXDATAFILES 5000 MAXINSTANCES 8 MAXLOGHISTORY 2920 LOGFILE group 1 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG' size 50M, group 3 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG' size 50M, group 2 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG' size 50M DATAFILE 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF', 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF', 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF', 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS01.DBF', 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\XIFENFEI.DBF', 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\XFF.DBF' CHARACTER SET ZHS16GBK
这里重建控制文件使用的是noresetlogs模式,然后尝试recover database
ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Parallel Media Recovery started with 4 slaves Fri Feb 21 21:57:25 2025 Recovery of Online Redo Log: Thread 1 Group 3 Seq 9087 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG Recovery of Online Redo Log: Thread 1 Group 1 Seq 9088 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG Fri Feb 21 21:57:30 2025 Completed: ALTER DATABASE RECOVER database
数据库直接recover成功,然后尝试正常open库
Fri Feb 21 21:58:17 2025 alter database open Beginning crash recovery of 1 threads parallel recovery started with 3 processes Started redo scan Completed redo scan read 12019 KB redo, 0 data blocks need recovery Started redo application at Thread 1: logseq 9088, block 2, scn 223805802 Recovery of Online Redo Log: Thread 1 Group 1 Seq 9088 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG Completed redo application of 0.00MB Completed crash recovery at Thread 1: logseq 9088, block 24040, scn 223836931 0 data blocks read, 0 data blocks written, 12019 redo k-bytes read Fri Feb 21 21:58:17 2025 Thread 1 advanced to log sequence 9089 (thread open) Thread 1 opened at log sequence 9089 Current log# 2 seq# 9089 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri Feb 21 21:58:17 2025 SMON: enabling cache recovery Dictionary check beginning Tablespace 'TEMP' #3 found in data dictionary, but not in the controlfile. Adding to controlfile. Dictionary check complete Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery ********************************************************************* WARNING: The following temporary tablespaces contain no files. This condition can occur when a backup controlfile has been restored. It may be necessary to add files to these tablespaces. That can be done using the SQL statement: ALTER TABLESPACE <tablespace_name> ADD TEMPFILE Alternatively, if these temporary tablespaces are no longer needed, then they can be dropped. Empty temporary tablespace: TEMP ********************************************************************* Database Characterset is ZHS16GBK No Resource Manager plan active ********************************************************** WARNING: Files may exists in db_recovery_file_dest that are not known to the database. Use the RMAN command CATALOG RECOVERY AREA to re-catalog any such files. If files cannot be cataloged, then manually delete them using OS command. One of the following events caused this: 1. A backup controlfile was restored. 2. A standby controlfile was restored. 3. The controlfile was re-created. 4. db_recovery_file_dest had previously been enabled and then disabled. ********************************************************** replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Fri Feb 21 21:58:20 2025 QMNC started with pid=22, OS id=524 LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Completed: alter database open
数据库完美打开,然后增加tempfile,检查/导出数据均没有任何问题,业务可以直接使用本库,不用逻辑迁移
其实这是一个比较小的故障,由于断电导致控制文件损坏,然后客户重建控制文件使用resetlogs方式,然后recover没有正确指定redo,来完成数据库实例恢复,直接使用_allow_resetlogs_corruption参数强制拉库,然后出现了ORA-600 2662/ORA-600 4194/ORA-600 2256等大家熟悉的错误.这个库不大,也做了故障现场保护,如果是一个10T以上大库,发生这样的故障,后果还是比较麻烦的(库可以打开,但是redo中的数据肯定丢失导致数据库不一致,后续可能有很多不一致性问题,可能涉及逻辑迁移),增加企业业务不可用时间成本和业务数据丢失风险,再次提醒对于Oracle隐含参数,还是需要慎重,做好专业的评估再使用.
CSSD signal 11 in thread clssnmRcfgMgrThread故障处理
一个客户,集群无法启动,只能启动到如下状态
查看cssd日志有CSSD signal 11 in thread clssnmRcfgMgrThread报错
2025-02-21 18:21:25.500: [ CSSD][2788693760]clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state 2025-02-21 18:21:25.500: [ CSSD][2788693760]clssnmDoSyncUpdate: Wait for 0 vote ack(s) 2025-02-21 18:21:25.500: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:25.700: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:25.901: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:25.995: [ CSSD][2801538816]clssnmvDiskPing: Writing with status 0x2, timestamp 1740133285/5870104 2025-02-21 18:21:25.997: [ CSSD][2799818496]clssnmvDiskKillCheck: not evicted, file /dev/dm-4 flags 0x00000000, kill block unique 0, my unique 1740133265 2025-02-21 18:21:26.000: [ CSSD][2793424640]clssgmWaitOnEventValue: after CmInfo State val 3, eval 2 waited 500 2025-02-21 18:21:26.101: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:26.302: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:26.497: [ CSSD][2801538816]clssnmvDiskPing: Writing with status 0x2, timestamp 1740133286/5870604 2025-02-21 18:21:26.502: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:26.702: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:26.902: [ CSSD][2788693760]clssnmDoSyncUpdate: waiting to update states on disk 2025-02-21 18:21:26.997: [ CSSD][2799818496]clssnmvDiskKillCheck: not evicted, file /dev/dm-4 flags 0x00000000, kill block unique 0, my unique 1740133265 2025-02-21 18:21:26.997: [ CSSD][2801538816]clssnmvDiskPing: Writing with status 0x2, timestamp 1740133286/5871114 2025-02-21 18:21:27.000: [ CSSD][2793424640]clssgmWaitOnEventValue: after CmInfo State val 3, eval 2 waited 0 2025-02-21 18:21:27.102: [ CSSD][2788693760]clssnmCheckDskInfo: Checking disk info... 2025-02-21 18:21:27.102: [ CSSD][2788693760]clssnmCheckDskInfo: diskTimeout set to (200000)ms 2025-02-21 18:21:27.103: [ CSSD][2788693760]################################### 2025-02-21 18:21:27.103: [ CSSD][2788693760]clssscExit: CSSD signal 11 in thread clssnmRcfgMgrThread 2025-02-21 18:21:27.103: [ CSSD][2788693760]################################### 2025-02-21 18:21:27.103: [ CSSD][2788693760](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally 2025-02-21 18:21:27.103: [ CSSD][2788693760] ----- Call Stack Trace ----- 2025-02-21 18:21:27.103: [ CSSD][2788693760]calling call entry argument values in hex 2025-02-21 18:21:27.103: [ CSSD][2788693760]location type point (? means dubious value) 2025-02-21 18:21:27.103: [ CSSD][2788693760]-------------------- -------- -------------------- ---------------------------- 2025-02-21 18:21:27.109: [ CSSD][2788693760]clssscExit()+745 call kgdsdst() 000000000 ? 000000000 ? 2025-02-21 18:21:27.109: [ CSSD][2788693760] 7F9EA637A650 ? 7F9EA637A728 ? 2025-02-21 18:21:27.109: [ CSSD][2788693760] 7F9EA637F1D0 ? 000000003 ? 2025-02-21 18:21:27.109: [ CSSD][2788693760]s0clsssc_sighandler call clssscExit() 001FB9FA0 ? 000000002 ? 2025-02-21 18:21:27.109: [ CSSD][2788693760]()+616 7F9EA637A650 ? 7F9EA637A728 ? 2025-02-21 18:21:27.109: [ CSSD][2788693760] 7F9EA637F1D0 ? 000000003 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]__sighandler() call s0clsssc_sighandler 00000000B ? 000000002 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] () 7F9EA637A650 ? 7F9EA637A728 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 7F9EA637F1D0 ? 000000003 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssnmCheckSplit()+ signal __sighandler() 001BEE8A8 ? 000000000 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]378 002039A80 ? 000000001 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 0004D2B40 ? 7F9EA63803C0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssnmCheckDskInfo( call clssnmCheckSplit() 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760])+387 000030D40 ? 000000001 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 0004D2B40 ? 7F9EA63803C0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssnmDoSyncUpdate( call clssnmCheckDskInfo( 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760])+4692 ) 000000001 ? 000000001 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 0004D2B40 ? 7F9EA63803C0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssnmLocalJoinEven call clssnmDoSyncUpdate( 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]t()+3992 ) FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssnmRcfgMgrThread call clssnmLocalJoinEven 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]()+2290 t() FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]clssscthrdmain()+25 call clssnmRcfgMgrThread 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760]3 () FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.110: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760]start_thread()+209 call clssscthrdmain() 001FB9FA0 ? 001DC83F0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760]clone()+109 call start_thread() 7F9EA6381700 ? 001DC83F0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760]0000000000000000 call clone() 7F9EA6381700 ? 001DC83F0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] FFFFFFFFFFFFFFFF ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 000000001 ? 7F9EA6380D20 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 7F9EA63803C0 ? 2025-02-21 18:21:27.111: [ CSSD][2788693760] 2025-02-21 18:21:27.111: [ CSSD][2788693760]--------------------- Binary Stack Dump ---------------------
这里提示表决盘超时,尝试启动nocrs貌似,在表决盘存在的情况下,启动依旧失败,通过处理让启动过程不读表决盘,启动nocrs模式成功,并mount其他业务磁盘组
确认其他磁盘没有问题,重建crs磁盘组
SQL> create diskgroup OCR external redundancy disk '/dev/dm-4' force attribute 'COMPATIBLE.ASM' = '11.2.0'; # ocrconfig -restore /u01/app/11.2.0.3/grid/cdata/scan/backup00.ocr # crsctl replace votedisk +OCR SQL> create spfile from pfile='/tmp/pfile.asm';
然后重启crs恢复正常