标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,671)
- DB2 (22)
- MySQL (73)
- Oracle (1,533)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (92)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- Kylin Linux 安装19c
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
分类目录归档:数据库
硬件故障导致ORA-01242 ORA-01122等错误
客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash
Mon Aug 19 20:48:02 2024 Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file CKPT (ospid: 75582): terminating the instance due to error 1242 Mon Aug 19 20:48:02 2024 System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc Termination issued to instance processes. Waiting for the processes to exit Mon Aug 19 20:48:13 2024 ORA-1092 : opitsk aborting process
继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功
ALTER DATABASE OPEN /* db agent *//* {0:6:39} */ Mon Aug 19 20:49:34 2024 SUCCESS: diskgroup DATA was mounted Mon Aug 19 20:49:34 2024 NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established Mon Aug 19 20:50:41 2024 Picked broadcast on commit scheme to generate SCNs Mon Aug 19 20:50:42 2024 Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc: ORA-01186: file 1399 failed verification tests ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file File 1399 not verified due to error ORA-01122
这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错
Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap] Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap] Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed
客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
200T 数据库非归档无备份恢复
一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作
SQL> @dbms_diskgroup_get_block.sql +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header Parameter 1: ASM_file_name (required) Parameter 2: block_to_extract (required) Parameter 3 number_of_blocks_to_extract (required) Parameter 4: FileSystem_File_Name (required) old 14: v_AsmFilename := '&ASM_File_Name'; new 14: v_AsmFilename := '+DATA/xifenfei.dbf'; old 15: v_offstart := '&block_to_extract'; new 15: v_offstart := '1'; old 16: v_numblks := '&number_of_blocks_to_extract'; new 16: v_numblks := '1'; old 17: v_FsFilename := '&FileSystem_File_Name'; new 17: v_FsFilename := '/tmp/xff/xifenfei.dbf.header'; File: +DATA/xifenfei.dbf Type: 2 Data File Size (in logical blocks): 3978880 Logical Block Size: 16384 Physical Block Size: 512 PL/SQL procedure successfully completed.
然后通过bbed修改相关scn
BBED> set filename 'xifenfei.dbf.header' FILENAME xifenfei.dbf.header BBED> set blocksize 16384 BLOCKSIZE 16384 BBED> map File: xifenfei.dbf.header (0) Block: 1 Dba:0x00000000 ------------------------------------------------------------ Data File Header struct kcvfh, 860 bytes @0 ub4 tailchk @16380 BBED> p kcvfh.kcvfhckp.kcvcpscn struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8061324 ub2 kscnwrp @488 0x0081 BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn; struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8133e2b ub2 kscnwrp @488 0x0081
然后把修改的数据文件头写回到asm中
SQL> @dbms_diskgroup_cp_block_to_asm.sql /tmp/xff/xifenfei.dbf.header +DATA/xifenfei.dbf 1 1 Parameter 1: v_FsFileName (required) Parameter 2: v_AsmFileName (required) Parameter 3 v_offstart (required) Parameter 4 v_numblks (required) old 16: v_FsFileName := '&v_FsFileName'; new 16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header'; old 17: v_AsmFileName := '&v_AsmFileName'; new 17: v_AsmFileName := '+DATA/xifenfei.dbf'; old 18: v_offstart := '&v_offstart'; new 18: v_offstart := '1'; old 19: v_numblks := '&v_numblks'; new 19: v_numblks := '1'; File: +DATA/xifenfei.dbf Type: 2 Data File Size (in logical blocks): 3978880 Logical Block Size: 16384 PL/SQL procedure successfully completed.
查询文件头是否修改成功
[oracle@xff1 xff]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> set numw 16 SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295); CHECKPOINT_CHANGE# ------------------ 556870614571 556870614571 SQL> recover datafile 295; Media recovery complete.
通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库
SQL> recover database; Media recovery complete. SQL> alter database open; Database altered.
alert日志提示
Sat Aug 10 16:46:11 2024 ALTER DATABASE RECOVER datafile 295 Media Recovery Start Serial Media Recovery started WARNING! Recovering data file 295 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. Media Recovery Complete (xff1) Completed: ALTER DATABASE RECOVER datafile 295 Sat Aug 10 16:46:39 2024 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Sat Aug 10 16:46:51 2024 WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. Sat Aug 10 16:46:54 2024 Parallel Media Recovery started with 64 slaves Media Recovery Complete (xff1) Completed: ALTER DATABASE RECOVER database Sat Aug 10 17:19:58 2024 alter database open This instance was first to open Sat Aug 10 17:19:58 2024 SUCCESS: diskgroup DATA was mounted Sat Aug 10 17:19:58 2024 NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established Sat Aug 10 17:20:10 2024 Picked broadcast on commit scheme to generate SCNs Sat Aug 10 17:20:10 2024 SUCCESS: diskgroup REDO was mounted Sat Aug 10 17:20:10 2024 NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established Thread 1 opened at log sequence 124958 Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sat Aug 10 17:20:14 2024 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads [33770] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK Sat Aug 10 17:20:16 2024 minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 Starting background process GTX0 Sat Aug 10 17:20:16 2024 GTX0 started with pid=45, OS id=34119 Starting background process RCBG Sat Aug 10 17:20:16 2024 RCBG started with pid=46, OS id=34121 replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Sat Aug 10 17:20:16 2024 QMNC started with pid=47, OS id=34134 Starting background process SMCO Completed: alter database open
检查数据字典一致性
SQL> @hcheck.sql HCheck Version 07MAY18 on 10-AUG-2024 18:24:49 ---------------------------------------------- Catalog Version 11.2.0.3.0 (1102000300) db_name: XFF Catalog Fixed Procedure Name Version Vs Release Timestamp Result ------------------------------ ... ---------- -- ---------- -------------- ------ .- LobNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- MissingOIDOnObjCol ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- SourceNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- OversizedFiles ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- PoorDefaultStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- PoorStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- TabPartCountMismatch ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- OrphanedTabComPart ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- MissingSum$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- MissingDir$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- DuplicateDataobj ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- ObjSynMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- ObjSeqMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedUndo ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndex ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndexPartition ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndexSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTable ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTablePartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTableSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- MissingPartCol ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedSeg$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedIndPartObj# ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- DuplicateBlockUse ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- FetUet ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- Uet0Check ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- SeglessUET ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- BadInd$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- BadTab$ ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadIcolDepCnt ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjIndDobj ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- TrgAfterUpgrade ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjType0 ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadOwner ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- StmtAuditOnCommit ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadPublicObjects ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadSegFreelist ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadDepends ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- CheckDual ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjectNames ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadCboHiLo ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- ChkIotTs ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- NoSegmentIndex ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- BadNextObject ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- DroppedROTS ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- FilBlkZero ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- DbmsSchemaCopy ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- OrphanedObjError ... 1102000300 > 1102000000 08/10 18:24:54 PASS .- ObjNotLob ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- MaxControlfSeq ... 1102000300 <= *All Rel* 08/10 18:24:55 PASS .- SegNotInDeferredStg ... 1102000300 > 1102000000 08/10 18:25:18 PASS .- SystemNotRfile1 ... 1102000300 > 902000000 08/10 18:25:18 PASS .- DictOwnNonDefaultSYSTEM ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS .- OrphanTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS .- ObjNotTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS --------------------------------------- 10-AUG-2024 18:25:18 Elapsed: 29 secs --------------------------------------- Found 0 potential problem(s) and 0 warning(s) PL/SQL procedure successfully completed. Statement processed. Complete output is in trace file: /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc
运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)
利用flashback快速恢复failover 的备库
客户数据库架构为单机+dataguard,一台生产库跑在物理机,备库跑在虚拟化环境中(当时由于成本原因使用了机械盘),今天物理机突然直接罢工,客户要求紧急切换备库
Thu Aug 08 09:52:13 2024 Media Recovery Waiting for thread 1 sequence 189448 (in transit) Recovery of Online Redo Log: Thread 1 Group 12 Seq 189448 Reading mem 0 Mem# 0: /oradata/xff/std_redo12.log Thu Aug 08 09:52:13 2024 Archived Log entry 187514 added for thread 1 sequence 189447 ID 0x2e6bc37f dest 1: Thu Aug 08 10:54:40 2024 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force Terminal Recovery: Stopping real time apply Thu Aug 08 10:54:40 2024 MRP0: Background Media Recovery cancelled with status 16037 Errors in file /u01/app/oracle/diag/rdbms/xffdg/xff/trace/xff_pr00_17876.trc: ORA-16037: user requested cancel of managed recovery operation Managed Standby Recovery not using Real Time Apply Recovery interrupted! Recovered data files to a consistent state at change 34188310512 Thu Aug 08 10:54:43 2024 MRP0: Background Media Recovery process shutdown (xff) Terminal Recovery: Stopped real time apply Thu Aug 08 10:55:14 2024 Stopping background process MMNL Stopping background process MMON Thu Aug 08 10:55:46 2024 Background process MMON not dead after 30 seconds Killing background process MMON All dispatchers and shared servers shutdown CLOSE: killing server sessions. Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)' Active process 15077 user 'oracle' program 'oracle@xffDG' Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)' Active process 11536 user 'oracle' program 'oracle@xffDG (M000)' Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)' Active process 15077 user 'oracle' program 'oracle@xffDG' Active process 11536 user 'oracle' program 'oracle@xffDG (M000)' Active process 11536 user 'oracle' program 'oracle@xffDG (M000)' Active process 11536 user 'oracle' program 'oracle@xffDG (M000)' CLOSE: all sessions shutdown successfully. Thu Aug 08 10:56:11 2024 SMON: disabling cache recovery Attempt to do a Terminal Recovery (xff) Media Recovery Start: Managed Standby Recovery (xff) started logmerger process Thu Aug 08 10:56:13 2024 Managed Standby Recovery not using Real Time Apply Parallel Media Recovery started with 4 slaves Media Recovery Waiting for thread 1 sequence 189448 (in transit) Killing 4 processes with pids 17733,17729,17731,32533 (all RFS, wait for I/O) in order to disallow current and future RFS connections. Requested by OS process 15184 Thu Aug 08 10:56:16 2024 idle dispatcher 'D000' terminated, pid = (16, 1) Begin: Standby Redo Logfile archival End: Standby Redo Logfile archival Terminal Recovery timestamp is '08/08/2024 10:56:17' Terminal Recovery: applying standby redo logs. Terminal Recovery: thread 1 seq# 189448 redo required Terminal Recovery: Recovery of Online Redo Log: Thread 1 Group 12 Seq 189448 Reading mem 0 Mem# 0: /oradata/xff/std_redo12.log Identified End-Of-Redo (failover) for thread 1 sequence 189448 at SCN 0xffff.ffffffff Incomplete Recovery applied until change 34188310513 time 08/08/2024 11:32:41 Thu Aug 08 10:56:18 2024 Media Recovery Complete (xff) Terminal Recovery: successful completion Thu Aug 08 10:56:18 2024 ARCH: Archival stopped, error occurred. Will continue retrying Forcing ARSCN to IRSCN for TR 7:4123539441 ORACLE Instance xff - Archival Error Attempt to set limbo arscn 7:4123539441 irscn 7:4123539441 Resetting standby activation ID 778814335 (0x2e6bc37f) ORA-16014: log 12 sequence# 189448 not archived, no available destinations ORA-00312: online log 12 thread 1: '/oradata/xff/std_redo12.log' Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL ORA-16136 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL... Thu Aug 08 10:56:28 2024 ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE (xff) Begin: Standby Redo Logfile archival End: Standby Redo Logfile archival Thu Aug 08 10:56:28 2024 Archiver process freed from errors. No longer stopped Standby terminal recovery start SCN: 34188310512 RESETLOGS after incomplete recovery UNTIL CHANGE 34188310513 Online log /oradata/xff/redo01.log: Thread 1 Group 1 was previously cleared Online log /oradata/xff/redo02.log: Thread 1 Group 2 was previously cleared Online log /oradata/xff/redo03.log: Thread 1 Group 3 was previously cleared Online log /oradata/xff/redo04.log: Thread 1 Group 4 was previously cleared Standby became primary SCN: 34188310511 Thu Aug 08 10:56:29 2024 Setting recovery target incarnation to 3 ACTIVATE STANDBY: Complete - Database mounted as primary Completed: ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE ARC1: Becoming the 'no SRL' ARCH alter database open Thu Aug 08 10:56:34 2024 Assigning activation ID 832379854 (0x319d1bce) Thread 1 advanced to log sequence 2 (thread open) Thread 1 opened at log sequence 2 Current log# 2 seq# 2 mem# 0: /oradata/xff/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Thu Aug 08 10:56:34 2024 SMON: enabling cache recovery Thu Aug 08 10:56:34 2024 ARC0: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch Thu Aug 08 10:56:34 2024 NSA2 started with pid=14, OS id=15198 [15133] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:1087824580 end:1087828220 diff:3640 (36 seconds) Dictionary check beginning Dictionary check complete Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Thu Aug 08 10:56:38 2024 Database Characterset is ZHS16GBK Starting background process SMCO Thu Aug 08 10:56:39 2024 SMCO started with pid=15, OS id=15200 Thread 1 advanced to log sequence 3 (LGWR switch) Current log# 3 seq# 3 mem# 0: /oradata/xff/redo03.log ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** Thu Aug 08 10:56:40 2024 Archived Log entry 187515 added for thread 1 sequence 2 ID 0x319d1bce dest 1: Starting background process QMNC Thu Aug 08 10:56:43 2024 QMNC started with pid=17, OS id=15204 LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Completed: alter database open
很不幸由于虚拟机资源io太差,无法接管业务,硬件工程师紧急修复好物理机,启动数据库正常,客户直接把业务又切换到物理机中,现在需要恢复dataguard环境(并且客户把虚拟机迁移到ssd环境中),把虚拟机数据库重启到mount状态
[oracle@xffDG ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Thu Aug 8 20:06:30 2024 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup mount; ORACLE instance started. Total System Global Area 2.5655E+10 bytes Fixed Size 2265224 bytes Variable Size 3892318072 bytes Database Buffers 2.1743E+10 bytes Redo Buffers 16896000 bytes Database mounted. SQL> select open_mode,database_role from v$database; OPEN_MODE DATABASE_ROLE -------------------- ---------------- MOUNTED PRIMARY
闪回数据库到备库failover之前scn
SQL> flashback database to scn 34188310500; Flashback complete.
Thu Aug 08 20:09:40 2024 flashback database to scn 34188310500 Flashback Restore Start Thu Aug 08 20:10:34 2024 Flashback Restore Complete Flashback Media Recovery Start Thu Aug 08 20:10:34 2024 Setting recovery target incarnation to 2 started logmerger process Parallel Media Recovery started with 4 slaves Flashback Media Recovery Log /oradata/fast_recovery_area/XFF/archivelog/2024_08_08/o1_mf_1_189448_mc8dzjxn_.arc Thu Aug 08 20:10:35 2024 Identified End-Of-Redo (failover) for thread 1 sequence 189448 at SCN 0x7.f5c837f1 Incomplete Recovery applied until change 34188310501 time 08/08/2024 11:32:40 Flashback Media Recovery Complete Setting recovery target incarnation to 3 Completed: flashback database to scn 34188310500
切换虚拟机库到standby 状态
SQL> alter database convert to physical standby; Database altered. SQL> select database_role from v$database; select database_role from v$database * ERROR at line 1: ORA-01507: database not mounted SQL> alter database mount; alter database mount * ERROR at line 1: ORA-00750: database has been previously mounted and dismounted SQL> shutdown immediate; ORA-01507: database not mounted ORACLE instance shut down. SQL> startup mount; ORACLE instance started. Total System Global Area 2.5655E+10 bytes Fixed Size 2265224 bytes Variable Size 3892318072 bytes Database Buffers 2.1743E+10 bytes Redo Buffers 16896000 bytes Database mounted. SQL> select open_mode,database_role from v$database; OPEN_MODE DATABASE_ROLE -------------------- ---------------- MOUNTED PHYSICAL STANDBY
Thu Aug 08 20:10:46 2024 alter database convert to physical standby ALTER DATABASE CONVERT TO PHYSICAL STANDBY (xff) Flush standby redo logfile failed:1649 Clearing standby activation ID 832379854 (0x319d1bce) The primary database controlfile was created using the 'MAXLOGFILES 16' clause. There is space for up to 12 standby redo logfiles Use the following SQL commands on the standby database to create standby redo logfiles that match the primary database: ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 209715200; ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 209715200; ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 209715200; ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 209715200; ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 209715200; Shutting down archive processes Archiving is disabled Completed: alter database convert to physical standby
开启mrp进程
SQL> alter database open read only; Database altered. SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION; Database altered.