标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,683)
- DB2 (22)
- MySQL (73)
- Oracle (1,545)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (53)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (565)
- Oracle安装升级 (92)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (79)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-15411: Failure groups in disk group DATA have different number of disks.
- 断电引起的ORA-08102: 未找到索引关键字, 对象号 39故障处理
- ORA-00227: corrupt block detected in control file
- 手工删除19c rac
- 解决oracle数据文件路径有回车故障
- .wstop扩展名勒索数据库恢复
- Oracle Recovery Tools工具一键解决ORA-00376 ORA-01110故障(文件offline)
- OGG-02771 Input trail file format RELEASE 19.1 is different from previous trail file form at RELEASE 11.2.
- OGG-02246 Source redo compatibility level 19.0.0 requires trail FORMAT 12.2 or higher
- GoldenGate 19安装和打patch
- dd破坏asm磁盘头恢复
- 删除asmlib磁盘导致磁盘组故障恢复
- Kylin Linux 安装19c
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
标签归档:asm异常恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
有客户反馈他们重启系统之后,发现asmlib创建的asmdisk丢失了,然后又使用oracleasm deletedisk和createdisk重新创建的asm disk,最后发现asm diskgroup无法mount。让客户通过dd 备份5m数据,然后使用kfed分析
kefd分析结果
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 3760689243 ; 0x00c: 0xe027905b kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=10 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=255 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=256|more kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 17 ; 0x002: KFBTYP_PST_META kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 256 ; 0x004: T=0 NUMB=0x100 kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0 kfbh.check: 3925268785 ; 0x00c: 0xe9f6d931 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdpHdrPairBv1.first.super.time.hi:32994098 ; 0x000: HOUR=0x12 DAYS=0x19 MNTH=0x c YEAR=0x7dd kfdpHdrPairBv1.first.super.time.lo:1614030848 ; 0x004: USEC=0x0 MSEC=0x10a SECS= 0x3 MINS=0x18 kfdpHdrPairBv1.first.super.last: 2 ; 0x008: 0x00000002 kfdpHdrPairBv1.first.super.next: 2 ; 0x00c: 0x00000002 kfdpHdrPairBv1.first.super.copyCnt: 1 ; 0x010: 0x01 kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01 kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000 kfdpHdrPairBv1.first.super.incarn: 1 ; 0x014: 0x00000001 kfdpHdrPairBv1.first.super.copy[0]: 0 ; 0x018: 0x0000 kfdpHdrPairBv1.first.super.copy[1]: 0 ; 0x01a: 0x0000 kfdpHdrPairBv1.first.super.copy[2]: 0 ; 0x01c: 0x0000 ……
因为kfed默认每个block为4k,这里提示256是ok的,255是损坏的,从而推测出来,很可能oracleasm createdisk损坏了1M的数据。由于默认au是1m,而且数据库版本是11.2.0.3,而且第256个blkn开始没有损坏,因此初步判断可以考虑使用备份asm disk header来恢复磁盘头
检查还原磁盘头的asm disk
[grid@xifenfei1 disks]$ kfed read DATA1 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 2776451033 ; 0x00c: 0xa57d47d9 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATA1 ; 0x000: length=13 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 49 ; 0x00c: 0x00000031 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 186646528 ; 0x020: 0x0b200000 kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATA_0000 ; 0x028: length=9 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA_0000 ; 0x068: length=9 kfdhdb.capname: ; 0x088: length=0 kfdhdb.crestmp.hi: 32994099 ; 0x0a8: HOUR=0x13 DAYS=0x19 MNTH=0xc YEAR=0x7dd kfdhdb.crestmp.lo: 2797442048 ; 0x0ac: USEC=0x0 MSEC=0x365 SECS=0x2b MINS=0x29 kfdhdb.mntstmp.hi: 33022061 ; 0x0b0: HOUR=0xd DAYS=0x3 MNTH=0x8 YEAR=0x7df kfdhdb.mntstmp.lo: 816879616 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0xb MINS=0xc kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 …………
证明磁盘头确实被比较完美的修复了,现在的任务是尝试mount磁盘组
mount磁盘组
[grid@xifenfei1 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 6 20:54:53 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data mount; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options
asm diskgroup已经正常mount,使用asmcmd命令检查文件是否正常
分析磁盘组数据是否正常
[grid@xifenfei1 ~]$ asmcmd ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 1622060 636493 0 636493 0 N DATA/ ASMCMD> cd data ASMCMD> ls ORCL/ ASMCMD> cd orcl ASMCMD> ls CONTROLFILE/ DATAFILE/ ONLINELOG/ PARAMETERFILE/ TEMPFILE/ spfileorcl.ora ASMCMD> cd datafile ASMCMD> ls XIFENFEI20130801.314.835191517 XIFENFEI20140101.321.835191571 XIFENFEI20140201.322.835191573 XIFENFEI20140301.323.835191573 ………… SYSAUX.270.835182535 SYSAUX.838.874669369 SYSTEM.271.835182533 SYSTEM.823.873555791 SYSTEM.945.883146947 …………
这里看到磁盘组里面的数据文件都正常,使用同样的方法,继续mount其他磁盘组。
尝试启动数据库
SQL> startup ORACLE 例程已经启动。 Total System Global Area 5010685952 bytes Fixed Size 2236968 bytes Variable Size 2013269464 bytes Database Buffers 2986344448 bytes Redo Buffers 8835072 bytes 数据库装载完毕。 ORA-16038: 日志 14 sequence# 21145 无法归档 ORA-19504: 无法创建文件"" ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569' ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'
查看数据库alert日志
ARC1: Archival started ARC2: Archival started ARC2: Becoming the 'no FAL' ARCH ARC2: Becoming the 'no SRL' ARCH ARC1: Becoming the heartbeat ARCH ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Thu Aug 06 21:04:06 2015 Thread 2 advanced to log sequence 21146 (thread recovery) Picked broadcast on commit scheme to generate SCNs Thread 2 advanced to log sequence 21147 (before internal thread enable) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-19816: 警告: 文件可能存在于数据库未知的 db_recovery_file_dest 中。 ORA-17502: ksfdcre: 4 未能创建文件 +ARCH ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [1] [47962] [1344818371 != 630731762] ORA-15130: diskgroup "ARCH" is being dismounted ORA-15066: offlining disk "ARCH_0000" in group "ARCH" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ARCH: Error 19504 Creating archive log file to '+ARCH' NOTE: Deferred communication with ASM instance Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-15130: diskgroup "ARCH" is being dismounted NOTE: deferred map free for map id 754 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-16038: 日志 14 sequence# 21145 无法归档 ORA-19504: 无法创建文件"" ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569' ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569' ORA-16038 signalled during: ALTER DATABASE OPEN... Thu Aug 06 21:04:10 2015 SUCCESS: diskgroup ARCH was dismounted SUCCESS: diskgroup ARCH was dismounted Thu Aug 06 21:04:10 2015 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc: ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc: ORA-00221: error on write to control file ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Thu Aug 06 21:04:10 2015 System state dump requested by (instance=1, osid=27353 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_27318.trc CKPT (ospid: 27353): terminating the instance due to error 221 Instance terminated by CKPT, pid = 27353
查看asm alert日志
Thu Aug 06 21:04:07 2015 WARNING: cache read a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1 Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc: ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] NOTE: a corrupted block from group ARCH was dumped to /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc WARNING: cache read (retry) a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1 Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc: ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ERROR: cache failed to read group=2(ARCH) dsk=0 blk=1 from disk(s): 0(ARCH_0000) ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] NOTE: cache initiating offline of disk 0 group ARCH NOTE: process _user27462_+asm1 (27462) initiating offline of disk 0.3942486752 (ARCH_0000) with mask 0x7e in group 2 WARNING: Disk 0 (ARCH_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: initiating PST update: grp = 2, dsk = 0/0xeafd92e0, mask = 0x6a, op = clear Thu Aug 06 21:04:07 2015 GMON updating disk modes for group 2 at 17 for pid 35, osid 27462 ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 2) Thu Aug 06 21:04:07 2015 NOTE: cache dismounting (not clean) group 2/0x723D6245 (ARCH) NOTE: messaging CKPT to quiesce pins Unix process pid: 27089, image: oracle@xifenfei1 (B000) WARNING: Offline of disk 0 (ARCH_0000) in group 2 and mode 0x7f failed on ASM inst 1 Thu Aug 06 21:04:07 2015 NOTE: halting all I/Os to diskgroup 2 (ARCH) System State dumped to trace file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc NOTE: AMDU dump of disk group ARCH created at /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace Thu Aug 06 21:04:09 2015 NOTE: LGWR doing non-clean dismount of group 2 (ARCH) NOTE: LGWR sync ABA=126.806 last written ABA 126.806
这里可以看出来,报错的block为arch磁盘组的第一个磁盘的第一个au的第二个block,而我们在开始的时候,已经分析了asm disk的第一个au完全损坏,并且我们使用了备份磁盘头进行来还原,勉强可以让磁盘组mount起来,但是由于数据库在启动的时候,需要对redo进行归档,而归档的过程需要写到arch磁盘组里面,这个时候需要访问到au=0 blk=1,而这个块本身是坏的,因此这个时候该块盘的disk就被offline掉了,而这个磁盘组是外部冗余的,因此磁盘组dismount了,所以数据库无法启动.
分析第一个au里面到底有哪些东西
SQL> select DISK_NUMBER,path from v$asm_disk; DISK_NUMBER PATH ----------- -------------------------------------------------- 0 /dev/raw/raw1 2 /dev/raw/raw3 1 /dev/raw/raw2 [oracle@xifenfei raw]$ kfed read raw1 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw1 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw1 blkn=3|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw1 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw2 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw2 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw2 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw3 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw3 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw3 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
通过一个测试机器的一个磁盘组进行分析,我们可以基本上确定asm 第一个au除了asm disk header的KFBTYP_DISKHEAD之外,其他主要是KFBTYP_FREESPC(Free Space Table)和KFBTYP_ALLOCTBL(allocator table),主要就是记录asm中au的分配情况,也就是进一步说明,如果我不对asm里面的数据使用更多的au分配或者回收au,在缺少第一个au的1-255个block的信息情况下,asm的磁盘组也不会dismount。根据这个思路,让数据库归档到本地,然后继续测试
继续open数据库
SQL> startup ORACLE 例程已经启动。 Total System Global Area 5010685952 bytes Fixed Size 2236968 bytes Variable Size 2013269464 bytes Database Buffers 2986344448 bytes Redo Buffers 8835072 bytes 数据库装载完毕。 SQL> alter database open; 数据库已更改。 LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Fri Aug 07 02:43:13 2015 ARC1 started with pid=34, OS id=22778 Fri Aug 07 02:43:13 2015 ARC2 started with pid=35, OS id=22780 Fri Aug 07 02:43:13 2015 ARC3 started with pid=36, OS id=22782 ARC1: Archival started ARC2: Archival started ARC2: Becoming the 'no FAL' ARCH ARC2: Becoming the 'no SRL' ARCH ARC1: Becoming the heartbeat ARCH ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Fri Aug 07 02:43:24 2015 Thread 1 opened at log sequence 18604 Current log# 10 seq# 18604 mem# 0: /tmp/xifenfei/otherfile/group_10.273.835182533 Current log# 10 seq# 18604 mem# 1: /tmp/xifenfei/otherfile/group_10.263.835182533 Successful open of redo thread 1 Fri Aug 07 02:43:24 2015 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri Aug 07 02:43:25 2015 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads Fri Aug 07 02:43:26 2015 minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:21328 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 Fri Aug 07 02:43:26 2015 Redo thread 2 internally disabled at seq 21147 (CKPT) [21341] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:96999124 end:97000624 diff:1500 (15 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active Starting background process GTX0 Fri Aug 07 02:43:31 2015 GTX0 started with pid=37, OS id=22803 Starting background process RCBG Fri Aug 07 02:43:31 2015 RCBG started with pid=38, OS id=22805 replication_dependency_tracking turned off (no async multimaster replication found) Fri Aug 07 02:43:34 2015 Archived Log entry 73876 added for thread 2 sequence 21145 ID 0x513c613f dest 1: <---果然有归档操作发生 Starting background process QMNC Fri Aug 07 02:43:34 2015 QMNC started with pid=39, OS id=22812 Fri Aug 07 02:43:35 2015 Archived Log entry 73877 added for thread 2 sequence 21146 ID 0x513c613f dest 1: Fri Aug 07 02:43:35 2015 ARC0: Archiving disabled thread 2 sequence 21147 Archived Log entry 73878 added for thread 2 sequence 21147 ID 0x513c613f dest 1: Fri Aug 07 02:43:37 2015 Completed: alter database open
现在到了这一步,基本上可以确定,数据库是零丢失恢复。由于asm 第一个au丢失数据严重,想要彻底修复比较难,考虑把数据库启动到mount/read only状态然后使用rman备份数据,然后进行重建asm 磁盘组,再迁移回来。至此完美恢复asmlib的磁盘被oracleasm重写的故障恢复,实现数据0丢失.当然在整个恢复过程没有于此的简单,涉及到在votedisk损坏的情况下,如何mount磁盘组,vote diskgroup的损坏修复问题,磁盘组在10g/11.1和11.2还原磁盘头备份的问题等问题.
虽然本次的恢复案例中,由于asmlib的asm disk不可见就轻易使用oracleasm createdisk命令对磁盘进行了重建,犯了一个很大错误,但是在重建之后,发现磁盘组依旧异常,未继续操作(比如重建磁盘组等),为最后的数据完全恢复创造了必要条件,使得客户的没有任何数据损失。如果再对除磁盘组继续复写操作,可能会导致数据永久性丢失。这个教训告诉我们:遇到自己不能把握的事情,及时终止,不要让错误越走越远
asm disk header 彻底损坏恢复
在asm 磁盘组不能mount的情况下,如果是磁盘头的少数部分损坏,或者是asm disk header存在,可以通过kfed修复,或者使用备份的磁盘头信息去恢复从而实现磁盘组mount来恢复数据库.如果没有备份也无法修复可以尝试使用amdu,dul来实现对不能mount的磁盘组进行恢复.在极端情况下(比如磁盘组完全丢失),amdu/dul都无论为力的情况下,可以考虑使用扫描磁盘找出来datafile 的方法求救数据的最后稻草.本实验大概的模拟了asm disk 前10M完全损坏的情况下数据库恢复
测试准备
创建新表空间,创建T_XIFENFEI测试表
SQL> create tablespace xifenfei datafile '+XIFENFEI' SIZE 50m; Tablespace created. SQL> CREATE TABLE T_XIFENFEI TABLESPACE XIFENFEI 2 AS SELECT * FROM DBA_OBJECTS; Table created. SQL> SELECT COUNT(*) FROM T_XIFENFEI; COUNT(*) ---------- 50031 SQL> select ts#,rfile#,bytes/1024/1024,blocks,name from v$datafile; TS# RFILE# BYTES/1024/1024 BLOCKS NAME ---------- ---------- --------------- ---------- -------------------------------------------------- 0 1 480 61440 +XIFENFEI/asm10g/datafile/system.256.845260203 1 2 25 3200 +XIFENFEI/asm10g/datafile/undotbs1.258.845260205 2 3 250 32000 +XIFENFEI/asm10g/datafile/sysaux.257.845260203 4 4 5 640 +XIFENFEI/asm10g/datafile/users.259.845260205 6 5 50 6400 +XIFENFEI/asm10g/datafile/xifenfei.266.845262139 SQL> select GROUP_NUMBER,DISK_NUMBER,STATE,TOTAL_MB,FREE_MB,NAME,path from v$asm_disk; GROUP_NUMBER DISK_NUMBER STATE TOTAL_MB FREE_MB NAME PATH ------------ ----------- -------- ---------- ---------- -------------------- ------------------ 1 0 NORMAL 2048 0 XIFENFEI_0000 /dev/raw/raw1 1 1 NORMAL 784 0 XIFENFEI_0001 /dev/raw/raw2 1 2 NORMAL 7059 0 XIFENFEI_0002 /dev/raw/raw3 --关闭数据库 SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. --关闭ASM SQL> shutdown immediate ASM diskgroups dismounted ASM instance shutdown
查看裸设备对应磁盘
[oracle@xifenfei dul]$ more /etc/sysconfig/rawdevices /dev/raw/raw1 /dev/sdc /dev/raw/raw2 /dev/sdd1 /dev/raw/raw3 /dev/sdd2
dd磁盘头
dd asm disk 前面10M,彻底破坏asm disk
[oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw1 bs=1M count=10 conv=notrunc 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.175424 seconds, 59.8 MB/s [oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw2 bs=1M count=10 conv=notrunc 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.11584 seconds, 90.5 MB/s [oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw3 bs=1M count=10 conv=notrunc 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.353435 seconds, 29.7 MB/s
kfed查看磁盘
确定所有asm disk header完全被破坏
[oracle@xifenfei dul]$ kfed read /dev/raw/raw1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 [oracle@xifenfei dul]$ kfed read /dev/raw/raw2 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 [oracle@xifenfei dul]$ kfed read /dev/raw/raw3 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000
amdu查看asm 磁盘
[oracle@xifenfei ~]$ amdu -diskstring '/dev/raw/raw*' amdu_2014_04_18_23_17_17/ [oracle@xifenfei ~]$ cd amdu_2014_04_18_23_17_17 [oracle@xifenfei amdu_2014_04_18_23_17_17]$ ls report.txt [oracle@xifenfei amdu_2014_04_18_23_17_17]$ more report.txt -*-amdu-*- ………… --------------------------------- Operations --------------------------------- ------------------------------- Disk Selection ------------------------------- -diskstring '/dev/raw/raw*' ------------------------------ Reading Control ------------------------------- ------------------------------- Output Control ------------------------------- ********************************* DISCOVERY ********************************** ----------------------------- DISK REPORT N0001 ------------------------------ Disk Path: /dev/raw/raw1 Unique Disk ID: Disk Label: Physical Sector Size: 512 bytes Disk Size: 65536 megabytes ** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb ** ----------------------------- DISK REPORT N0002 ------------------------------ Disk Path: /dev/raw/raw2 Unique Disk ID: Disk Label: Physical Sector Size: 512 bytes Disk Size: 65536 megabytes ** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb ** ----------------------------- DISK REPORT N0003 ------------------------------ Disk Path: /dev/raw/raw3 Unique Disk ID: Disk Label: Physical Sector Size: 512 bytes Disk Size: 65536 megabytes ** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb ** ******************************* END OF REPORT ********************************
通过这里证明,当asm disk header 损坏严重之时,amdu无法识别,更加无法恢复相关数据库
dul查看完全损坏asm disk header
测试在asm disk header完全损坏情况下,dul是否还能够实现asm磁盘组中抽取数据,同理amdu也无法正常工作.
[oracle@xifenfei dul]$ ./dul Data UnLoader: 10.2.0.5.28 - Internal Only - on Sat Apr 19 04:02:02 2014 with 64-bit io functions Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved. Strictly Oracle Internal Use Only DUL: Warning: block 0 is not a disk header block DUL: Error: Block is not in use DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw1 DUL: Warning: block 0 is not a disk header block DUL: Error: Block is not in use DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw2 DUL: Warning: block 0 is not a disk header block DUL: Error: Block is not in use DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw3
这里可以看出来,当asm disk header完全异常,dul也无法识别出来asm磁盘组(该情况下dul无法正常操作)
通过工具扫描磁盘抽取数据块
CPFL> scan disk /dev/raw/raw1 Scanning disk /dev/raw/raw1, at 2014-04-19 04:05:11 Completed disk /dev/raw/raw1, at 2014-04-19 04:05:56 CPFL> scan disk /dev/raw/raw2 Scanning disk /dev/raw/raw2, at 2014-04-19 04:05:56 Completed disk /dev/raw/raw2, at 2014-04-19 04:06:15 CPFL> scan disk /dev/raw/raw3 Scanning disk /dev/raw/raw3, at 2014-04-19 04:06:15 Completed disk /dev/raw/raw3, at 2014-04-19 04:07:44 CPFL> list datafiles Tablespace: SYSTEM File: 1 Blocks: 61440 Tablespace: UNDOTBS1 File: 2 Blocks: 3200 Tablespace: SYSAUX File: 3 Blocks: 32000 Tablespace: USERS File: 4 Blocks: 640 Tablespace: XIFENFEI File: 5 Blocks: 6400 CPFL> copy datafile 1 to /u01/oracle/oradata/datafile/1.dbf copy datafile start: 2014-04-19 04:10:35 copy datafile 1 have blocks 61440 copy datafile completed: 2014-04-19 04:11:18 CPFL> copy datafile 2 to /u01/oracle/oradata/datafile/2.dbf copy datafile start: 2014-04-19 04:11:52 copy datafile 2 have blocks 3200 copy datafile completed: 2014-04-19 04:11:54 CPFL> copy datafile 3 to /u01/oracle/oradata/datafile/3.dbf copy datafile start: 2014-04-19 04:12:03 copy datafile 3 have blocks 32000 copy datafile completed: 2014-04-19 04:12:27 CPFL> copy datafile 4 to /u01/oracle/oradata/datafile/4.dbf copy datafile start: 2014-04-19 04:13:07 copy datafile 4 have blocks 640 copy datafile completed: 2014-04-19 04:13:08 CPFL> copy datafile 5 to /u01/oracle/oradata/datafile/5.dbf copy datafile start: 2014-04-19 04:13:18 copy datafile 5 have blocks 6400 copy datafile completed: 2014-04-19 04:13:19
查看使用工具抽取数据文件
[oracle@xifenfei datafile]$ ls -l total 830320 -rw-r--r-- 1 oracle oinstall 503324672 Apr 19 04:34 1.dbf -rw-r--r-- 1 oracle oinstall 26222592 Apr 19 04:34 2.dbf -rw-r--r-- 1 oracle oinstall 262152192 Apr 19 04:34 3.dbf -rw-r--r-- 1 oracle oinstall 5251072 Apr 19 04:34 4.dbf -rw-r--r-- 1 oracle oinstall 52436992 Apr 19 04:34 5.dbf
dul验证抽取文件
[oracle@xifenfei dul]$ ./dul Data UnLoader: 10.2.0.5.28 - Internal Only - on Sat Apr 19 06:56:09 2014 with 64-bit io functions Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved. Strictly Oracle Internal Use Only DUL: Warning: Recreating file "dul.log" Found db_id = 181793355 Found db_name = ASM10G DUL> show datafiles; ts# rf# start blocks offs open err file name 0 1 0 61440 0 1 0 /u01/oracle/oradata/datafile/1.dbf 1 2 0 3200 0 1 0 /u01/oracle/oradata/datafile/2.dbf 2 3 0 32000 0 1 0 /u01/oracle/oradata/datafile/3.dbf 4 4 0 640 0 1 0 /u01/oracle/oradata/datafile/4.dbf 6 5 0 6400 0 1 0 /u01/oracle/oradata/datafile/5.dbf DUL> bootstrap; Probing file = 1, block = 377 . unloading table BOOTSTRAP$ DUL: Warning: block number is non zero but marked deferred trying to process it anyhow 57 rows unloaded DUL: Warning: Dictionary cache DC_BOOTSTRAP is empty Reading BOOTSTRAP.dat 57 entries loaded Parsing Bootstrap$ contents DUL: Warning: Recreating file "dict.ddl" Generating dict.ddl for version 10 OBJ$: segobjno 18, file 1 block 121 TAB$: segobjno 2, tabno 1, file 1 block 25 COL$: segobjno 2, tabno 5, file 1 block 25 USER$: segobjno 10, tabno 1, file 1 block 89 Running generated file "@dict.ddl" to unload the dictionary tables . unloading table OBJ$ 51171 rows unloaded . unloading table TAB$ 1576 rows unloaded . unloading table COL$ 55264 rows unloaded . unloading table USER$ 59 rows unloaded Reading USER.dat 59 entries loaded Reading OBJ.dat 51171 entries loaded and sorted 51171 entries Reading TAB.dat 1576 entries loaded Reading COL.dat 55264 entries loaded and sorted 55264 entries Reading BOOTSTRAP.dat 57 entries loaded DUL: Warning: Recreating file "dict.ddl" Generating dict.ddl for version 10 OBJ$: segobjno 18, file 1 block 121 TAB$: segobjno 2, tabno 1, file 1 block 25 COL$: segobjno 2, tabno 5, file 1 block 25 USER$: segobjno 10, tabno 1, file 1 block 89 TABPART$: segobjno 266, file 1 block 2121 INDPART$: segobjno 271, file 1 block 2161 TABCOMPART$: segobjno 288, file 1 block 2297 INDCOMPART$: segobjno 293, file 1 block 2345 TABSUBPART$: segobjno 278, file 1 block 2217 INDSUBPART$: segobjno 283, file 1 block 2257 IND$: segobjno 2, tabno 3, file 1 block 25 ICOL$: segobjno 2, tabno 4, file 1 block 25 LOB$: segobjno 2, tabno 6, file 1 block 25 COLTYPE$: segobjno 2, tabno 7, file 1 block 25 TYPE$: segobjno 181, tabno 1, file 1 block 1297 COLLECTION$: segobjno 181, tabno 2, file 1 block 1297 ATTRIBUTE$: segobjno 181, tabno 3, file 1 block 1297 LOBFRAG$: segobjno 299, file 1 block 2393 LOBCOMPPART$: segobjno 302, file 1 block 2425 UNDO$: segobjno 15, file 1 block 105 TS$: segobjno 6, tabno 2, file 1 block 57 PROPS$: segobjno 96, file 1 block 721 Running generated file "@dict.ddl" to unload the dictionary tables . unloading table OBJ$ DUL: Warning: Recreating file "OBJ.ctl" 51171 rows unloaded . unloading table TAB$ DUL: Warning: Recreating file "TAB.ctl" 1576 rows unloaded . unloading table COL$ DUL: Warning: Recreating file "COL.ctl" 55264 rows unloaded . unloading table USER$ DUL: Warning: Recreating file "USER.ctl" 59 rows unloaded . unloading table TABPART$ 72 rows unloaded . unloading table INDPART$ 80 rows unloaded . unloading table TABCOMPART$ 0 rows unloaded . unloading table INDCOMPART$ 0 rows unloaded . unloading table TABSUBPART$ 0 rows unloaded . unloading table INDSUBPART$ 0 rows unloaded . unloading table IND$ 2231 rows unloaded . unloading table ICOL$ 3650 rows unloaded . unloading table LOB$ 530 rows unloaded . unloading table COLTYPE$ 1701 rows unloaded . unloading table TYPE$ 1945 rows unloaded . unloading table COLLECTION$ 555 rows unloaded . unloading table ATTRIBUTE$ 7275 rows unloaded . unloading table LOBFRAG$ 1 row unloaded . unloading table LOBCOMPPART$ 0 rows unloaded . unloading table UNDO$ 21 rows unloaded . unloading table TS$ 7 rows unloaded . unloading table PROPS$ 28 rows unloaded Reading USER.dat 59 entries loaded Reading OBJ.dat 51171 entries loaded and sorted 51171 entries Reading TAB.dat 1576 entries loaded Reading COL.dat 55264 entries loaded and sorted 55264 entries Reading TABPART.dat 72 entries loaded and sorted 72 entries Reading TABCOMPART.dat 0 entries loaded and sorted 0 entries Reading TABSUBPART.dat 0 entries loaded and sorted 0 entries Reading INDPART.dat 80 entries loaded and sorted 80 entries Reading INDCOMPART.dat 0 entries loaded and sorted 0 entries Reading INDSUBPART.dat 0 entries loaded and sorted 0 entries Reading IND.dat 2231 entries loaded Reading LOB.dat 530 entries loaded Reading ICOL.dat 3650 entries loaded Reading COLTYPE.dat 1701 entries loaded Reading TYPE.dat 1945 entries loaded Reading ATTRIBUTE.dat 7275 entries loaded Reading COLLECTION.dat 555 entries loaded Reading BOOTSTRAP.dat 57 entries loaded Reading LOBFRAG.dat 1 entries loaded and sorted 1 entries Reading LOBCOMPPART.dat 0 entries loaded and sorted 0 entries Reading UNDO.dat 21 entries loaded Reading TS.dat 7 entries loaded Reading PROPS.dat 28 entries loaded Database character set is ZHS16GBK Database national character set is AL16UTF16 DUL> unload table sys.t_xifenfei; . unloading table T_XIFENFEI 50031 rows unloaded
通过这里可以发现,我们创建测试数据为50031条,dul读取抽取出来数据文件中对应表数据条数也为50031条;证明:在asm disk header完全损坏情况下,amdu,dul无法直接恢复asm里面数据库,但是可以通过工具扫描数据文件,找出来磁盘中的datafile block实现完整恢复数据[只要你的asm中的数据没有覆盖,都可以通过该方法恢复]
如果你在使用这些思路进行恢复遇到突发情况不能自行解决,请联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com