标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,670)
- DB2 (22)
- MySQL (73)
- Oracle (1,532)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (91)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
标签归档:kfed
ADHU(ASM Disk Header Utility)—asm disk header备份恢复工具
adhu(ASM Disk Header Utility)作为oracle asm中和kfed,amdu齐名的asm三大恢复神器之一,没有被oracle大力推广(属于内部工具),随着kfed功能增强和asm disk header自动备份功能的完善,adhu oracle基本上停止的开发支持,可以用来作为10.2.0.5之前asm版本的磁盘头保护工具
adhu预览
这里可以通过shell封装的utildhu调用adhu程序,实现更加人性化和自动化操作,它含有install,check,repair三个命令参数.
[root@xff1 tmp]# su - grid xff1:/home/grid> cd /tmp/adhu/ xff1:/tmp/adhu> ls -l total 68 -rwxr-xr-x 1 grid oinstall 18902 Nov 1 2008 adhu -rw-r--r-- 1 grid oinstall 1970 Nov 1 2008 README -rwxr-xr-x 1 grid oinstall 6964 Mar 21 16:20 utildhu -rw-r--r-- 1 root root 12634 Mar 21 16:05 utildhu.zip xff1:/tmp/adhu> ./utildhu Usage: utildhu install/check/repair [device name] $utildhu install Will gather a list of member ASM disks and create the backup directory ./HeaderBackup The ./HeaderBackup directory will contain the backup header of every asm disk in this database $utildhu check Will run /tmp/adhu/adhu for every disk discovered by $utildhu install and will email recipients configured in RECIPIENTS if there are errors in the disk header It is hoped that the user will enter valid RECIPIENT email addresses, and will place this utility in $ORA_ASM_HOME $utildhu repair <device name> Will repair the device provided using the backup header blocks that have been copied previously. This does assume that you have backup header blocks in ./HeaderBackup Sample crontab entry to run a check every 5 minutes #Minute (0-59) Hour (0-23) Day of Month (1-31) Month (1-12 or Jan-Dec) Day of Week (0-6 or Sun-Sat) Command 0,5,10,15,20,25,30,35,40,45,50,55 * * * * * utildhu check Please read the README for more information
adhu install
install主要是实现utildhu.config配置文件生成和第一次asm 磁盘头备份
xff1:/tmp/adhu> ./utildhu install xff1:/tmp/adhu> ls -l total 64 -rwxr-xr-x 1 grid oinstall 18902 Nov 1 2008 adhu drwxr-xr-x 2 grid oinstall 4096 Mar 21 16:23 HeaderBackup -rw-r--r-- 1 grid oinstall 1117 Mar 21 16:23 persistent-log.utildhu -rw-r--r-- 1 grid oinstall 1970 Nov 1 2008 README -rwxr-xr-x 1 grid oinstall 6964 Mar 21 16:20 utildhu -rw-r--r-- 1 grid oinstall 243 Mar 21 16:23 utildhu.config -rw-r--r-- 1 grid oinstall 710 Mar 21 16:23 utildhu.out -rw-r--r-- 1 root root 12634 Mar 21 16:05 utildhu.zip xff1:/tmp/adhu> cd HeaderBackup/ xff1:/tmp/adhu/HeaderBackup> ls -ltr total 12 -rw-r--r-- 1 grid oinstall 4096 Mar 21 16:23 oradata1p1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 16:23 oradata2p1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 16:23 ocrvotep1 xff1:/tmp/adhu/HeaderBackup> more ../utildhu.config /dev/mapper/oradata1p1 /dev/mapper/oradata2p1 /dev/mapper/ocrvotep1 xff1:/tmp/adhu> more ../persistent-log.utildhu Mon Mar 21 16:23:29 CST 2016 ASM Disk Header Check Utility Installed on Devices configured are: /dev/mapper/oradata1p1 /dev/mapper/oradata2p1 /dev/mapper/ocrvotep1 ADHU: /dev/mapper/oradata1p1: Status 0x01 Mon Mar 21 16:23:29 2016 ADHU: /dev/mapper/oradata1p1: Diskgroup:DATA Disk:DATA_0000 #0 ADHU: /dev/mapper/oradata1p1: valid disk header found ADHU: /dev/mapper/oradata1p1: backup block updated ###ADHU check run at Mon Mar 21 16:23:29 CST 2016 NO ERRORS FOUND xff1:/tmp/adhu> ls -l HeaderBackup/ total 16 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:04 ocrvotep1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:08 oradata1p1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:08 oradata1p1.corrupt -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:04 oradata2p1
adhu check
对于正常的asm disk,主要是为了生成新的磁盘头备份
xff1:/tmp/adhu> ls -l HeaderBackup/ total 16 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:04 ocrvotep1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:08 oradata1p1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:08 oradata1p1.corrupt -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:04 oradata2p1 xff1:/tmp/adhu> ./utildhu check xff1:/tmp/adhu> ls -l HeaderBackup/ total 16 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:11 ocrvotep1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:11 oradata1p1 -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:08 oradata1p1.corrupt -rw-r--r-- 1 grid oinstall 4096 Mar 21 23:11 oradata2p1
adhu repair
repair主要是修复磁盘头,当asm 磁盘头损坏之时,可以通过这个命令实现磁盘头修复
xff1:/tmp/adhu> dd if=/dev/zero of=/dev/mapper/oradata1p1 bs=4096 count=1 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000282838 s, 14.5 MB/s xff1:/tmp/adhu> ./utildhu check xff1:/tmp/adhu> tail -f persistent-log.utildhu ###ADHU check run at Mon Mar 21 23:04:04 CST 2016 ERRORS FOUND ADHU: /dev/mapper/oradata1p1: Status 0x08 Mon Mar 21 23:04:04 2016 ADHU: /dev/mapper/oradata1p1: Diskgroup:DATA Disk:DATA_0000 #0 ADHU: /dev/mapper/oradata1p1: corrupt disk header encountered ADHU: /dev/mapper/oradata1p1: valid backup block found ADHU: /dev/mapper/oradata1p1: CORRUPT HEADER NOT REPAIRED xff1:/tmp/adhu> kfed read /dev/mapper/oradata1p1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 7F6BD9981400 00000000 00000000 00000000 00000000 [................] Repeat 255 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] xff1:/tmp/adhu> sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 21 23:07:25 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data dismount; alter diskgroup data dismount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15027: active use of diskgroup "DATA" precludes its dismount SQL> alter diskgroup data mount; alter diskgroup data mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15017: diskgroup "DATA" cannot be mounted ORA-15013: diskgroup "DATA" is already mounted SQL> alter diskgroup data dismount force; Diskgroup altered. SQL> alter diskgroup data mount; alter diskgroup data mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15017: diskgroup "DATA" cannot be mounted ORA-15040: diskgroup is incomplete xff1:/tmp/adhu> ./utildhu repair /dev/mapper/oradata1p1 DEVICE /dev/mapper/oradata1p1 REPAIRED AT Mon Mar 21 23:06:06 CST 2016 ADHU: /dev/mapper/oradata1p1: Status 0x04 Mon Mar 21 23:06:06 2016 ADHU: /dev/mapper/oradata1p1: Diskgroup:DATA Disk:DATA_0000 #0 ADHU: /dev/mapper/oradata1p1: corrupt disk header encountered ADHU: /dev/mapper/oradata1p1: valid backup block found ADHU: /dev/mapper/oradata1p1: disk header repaired xff1:/tmp/adhu> sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 21 23:08:48 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data mount; Diskgroup altered.
adhu直接使用
adhu [-dir dirname ] [-repair] [-quiet] [-readonly] [-syslog mask ] devname
默认情况下adhu将disk header备份为当前目录下的备份文件。 使用-dir选项可以指定存放的目录。
当需要使用adhu去修复一个损坏的asm disk header时使用-repair 选项。
-quiet 选项将过滤所有正常的输出信息,若执行成功则不打印任何输出。
-readonly选项 以只读方式来打开disk device,这样备份block将不被写入,而备份文件将在可能的情况下写入。
-syslog选项控制是否写出结果到系统日志和标准输出。
devname代表为asm disk的设备文件,asm头的备份文件将以该device name为基础,并存放在当前目录或者-dir指定的目录。
分区无法识别导致asm diskgroup无法mount
有客户咨询由于主机重启之后,导致四个磁盘组中的data2磁盘组无法mount(报ORA-15032,ORA-15017,ORA-15063),数据库无法open,让我们帮忙分析解决
Wed Mar 09 18:10:53 2016 NOTE: Assigning number (1,1) to disk (/dev/oracleasm/disks/VOL011) Wed Mar 09 18:10:53 2016 ERROR: no read quorum in group: required 1, found 0 disks NOTE: cache dismounting (clean) group 1/0xBD42B778 (DATA2) NOTE: messaging CKPT to quiesce pins Unix process pid: 45093, image: oracle@BA (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0xBD42B778 (DATA2) NOTE: cache ending mount (fail) of group DATA2 number=1 incarn=0xbd42b778 NOTE: cache deleting context for group DATA2 1/0xbd42b778 GMON dismounting group 1 at 16 for pid 18, osid 45093 NOTE: Disk DATA2_0001 in mode 0x9 marked for de-assignment ERROR: diskgroup DATA2 was not mounted ORA-15032: not all alterations performed ORA-15017: diskgroup "DATA2" cannot be mounted ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA2" ERROR: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:0:431} */
这里很明显由于缺少asm disk导致data2无法正常mount,进一步分析发现data2是有两块磁盘组成
Mon Sep 14 13:14:35 2015 SQL> create diskgroup data2 external redundancy disk '/dev/oracleasm/disks/VOL010','/dev/oracleasm/disks/VOL011' NOTE: Assigning number (4,0) to disk (/dev/oracleasm/disks/VOL010) NOTE: Assigning number (4,1) to disk (/dev/oracleasm/disks/VOL011) NOTE: initializing header on grp 4 disk DATA2_0000 NOTE: initializing header on grp 4 disk DATA2_0001 NOTE: initiating PST update: grp = 4 Mon Sep 14 13:14:35 2015 GMON updating group 4 at 29 for pid 26, osid 51535 NOTE: group DATA2: initial PST location: disk 0000 (PST copy 0) NOTE: PST update grp = 4 completed successfully NOTE: cache registered group DATA2 number=4 incarn=0xea085f62 NOTE: cache began mount (first) of group DATA2 number=4 incarn=0xea085f62 NOTE: cache opening disk 0 of grp 4: DATA2_0000 path:/dev/oracleasm/disks/VOL010 NOTE: cache opening disk 1 of grp 4: DATA2_0001 path:/dev/oracleasm/disks/VOL011 NOTE: cache creating group 4/0xEA085F62 (DATA2) NOTE: cache mounting group 4/0xEA085F62 (DATA2) succeeded NOTE: allocating F1X0 on grp 4 disk DATA2_0000 NOTE: diskgroup must now be re-mounted prior to first use NOTE: cache dismounting (clean) group 4/0xEA085F62 (DATA2) NOTE: messaging CKPT to quiesce pins Unix process pid: 51535, image: oracle@BA (TNS V1-V3) NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 4/0xEA085F62 (DATA2) GMON dismounting group 4 at 30 for pid 26, osid 51535 GMON dismounting group 4 at 31 for pid 26, osid 51535 NOTE: Disk DATA2_0000 in mode 0x7e marked for de-assignment NOTE: Disk DATA2_0001 in mode 0x7e marked for de-assignment SUCCESS: diskgroup DATA2 was created
结合这部分信息,我们可以确定data2磁盘组是由两个磁盘组构成,分别为VOL010和VOL011,现在由于只发现了VOL011,因此data2磁盘组无法正常mount.观察发现该系统使用的是asmlib,通过oracleasm querydisk命令结合fdisk的盘符,
基本上可以确定VOL010丢失应该在mpathb盘(由于只有该盘和分区未被使用,其他盘和分区已经全部被现在可以查询到的asmlib使用作为asmdisk)之上
Disk /dev/mapper/mpathb: 3846.7 GB, 3846677987328 bytes 255 heads, 63 sectors/track, 467665 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/mapper/mpathbp1 1 267350 2147483647+ ee GPT Disk /dev/mapper/mpathbp1: 3846.7 GB, 3846675890176 bytes 255 heads, 63 sectors/track, 467665 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xb84bb99a Device Boot Start End Blocks Id System /dev/mapper/mpathbp1p1 1 200513 1610620641 83 Linux /dev/mapper/mpathbp1p2 200514 267349 536860170 83 Linux /dev/mapper/mpathbp1p3 267350 467665 1609038270 83 Linux
这里我们发现奇怪现象:mpathb盘先使用parted分为一个mapthbp1分区,然后又使用fdisk分了三个p1p1,p1p2,p1p3三个子分区.然后我们查看/dev/mapper/中的设备情况
发现没有p1p1,p1p2,p1p3这三个本该属于mapthb上的子分区.现在基本上明确,是由于对mapthb先使用了parted分区,然后再使用fdisk分区,在操作系统重启之后,无法正常识别相关子分区导致该问题.到此解决该问题的思路有三种.
1. 因为磁盘分区表信息是正常的,就是分区表信息没有同步到操作系统之上,想办法同步过去即可,os部分内容,此处忽略
2. 使用数据文件重组的方式直接对data2这两个asm disk进行重组,这里因为三个子分区未发现,直接对mapthbp1分区进行扫描即可,参考:asm disk header 彻底损坏恢复
3. 因为分区对于asm disk来说主要就是设置了磁盘的偏移量和大小,如果找到磁盘的偏移量,然后确定asm disk大小,直接通过dd命令把该部分dd到新的磁盘设备之上,然后直接mount磁盘组即可,这里重点讲解第三种方法恢复处理
使用dd出来mapthp1的磁盘头,然后使用bbed找出来偏移量,主要依据是第一次出现01820101信息的部分
BBED> d File: bp1 (0) Block: 64 Offsets: 0 to 63 Dba:0x00000000 ------------------------------------------------------------------------ 01820101 00000000 00000080 bc60223c 00000000 00000000 00000000 00000000 4f52434c 4449534b 564f4c30 31300000 00000000 00000000 00000000 00000000 <32 bytes per line> BBED> show all FILE# 0 BLOCK# 64 OFFSET 0 DBA 0x00000000 (0 0,64) FILENAME bp1 BIFILE bifile.bbd LISTFILE BLOCKSIZE 512 MODE Browse EDIT Unrecoverable IBASE Dec OBASE Dec WIDTH 80 COUNT 64 LOGFILE log.bbd SPOOL No
这里基本上可以定位到asm disk header对于mapthbp1的偏移量为32256,dd出来asm disk header分析
使用kfed查看磁盘头信息
现在基本上可以确定,asm disk大小为1572871M,磁盘的偏移量为32256,然后使用dd命令把这部分dd到新的磁盘设备上,然后oracleasm scandisks后
data2 mount成功,数据库正常open,此数据库完美恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
有客户反馈他们重启系统之后,发现asmlib创建的asmdisk丢失了,然后又使用oracleasm deletedisk和createdisk重新创建的asm disk,最后发现asm diskgroup无法mount。让客户通过dd 备份5m数据,然后使用kfed分析
kefd分析结果
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 3760689243 ; 0x00c: 0xe027905b kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=10 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=255 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=256|more kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 17 ; 0x002: KFBTYP_PST_META kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 256 ; 0x004: T=0 NUMB=0x100 kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0 kfbh.check: 3925268785 ; 0x00c: 0xe9f6d931 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdpHdrPairBv1.first.super.time.hi:32994098 ; 0x000: HOUR=0x12 DAYS=0x19 MNTH=0x c YEAR=0x7dd kfdpHdrPairBv1.first.super.time.lo:1614030848 ; 0x004: USEC=0x0 MSEC=0x10a SECS= 0x3 MINS=0x18 kfdpHdrPairBv1.first.super.last: 2 ; 0x008: 0x00000002 kfdpHdrPairBv1.first.super.next: 2 ; 0x00c: 0x00000002 kfdpHdrPairBv1.first.super.copyCnt: 1 ; 0x010: 0x01 kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01 kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000 kfdpHdrPairBv1.first.super.incarn: 1 ; 0x014: 0x00000001 kfdpHdrPairBv1.first.super.copy[0]: 0 ; 0x018: 0x0000 kfdpHdrPairBv1.first.super.copy[1]: 0 ; 0x01a: 0x0000 kfdpHdrPairBv1.first.super.copy[2]: 0 ; 0x01c: 0x0000 ……
因为kfed默认每个block为4k,这里提示256是ok的,255是损坏的,从而推测出来,很可能oracleasm createdisk损坏了1M的数据。由于默认au是1m,而且数据库版本是11.2.0.3,而且第256个blkn开始没有损坏,因此初步判断可以考虑使用备份asm disk header来恢复磁盘头
检查还原磁盘头的asm disk
[grid@xifenfei1 disks]$ kfed read DATA1 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 2776451033 ; 0x00c: 0xa57d47d9 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATA1 ; 0x000: length=13 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 49 ; 0x00c: 0x00000031 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 186646528 ; 0x020: 0x0b200000 kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATA_0000 ; 0x028: length=9 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA_0000 ; 0x068: length=9 kfdhdb.capname: ; 0x088: length=0 kfdhdb.crestmp.hi: 32994099 ; 0x0a8: HOUR=0x13 DAYS=0x19 MNTH=0xc YEAR=0x7dd kfdhdb.crestmp.lo: 2797442048 ; 0x0ac: USEC=0x0 MSEC=0x365 SECS=0x2b MINS=0x29 kfdhdb.mntstmp.hi: 33022061 ; 0x0b0: HOUR=0xd DAYS=0x3 MNTH=0x8 YEAR=0x7df kfdhdb.mntstmp.lo: 816879616 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0xb MINS=0xc kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 …………
证明磁盘头确实被比较完美的修复了,现在的任务是尝试mount磁盘组
mount磁盘组
[grid@xifenfei1 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 6 20:54:53 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data mount; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options
asm diskgroup已经正常mount,使用asmcmd命令检查文件是否正常
分析磁盘组数据是否正常
[grid@xifenfei1 ~]$ asmcmd ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 1622060 636493 0 636493 0 N DATA/ ASMCMD> cd data ASMCMD> ls ORCL/ ASMCMD> cd orcl ASMCMD> ls CONTROLFILE/ DATAFILE/ ONLINELOG/ PARAMETERFILE/ TEMPFILE/ spfileorcl.ora ASMCMD> cd datafile ASMCMD> ls XIFENFEI20130801.314.835191517 XIFENFEI20140101.321.835191571 XIFENFEI20140201.322.835191573 XIFENFEI20140301.323.835191573 ………… SYSAUX.270.835182535 SYSAUX.838.874669369 SYSTEM.271.835182533 SYSTEM.823.873555791 SYSTEM.945.883146947 …………
这里看到磁盘组里面的数据文件都正常,使用同样的方法,继续mount其他磁盘组。
尝试启动数据库
SQL> startup ORACLE 例程已经启动。 Total System Global Area 5010685952 bytes Fixed Size 2236968 bytes Variable Size 2013269464 bytes Database Buffers 2986344448 bytes Redo Buffers 8835072 bytes 数据库装载完毕。 ORA-16038: 日志 14 sequence# 21145 无法归档 ORA-19504: 无法创建文件"" ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569' ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'
查看数据库alert日志
ARC1: Archival started ARC2: Archival started ARC2: Becoming the 'no FAL' ARCH ARC2: Becoming the 'no SRL' ARCH ARC1: Becoming the heartbeat ARCH ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Thu Aug 06 21:04:06 2015 Thread 2 advanced to log sequence 21146 (thread recovery) Picked broadcast on commit scheme to generate SCNs Thread 2 advanced to log sequence 21147 (before internal thread enable) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-19816: 警告: 文件可能存在于数据库未知的 db_recovery_file_dest 中。 ORA-17502: ksfdcre: 4 未能创建文件 +ARCH ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [1] [47962] [1344818371 != 630731762] ORA-15130: diskgroup "ARCH" is being dismounted ORA-15066: offlining disk "ARCH_0000" in group "ARCH" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ARCH: Error 19504 Creating archive log file to '+ARCH' NOTE: Deferred communication with ASM instance Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-15130: diskgroup "ARCH" is being dismounted NOTE: deferred map free for map id 754 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc: ORA-16038: 日志 14 sequence# 21145 无法归档 ORA-19504: 无法创建文件"" ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569' ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569' ORA-16038 signalled during: ALTER DATABASE OPEN... Thu Aug 06 21:04:10 2015 SUCCESS: diskgroup ARCH was dismounted SUCCESS: diskgroup ARCH was dismounted Thu Aug 06 21:04:10 2015 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc: ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc: ORA-00221: error on write to control file ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Thu Aug 06 21:04:10 2015 System state dump requested by (instance=1, osid=27353 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_27318.trc CKPT (ospid: 27353): terminating the instance due to error 221 Instance terminated by CKPT, pid = 27353
查看asm alert日志
Thu Aug 06 21:04:07 2015 WARNING: cache read a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1 Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc: ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] NOTE: a corrupted block from group ARCH was dumped to /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc WARNING: cache read (retry) a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1 Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc: ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ERROR: cache failed to read group=2(ARCH) dsk=0 blk=1 from disk(s): 0(ARCH_0000) ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1] NOTE: cache initiating offline of disk 0 group ARCH NOTE: process _user27462_+asm1 (27462) initiating offline of disk 0.3942486752 (ARCH_0000) with mask 0x7e in group 2 WARNING: Disk 0 (ARCH_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: initiating PST update: grp = 2, dsk = 0/0xeafd92e0, mask = 0x6a, op = clear Thu Aug 06 21:04:07 2015 GMON updating disk modes for group 2 at 17 for pid 35, osid 27462 ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 2) Thu Aug 06 21:04:07 2015 NOTE: cache dismounting (not clean) group 2/0x723D6245 (ARCH) NOTE: messaging CKPT to quiesce pins Unix process pid: 27089, image: oracle@xifenfei1 (B000) WARNING: Offline of disk 0 (ARCH_0000) in group 2 and mode 0x7f failed on ASM inst 1 Thu Aug 06 21:04:07 2015 NOTE: halting all I/Os to diskgroup 2 (ARCH) System State dumped to trace file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc NOTE: AMDU dump of disk group ARCH created at /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace Thu Aug 06 21:04:09 2015 NOTE: LGWR doing non-clean dismount of group 2 (ARCH) NOTE: LGWR sync ABA=126.806 last written ABA 126.806
这里可以看出来,报错的block为arch磁盘组的第一个磁盘的第一个au的第二个block,而我们在开始的时候,已经分析了asm disk的第一个au完全损坏,并且我们使用了备份磁盘头进行来还原,勉强可以让磁盘组mount起来,但是由于数据库在启动的时候,需要对redo进行归档,而归档的过程需要写到arch磁盘组里面,这个时候需要访问到au=0 blk=1,而这个块本身是坏的,因此这个时候该块盘的disk就被offline掉了,而这个磁盘组是外部冗余的,因此磁盘组dismount了,所以数据库无法启动.
分析第一个au里面到底有哪些东西
SQL> select DISK_NUMBER,path from v$asm_disk; DISK_NUMBER PATH ----------- -------------------------------------------------- 0 /dev/raw/raw1 2 /dev/raw/raw3 1 /dev/raw/raw2 [oracle@xifenfei raw]$ kfed read raw1 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw1 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw1 blkn=3|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw1 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw2 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw2 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw2 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw3 blkn=1|grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC [oracle@xifenfei raw]$ kfed read raw3 blkn=2|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL [oracle@xifenfei raw]$ kfed read raw3 blkn=255|grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
通过一个测试机器的一个磁盘组进行分析,我们可以基本上确定asm 第一个au除了asm disk header的KFBTYP_DISKHEAD之外,其他主要是KFBTYP_FREESPC(Free Space Table)和KFBTYP_ALLOCTBL(allocator table),主要就是记录asm中au的分配情况,也就是进一步说明,如果我不对asm里面的数据使用更多的au分配或者回收au,在缺少第一个au的1-255个block的信息情况下,asm的磁盘组也不会dismount。根据这个思路,让数据库归档到本地,然后继续测试
继续open数据库
SQL> startup ORACLE 例程已经启动。 Total System Global Area 5010685952 bytes Fixed Size 2236968 bytes Variable Size 2013269464 bytes Database Buffers 2986344448 bytes Redo Buffers 8835072 bytes 数据库装载完毕。 SQL> alter database open; 数据库已更改。 LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Fri Aug 07 02:43:13 2015 ARC1 started with pid=34, OS id=22778 Fri Aug 07 02:43:13 2015 ARC2 started with pid=35, OS id=22780 Fri Aug 07 02:43:13 2015 ARC3 started with pid=36, OS id=22782 ARC1: Archival started ARC2: Archival started ARC2: Becoming the 'no FAL' ARCH ARC2: Becoming the 'no SRL' ARCH ARC1: Becoming the heartbeat ARCH ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Fri Aug 07 02:43:24 2015 Thread 1 opened at log sequence 18604 Current log# 10 seq# 18604 mem# 0: /tmp/xifenfei/otherfile/group_10.273.835182533 Current log# 10 seq# 18604 mem# 1: /tmp/xifenfei/otherfile/group_10.263.835182533 Successful open of redo thread 1 Fri Aug 07 02:43:24 2015 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri Aug 07 02:43:25 2015 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads Fri Aug 07 02:43:26 2015 minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:21328 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 Fri Aug 07 02:43:26 2015 Redo thread 2 internally disabled at seq 21147 (CKPT) [21341] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:96999124 end:97000624 diff:1500 (15 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active Starting background process GTX0 Fri Aug 07 02:43:31 2015 GTX0 started with pid=37, OS id=22803 Starting background process RCBG Fri Aug 07 02:43:31 2015 RCBG started with pid=38, OS id=22805 replication_dependency_tracking turned off (no async multimaster replication found) Fri Aug 07 02:43:34 2015 Archived Log entry 73876 added for thread 2 sequence 21145 ID 0x513c613f dest 1: <---果然有归档操作发生 Starting background process QMNC Fri Aug 07 02:43:34 2015 QMNC started with pid=39, OS id=22812 Fri Aug 07 02:43:35 2015 Archived Log entry 73877 added for thread 2 sequence 21146 ID 0x513c613f dest 1: Fri Aug 07 02:43:35 2015 ARC0: Archiving disabled thread 2 sequence 21147 Archived Log entry 73878 added for thread 2 sequence 21147 ID 0x513c613f dest 1: Fri Aug 07 02:43:37 2015 Completed: alter database open
现在到了这一步,基本上可以确定,数据库是零丢失恢复。由于asm 第一个au丢失数据严重,想要彻底修复比较难,考虑把数据库启动到mount/read only状态然后使用rman备份数据,然后进行重建asm 磁盘组,再迁移回来。至此完美恢复asmlib的磁盘被oracleasm重写的故障恢复,实现数据0丢失.当然在整个恢复过程没有于此的简单,涉及到在votedisk损坏的情况下,如何mount磁盘组,vote diskgroup的损坏修复问题,磁盘组在10g/11.1和11.2还原磁盘头备份的问题等问题.
虽然本次的恢复案例中,由于asmlib的asm disk不可见就轻易使用oracleasm createdisk命令对磁盘进行了重建,犯了一个很大错误,但是在重建之后,发现磁盘组依旧异常,未继续操作(比如重建磁盘组等),为最后的数据完全恢复创造了必要条件,使得客户的没有任何数据损失。如果再对除磁盘组继续复写操作,可能会导致数据永久性丢失。这个教训告诉我们:遇到自己不能把握的事情,及时终止,不要让错误越走越远