标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,670)
- DB2 (22)
- MySQL (73)
- Oracle (1,532)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (91)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
标签归档:ORA-15042
Oracle Exadata坏盘导致磁盘组无法mount恢复
接到朋友求救有客户oracle exadata一体机 的 asm磁盘组无法mount,希望我们提供恢复支持服务
经过分析和了解,大致问题是:磁盘空间已经超容量使用(部分数据不能完成ASM镜像),最近又损坏一块盘,导致asm 磁盘组无法mount。我们分析后,通过重构exadata celldisk数据,将asm 磁盘组 mount成功后,实现五套数据库全部open成功(由于底层磁盘部分数据损坏,导致部分数据访问报错,需要在oracle层面进行处理)。
本次问题的具体分析和处理如下:
存放数据库文件的磁盘组不能mount
Wed Dec 12 21:29:04 2018 SQL> alter diskgroup DATA_XFF mount force NOTE: cache registered group DATA_XFF number=1 incarn=0x5fe882cb NOTE: cache began mount (first) of group DATA_XFF number=1 incarn=0x5fe882cb NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03) NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03) NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03) NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03) NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03) NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03) NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03) NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03) NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03) NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03) NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03) NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03) NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02) NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02) NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02) NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02) NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02) NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02) NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02) NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02) NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02) NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02) NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02) NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02) NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01) NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01) NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01) NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01) NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01) NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01) NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01) NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01) NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01) NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01) NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01) Wed Dec 12 21:29:10 2018 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 101 for pid 27, osid 62541 NOTE: Assigning number (1,0) to disk () GMON querying group 1 at 102 for pid 27, osid 62541 NOTE: process _user62541_+asm2 (62541) initiating offline of disk 0.3915937355 () with mask 0x7e[0x7f] in group 1 NOTE: initiating PST update: grp = 1, dsk = 0/0xe968764b, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 103 for pid 27, osid 62541 ERROR: Disk 0 cannot be offlined, since all the disks [0, 25] with mirrored data would be offline. ERROR: too many offline disks in PST (grp 1) WARNING: Offline of disk 0 () in group 1 and mode 0x7f failed on ASM inst 2 NOTE: cache dismounting (not clean) group 1/0x5FE882CB (DATA_XFF) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0x5FE882CB (DATA_XFF) NOTE: cache ending mount (fail) of group DATA_XFF number=1 incarn=0x5fe882cb NOTE: cache deleting context for group DATA_XFF 1/0x5fe882cb GMON dismounting group 1 at 104 for pid 27, osid 62541 ERROR: diskgroup DATA_XFF was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15066: offlining disk "0" in group "DATA_XFF" may result in a data loss ORA-15042: ASM disk "0" is missing from group number "1" ERROR: alter diskgroup DATA_XFF mount force
检查底层损坏情况
CellCLI> list physicaldisk 20:0 KN3VZL normal 20:1 KNAWLL normal 20:2 KN4E4L warning - predictive failure, poor performance 20:3 KNAN5L normal 20:4 KMJKYL normal 20:5 KN5DGL normal 20:6 KMDLWL normal 20:7 KMDKPL normal 20:8 KMDA7L normal 20:9 KN1YJL normal 20:10 KMH1YL normal 20:11 KMVHAL normal CellCLI> list griddisk DATA_XFF_CD_00_XFFCEL01 active DATA_XFF_CD_01_XFFCEL01 active DATA_XFF_CD_02_XFFCEL01 proactive failure DATA_XFF_CD_03_XFFCEL01 active DATA_XFF_CD_04_XFFCEL01 active DATA_XFF_CD_05_XFFCEL01 active DATA_XFF_CD_06_XFFCEL01 active DATA_XFF_CD_07_XFFCEL01 active DATA_XFF_CD_08_XFFCEL01 active DATA_XFF_CD_09_XFFCEL01 active DATA_XFF_CD_10_XFFCEL01 active DATA_XFF_CD_11_XFFCEL01 active
在db节点无法发现异常磁盘的asm disk
[grid@ycdwdb01 grid]$ kfod disk=all -------------------------------------------------------------------------------- Disk Size Path User Group ============================================================ 1: 433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown> 2: 433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown> 3: 433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown> 4: 433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown> 5: 433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown> 6: 433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown> 7: 433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown> 8: 433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown> 9: 433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown> 10: 433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown> 11: 433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>
根据客户的反馈该磁盘组几乎全部被使用,asmcmd lsdg看到Usable_file_MB已经出现负值.证明该磁盘组本身的normal没有完全存储两份数据,在这样的情况下,继续坏盘会导致部分数据只有一份,因此也就出现了这里的磁盘组无法正常mount成功.
通过底层修复celldisk之后
CellCLI> list griddisk DATA_XFF_CD_00_XFFCEL01 active DATA_XFF_CD_01_XFFCEL01 active DATA_XFF_CD_02_XFFCEL01 active DATA_XFF_CD_03_XFFCEL01 active DATA_XFF_CD_04_XFFCEL01 active DATA_XFF_CD_05_XFFCEL01 active DATA_XFF_CD_06_XFFCEL01 active DATA_XFF_CD_07_XFFCEL01 active DATA_XFF_CD_08_XFFCEL01 active DATA_XFF_CD_09_XFFCEL01 active DATA_XFF_CD_10_XFFCEL01 active DATA_XFF_CD_11_XFFCEL01 active [grid@ycdwdb01 grid]$ kfod disk=all -------------------------------------------------------------------------------- Disk Size Path User Group ============================================================ 1: 433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown> 2: 433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown> 3: 433152 Mb o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01 <unknown> <unknown> 4: 433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown> 5: 433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown> 6: 433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown> 7: 433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown> 8: 433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown> 9: 433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown> 10: 433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown> 11: 433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown> 12: 433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>
data磁盘组直接mount成功
Fri Dec 14 14:04:59 2018 SQL> alter diskgroup DATA_XFF mount NOTE: cache registered group DATA_XFF number=1 incarn=0x78a886e7 NOTE: cache began mount (not first) of group DATA_XFF number=1 incarn=0x78a886e7 NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03) NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03) NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03) NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03) NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03) NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03) NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03) NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03) NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03) NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03) NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03) NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03) NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02) NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02) NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02) NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02) NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02) NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02) NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02) NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02) NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02) NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02) NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02) NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02) NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01) NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01) NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01) NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01) NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01) NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01) NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01) NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01) NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01) NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01) NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01) NOTE: Assigning number (1,0) to disk (o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01) Fri Dec 14 14:04:59 2018 GMON querying group 1 at 78 for pid 28, osid 76016 NOTE: Assigning number (1,24) to disk () NOTE: Assigning number (1,25) to disk () NOTE: Assigning number (1,26) to disk () NOTE: Assigning number (1,27) to disk () NOTE: Assigning number (1,28) to disk () NOTE: Assigning number (1,29) to disk () NOTE: Assigning number (1,30) to disk () NOTE: Assigning number (1,31) to disk () NOTE: Assigning number (1,32) to disk () NOTE: Assigning number (1,33) to disk () NOTE: Assigning number (1,35) to disk () GMON querying group 1 at 79 for pid 28, osid 76016 NOTE: cache opening disk 0 of grp 1: DATA_XFF_CD_02_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01 NOTE: cache opening disk 1 of grp 1: DATA_XFF_CD_05_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 NOTE: cache opening disk 2 of grp 1: DATA_XFF_CD_03_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 NOTE: F1X0 found on disk 2 au 5 fcn 0.15948262 NOTE: cache opening disk 3 of grp 1: DATA_XFF_CD_06_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 NOTE: cache opening disk 4 of grp 1: DATA_XFF_CD_09_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 NOTE: cache opening disk 5 of grp 1: DATA_XFF_CD_04_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 NOTE: cache opening disk 6 of grp 1: DATA_XFF_CD_07_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 NOTE: cache opening disk 7 of grp 1: DATA_XFF_CD_11_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 NOTE: cache opening disk 8 of grp 1: DATA_XFF_CD_01_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 NOTE: cache opening disk 9 of grp 1: DATA_XFF_CD_00_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 NOTE: cache opening disk 10 of grp 1: DATA_XFF_CD_10_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 NOTE: cache opening disk 11 of grp 1: DATA_XFF_CD_08_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 NOTE: cache opening disk 12 of grp 1: DATA_XFF_CD_00_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02 NOTE: cache opening disk 13 of grp 1: DATA_XFF_CD_01_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02 NOTE: cache opening disk 14 of grp 1: DATA_XFF_CD_02_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02 NOTE: cache opening disk 15 of grp 1: DATA_XFF_CD_03_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02 NOTE: cache opening disk 16 of grp 1: DATA_XFF_CD_04_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02 NOTE: cache opening disk 17 of grp 1: DATA_XFF_CD_05_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02 NOTE: cache opening disk 18 of grp 1: DATA_XFF_CD_06_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02 NOTE: cache opening disk 19 of grp 1: DATA_XFF_CD_07_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02 NOTE: cache opening disk 20 of grp 1: DATA_XFF_CD_08_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02 NOTE: cache opening disk 21 of grp 1: DATA_XFF_CD_09_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02 NOTE: F1X0 found on disk 21 au 2 fcn 0.15948262 NOTE: cache opening disk 22 of grp 1: DATA_XFF_CD_10_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02 NOTE: cache opening disk 23 of grp 1: DATA_XFF_CD_11_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02 NOTE: cache opening disk 36 of grp 1: DATA_XFF_CD_11_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03 NOTE: cache opening disk 37 of grp 1: DATA_XFF_CD_04_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03 NOTE: cache opening disk 38 of grp 1: DATA_XFF_CD_00_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03 NOTE: cache opening disk 39 of grp 1: DATA_XFF_CD_03_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03 NOTE: cache opening disk 40 of grp 1: DATA_XFF_CD_05_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03 NOTE: cache opening disk 41 of grp 1: DATA_XFF_CD_08_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03 NOTE: cache opening disk 42 of grp 1: DATA_XFF_CD_01_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03 NOTE: cache opening disk 43 of grp 1: DATA_XFF_CD_09_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03 NOTE: cache opening disk 44 of grp 1: DATA_XFF_CD_06_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03 NOTE: F1X0 found on disk 44 au 2 fcn 0.15948262 NOTE: cache opening disk 45 of grp 1: DATA_XFF_CD_07_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03 NOTE: cache opening disk 46 of grp 1: DATA_XFF_CD_02_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03 NOTE: cache mounting (not first) normal redundancy group 1/0x78A886E7 (DATA_XFF) Fri Dec 14 14:04:59 2018 kjbdomatt send to inst 2 Fri Dec 14 14:04:59 2018 NOTE: attached to recovery domain 1 NOTE: redo buffer size is 512 blocks (2101760 bytes) Fri Dec 14 14:04:59 2018 NOTE: LGWR attempting to mount thread 2 for diskgroup 1 (DATA_XFF) NOTE: LGWR found thread 2 closed at ABA 98.4672 NOTE: LGWR mounted thread 2 for diskgroup 1 (DATA_XFF) NOTE: LGWR opening thread 2 at fcn 0.18931129 ABA 99.4673 NOTE: cache mounting group 1/0x78A886E7 (DATA_XFF) succeeded NOTE: cache ending mount (success) of group DATA_XFF number=1 incarn=0x78a886e7 GMON querying group 1 at 80 for pid 19, osid 9805 Fri Dec 14 14:04:59 2018 NOTE: Instance updated compatible.asm to 11.2.0.3.0 for grp 1 SUCCESS: diskgroup DATA_XFF was mounted SUCCESS: alter diskgroup DATA_XFF mount
恢复后的asm磁盘状态
ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL Y 512 4096 4194304 15160320 4776184 5197824 -210820 12 N DATA_XFF/ MOUNTED NORMAL N 512 4096 4194304 864896 863400 298240 282580 0 Y DBFS_DG/ MOUNTED NORMAL N 512 4096 4194304 3787840 2157232 1298688 429272 0 N RECO_XFF/
后续数据库open成功,有部分坏块通过技术手段进行二次处理,至此数据库恢复完成,成功抢救了客户Oracle Exadata中的绝大部分数据.如果有类似xd故障恢复,无法自行解决,需要恢复支持请联系我们
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
发表在 非常规恢复
标签为 exadata mount, exadata坏盘恢复, exadata恢复, exadata磁盘组恢复, ORA-15040, ORA-15042, ORA-15066, xd坏盘恢复, xd恢复
评论关闭
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]
在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)
SQL> alter diskgroup DATA mount; alter diskgroup DATA mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "2" is missing from group number "2"
asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)
ORA-15335: ASM metadata corruption detected in disk group 'DATA' ORA-15130: diskgroup "DATA" is being dismounted ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]
kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误
[oracle@fcomtaep2 disks]$ kfed read ASMRECO03 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 7FC18D899400 00000000 00000000 00000000 00000000 [................] Repeat 27 times 7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................] 7FC18D8995D0 00000000 00000000 00000000 00000000 [................] Repeat 1 times 7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.] 7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ****** 7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................] 7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......] 7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B] 7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........] 7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....] 7FC18D899660 00000000 00000000 00000000 00000000 [................] Repeat 25 times 7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..] 7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..] 7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......] 7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.] 7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........] 7FC18D899850 00000000 00000000 00000000 00000000 [................] Repeat 186 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.
[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 2ABD671E9400 00000000 00000000 00000000 00000000 [................] Repeat 31 times 2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........] 2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ****** 2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK] 2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc] 2ABD671E9640 00000000 00000019 00030000 00000000 [................] 2ABD671E9650 00000000 00000000 00000000 00000000 [................] 2ABD671E9660 00000000 00000000 00001000 00000000 [................] 2ABD671E9670 0002F000 00000000 00000000 00000000 [................] 2ABD671E9680 00000000 00000000 00000000 00000000 [................] Repeat 215 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏
[ebernal@dbaasm tars]$ kfed read rhdisk16 kfbh.endian: 65 ; 0x000: 0x41 kfbh.hard: 73 ; 0x001: 0x49 kfbh.type: 88 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 32 ; 0x003: 0x20 kfbh.block.blk: 1111709260 ; 0x004: blk=1111709260 kfbh.block.obj: 1634861056 ; 0x008: file=131072 kfbh.check: 119 ; 0x00c: 0x00000077 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 2B6FE2AC1400 20584941 4243564C 61720000 00000077 [AIX LVCB..raw...] <==== **** Here ****** 2B6FE2AC1410 00000000 00000000 00000000 00000000 [................] 2B6FE2AC1420 00000000 00000000 30300000 38306430 [..........000d08] 2B6FE2AC1430 30306131 34643030 30303030 31303030 [1a0000d400000001] 2B6FE2AC1440 61006533 766C6D73 7461645F 00003161 [3e.asmlv_data1..] 2B6FE2AC1450 00000000 00000000 00000000 00000000 [................] Repeat 2 times 2B6FE2AC1480 54000000 4D206575 20207961 31312037 [...Tue May 7 11] 2B6FE2AC1490 3A33343A 32203633 0A333130 00000000 [:43:36 2013.....] 2B6FE2AC14A0 65755400 79614D20 20372020 343A3131 [.Tue May 7 11:4] 2B6FE2AC14B0 34323A38 31303220 00000A33 44000000 [8:24 2013......D] 2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900 [081AD400.ymm.y..] 2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000 [....None........] 2B6FE2AC14E0 00000000 00000000 00000000 00000000 [................] Repeat 14 times 2B6FE2AC15D0 00000000 00000000 65310000 61653934 [..........1e49ea] 2B6FE2AC15E0 342E3862 00000000 00000000 00000000 [b8.4............] 2B6FE2AC15F0 00000000 00000000 00000000 00000000 [................] Repeat 224 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]
这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏
[oracle@dbep2 disks]$ kfed read asm-disk3 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 06000000 00000000 00000000 00000000 00000000 [................] Repeat 25 times 0602100 51e2b7f6 00ed4e00 00000000 00000001 [...Q.N..........] 0602120 00000000 0000000b 00000100 0000003c [............<...] 0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa] 0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........] 0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ****** 0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........] 0602240 00000000 00000000 00000000 09d18000 [................] Repeat 254 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]
这里的/u01很可能表明该asm disk被文件系统覆盖
对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
发表在 Oracle ASM, Oracle备份恢复
标签为 endian_kfbh, Invalid OSM block type, kfbtTraverseBlock, KFED-00322, ORA-15042, ORA-15196
评论关闭
hp平台rdisk中磁盘丢失导致asm启动报ORA-15042恢复
有老朋友找到我,说一个客户的数据库异常,问题是asm无法正常mount,提示是缺少两块磁盘.问我是否可以恢复.因为是内网环境,通过他那边发过来的零零散散的信息,大概分析如下
asm alert日志报错
ERROR: diskgroup DGROUP1 was not mounted
Fri Aug 12 16:03:12 EAT 2016 SQL> alter diskgroup DGROUP1 mount Fri Aug 12 16:03:12 EAT 2016 NOTE: cache registered group DGROUP1 number=1 incarn=0xf6781b5c Fri Aug 12 16:03:12 EAT 2016 NOTE: Hbeat: instance first (grp 1) Fri Aug 12 16:03:16 EAT 2016 NOTE: start heartbeating (grp 1) Fri Aug 12 16:03:16 EAT 2016 NOTE: cache dismounting group 1/0xF6781B5C (DGROUP1) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGROUP1 was not mounted
前台尝试mount asm 磁盘组报错ORA-15042
从这里可以明显的看出来asm 磁盘组无法正常mount,是由于缺少asm disk 15,16.如果想恢复asm,最好的方法就是找出来这两个磁盘.通过kfed对现在的磁盘进行分析,最后我们发现asm disk 14对应的磁盘为disk160,,asm disk 17对应的disk163,根据第一感觉很可能是disk161和disk161两块盘异常,让机房检查硬件无任何告警
OS层面分析
省略和本次结论无关的记录
ls -l /dev/rdisk crw-rw---- 1 oracle dba 13 0x000070 Jan 1 2016 disk160 crw-rw---- 1 oracle dba 13 0x000073 Jan 1 2016 disk163 ls -l /dev/disk brw-r----- 1 bin sys 1 0x000070 Jan 13 2015 disk160 brw-r----- 1 bin sys 1 0x000071 Jan 13 2015 disk161 brw-r----- 1 bin sys 1 0x000072 Jan 13 2015 disk162 brw-r----- 1 bin sys 1 0x000073 Jan 13 2015 disk163
这里我们发现在hp unix中/dev/disk下面磁盘都存在,但是/dev/rdisk下面丢失,通过ioscan相关命令继续分析
ioscan -fNnkC disk disk 160 64000/0xfa00/0x70 esdisk CLAIMED DEVICE HP OPEN-V /dev/disk/disk160 /dev/rdisk/disk160 disk 161 64000/0xfa00/0x71 esdisk CLAIMED DEVICE HP OPEN-V /dev/disk/disk161 disk 162 64000/0xfa00/0x72 esdisk CLAIMED DEVICE HP OPEN-V /dev/disk/disk162 disk 163 64000/0xfa00/0x73 esdisk CLAIMED DEVICE HP OPEN-V /dev/disk/disk163 /dev/rdisk/disk163
这里我们基本上可以确定是/dev/rdisk下面的盘发生丢失.进一步分析,因为rdisk是聚合后的盘符,那我们分析聚合前的盘符是否正常
ioscan -m dsf /dev/rdisk/disk160 /dev/rdsk/c29t12d4 /dev/rdsk/c28t12d4 /dev/rdisk/disk163 /dev/rdsk/c29t12d7 /dev/rdsk/c28t12d7 ls -l /dev/rdsk crw-r----- 1 bin sys 188 0x1dc000 Apr 22 2014 c29t12d0 crw-r----- 1 bin sys 188 0x1dc100 Apr 22 2014 c29t12d1 crw-r----- 1 bin sys 188 0x1dc300 Jan 13 2015 c29t12d3 crw-r----- 1 bin sys 188 0x1dc400 Jan 13 2015 c29t12d4 crw-r----- 1 bin sys 188 0x1dc500 Jan 13 2015 c29t12d5 crw-r----- 1 bin sys 188 0x1dc600 Jan 13 2015 c29t12d6 crw-r----- 1 bin sys 188 0x1dc700 Jan 13 2015 c29t12d7 crw-r----- 1 bin sys 188 0x1cc100 Apr 22 2014 c28t12d1 crw-r----- 1 bin sys 188 0x1cc300 Jan 13 2015 c28t12d3 crw-r----- 1 bin sys 188 0x1cc400 Jan 13 2015 c28t12d4 crw-r----- 1 bin sys 188 0x1cc500 Jan 13 2015 c28t12d5 crw-r----- 1 bin sys 188 0x1cc600 Jan 13 2015 c28t12d6 crw-r----- 1 bin sys 188 0x1cc700 Jan 13 2015 c28t12d7
通过这里我们基本上可以大概判断出来/dev/rdsk/c28t12d5,/dev/rdsk/c28t12d6,/dev/rdsk/c29t12d5,/dev/rdsk/c29t12d6就是我们需要找的/dev/rdisk/disk161和disk162的聚合之前的盘符.也就是说,现在我们判断只有/dev/rdisk下面的字符设备有问题,其他均正常.
通过系统命令修复异常
insf -e -H 64000/0xfa00/0x71 insf -e -H 64000/0xfa00/0x72
现在已经可以正常看到/dev/rdisk/disk161和/dev/rdisk/disk162盘符,初步判断,os层面盘符已经恢复正常.修改磁盘权限和所属组
chmod 660 /dev/rdisk/disk161 chmod 660 /dev/rdisk/disk162 chown oracle:dba /dev/rdisk/disk161 chown oracle:dba /dev/rdisk/disk162
正常启动asm,mount磁盘组,open数据库
这次的恢复,主要是从操作系统层面判断解决问题,从而实现数据库完美恢复,数据0丢失.有类似恢复案例:分区无法识别导致asm diskgroup无法mount
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com