分类目录归档:Oracle ASM

ASM中磁盘组权限设置

aix平台11gr2单库使用使用grid和oracle用户分别部署gi和db,在添加磁盘的时候,使用设置磁盘所属用户和组为grid与oinstall,设置权限为755.添加磁盘成功后,数据库直接crash.
asm添加磁盘操作

SQL>  alter diskgroup DATA add disk '/dev/rhdisk15' 
NOTE: Assigning number (2,7) to disk (/dev/rhdisk15)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_0007
NOTE: requesting all-instance disk validation for group=2
Wed Apr 03 22:09:03 2013
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Wed Apr 03 22:09:03 2013
GMON updating group 2 at 21 for pid 17, osid 22610284
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0xa026f7ec (DATA)
GMON querying group 2 at 22 for pid 13, osid 20643916
NOTE: cache opening disk 7 of grp 2: DWDATAGRP_0007 path:/dev/rhdisk15
GMON querying group 2 at 23 for pid 13, osid 20643916
SUCCESS: refreshed membership for 2/0xa026f7ec (DATA)
NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1
SUCCESS:  alter diskgroup DATA add disk '/dev/rhdisk15'
Starting background process ARB0
Wed Apr 03 22:09:07 2013
ARB0 started with pid=22, OS id=14155890 
NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
Wed Apr 03 22:09:19 2013
SQL>  alter diskgroup DATA add disk '/dev/rhdisk11' 
Wed Apr 03 22:09:20 2013
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 2/0xa026f7ec (DATA)
NOTE: Assigning number (2,8) to disk (/dev/rhdisk11)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_0008
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Wed Apr 03 22:09:23 2013
GMON updating group 2 at 24 for pid 17, osid 22610284
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0xa026f7ec (DATA)
GMON querying group 2 at 25 for pid 13, osid 20643916
NOTE: cache opening disk 8 of grp 2: DATA_0008 path:/dev/rhdisk11
GMON querying group 2 at 26 for pid 13, osid 20643916
SUCCESS: refreshed membership for 2/0xa026f7ec (DATA)
NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1
SUCCESS:  alter diskgroup DATA add disk '/dev/rhdisk11'
Starting background process ARB0
Wed Apr 03 22:09:26 2013
ARB0 started with pid=22, OS id=22872116 
NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
Wed Apr 03 22:14:41 2013
NOTE: ASM client xifenfei:xifenfei disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/diag/asm/+asm/+ASM/trace/+ASM_ora_15073468.trc
Wed Apr 03 22:16:53 2013
NOTE: client xifenfei:xifenfei registered, osid 20709378, mbr 0x0
Wed Apr 03 22:20:33 2013
NOTE: client xifenfei:xifenfei deregistered

这里可看到增加磁盘操作正常并且开始做rebalance,但是也看到关于client xifenfei异常断开连接(本质就是数据库crash)

crash时的alert日志

Wed Apr 03 22:00:00 2013
Setting Resource Manager plan SCHEDULER[0x318B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Wed Apr 03 22:00:00 2013
Starting background process VKRM
Wed Apr 03 22:00:00 2013
VKRM started with pid=31, OS id=22413426 
Wed Apr 03 22:09:06 2013
ORA-15025: could not open disk "/dev/rhdisk15"
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
Wed Apr 03 22:09:06 2013
SUCCESS: disk DATA_0007 (7.2092304189) added to diskgroup DATA
Wed Apr 03 22:09:26 2013
ORA-15025: could not open disk "/dev/rhdisk15"
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
Wed Apr 03 22:09:26 2013
SUCCESS: disk DATA_0008 (8.2092304190) added to diskgroup DATA
Wed Apr 03 22:14:40 2013
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 1 logical extent 0 of file 261 in 
group 2 on disk 7 allocation unit 464 
KCF: read, write or open error, block=0x6a online=1
        file=1 '+DATA/xifenfei/datafile/system.261.788373447'
        error=15081 txt: ''
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 1 (block # 106)
ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
DBW0 (ospid: 17367438): terminating the instance due to error 63999

这里可以看到数据库异常crash是因为/dev/rhdisk15没有权限去操作该文件,导致dbw0进程异常,从而出现该数据库crash

尝试重启数据库(asm重启正常)

SQL> startup
ORACLE instance started.

Total System Global Area 1.2827E+10 bytes
Fixed Size                  2233480 bytes
Variable Size            1711278968 bytes
Database Buffers         1.1073E+10 bytes
Redo Buffers               40894464 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447'

这里提示file 1需要恢复,查看alert日志,出现以下错误

Completed: ALTER DATABASE   MOUNT
Wed Apr 03 22:17:02 2013
ALTER DATABASE OPEN
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 261 in 
group 2 on disk 8 allocation unit 462 
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:690 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:918 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 263 in 
group 2 on disk 8 allocation unit 918 
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 262 in 
group 2 on disk 8 allocation unit 690 
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-01110: data file 3: '+DATA/xifenfei/datafile/undotbs1.263.788373475'
ORA-01114: IO error writing block to file 3 (block # 1)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.262.788373463'
ORA-01114: IO error writing block to file 2 (block # 1)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

recover database 操作

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01201: file 1 header failed to write correctly
Wed Apr 03 22:18:49 2013
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Wed Apr 03 22:18:50 2013
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306

依然是这里的提示依然是因为磁盘无读写权限从而出现数据库无法写数据文件问题,修改刚刚加入的磁盘文件权限问为660(4读2写1执行),表明与oinstall相同组的oracle用户对该磁盘也有读写权限.
这个事故是一个很简单,而且随着11g中asm使用grid和oracle用户的客户越来越多,相关的事故也越来越多,因为大多数使用人习惯直接给某个文件授权为755,而在这样的grid和oracle分开安装的系统中,将出现增加磁盘后,数据库crash,而且不能起来(因为oracle用户对磁盘只有读权限,无写权限),一种比较好的规范:在11gr2的asm系统中(grid和oracle用户),建议设置磁盘为grid.oinstall,权限设置为660

发表在 Oracle ASM | 2 条评论

asmlib异常报ORA-00600[kfklLibFetchNext00]

一个朋友的历史库出现故障,在linux 4的平台上asm的10.2.0.1的单库,asm使用asmlib来处理。
asm不能正常mount磁盘组,可以看到asmdisk,alert日志报ORA-00600[kfklLibFetchNext00]
操作系统内核是:2.6.9-78
oracleasmlib是:2.0.2-1
asm磁盘组mount失败

--以前故障
SQL> ALTER DISKGROUP ALL MOUNT 
Thu Sep  6 14:23:16 2012
NOTE: cache registered group DGARC number=1 incarn=0x2bf96274
NOTE: cache registered group DGDATA number=2 incarn=0x2c196275
NOTE: cache registered group DGSYS number=3 incarn=0x2c196276
Thu Sep  6 14:23:16 2012
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc:
ORA-15183: ASMLIB initialization error [driver/agent not installed]
Thu Sep  6 14:23:16 2012
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc:
ORA-15183: ASMLIB initialization error [/opt/oracle/extapi/64/asm/orcl/1/libasm.so]
ORA-15183: ASMLIB initialization error [driver/agent not installed]
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 1: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 1/0x2BF96274 (DGARC) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGARC was not mounted
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 2: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 2/0x2C196275 (DGDATA) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGDATA was not mounted
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 3: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 3/0x2C196276 (DGSYS) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGSYS was not mounted

--现在故障
Thu Jan 24 13:49:45 2013
SQL> ALTER DISKGROUP ALL MOUNT 
Thu Jan 24 13:49:45 2013
NOTE: cache registered group DGARC number=1 incarn=0xf388cee9
NOTE: cache registered group DGDATA number=2 incarn=0xf3a8ceea
NOTE: cache registered group DGSYS number=3 incarn=0xf3a8ceeb
Thu Jan 24 13:49:45 2013
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc:
ORA-00600: internal error code, arguments: [kfklLibFetchNext00], 
[18446744073709551614], [0], [], [], [], [], []
Thu Jan 24 13:49:46 2013
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc:
ORA-00600: internal error code, arguments: [kfklLibFetchNext00], 
[18446744073709551614], [0], [], [], [], [], []
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 1: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 1/0xF388CEE9 (DGARC) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGARC was not mounted
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 2: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 2/0xF3A8CEEA (DGDATA) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGDATA was not mounted
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 3: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 3/0xF3A8CEEB (DGSYS) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGSYS was not mounted
Shutting down instance: further logons disabled

trace文件信息

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
kgesinv()+33         call     kgerinv()            006469D40 ? 0064E1C58 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000001 ?
kgesinw()+166        call     kgesinv()            006469D40 ? 0064E1C58 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000001 ?
kfklLibScanNext()+2  call     kgesinw()            006469D40 ? 000000000 ?
39                                                 000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfkLibFetchNext()+3  call     kfklLibScanNext()    0064DDD70 ? 7FBFFFDCD0 ?
43                                                 000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfuitrnInit()+524    call     kfkLibFetchNext()    006469D40 ? 2A971DFF90 ?
                                                   000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfkLibIterInit()+18  call     kfuitrnInit()        006469D40 ? 2A971DFCB0 ?
0                                                  2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfkLoadAllLibs()+36  call     kfkLibIterInit()     000000000 ? 00646C7E0 ?
3                                                  2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfkDiscoverString()  call     kfkLoadAllLibs()     000000000 ? 00646C7E0 ?
+107                                               2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
Cannot find symbol
Cannot find symbol
Cannot find symbol
kfdDiscoverString()  call     kfkDiscoverString()  067A53768 ? 00646C7E0 ?
+28                                                2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfdDiscoverShallow(  call     kfdDiscoverString()  067A53768 ? 000000000 ?
)+315                                              2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfgbDriver()+1174    call     kfdDiscoverShallow(  000000180 ? 000000000 ?
                              )                    2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
ksbabs()+564         call     kfgbDriver()         7FBFFFE5C0 ? 000000048 ?
                                                   000000000 ? 000000009 ?
                                                   000000009 ? 000000000 ?
ksbrdp()+727         call     ksbabs()             7FBFFFE5C0 ? 000000048 ?
                                                   000000000 ? 000000009 ?
                                                   000000009 ? 000000000 ?
opirip()+616         call     ksbrdp()             7FBFFFE5C0 ? 000000048 ?
                                                   000000001 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF6C8 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF6C8 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
opimai_real()+317    call     sou2o()              7FBFFFF6A0 ? 000000032 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF730 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
<0x3c9fb1c40b>       call     main()               000000003 ? 7FBFFFF730 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
 
--------------------- Binary Stack Dump ---------------------

因为客户的库是一个历史库,基本上不怎么使用,在2012年启动asm就出现了ORA-15183错误,然后在2013年重启机器后,再次启动asm就出现了ORA-00600[kfklLibFetchNext00]错误,通过2012年的错误提示,我们大概可以判断出来该问题和ASMLIB有关系,查询mos发现429945.1,发现Call Stack Trace完全一致,可以定位是该问题(如果想深入分析,可以通过strace继续分析)

ORA-600: [kfklLibFetchNext00], [18446744073709551614], [0] when mounting diskgroup in ASM

Applies to: 
Linux OS - Version: 2.0.1-1 and later   [Release: RHEL4 and later ]
Information in this document applies to any platform.
Linux Kernel - Version: 2.0.1 

Symptoms
 3 RAC db.  
2 nodes are up and functioning except for 1 node - ASM did not come back up after 
the reboot eventhough all disks show available from asmlib's perspective:
 
Changes
 All that was done with resources were stopped on Node1 and an extra LUN added. 
 A reboot was then performed.

Cause
 The cause of the issue is libasm.o corruption

Ran the following to confirm that disks are ok:
/dev/oracleasm listdisks
/usr/sbin/asmtool -I -l /dev/oracleasm -n /dev/sdg1 -a label
/usr/sbin/oracleasm-discover 'ORCL:*'
dd if=/dev/sdg1 bs=8192 count=1 | od -c
==> output checked out fine
.
kfod asm_diskstring='ORCL:*'
==> this failed on Node1
KFOD-00600: file not found; argument [610][kfklLibFetchNext00] even though libasm.o exists

You might see the following call stack as well

----- Call Stack Trace -----

kfklLibScanNext 
kfkLibFetchNext 
kfuitrnInit 
kfkLibIterInit 
kfkLoadAllLibs 
kfkDiscoverString 
kfdDiscoverString 
kfdDiscoverShallow 
kfgbDriver 

strace showed
 
 Node1-failing
-------
stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0777, st_size=19344, ...}) = 0
 getdents64(4, /* 0 entries */, 4096) = 0 <<<<
 close(4) = 0
 open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1 
 ENOENT (No such file or directory)
 open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1 
 ENOENT (No such file or directory)
 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9750d000
write(1, "KFOD-00600: file not found; argu"..., 69) = 69

Node2-working
 -----
 stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0
 open("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\23\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0
mmap(NULL, 1066104, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) 0x2a9750d000

通过MOS的描述,可以明确定位到问题是:libasm.o异常导致

解决方案

To implement the solution, reinstall the ASMlib RPM
>rpm -Uvh oracleasmlib-2.0.0-1
 
This replaces the /opt/oracle/extapi/64/asm/orcl/1/libasm.so
发表在 Oracle ASM | 标签为 , , | 2 条评论

How to Get the Contents of an Spfile on ASM when ASM/GRID is down

在11g中asm的spfile文件是存放在asm中的,如果asm不能正常启动是否可以获得其spfile信息.这里通过gpnptool来获得spfile文件信息,给大家提供了在11gr2的rac是怎么利用asm 中的spfile启动asm的思路
asm spfile信息

[grid@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Fri Dec 21 01:41:31 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create pfile='/tmp/pfile' from spfile;

File created.

SQL> !more /tmp/pfile 
+ASM1.__oracle_base='/u01/app/gridbase'#ORACLE_BASE set from in memory value
+ASM2.asm_diskgroups='XIFENFEI'#Manual Mount
+ASM1.asm_diskgroups='XIFENFEI'#Manual Mount
*.asm_diskstring='/dev/oracleasm/disks/*'
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/gridbase'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'

关闭集群(asm已关闭)

[root@rac1 ~]# crsctl stop crs 
[root@rac1 ~]# ps -ef|grep pmon
root      8768  6372  0 02:53 pts/1    00:00:00 grep pmon
[root@rac1 ~]# crsctl stat res
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

gpnptool命令获取asm disk信息

[root@rac1 ~]# gpnptool get -o-
<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" 
xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" 
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" ProfileSequence="4" 
ClusterUId="885339054e904f1dbfa646b41d7a0edb" ClusterName="rac-cluster" PALocation="">
<gpnp:Network-Profile>
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth0" Use="public"/>
<gpnp:Network id="net2" IP="10.10.1.0" Adapter="eth1" Use="cluster_interconnect"/>
</gpnp:HostNetwork>
</gpnp:Network-Profile>
<orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
--重点关注信息(asm disk 信息)
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/oracleasm/disks/*" 
SPFile="+DATA/rac-cluster/asmparameterfile/registry.253.776955291"/>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<ds:Reference URI="">
<ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
<ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> 
<InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/>
</ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<ds:DigestValue>T2Q3r+5sER2Rp0VfeqzYh461f2s=</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>
LwcQEtlsPGfywzdYJrOqiTp4cZNFGB/S9Ts8OCvYOGf/Z8HDT2yN5p2nCuxArUfW+KzaPzPHHihpRVaTcAY31nJ2Rcf2vMqYp4e++shliQXC8mg
1oGxQGifkjZwA4pTTEK5MBmr4FTZnR3VArZjjVfJdsmOMfyH4YeSMU5HPjdA=
</ds:SignatureValue>
</ds:Signature>
</gpnp:GPnP-Profile>
Success.

Error CLSGPNP_NO_DAEMON getting profile.

获得asm spfile信息
通过kfed找磁盘中的kfdhdb.sp|ausize来获得asm spfile相关信息

[root@rac1 ~]# ls /dev/oracleasm/disks/
VOL1  VOL2  VOL3  VOL4
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL1 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       22 ; 0x0f4: 0x00000016
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL2 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL3 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL4 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000

这里可以看出来asm spfile信息在磁盘VOL1中,spfile从第22个au开始,1个au(1M).

获得asm spfile 内容

[root@rac1 ~]# dd if=/dev/oracleasm/disks/VOL1 bs=1M skip=22 count=1 > /tmp/spfile
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 1.47474 seconds, 711 kB/s
[root@rac1 ~]# strings /tmp/spfile
+ASM1.__oracle_base='/u01/app/gridbase'#ORACLE_BASE set from in memory value
+ASM2.asm_diskgroups='XIFENFEI'#Manual Mount
+ASM1.asm_diskgroups='XIFENFEI'#Manual Mount
*.asm_diskstring='/dev/oracleasm/disks/*'
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/gridbase'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'

通过对比发现,在asm实例未正常启动的情况下,也可以通过其他方面来获得asm spfile文件.本实验只是对于spfile在asm中位置的定位(大家去猜测11gr2的rac是怎么利用asm 中的spfile启动asm的思路),实际生产环境中请勿模仿,gpnptool命令有较大风险

发表在 Oracle ASM | 评论关闭