联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
接到同事请求,说客户的linux redhat 5.8平台部署的11.2.0.3 RAC 节点2挂掉了,报磁盘IO异常,数据库hang住
Fri Jun 14 12:01:22 2013 Thread 2 advanced to log sequence 369 (LGWR switch) Current log# 49 seq# 369 mem# 0: +DATA/q9db/onlinelog/group_49.861.817830099 Fri Jun 14 12:01:22 2013 Archived Log entry 89300 added for thread 2 sequence 368 ID 0x35324053 dest 1: Fri Jun 14 14:26:18 2013 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625 in group [2.3857217523] from disk DATA_0001 allocation unit 377890 reason error; if possible, will try another mirror side Fri Jun 14 14:31:17 2013 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625 in group [2.3857217523] from disk DATA_0001 allocation unit 377890 reason error; if possible, will try another mirror side
在12点钟数据库运行正常,无任何错误,突然到了14多出现ORA-15025/ORA-27041,并且重启ORACLE 数据库恢复正常。该错误很明显是数据库无权限访问ASM DISK,检查ASM实例日志
Thu Jun 13 19:01:21 2013 ASMB started with pid=25, OS id=25066 Thu Jun 13 19:01:22 2013 NOTE: client +ASM2:+ASM registered, osid 25068, mbr 0x0 WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD) Thu Jun 13 19:01:24 2013 WARNING: failed to online diskgroup resource ora.OCR_VOTE.dg (unable to communicate with CRSD/OHASD) Thu Jun 13 19:01:57 2013 NOTE: client q9db2:q9db registered, osid 25732, mbr 0x1 Thu Jun 13 19:02:31 2013 ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.8.8.33) (PORT=1521))))' SCOPE=MEMORY SID='+ASM2'; Fri Jun 14 14:53:09 2013 SQL> ALTER DISKGROUP OCR_VOTE DISMOUNT /* asm agent *//* {2:61929:97} */ Fri Jun 14 14:53:10 2013 SQL> ALTER DISKGROUP ARCH DISMOUNT /* asm agent *//* {2:61929:97} */ Fri Jun 14 14:53:10 2013 SQL> ALTER DISKGROUP DATA DISMOUNT /* asm agent *//* {2:61929:97} */
这里可以明显的看到,ASM实例在该时间点无任何错误,证明一切运行正常,查看系统日志,在该故障点,message中无任何记录,查看asm disk权限
[oracle@q9db02 trace]$ ll /dev/mapper/ total 0 crw------- 1 root root 10, 60 Jun 9 11:08 control brw-rw---- 1 grid asmdba 253, 15 Jun 14 16:20 q9datalun1 brw-rw---- 1 grid asmdba 253, 16 Jun 14 16:20 q9datalun2 brw-rw---- 1 grid asmdba 253, 17 Jun 14 16:20 q9datalun3 brw-rw---- 1 grid asmdba 253, 18 Jun 14 16:19 q9datalun4 brw-rw---- 1 grid asmdba 253, 19 Jun 14 16:20 q9datalun5 brw-rw---- 1 grid asmdba 253, 20 Jun 14 16:20 q9datalun6 brw-rw---- 1 grid asmdba 253, 21 Jun 14 16:19 q9datalun7 brw-rw---- 1 grid asmdba 253, 4 Jun 14 16:20 q9datalun8 brw-rw---- 1 grid asmdba 253, 5 Jun 14 16:20 q9votelun1
所有文件权限没有任何问题,和当初部署之时完全相同而且运行了一段时间都正常,部署之时权限
[oracle@q9db02 trace]$ more /etc/rc.local chown grid:asmdba /dev/mapper/q9votelun1 chmod 660 /dev/mapper/q9votelun1 chown grid:asmdba /dev/mapper/q9datalun1 chmod 660 /dev/mapper/q9datalun1 chown grid:asmdba /dev/mapper/q9datalun2 chmod 660 /dev/mapper/q9datalun2 chown grid:asmdba /dev/mapper/q9datalun3 chmod 660 /dev/mapper/q9datalun3 chown grid:asmdba /dev/mapper/q9datalun4 chmod 660 /dev/mapper/q9datalun4 chown grid:asmdba /dev/mapper/q9datalun5 chmod 660 /dev/mapper/q9datalun5 chown grid:asmdba /dev/mapper/q9datalun6 chmod 660 /dev/mapper/q9datalun6 chown grid:asmdba /dev/mapper/q9datalun7 chmod 660 /dev/mapper/q9datalun7 chown grid:asmdba /dev/mapper/q9datalun8 chmod 660 /dev/mapper/q9datalun8 chown grid:asmdba /dev/mapper/q9datalun8 chmod 660 /dev/mapper/q9datalun8
因为这里权限没有任何改变,而且asm disk权限正确,系统日志无任何日志,证明该问题不是因为ASM DISK权限改变导致,那我怀疑是人做了不该做的操作,比喻临时性修改了ASM DISK权限,然后有修改回来了,或者是不正常的用户操作了数据库,而这些操作更加可能是root用户操作,分析root用户操作记录
--history部分记录 803 su oracle 804 exit 805 cd /tmp 806 ls 807 cd sysbench/ 808 cd bin/ 809 ls 810 ORACLE_SID=q9db2 811 export ORACLE_BASE 812 export ORACLE_HOME 813 ./sysbench --test=oltp --oltp-table-name=sysbench --oltp-table-size=1 --oracle-db=Q9DB --oracle-user=sysbench --oracle-password=sysbench --db-driver=oracle prepare 814 syssql 815 sqlplus system/sysbench@q9db02 816 sqlplus system/q9db@q9db02 817 echo $ORACLE_HOME 818 cd $ORACLE_HOME/network/ 819 vi admin/tnsnames.ora 820 sqlplus system/NEWQ9DB 821 echo $ORACLE_HOME 822 vi ~/.bash_profile 823 echo $ORACLE_SID 824 ps -ef | grep smon 825 sqlplus system/NEWQ9DB 826 exit
这里很明显的看到,由于SA想使用sysbench做系统基线测试,使用了root用户登录数据库并进行了相关操作,从而出现了该问题,因为ASM DISK 所有者是grid:asmdba,权限是660,root用户无法对ASM DISK进行读写操作,从而出现了上述错误。让同事协助SA重现上述操作,果然出现完全相同的错误,而且退出root session,数据库恢复正常
Fri Jun 14 15:44:24 2013 Archived Log entry 89330 added for thread 2 sequence 389 ID 0x35324053 dest 1: Fri Jun 14 15:50:42 2013 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun2" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 WARNING: failed to read mirror side 1 of virtual extent 473 logical extent 0 of file 625 in group [2.3857045540] from disk DATA_0001 allocation unit 377894 reason error; if possible, will try another mirror side Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun4" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 ORA-00604: error occurred at recursive SQL level 2 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255' ORA-15081: failed to submit an I/O operation to a disk Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc: ORA-15025: could not open disk "/dev/mapper/q9datalun4" ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 3 ORA-00604: error occurred at recursive SQL level 2 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255' ORA-15081: failed to submit an I/O operation to a disk WARNING: failed to read mirror side 1 of virtual extent 652 logical extent 0 of file 625 in group [2.3857045540] from disk DATA_0003 allocation unit 377939 reason error; if possible, will try another mirror side Fri Jun 14 15:55:58 2013 Thread 2 advanced to log sequence 391 (LGWR switch) Current log# 41 seq# 391 mem# 0: +DATA/q9db/onlinelog/group_41.853.817830085 Fri Jun 14 15:55:58 2013 Archived Log entry 89331 added for thread 2 sequence 390 ID 0x35324053 dest 1: Thread 2 advanced to log sequence 392 (LGWR switch) Current log# 42 seq# 392 mem# 0: +DATA/q9db/onlinelog/group_42.854.817830087
在ASM ORACLE RAC环境中,使用root操作oracle 数据库导致该错误,强烈建议:操作oracle数据库,请使用oracle数据库安装用户(最少也是同一个所属组用户)运行,超级用户root对于oracle来说也不是万能的