root 用户操作 ORACLE 数据库导致悲剧

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:root 用户操作 ORACLE 数据库导致悲剧

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到同事请求,说客户的linux redhat 5.8平台部署的11.2.0.3 RAC 节点2挂掉了,报磁盘IO异常,数据库hang住

Fri Jun 14 12:01:22 2013
Thread 2 advanced to log sequence 369 (LGWR switch)
  Current log# 49 seq# 369 mem# 0: +DATA/q9db/onlinelog/group_49.861.817830099
Fri Jun 14 12:01:22 2013
Archived Log entry 89300 added for thread 2 sequence 368 ID 0x35324053 dest 1:
Fri Jun 14 14:26:18 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625 
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side
Fri Jun 14 14:31:17 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625 
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side

在12点钟数据库运行正常,无任何错误,突然到了14多出现ORA-15025/ORA-27041,并且重启ORACLE 数据库恢复正常。该错误很明显是数据库无权限访问ASM DISK,检查ASM实例日志

Thu Jun 13 19:01:21 2013
ASMB started with pid=25, OS id=25066 
Thu Jun 13 19:01:22 2013
NOTE: client +ASM2:+ASM registered, osid 25068, mbr 0x0
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:24 2013
WARNING: failed to online diskgroup resource ora.OCR_VOTE.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:57 2013
NOTE: client q9db2:q9db registered, osid 25732, mbr 0x1
Thu Jun 13 19:02:31 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.8.8.33)
 (PORT=1521))))' SCOPE=MEMORY SID='+ASM2';
Fri Jun 14 14:53:09 2013
SQL> ALTER DISKGROUP OCR_VOTE DISMOUNT  /* asm agent *//* {2:61929:97} */ 
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP ARCH DISMOUNT  /* asm agent *//* {2:61929:97} */ 
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP DATA DISMOUNT  /* asm agent *//* {2:61929:97} */ 

这里可以明显的看到,ASM实例在该时间点无任何错误,证明一切运行正常,查看系统日志,在该故障点,message中无任何记录,查看asm disk权限

[oracle@q9db02 trace]$ ll /dev/mapper/
total 0
crw------- 1 root root    10, 60 Jun  9 11:08 control
brw-rw---- 1 grid asmdba 253, 15 Jun 14 16:20 q9datalun1
brw-rw---- 1 grid asmdba 253, 16 Jun 14 16:20 q9datalun2
brw-rw---- 1 grid asmdba 253, 17 Jun 14 16:20 q9datalun3
brw-rw---- 1 grid asmdba 253, 18 Jun 14 16:19 q9datalun4
brw-rw---- 1 grid asmdba 253, 19 Jun 14 16:20 q9datalun5
brw-rw---- 1 grid asmdba 253, 20 Jun 14 16:20 q9datalun6
brw-rw---- 1 grid asmdba 253, 21 Jun 14 16:19 q9datalun7
brw-rw---- 1 grid asmdba 253,  4 Jun 14 16:20 q9datalun8
brw-rw---- 1 grid asmdba 253,  5 Jun 14 16:20 q9votelun1

所有文件权限没有任何问题,和当初部署之时完全相同而且运行了一段时间都正常,部署之时权限

[oracle@q9db02 trace]$ more /etc/rc.local
chown grid:asmdba /dev/mapper/q9votelun1
chmod 660 /dev/mapper/q9votelun1
chown grid:asmdba /dev/mapper/q9datalun1
chmod 660 /dev/mapper/q9datalun1
chown grid:asmdba /dev/mapper/q9datalun2
chmod 660 /dev/mapper/q9datalun2
chown grid:asmdba /dev/mapper/q9datalun3
chmod 660 /dev/mapper/q9datalun3
chown grid:asmdba /dev/mapper/q9datalun4
chmod 660 /dev/mapper/q9datalun4
chown grid:asmdba /dev/mapper/q9datalun5
chmod 660 /dev/mapper/q9datalun5
chown grid:asmdba /dev/mapper/q9datalun6
chmod 660 /dev/mapper/q9datalun6
chown grid:asmdba /dev/mapper/q9datalun7
chmod 660 /dev/mapper/q9datalun7
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8

因为这里权限没有任何改变,而且asm disk权限正确,系统日志无任何日志,证明该问题不是因为ASM DISK权限改变导致,那我怀疑是人做了不该做的操作,比喻临时性修改了ASM DISK权限,然后有修改回来了,或者是不正常的用户操作了数据库,而这些操作更加可能是root用户操作,分析root用户操作记录

--history部分记录
  803  su  oracle
  804  exit
  805  cd /tmp
  806  ls
  807  cd sysbench/
  808  cd bin/
  809  ls
  810  ORACLE_SID=q9db2
  811  export ORACLE_BASE
  812  export ORACLE_HOME
  813  ./sysbench --test=oltp --oltp-table-name=sysbench --oltp-table-size=1 --oracle-db=Q9DB 
       --oracle-user=sysbench --oracle-password=sysbench --db-driver=oracle  prepare
  814  syssql
  815  sqlplus system/sysbench@q9db02
  816  sqlplus system/q9db@q9db02
  817  echo $ORACLE_HOME
  818  cd $ORACLE_HOME/network/
  819  vi admin/tnsnames.ora 
  820  sqlplus system/NEWQ9DB
  821   echo $ORACLE_HOME
  822  vi ~/.bash_profile 
  823   echo $ORACLE_SID
  824  ps -ef | grep smon
  825  sqlplus system/NEWQ9DB
  826  exit

这里很明显的看到,由于SA想使用sysbench做系统基线测试,使用了root用户登录数据库并进行了相关操作,从而出现了该问题,因为ASM DISK 所有者是grid:asmdba,权限是660,root用户无法对ASM DISK进行读写操作,从而出现了上述错误。让同事协助SA重现上述操作,果然出现完全相同的错误,而且退出root session,数据库恢复正常

Fri Jun 14 15:44:24 2013
Archived Log entry 89330 added for thread 2 sequence 389 ID 0x35324053 dest 1:
Fri Jun 14 15:50:42 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 473 logical extent 0 of file 625 
  in group [2.3857045540] from disk DATA_0001
  allocation unit 377894 reason error; if possible, will try another mirror side
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
WARNING: failed to read mirror side 1 of virtual extent 652 logical extent 0 of file 625 
  in group [2.3857045540] from disk DATA_0003
  allocation unit 377939 reason error; if possible, will try another mirror side
Fri Jun 14 15:55:58 2013
Thread 2 advanced to log sequence 391 (LGWR switch)
  Current log# 41 seq# 391 mem# 0: +DATA/q9db/onlinelog/group_41.853.817830085
Fri Jun 14 15:55:58 2013
Archived Log entry 89331 added for thread 2 sequence 390 ID 0x35324053 dest 1:
Thread 2 advanced to log sequence 392 (LGWR switch)
  Current log# 42 seq# 392 mem# 0: +DATA/q9db/onlinelog/group_42.854.817830087

在ASM ORACLE RAC环境中,使用root操作oracle 数据库导致该错误,强烈建议:操作oracle数据库,请使用oracle数据库安装用户(最少也是同一个所属组用户)运行,超级用户root对于oracle来说也不是万能的

此条目发表在 Oracle 分类目录。将固定链接加入收藏夹。

评论功能已关闭。