标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,738)
- DB2 (22)
- MySQL (75)
- Oracle (1,588)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (160)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (582)
- Oracle安装升级 (95)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (83)
- PostgreSQL (27)
- pdu工具 (5)
- PostgreSQL恢复 (9)
- SQL Server (29)
- SQL Server恢复 (10)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-39773: parse of metadata stream failed故障处理
- sql数据库备份失败—失败: 23(数据错误(循环冗余检查)
- vmdk文件被加密恢复(虚拟机文件加密)
- 差点被误操作的ORA-600 kcratr_nab_less_than_odr故障
- win平台19c 打patch遭遇2个小问题汇总
- pg单个数据库目录恢复-pdu恢复单个数据库目录数据
- pg删除数据恢复—pdu恢复pg delete数据
- .[OnlyBuy@cyberfear.com].REVRAC勒索mysql恢复
- 表dml操作权限授权给public,导致只读用户失效
- 21c数据库恢复遭遇ora-600 ktugct: corruption detected
- pg_control丢失/损坏处理
- 当前主流数据库版本服务支持周期-202503
- pg启动报invalid checkpoint record处理
- 删除redo导致ORA-00313 ORA-00312故障处理
- Navicat连接postgresql时出现column “datlastsysoid” does not exist错误解决
- aix磁盘损坏oracle数据库恢复
- pg误删除数据恢复(PostgreSQL delete数据恢复)
- PostgreSQL表文件损坏恢复—pdu恢复损坏的表文件
- linux rm -rf 删除数据文件恢复
- PostgreSQL恢复工具—pdu恢复单个表文件
标签归档:rac无法启动
init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动
早上到客户现场,客户告知有一套环境替换OCR和VOTEDISK之后,crs无法启动,让我看看。环境:HP RAC(只用一个节点)+10.2.0.5 Oracle 数据库
start crs显示正常,但是无法启动
# /app/oracle/product/10.2.0/crs/bin/crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly # ps -ef|grep crs root 6461 1 0 May 19 ? 0:00 /bin/sh /sbin/init.d/init.crsd run root 29719 23678 0 10:04:51 pts/tc 0:00 grep crs
也无任何日志
[xifenfei01][orawj][/root/xifenfei]#ls -ltr total 148 drwxr-x--- 2 oracle dba 96 May 15 2014 admin drwxr-x--- 2 root dba 96 May 15 2014 crsd drwxr-x--- 2 oracle dba 96 May 15 2014 evmd drwxrwxr-t 5 oracle dba 1024 Jun 4 2014 racg drwxr-x--- 5 oracle dba 1024 May 17 22:50 cssd -rw-rw-r-- 1 root dba 61568 May 24 15:26 alertxifenfei01.log drwxr-x--- 2 oracle dba 3072 May 24 15:43 client [xifenfei01][orawj][/root/xifenfei]#date Mon, May 25, 2015 11:30:09 AM
表决磁盘和OCR信息
[xifenfei01][orawj][/root/xifenfei]#ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 1441492 Used space (kbytes) : 5972 Available space (kbytes) : 1435520 ID : 1714667730 Device/File Name : /dev/vgc01/rCMPR_VGC01_OCR1 Device/File integrity check succeeded Device/File Name : /dev/vgc02/rCMPR_VGC02_OCR2 Device/File integrity check succeeded Cluster registry integrity check succeeded [xifenfei01][orawj][/root/xifenfei]#crsctl query css votedisk 0. 0 /dev/vgc01/rCMPR_VGC01_VOTE1 1. 0 /dev/vgc02/rCMPR_VGC02_VOTE2 2. 0 /dev/vgc03/rCMPR_VGC03_VOTE3 located 3 votedisk(s).
ocr.loc文件路径
# more /var/opt/oracle/ocr.loc #Device/file /dev/vgc02/rCMPR_VGC02_OCR2 getting replaced by device /dev/vgc02/rCMPR_VGC02_OCR2 ocrconfig_loc=/dev/vgc01/rCMPR_VGC01_OCR1 ocrmirrorconfig_loc=/dev/vgc02/rCMPR_VGC02_OCR2 local_only=false
这里可以看出来表决磁盘和ocr等相关信息正常
显示init.cssd startcheck进程
[xifenfei01][orawj][/root/xifenfei]#ps -ef|grep init root 1 0 0 May 19 ? 0:03 init root 119 0 0 May 19 ? 0:00 pagetable_init_daemon root 115 0 0 May 19 ? 0:00 mdep_initiator_thread root 26820 26792 0 10:49:53 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 26791 1 0 10:49:53 ? 0:00 /bin/sh /sbin/init.d/init.crsd run root 27183 23698 0 10:50:23 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 26792 1 0 10:49:53 ? 0:00 /bin/sh /sbin/init.d/init.cssd fatal root 23698 1 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.evmd run root 26816 26791 0 10:49:53 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck oracle 20534 11033 0 11:30:35 pts/ta 0:00 grep init
这里的init.cssd startcheck大部分情况下,是由于不能访问存储或者第三方集群件无法访问导致
查看vg状态
VG Name /dev/vgc01 VG Write Access read/write VG Status available Max LV 255 Cur LV 9 Open LV 9 Max PV 255 Cur PV 1 Act PV 1 Max PE per PV 3200 VGDA 2 PE Size (Mbytes) 32 Total PE 3199 Alloc PE 736 Free PE 2463 Total PVG 0 Total Spare PVs 0 Total Spare PVs in use 0 VG Version 1.0 VG Max Size 25500g VG Max Extents 816000 VG Name /dev/vgc02 VG Write Access read/write VG Status available Max LV 255 Cur LV 9 Open LV 9 Max PV 255 Cur PV 1 Act PV 1 Max PE per PV 3200 VGDA 2 PE Size (Mbytes) 32 Total PE 3199 Alloc PE 736 Free PE 2463 Total PVG 0 Total Spare PVs 0 Total Spare PVs in use 0 VG Version 1.0 VG Max Size 25500g VG Max Extents 816000 VG Name /dev/vgc03 VG Write Access read/write VG Status available Max LV 255 Cur LV 6 Open LV 6 Max PV 255 Cur PV 1 Act PV 1 Max PE per PV 3200 VGDA 2 PE Size (Mbytes) 32 Total PE 3199 Alloc PE 448 Free PE 2751 Total PVG 0 Total Spare PVs 0 Total Spare PVs in use 0 VG Version 1.0 VG Max Size 25500g VG Max Extents 816000
这里可以看到,三个存放表决磁盘和ocr的vg都是available的
看votedisk和ocr权限
# ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl crw-r----- 1 oracle dba 64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1 crw-r----- 1 oracle dba 64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1 crw-r----- 1 oracle dba 64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2 crw-r----- 1 oracle dba 64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2 crw-r----- 1 oracle dba 64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3
直接修改权限为777,然后尝试
# chmod 777 /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl # ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl crwxrwxrwx 1 oracle dba 64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1 crwxrwxrwx 1 oracle dba 64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1 crwxrwxrwx 1 oracle dba 64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2 crwxrwxrwx 1 oracle dba 64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2 crwxrwxrwx 1 oracle dba 64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3
kill相关进程重试
# ps -ef|grep init root 1 0 0 May 19 ? 0:03 init root 119 0 0 May 19 ? 0:00 pagetable_init_daemon root 115 0 0 May 19 ? 0:00 mdep_initiator_thread root 6458 1 0 May 19 ? 0:00 /bin/sh /sbin/init.d/init.evmd run root 20975 1 0 10:40:11 ? 0:00 /bin/sh /sbin/init.d/init.crsd run root 20976 1 0 10:40:11 ? 0:00 /bin/sh /sbin/init.d/init.cssd fatal root 21006 20976 0 10:40:11 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 20997 20975 0 10:40:11 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 21152 23678 0 10:40:18 pts/tc 0:00 grep init vi /etc/inittab #h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null #h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null #h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null # /sbin/init q # ps -ef|grep init.c | grep -v grep | awk '{print $2}' |xargs kill -9 # ps -ef|grep init root 1 0 0 May 19 ? 0:03 init root 119 0 0 May 19 ? 0:00 pagetable_init_daemon root 115 0 0 May 19 ? 0:00 mdep_initiator_thread root 21744 23678 1 10:42:31 pts/tc 0:00 grep init
重新启动init进程
vi /etc/inittab h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null ~ # /sbin/init q # ps -ef|grep init root 1 0 0 May 19 ? 0:03 init root 119 0 0 May 19 ? 0:00 pagetable_init_daemon root 115 0 0 May 19 ? 0:00 mdep_initiator_thread root 23737 23706 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 23731 23698 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 23706 1 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.crsd run root 23698 1 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.evmd run root 23887 23678 1 10:45:28 pts/tc 0:00 grep init root 23746 23700 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.cssd startcheck root 23700 1 0 10:45:23 ? 0:00 /bin/sh /sbin/init.d/init.cssd fatal
证明修改lv权限,问题依旧,不是votedisk和ocr的权限和所有者导致,通过dd和strings读相关文件,发现都OK.
调试/sbin/init.d/init.cssd startcheck进程
[xifenfei01][orawj][/root/xifenfei]#sh -x /sbin/init.d/init.cssd startcheck + ORA_CRS_HOME=/app/oracle/product/10.2.0/crs + ORACLE_USER=oracle + ORACLE_HOME=/app/oracle/product/10.2.0/crs + export ORACLE_HOME + export ORA_CRS_HOME + export ORACLE_USER + DISABLE_OPROCD=false + OPROCD_DEFAULT_TIMEOUT=1000 + OPROCD_DEFAULT_MARGIN=500 + OPROCD_CHECK_TIMEOUT=2000 + OPROCD_STOP_TIMEOUT=2000 + OPROCD_DEFAULT_HISTORGRAM= + HOSTN=/bin/hostname + EXPRN=/usr/bin/expr + CUT=/usr/bin/cut + AWK=/bin/awk + ECHO=echo + TR=/bin/tr + /bin/uname + [ SunOS = HP-UX ] + /bin/uname + [ Linux = HP-UX ] + + /bin/hostname HOST=xifenfei01 + + /usr/bin/expr xifenfei01 : .* len1=8 + + /usr/bin/expr match xifenfei01 [0-9]*\.[0-9]*\.[0-9]*\.[0-9]* len2=0 + [ 8 != 0 ] + + echo xifenfei01 + /usr/bin/cut -d. -f1 HOST=xifenfei01 + + echo xifenfei01 + /bin/tr [:upper:] [:lower:] HOST=xifenfei01 + PS=/bin/ps + PSE=/bin/ps -e + PSEF=/bin/ps -ef + HEAD=/bin/head + GREP=/bin/grep + KILL=/bin/kill + KILLTERM=/bin/kill -TERM + KILLDIE=/bin/kill -9 + KILLCHECK=/bin/kill -0 5852 + SLEEP=/bin/sleep + NULL=/dev/null + UNAME=/bin/uname + CAT=/bin/cat ……………… + eval /bin/true + /bin/true + [ 0 != 0 ] + eval /bin/ps -ef | /bin/grep '/usr/lbin/cm[g]msd' 1>/dev/null 2>/dev/null + /bin/grep /usr/lbin/cm[g]msd + /bin/ps -ef + 1> /dev/null 2> /dev/null + RC=1 + [ 1 -ne 0 ] + /bin/logger -puser.err Oracle Cluster Ready Services waiting for HP-UX Service Guard to start. + /bin/sleep 60
这里可以通过-x调试shell脚本,发现crs在等待HP-UX Service Guard启动,从而可以确定是由于HP-UX Service Guard未启动
检查HP-UX Service Guard是否启动
[xifenfei01][orawj][/root/xifenfei]#cmviewcl CLUSTER STATUS crmdb_b_cluster down NODE STATUS STATE xifenfei01 down unknown crmdbb02 down unknown UNOWNED_PACKAGES PACKAGE STATUS STATE AUTO_RUN NODE pkg1 down halted enabled unowned pkg2 down halted enabled unowned
通过这里,结合客户描述(只启动了一个节点,另外一个节点的vg未激活),可以判断出来由于只使用一个节点,在未启动Service Guard的情况下,直接激活vg,由于Service Guard未启动导致crs无法启动