联系:手机/微信(+86 17813235971) QQ(107644445)
标题:私网直连后遗症:一节点无法启动导致另外节点haip无法启动
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动
# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE xifenfei1 Started ora.cluster_interconnect.haip >>>> OFFLINE 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE xifenfei1 ora.crsd 1 ONLINE OFFLINE >>>> OFFLINE ora.cssd 1 ONLINE ONLINE xifenfei1 ora.cssdmonitor 1 ONLINE ONLINE xifenfei1 ora.ctssd 1 ONLINE ONLINE xifenfei1 OBSERVER ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE xifenfei1 ora.evmd 1 ONLINE INTERMEDIATE xifenfei1 ora.gipcd 1 ONLINE ONLINE xifenfei1 ora.gpnpd 1 ONLINE ONLINE xifenfei1 ora.mdnsd 1 ONLINE ONLINE xifenfei1
alerthostname日志
2018-09-02 10:38:56.767: [/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log. 2018-09-02 10:39:00.771: [ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log. 2018-09-02 10:40:00.802: [/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log. 2018-09-02 10:40:04.806: [ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
orarootagent_root日志
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults 2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16 2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage. 2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0. [ CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000] 2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces 2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces
gipcd.log日志
2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces 2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client 2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)] 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist 2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query 2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery 2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0
ohasd.log日志
2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root 2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500 2018-09-02 10:38:57.255: [ CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500 2018-09-02 10:38:57.256: [ CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log". 2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动
[root@xifenfei1 rules.d]# ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: no ====>网卡链路状态异常 [root@xifenfei1 rules.d]# ifconfig eth0 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:36 inet addr:10.10.17.42 Bcast:172.17.17.255 Mask:255.255.255.0 inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --------->注意 RX packets:234424 errors:0 dropped:0 overruns:0 frame:0 TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16926236 (16.1 MiB) TX bytes:24269882 (23.1 MiB) Memory:91160000-91180000 eth1 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:37 inet addr:11.1.1.2 Bcast:11.1.1.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 --------->注意少了RUNNING RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Memory:91140000-91160000
关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)