linux – mount.ocfs2:安装时没有连接传输端点……?
副标题[/!--empirenews.page--]
我用OCFS2替换了在双主模式下运行的死节点.所有步骤都有效: 的/ proc / DRBD version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbuild@builder10.centos.org,2012-05-07 11:56:36 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 直到我尝试装入卷: mount -t ocfs2 /dev/drbd1 /data/webroot/ mount.ocfs2: Transport endpoint is not connected while mounting /dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this error. /var/log/kern.log kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds,giving up and returning errors. kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status = -107 kernel: (mount.ocfs2,1):dlm_try_to_join_domain:1210 ERROR: status = -107 kernel: (mount.ocfs2,1):dlm_join_domain:1488 ERROR: status = -107 kernel: (mount.ocfs2,1):dlm_register_domain:1754 ERROR: status = -107 kernel: (mount.ocfs2,1):ocfs2_dlm_init:2808 ERROR: status = -107 kernel: (mount.ocfs2,1):ocfs2_mount_volume:1447 ERROR: status = -107 kernel: ocfs2: Unmounting device (147,1) on (node 1) 以下是节点0(192.168.3.145)上的内核日志: kernel: : (swapper,7):o2net_listen_data_ready:1894 bytes: 0 kernel: : (o2net,4024,3):o2net_accept_one:1800 attempt to connect from unknown node at 192.168.2.93 :43868 kernel: : (o2net,3):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 30.0 seconds,giving up and returning errors. kernel: : (o2net,3):o2net_set_nn_state:478 node 1 sc: 0000000000000000 -> 0000000000000000,valid 0 -> 0,err 0 -> -107 我确定两个节点上的/etc/ocfs2/cluster.conf是相同的: /etc/ocfs2/cluster.conf node: ip_port = 7777 ip_address = 192.168.3.145 number = 0 name = SVR233NTC-3145.localdomain cluster = cpc node: ip_port = 7777 ip_address = 192.168.2.93 number = 1 name = SVR022-293.localdomain cluster = cpc cluster: node_count = 2 name = cpc 他们连接得很好: # nc -z 192.168.3.145 7777 Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded! 但O2CB心跳在新节点上不活动(192.168.2.93): /etc/init.d/o2cb状态 Driver for "configfs": Loaded Filesystem "configfs": Mounted Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster cpc: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active 以下是在节点1上运行tcpdump同时在节点1上启动ocfs2时的结果: 1 0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0 2 0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274 [SYN,ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223 TSecr=690432180 3 0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223 4 0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt [PSH,ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223 5 0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181 6 0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [RST,ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181 每6个数??据包发送一次RST标志. 我还可以做些什么来调试这个案例? PS: 节点0上的OCFS2版本: > ocfs2-tools-1.4.4-1.el5 节点1上的OCFS2版本: > ocfs2-tools-1.4.4-1.el5 更新1 – Sun Dec 23 18:15:07 ICT 2012
不,它们是不同子网上的2个VMWare服务器.
当然,我在/ etc / hosts中添加了每个节点的主机名和IP地址: 192.168.2.93 SVR022-293.localdomain 192.168.3.145 SVR233NTC-3145.localdomain 并且他们可以通过主机名相互连接: # nc -z SVR022-293.localdomain 7777 Connection to SVR022-293.localdomain 7777 port [tcp/cbt] succeeded! # nc -z SVR233NTC-3145.localdomain 7777 Connection to SVR233NTC-3145.localdomain 7777 port [tcp/cbt] succeeded! 更新2 – 星期一12月24日18:32:15 ICT 2012 找到了线索:我的同事在群集运行时手动编辑了/etc/ocfs2/cluster.conf文件.因此,它仍然将死节点信息保存在/ sys / kernel / config / cluster /中: # ls -l /sys/kernel/config/cluster/cpc/node/ total 0 drwxr-xr-x 2 root root 0 Dec 24 18:21 SVR150-4107.localdomain drwxr-xr-x 2 root root 0 Dec 24 18:21 SVR233NTC-3145.localdomain (在这种情况下为SVR150-4107.localdomain) 我要停止集群删除死节点但是出现以下错误: # /etc/init.d/o2cb stop Stopping O2CB cluster cpc: Failed Unable to stop cluster as heartbeat region still active 我确定ocfs2服务已经停止: # mounted.ocfs2 -f Device FS Nodes /dev/sdb ocfs2 Not mounted /dev/drbd1 ocfs2 Not mounted 没有参考了: # ocfs2_hb_ctl -I -u 12963EAF4E16484DB81ECB0251177C26 12963EAF4E16484DB81ECB0251177C26: 0 refs (编辑:晋中站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |