CentOS Cluster-DRBD Setup

From The Customer

"We have used Pate Consulting for several years with a high degree of satisfaction. We initially used Kevin to convert remote locations from a fixed line to Internet accessible locations saving around $4,000.00 per month. We currently use Kevin for all high level networking and security issues. We have been able to contact Kevin at all hours and he has always responded on a very timely basis. Through innovative uses of current technology, Kevin has taken our company to a stable technological environment. Pate Consulting will be our first point of contact for technological consultation and I have no hesitation in recommending him to any other company."
John Moczygemba
Vice President
Tetco, Inc.

"I have used Pate Consulting as a valuable source of I.T. help for about eight years now. Through that entire time Kevin and his staff have proven themselves to be reliable, efficient, thorough and affordable. They have never failed to be immediately available when I needed them for a problem and have always been there to assist me as needed on various projects I was working on. I myself changed positions once during that time and I made sure that Pate Consulting was on my list of must have contacts for the move. I highly recommend their services for anyone who is in need of timely and expert help involving one of the many areas they cover."
Mike Williams
IT Director
Forge USA

"The Houston Airport System needed to continue to monitor the 250 device network while reducing monitoring costs. Pate Consulting suggesting using an Open Source product that would give us greater visibility and control over our network devices and reduce our current monitoring costs. With the help and guidance of Pate Consulting, within sixty days, we were able to make the switch to the new product and will pay for the cost of the new system with just two months of network monitoring savings. Total annual cost reductions are over 500%. Rarely do we ever get an ROI of that magnitude and we would not have achieved these savings without the help of Pate Consulting."
Matt Hyde
CTO
Houston Airport System

"Our company has been using Pate Consulting since its inception. If we have to leave a message, Kevin is always prompt getting back with us and tries to get by the office the same day. He always has an answer to our questions and finds a solution for our problems. He has been a life saver for us!"
Bob Pilegge
Executive Director
Finkelstein Partners, Ltd.

Submitted by Linux Guru on Thu, 10/15/2009 - 11:51pm

As part of a MySQL Cluster setup, I recently setup a 2-node web cluster using CentOS's native cluster software suite with a twist. The web root was mounted on a DRBD partition in place of periodic file sync'ing. This article focuses on the cluster setup and does not cover the DRBD setup/configuration. Let's go:

Install "Cluster Storage" group using yum:

[root@host]# yum installgroup "Cluster Storage"

Edit /etc/cluster/cluster.conf: ======================================================
<?xml version="1.0"?>
<cluster name="drbd_srv" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="WEB_Node1" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="human" ipaddr="10.255.255.225"/>
</method>
</fence>
</clusternode>
<clusternode name="WEB_Node2" votes="1" nodeid="2">
<fence>
<method name="single">
<device name="human" ipaddr="10.255.255.226"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fence_devices>
<fence_device name="human" agent="fence_manual"/>
</fence_devices>
</cluster>
======================================================

Start the cluster:

[root@host]# service cman start (on both nodes)

To verify proper startup:

[root@host]# cman_tool nodes

Should show:

Node Sts   Inc   Joined               Name
   1   M     16   2009-08-11 22:13:27 WEB_Node1
   2   M     24   2009-08-11 22:13:34 WEB_Node2

Status 'M' means normal, 'X' would mean there is a problem

Edit /etc/lvm/lvm.conf

change:

locking_type = 1

to:

locking_type = 3

and change:

filter = ["a/.*/"]

to:

filter = [ "a|drbd.*|", "r|.*|" ]

Start clvmd:

[root@host]# service clvmd start (on both nodes)

Set cman and clvmd to start on bootup on both nodes:

[root@host]# chkconfig --level 345 cman on
[root@host]# chkconfig --level 345 clvmd on

Run vgscan:

[root@host]# vgscan

Create a new PV (physical volume) using the drbd block device

[root@host]# pvcreate /dev/drbd1

Create a new VG (volume group) using the drbd block device

[root@host]# vgcreate name-it-something /dev/drbd1 (VolGroup01, for example)

Now when you run vgdisplay, you should see:

--- Volume group ---
VG Name             VolGroup01
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No 1
VG Access             read/write
VG Status             resizable
Clustered             yes
Shared                no
MAX LV                0
Cur LV                0
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               114.81 GB
PE Size               4.00 MB
Total PE              29391
Alloc PE / Size       0 / 0
Free PE / Size       29391 / 114.81 GB
VG UUID               k9TBBF-xdg7-as4a-2F0c-XGTv-M2Wh-CabVXZ

Notice the line that reads "Clustered yes"

Create a LV (logical volume) in the PV that we just created.
**Make sure your drbd nodes are both in primary roles before doing this.

If there is a "Split-Brain detected, dropping connection!" entry in /var/log/messages, then a manual split-brain recovery is necessary.

To manually recover from a split-brain scenario (on the split-brain machine):

[root@host]# drbdadm secondary r0
[root@host]# drbdadm -- --discard-my-data connect r0

On verified that both nodes are in primary mode:

On node2:

[root@host]# service clvmd restart

On node1:

[root@host]# lvcreate -l 100%FREE -n gfs VolGroup01

Format the LV with gfs:

[root@host]# mkfs.gfs -p lock_dlm -t drbd_srv:www -j 2 /dev/VolGroup01/gfs (drbd_srv is the cluster name, www is a locking table name)

Start the gfs service (on both nodes):

[root@host]# service gfs start

Set the gfs service to start automatically (on both nodes):

[root@host]# chkconfig --level 345 gfs on

Mount the filesystem (on both nodes):

[root@host]# mount -t gfs /dev/VolGroup01/gfs /srv

Modify the startup sequence as follows:

1) network
...
2) drbd (S15)
3) cman (S21)
4) clvmd (S24)
5) gfs   (S26)

Remove the soft link for shutting down openais (K20openais in runlevel 3 for our install) because shutting down cman does the same.

Modify the shutdown sequence as follows (in reverse order as startup sequence):

1) gfs   (K21)
2) clvmd (K22)
3) cman (K23)
4) drbd (K24)
...
5) network

Reboot each server to test availability of gfs filesystem during reboot.

Troubleshooting:

If one node's drbd status is showing:

version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17

1: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:40 dr:845 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 oos:12288

and the other node is showing:

version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17

1: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:56 dr:645 al:3 bm:2 lo:0 pe:0 ua:0 ap:0 oos:8200

Also, the node with cs:WFConnection in the status field should have the gfs filesystem mounted.

Then, there is a drbd sync issue. To solve this, just restart the gfs cluster services on the node with "cs:Standalone" in the status field by:

1) service clvmd stop
2) service cman stop
3) service drbd restart

Verify you see Primary/Primary in the st: section of the drbd status (cat /proc/drbd)

4) service cman start
5) service clvmd start
6) service gfs start

Contact us

Pate Consulting, Inc.
2100 West Loop South
Houston, TX 77027
info@pateconsulting.com
Follow Us on Twitter

News

Pate Consulting, Inc. ribbon cutting ceremony was held at the Houston Northwest Chamber Offices on November 4th, 2009 @ 9:35am.
We had a great turnout!! Thanks to everyone who showed up for us!!

Our new website was launched Saturday, October 17, 2009!!

Houston Linux Consulting

From The Customer