Installing a DRBD backed NFS cluster in a pair of Debian servers

Kuko Armas <kuko@canarytek.com>
| ,

Installing a DRBD backed NFS cluster in a pair of Debian 8 servers (the SO and version are a client’s requirement)

Description

We will install a Highly Available NFS service with a DRBD replicated disk between two debian servers. For the cluster servicers we will use Corosync/Pacemaker

Node IP Components
nfs_vi 10.10.90.30 floating IP for NFS service
nfs01 10.10.90.31 nfs01 NFS service network
nfs02 10.10.90.32 nfs02 NFS service network
nfs01-adm 10.10.70.31 nfs01 syncronization and cluster network
nfs02-adm 10.10.70.32 nfs02 syncronization and cluster network

Setup DRBD storage configuration

We will use a second 1TB disk on each node (/dev/sdb). Since we are using the whole disk for DRBD, we will not use LVM

  • Create a single partition using the whole disk (dev/sdb1)
  • Install DRBD

      apt-get install drbd-utils
    
  • Setup DRBD config on both nodes
cat > /etc/drbd.d/nfs01.res <<EOF
resource nfs-vol01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/sdb1;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on nfs01 {
  address  10.10.70.31:7790;
 }
 on nfs02 {
  address  10.10.70.32:7790;
 }
}
EOF
  • Initialize DRBD metadata in both nodes

      drbdadm create-md nfs-vol01
    
  • Start DRBD resource on both nodes

      drbdadm up nfs-vol01
    
  • Since this is the first time we start the resource, both nodes will be secondary and the state should be “Inconsistent”. i

root@nfs01:~# drbd-overview
 0:nfs-vol01/0  Connected Secondary/Secondary Inconsistent/Inconsistent
  • We need to force one node to be the primary in order to force a sincronization and be able to write to the drbd device and create a filesystem. We will force nfs01 to be the primary, so run the following steps only in nfs01

      drbdadm primary --force nfs-vol01
    
  • Now, the DRBD device will start a sincronization, you can check the status with the following command:

 0:nfs-vol01/0  SyncSource Primary/Secondary UpToDate/Inconsistent
	[>....................] sync'ed:  0.2% (1047244/1048540)Mfinish: 7:29:04 speed: 39,792 (23,328) K/sec
  • Since this is a quite big volume (1TB) it will take a loong time to complete synchornization, but it should already be activated read/write in nfs01, so we can continue our setup

  • We will use XFS as the filesystem for exported volumes, so we create a filesystem (remember, only in nfs01)

      mkfs -t xfs /dev/drbd0
    
  • We will mount the DRBD volume in /nfs, create the mount point and make sure we can mount it

      mkdir /nfs
      mount /dev/drbd0 /nfs
    
  • In the shared volume we will store the exported data and internat NFS status information. Create both directories

      mkdir /nfs/exported
      mkdir /nfs/nfsinfo
    
  • Unmount the volume

      umount /nfs
    
  • When the synchronization is complete, set nfs-01 as secondary again

      drbdadm secondary nfs-vol01
    

Install cluster software

  • We will use the pcs utils, but Debian 8 does not include them. We need to install from jesse-backports repo
  • Add jessie-backorts repo
cat > /etc/apt/sources.list.d/jessie-backports.list <<EOF
deb http://http.debian.net/debian jessie-backports main
EOF
  • Install pcs tools
    apt-get update
    apt-get install -t jessie-backports pcs corosync pacemaker
  • Setup a password for the hacluster user in both nodes. You can use a random password, since you will only need to use it once in the next section.

Setup cluster

Most of the steps in this section needs to be done only in one node. We will do it in nfs01

  • Setup authentication to the pcsd service using the previous password
root@nfs01:~# pcs cluster auth nfs01-adm nfs02-adm -u hacluster -p $PASSWORD
nfs01-adm: Authorized
nfs02-adm: Authorized
  • Create the corosync cluster
root@nfs01:~# pcs cluster setup --name nfs01_cluster nfs01-adm nfs02-adm
Destroying cluster on nodes: nfs01-adm, nfs02-adm...
nfs01-adm: Stopping Cluster (pacemaker)...
nfs02-adm: Stopping Cluster (pacemaker)...
nfs01-adm: Successfully destroyed cluster
nfs02-adm: Successfully destroyed cluster

Sending cluster config files to the nodes...
nfs01-adm: Succeeded
nfs02-adm: Succeeded

Synchronizing pcsd certificates on nodes nfs01-adm, nfs02-adm...
nfs01-adm: Success
nfs02-adm: Success

Restarting pcsd on the nodes in order to reload the certificates...
nfs01-adm: Success
nfs02-adm: Success
  • Start the cluster

      pcs cluster start --all
    
  • Now we have a basic cluster with no services configured

root@nfs01:~# pcs status
Cluster name: nfs01_cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: nfs02-adm (version 1.1.16-94ff4df) - partition with quorum
Last updated: Wed Sep  6 20:33:18 2017
Last change: Wed Sep  6 20:31:53 2017 by hacluster via crmd on nfs02-adm

2 nodes configured
0 resources configured

Online: [ nfs01-adm nfs02-adm ]

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Setup cluster services

In this section we will setup the cluster services. When pacemaker starts the service in a node, it has to do th following steps (in this order):

  • Set the selected node as “primary” of the DRBD volume (to be able to mount it)
  • Mount the filesystem in the defined mountpoint
  • Setup the NFS service IP
  • Start the nfs service
  • Add NFS the exported resources

So we need to setup all these resources with the needed dependency information to make sure it is done in the correct order

One interesting feature we are going to use is that we can setup all changes to a file, and then commit that file to the cluster as a single transaction

  • First, create our base cluster config file, from the existing cluster

      pcs cluster cib cluster_config
    
  • Since this is a two node cluster, where consensus is impossible, we disable consensus. Also, disable stonith since we don’t have any available stonith options

      pcs -f cluster_config property set stonith-enabled=false
      pcs -f cluster_config property set no-quorum-policy=ignore
    
  • Since this is a “symmetric” cluster, prevent the resources from moving after a node recovery, because it can increase downtime

      pcs -f cluster_config resource defaults resource-stickiness=200
    
  • Create cluster resource and clone for the DRBD volume

pcs -f cluster_config resource create nfs01-vol ocf:linbit:drbd \
  drbd_resource=nfs-vol01 \
  op monitor interval=30s
pcs -f cluster_config resource master nfs01-clone nfs01-vol \
  master-max=1 master-node-max=1 \
  clone-max=2 clone-node-max=1 \
  notify=true
  • Create a cluster resource for the filesystem, and define the dependencies. We also define options to activate XFS quotas
pcs -f cluster_config resource create nfs01_fs Filesystem \
  device="/dev/drbd0" \
  directory="/nfs" \
  fstype="xfs" \
  options=uquota,gquota
pcs -f cluster_config constraint colocation add nfs01_fs with nfs01-clone \
  INFINITY with-rsc-role=Master
pcs -f cluster_config constraint order promote nfs01-clone then start nfs01_fs
  • Create the cluster resource for the floating service ip used for NFS
pcs -f cluster_config resource create nfs_vip01 ocf:heartbeat:IPaddr2 \
 ip=10.10.90.30 cidr_netmask=24 \
 op monitor interval=30s
pcs -f cluster_config constraint colocation add nfs_vip01 with nfs01_fs INFINITY
pcs -f cluster_config constraint order nfs01_fs then nfs_vip01
  • Create a resource for the nfs service
pcs -f cluster_config resource create nfs-service nfsserver nfs_shared_infodir=/nfs/nfsinfo nfs_ip=10.10.90.30
pcs -f cluster_config constraint colocation add nfs-service with nfs_vip01 INFINITY
pcs -f cluster_config constraint order nfs_vip01 then nfs-service
  • Create the nfs exports resources
pcs -f cluster_config resource create nfs-export01 exportfs clientspec=10.10.80.0/24 options=rw,sync,no_subtree_check,acl directory=/nfs/exported fsid=0
pcs -f cluster_config constraint colocation add nfs-export01 with nfs-service INFINITY
pcs -f cluster_config constraint order nfs-service then nfs-export01
  • Verify that defined resources and constraints are correct
root@nfs01:~# pcs -f cluster_config resource show
 Master/Slave Set: nfs01-clone [nfs01-vol]
     Stopped: [ nfs01-adm nfs02-adm ]
 nfs01_fs	(ocf::heartbeat:Filesystem):	Stopped
 nfs_vip01	(ocf::heartbeat:IPaddr2):	Stopped
 nfs-service	(ocf::heartbeat:nfsserver):	Stopped
 nfs-export01	(ocf::heartbeat:exportfs):	Stopped

root@nfs01:~# pcs -f cluster_config constraint
Location Constraints:
Ordering Constraints:
  promote nfs01-clone then start nfs01_fs (kind:Mandatory)
  start nfs01_fs then start nfs_vip01 (kind:Mandatory)
  start nfs_vip01 then start nfs-service (kind:Mandatory)
  start nfs-service then start nfs-export01 (kind:Mandatory)
Colocation Constraints:
  nfs01_fs with nfs01-clone (score:INFINITY) (with-rsc-role:Master)
  nfs_vip01 with nfs01_fs (score:INFINITY)
  nfs-service with nfs_vip01 (score:INFINITY)
  nfs-export01 with nfs-service (score:INFINITY)
Ticket Constraints:
  • If everything seems ok, push the configuration to the cluster

      pcs cluster cib-push cluster_config
    
  • Check the cluster status. Everything should be OK

root@nfs01:~# pcs status
Cluster name: nfs01_cluster
Stack: corosync
Current DC: nfs01-adm (version 1.1.16-94ff4df) - partition with quorum
Last updated: Fri Sep  8 00:45:47 2017
Last change: Thu Sep  7 16:14:39 2017 by root via cibadmin on nfs01-adm

2 nodes configured
6 resources configured

Online: [ nfs01-adm nfs02-adm ]

Full list of resources:

 Master/Slave Set: nfs01-clone [nfs01-vol]
     Masters: [ nfs01-adm ]
     Slaves: [ nfs02-adm ]
 nfs01_fs	(ocf::heartbeat:Filesystem):	Started nfs01-adm
 nfs_vip01	(ocf::heartbeat:IPaddr2):	Started nfs01-adm
 nfs-service	(ocf::heartbeat:nfsserver):	Started nfs01-adm
 nfs-export01	(ocf::heartbeat:exportfs):	Started nfs01-adm

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
  • Now you can mount the NFS export on client nodes and check the failover process (shutdown the active server and the client shouldn’t notice)

  • BTW, in order for the client nodes to recover from a NFS server failover, better use NFSv3 on UDP, since UDP is stateless and recovers better from server interruptions. We use the following options to mount on the clients

      rsize=8192,wsize=8192,acl,udp,nfsvers=3,rw
    
  • That’s all, go and test! and Good luck!