This morning I successfully set up a clustered, high availability pair of Zimbra (VMware virtual) servers, synced with DRBD and using Heartbeat to failover to the secondary standby server.
This is a howto that tries to cover *all* the steps, as there seems to be a great series of Howto's on the subject that in one way or another, leave something out with 'I am assuming you already (insert service here) working and will not cover this' clauses. In particular I ran into a few small hurdles with DRBD and hostnames and whatnot, so tried to document what I needed to do to make it work.
In this set up, I install Debian Etch 4.0 on a VMware VM using the netinstall iso image, and along the line (after installing Zimbra itself) I clone the machine by copying its vmdk disk image to save time and avoid having to duplicate too many steps.
In this howto, the one zimbra 'domain' that both servers believe themselves to be is 'zimbra.yourdomain.com'. You'll notice a bit of hostname fiddling from time to time: this is required to keep Zimbra happy at install time, and also later during DRBD configuration things change again. In the end, the two VMs are 'zimbra-1' and 'zimbra-2' with respective IPs of 192.168.1.11 and 192.168.1.12. The 'virtual' IP of 'zimbra.yourdomain.com' is 192.168.1.10. Heartbeat configures whichever server is to take over the running of Zimbra with this virtual IP as a virtual ethernet interface.
Please replace zimbra.yourdomain.com, zimbra-1, zimbra-2 and the IP addresses to whatever suits your environment.
Howto
1) First steps - DNS
I edited the DNS server authoritative for the domain 'yourdomain.com' (in my case, an internal DNS server on the same LAN) to add these entries:
zimbra IN A 192.168.1.10 zimbra MX 10 zimbra zimbra-1 IN A 192.168.1.11 zimbra-1 MX 10 zimbra-1 zimbra-2 IN A 192.168.1.12 zimbra-2 MX 10 zimbra-2
2) Debian Install - Manual Partitioning
I did a standard netinstall of Debian Etch on the zimbra-1 VM, but manually set up the partitioning as follows. Note the low specs of these machines, it was only a test after all and not a production server :)
/boot /dev/sda1 100MB (primary) (bootable flag on) / /dev/sda5 3GB (logical) (ext3) swap /dev/sda6 512MB (logical) (unmounted) /dev/sda7 150MB (logical) (ext3) # this'll be the DRBD meta-disk (unmounted) /dev/sda8 7GB (logical) (ext3) # this'll be the /opt partition used by DRBD
3) Remove exim4
If you installed Debian with a network mirror and 'Standard System' checked in tasksel, Debian will install exim4 which we don't want since Zimbra will be using its Postfix installation.
apt-get remove --purge exim4 exim4-base exim4-config exim4-daemon-light
4) Install extra packages
These packages are required to install Zimbra. We also throw in DRBD for use later on.
apt-get install ntp ntpdate libc6-i686 sudo libidn11 curl fetchmail libgmp3c2 libexpat1 libgetopt-mixed-perl libxml2 libstdc++6 libpcre3 libltdl3 ssh drbd0.7-module-source drbd0.7-utils linux-headers-`uname -r`
5) Edit (fudge) the hostname to keep Zimbra happy
To install Zimbra successfully, we must trick the server into thinking it is the 'real' domain zimbra.yourdomain.com where in fact it is zimbra-1.
echo zimbra.yourdomain.com > /etc/hostname
6) Reboot the server
reboot
7) Mount /opt
We will now temporarily mount /dev/sda8 as /opt so that we can do a Zimbra installation.
mount -t ext3 /dev/sda8 /opt
8) Download, extract and install Zimbra Collaboration Suite (Open Source edition)
At the time of writing, ZCS was version 5.09 and we are downloading the Open Source Edition Debian pack.
cd /tmp/ wget "http://h.yimg.com/lo/downloads/5.0.9_GA/zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz" tar zxfv zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz cd zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219 ./install.sh -l
This install should go ok if your hostname is set to zimbra.yourdomain.com. Zimbra will alert you to a DNS MX record error, because the MX record for zimbra.yourdomain.com points to the virtual IP (192.168.1.10) and not zimbra-1's IP (192.168.1.11) . That's ok, we want it like that, so ignore the error and say 'No' to 'Change domain' or whatever the question is.
9) Remove Zimbra startup scripts
We want to remove the Zimbra startup scripts because Heartbeat will be handling the starting of Zimbra when it needs to.
This command will probably work:
update-rc.d -f zimbra removerm /etc/rc2.d/S99zimbra rm /etc/rc3.d/S99zimbra rm /etc/rc4.d/S99zimbra rm /etc/rc5.d/S99zimbra
10) Change hostname back for DRBD, modify /etc/hosts
Now that we have Zimbra installed, we need to change the hostname again to make DRBD work. Note that you can't just edit /etc/hosts and fudge the local hostname because DRBD is smarter and will report a mismatch if /etc/hostname and /etc/hosts don't agree.
echo zimbra-1 > /etc/hostname
127.0.0.1 zimbra.yourdomain.com localhost.localdomain localhost 192.168.1.11 zimbra-1 zimbra.yourdomain.com 192.168.1.12 zimbra-2 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts
11) Shutdown and clone zimbra-1 to make a zimbra-2
At this point I cloned zimbra-1's vmdk image and created a zimbra-2. Edit per comment: glossed over this, but it seems pretty obvious, being a clone, it will have the same IP etc as zimbra-1. To make zimbra-2 the equivalent of zimbra-1, set its IP to be 192.168.1.12 instead of zimbra-1's 192.168.1.11 (you may have issues bringing up the eth interface entirely until that point, since it was a virtual machine, edit /etc/networking/interfaces from within the VMware Console) and change the hostname:
echo zimbra-2 > /etc/hostname
127.0.0.1 zimbra.yourdomain.com localhost.localdomain localhost 192.168.1.12 zimbra-2 zimbra.yourdomain.com 192.168.1.11 zimbra-1 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts
12) Reboot both servers, install DRBD and configure
On both zimbra-1 and zimbra-2:
cd /usr/src/ tar xvfz drbd0.7.tar.gz cd modules/drbd/drbd make make install mv /etc/drbd.conf /etc/drbd.conf.orig
resource r0 {
protocol C;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on zimbra-1 {
device /dev/drbd0;
disk /dev/sda8;
address 192.168.1.11:7788;
meta-disk /dev/sda7[0];
}
on zimbra-2 {
device /dev/drbd0;
disk /dev/sda8;
address 192.168.1.12:7788;
meta-disk /dev/sda7[0];
}
}13) Get the first DRBD sync going
On zimbra-1 and zimbra-2
modprobe drbd drbdadm up all
Otherwise with no errors:
On zimbra-1:
drbdadm -- --do-what-I-say primary all drbdadm -- connect all
cat /proc/drbd
version: 0.7.20 (api:77/proto:74)
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0
[==>..............] sync'ed: 3.1% (7000/7168)M
finish: 1:14:16 speed: 2,644 (2,204) K/sec
1: cs:UnconfiguredWe're almost there!!
14) Install and configure Heartbeat
On zimbra-1 and zimbra-2:
apt-get install heartbeat
/etc/heartbeat/ha.cf
logfacility local0 keepalive 2 deadtime 20 # timeout before the other server takes over bcast eth0 node zimbra-1 zimbra-2 # our two zimbra VMs auto_failback on # very important or auto failover won't happen
zimbra-1 IPaddr::192.168.1.10/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra
Finally, create /etc/heartbeat/authkeys on both servers
This file needs an md5 string, which each heartbeat daemon uses to authenticate with the other. I ran a quick php 'echo md5("my password"); to get an md5 string.
auth 3 3 md5 yourrandommd5string
Protect the permissions of authkeys file on both servers:
chmod 600 /etc/heartbeat/authkeys
15) Reboot!
At this point Zimbra should fire up on zimbra-1 as normal. Do a 'df -h' on zimbra-1 and you'll see the /dev/drbd0 device has mounted /opt and if you run ifconfig, you'll see the eth0:0 entry that contains the virtual IP 192.168.1.10. You should be able to visit http://zimbra.yourdomain.com or http://192.168.1.10 and see a working Zimbra system that is running off of zimbra-1.
16) Test the failover
Shutdown zimbra-1. If you tail -f /var/log/messages on zimbra-1 as it shuts down, you should see it release drbd and heartbeat, and running tail -f /var/log/messages on zimbra-2 will show it pick up the virtual IP, mount /dev/drbd0 and kick off the Zimbra startup scripts.
When the startup scripts have finished, visit http://zimbra.yourdomain.com just like you did before and everything should appear to still be running, except now we're running off zimbra-2!
Fire up zimbra-1 again and it will take back the control from zimbra-2.
Congratulations, you have automatic failover and high availability of your Zimbra service!
Feel free to leave comments, feedbacks, or corrections in the event that I've done something wrong.. but this worked for me no problems. I hope it works for you.
Hi, Thank you, i found this
Hi,
Thank you, i found this article very good, i would like to know if is possible to have a cluster active - active, where both zimbra servers are up.
The problem is when you have to manage the process on memory. There is a way to deal with this, maybe with a raw partition who act like quorum for the cluster nodes.
Best regards,
Hi, I'm not sure if you can
Hi,
I'm not sure if you can do active-active relationship easily with the Open Source edition. I believe if you need active-active, it's easier to buy the Network Edition which supports multi-node Zimbra clusters (Red Hat Cluster Suite) in an active-active relationship. I think the Network Edition also comes with various other perks (like proper backup solutions for clusters), but I haven't tried it.
Hi, Thank you..i think this
Hi,
Thank you..i think this article is very good..
But i faced a lot of problems when the time of installation.it will be very helpful if u give a detailed description of this configuration.Also there is any entry we need to add in /etc/fstab. my drbd.conf file is
resource drbd0 {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
}
on zimbra01.domain.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.5.35:7788;
meta-disk internal;
}
on zimbra02.domain.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.5.36:7788;
meta-disk internal;
}
}