Showing posts with label Linux. Show all posts
Showing posts with label Linux. Show all posts

Thursday, 14 November 2013

Virtualised Exalogic and External DNS Servers

Quite often when configuring Exalogic issues arise with accessing a DNS server,  resulting in delays.  From a management perspective this generally reveals itself as a pause when using ssh to connect to a server of 20-30 seconds.   During management via Exalogic Control DNS issues sometimes cause timeouts in jobs and hence failures. From an application perspective this is often shown up when access to shares on the shared storage take a long time to become available and the creation time or initial read of a file is slow. 

Virtual servers deployed onto Exalogic can easily be setup to access DNS over the 10GbE network either by configuring the Network Services on the EoIB network.  (Select the network that gives access to the 10GbE on your rack and select the "Edit Network Services" action.)  or by simply editing the /etc/resolv.conf file on your vServer to point it to the DNS servers in the environment.  (This could be put into a template if this approach is preferred.)

Editing network services in Exalogic Control
Note - Health Warning - Only attempt to change the network services if you are running Exalogic Elastic Cloud Software with a version of 2.0.6.0.0 or higher!

The shared storage is a slightly different kettle of fish.  When setup it has direct access to the 1GbE management LAN and it is normally through this network that it would gain access to services such as LDAP/NIS or DNS.  However the 1GbE network is not setup to be fault tolerant within Exalogic.  As such a route through the 10GbE network that is fault tolerant should be created.  A DNS service on an vServer can be easily setup that the shared storage can access, following the same principles as was talked about in an earlier blog posting about setting up LDAP for access via internal vServers.

To achieve a similar setup for DNS the following steps should be done:-

  1. Create your vServer with access to at least the 10GbE and the vserver-shared-storage networks.  (Ensure it is marked for HA or alternatively plan for two vservers both running DNS and part of a distribution group.)
  2. Configure the vServer to act as a DNS server.  Can be done using tools like dnsmasq or from the bind package.  The example shown here is using bind to create the service.
    1. Setup a yum repository that your vServer can access.
    2. Install the bind package.
      # yum install bind --skip-broken
      (Notes:-
      • We include the option --skip-broken so that it does not upgrade the packages that bind relies on.  With the rack I tested on there are other utilities that depend on the bind-libs package and upgrading this caused issues with the Infiniband network.  Simply ignoring this mismatch and the named daemon is installed and seems to operate successfully.
      • Not strictly necessary but for testing purposes the unix command nslookup is quite handy.  If this is not already installed then install the bind-utils package.)
    3. Create the /etc/named.conf file with content along the lines of that shown below.

      # cat /etc/named.conf
      options {
          directory "/var/named";

          # Hide version string for security
          version "not currently available";

          # Listen to the loopback device and internal networks only
          listen-on { 127.0.0.1; 172.16.0.14; 172.17.0.41; };
          listen-on-v6 { ::1; };

          # Do not query from the specified source port range
          # (Adjust depending your firewall configuration)
          avoid-v4-udp-ports { range 1 32767; };
          avoid-v6-udp-ports { range 1 32767; };

          # Forward all DNS queries to your DNS Servers
          forwarders { 10.5.5.4; 10.5.5.5; };
          forward only;

          # Expire negative answer ASAP.
          # i.e. Do not cache DNS query failure.
          max-ncache-ttl 3; # 3 seconds

          # Disable non-relevant operations
          allow-transfer { none; };
          allow-update-forwarding { none; };
          allow-notify { none; };
      };
    4. Startup the DNS daemon (named) to ensure it is OK.
      # service named start
    5. Set it up to automatically startup.
      # chkconfig named on
  3. Configure the Storage to include the vServer shared storage IP address in its list of DNS servers.  In our case it is using the Internal vServer IP address of 172.17.0.41 first then would be using other IP addresses via the 1GbE network should that fail.

Configuring DNS on the ZFS Storage Appliance

Wednesday, 5 December 2012

Backing up an Exalogic vServer via templating the vServer

 Introduction


Following on from my earlier post about backing up a vServer using the rsync command  it is also possible to effectively backup a vServer by using the capability to template it. This is documented in appendix F of the Cloud Administrators guide however an example process is documented here to create a template and re-create a vServer from this template.

A really useful little script has been created by the Exalogic A-Team that could save you some time and effort in templating a vServer.  It is available for download from here.   To do it manually read on....

The vServer we will be using to perform the actions on is the same one that we have done a backup with using rsync.  Namely a vServer that has been configured to perform an rsync backup and has an additional partition over and above the Exalogic base template mounted on /u01 that contains a deployment of Weblogic.

The general steps to follow are:-
  1. Shutdown vServer
  2. Clone in OVMM
  3. Startup cloned image
  4. Log on and edit to remove configuration
  5. Shutdown
  6. Copy files to create a template
  7. Import template to Exalogic Control
  8. Delete previous vServer
  9. Create new vServer based on new template
  10. Check operation.

Shutdown/Clone Operations (Backup)

The first step is simply to shutdown the vServer, this can be done from Exalogic Control. Then we switch context to log in to OVMM in order to perform the cloning activity. Below is a screenshot of the clone process in OVMM.



As you can see we do not clone as a template but clone the machine as a vServer. This is because we will make changes to the new vServer so that it can become a template for Exalogic Control.  Thus once the job to clone the machine has completed we can then go in and start the server up. The behaviour is to automatically assign the cloned vServer into the target server pool that was selected, however it will be stopped by default. By highlighting the pool and selecting the "Virtual Machines" tab we are able to select our newly created clone and start it.

Once the machine has started it is possible to log onto the cloned vServer using the IP address of the previous instance. Log on as root and now we want to make a number of changes to the configuration files so that it becomes an "unconfigured" vServer, ready to be imported as a template into Exalogic Control. The changes to perform are described below.

Action 
Detail 
Edit and /etc/sysconfig/ovmd file and change the INITIAL_CONFIG=no parameter to INITIAL_CONFIG=yes. Save the file after making this change.

Remove DNS information by running the following commands:
cd /etc
sed -i '/.*/d' resolv.conf
Remove SSH information by running the following commands:
rm -f /root/.ssh/*
rm -f /etc/ssh/ssh_host*
Clean up the /etc/sysconfig/network file by running the following commands:
cd /etc/sysconfig
sed -i '/^GATEWAY/d' network
Clean up the hosts files by running the following commands:
cd /etc
sed -i '/localhost/!d' hosts
cd /etc/sysconfig/networking/profiles/default
sed -i '/localhost/!d' hosts
Remove network scripts by running the following commands:
cd /etc/sysconfig/network-scripts
rm -f ifcfg-*eth*
rm -f ifcfg-ib*
rm -f ifcfg-bond*
Remove log files, including the ones that contain information you do not want to propagate to new vServers, by running the following commands:
cd /var/log and remove the following files
messages*, ovm-template-config.log,ovm-network.log, boot.log*, cron*, maillog*, messages*, rpmpkgs*, secure*, spooler*, yum.log*
Remove kernel messages by running the following commands:
cd /var/log
rm -f dmesg
dmesg -c
Edit the /etc/modprobe.conf file and remove the following lines (and other lines starting with alias bond):
options bonding max_bonds=11
alias bond0 bonding
alias bond1 bonding
Edit the /etc/sysconfig/hwconf file and modify the driver: mlx4_en entry to driver: mlx4_core. Save the file after making changes.

Remove the Exalogic configuration file by running the following command:
rm -f /etc/exalogic.conf
Remove bash history by running the following commands:
rm -f /root/.bash_history
history -c

Once completed stop the vServer from the command line. Then log onto one of the hypervisor compute nodes. What we need to do is copy the disk images and the vm.cfg file from the OVS repository into a scratch area where we will create the template.  The simplest mechanism to achieve this on an Exalogic rack is by placing them onto the handy ZFS appliance. This can be made available via HTTP to Exalogic Control to upload the template. Thus the steps to follow are:-
  1. Mount a share on the compute node
    # mkdir /mnt/images
    # mount <ZFS Appliance IP>:/export/common/images /mnt/images
  2. Under the /OVS/Repositories directory will be a unique ID then a directory called VirtualMachines. Under this directory will be multiple directories named by their identifiers. Each with a vm.cfg file contained within. This is one of the files that we need to copy to the scratch area.
    # cd /OVS/Repositories/*/VirtualMachines
    # grep -i simple */vm.cfg
    This will enable you to spot the name of the cloned vServer and hence identify the correct vm.cfg file.
  3. Copy the cloned vServer vm.cfg to the scratch area.
    # cp vm.cfg /mnt/images
  4. Inside the vm.cfg file is a line that specifies the disks involved. Copy the disk image into the scratch area.
  5. Create the template by simply creating a tar.gz file from the config file and the disk image.
    # cd /mnt/images
    # tar zvcf my_template.tar.gz vm.cfg <disk image ID.img>

Startup/Create Operations (Restore)

Now load up the template into Exalogic control and create a vServer from it. If the new vServer looks to match in perfectly with the old one and all your testing proves a successful duplicate then all we need do is a tidy up exercise:-
  • Delete the image file and config file from the location where we created the template. (You may want to delete the template as well although it might be worth keeping it as a historical archive.  It will depend on how much free storage space you have.)
  • Delete the clone from OVMM. Make sure you mark all the volumes to be deleted.

For more complicated deployments it is likely that if you are moving your vServer to a new rack or recreating another instance there may be changes required to configuration held on disk to correct things such as IP address changes, mounts in /etc/fstab, /etc/hosts file etc.

Advantages/Disadvantages of this approach

Using the template capability has both advantages and disadvantages and it will depend on what you are aiming to achieve as to what backup approach you use.


Advantages Disadvantages
Ability to make the backup portable to any Exalogic rack The existing vServer must be shutdown, making its service unavailable for a period of time.
A simple process Not able to recover individual files and directories without going through an entire process of creating another vServer and copying files back from this newly created vServer.
Intensive work required to script up for automated backup.

Tuesday, 27 November 2012

Backup and Recovery of an Exalogic vServer via rsync

Introduction

On Exalogic a vServer will consist of a number of resources from the underlying machine. These resources include compute power, networking and storage. In order to recover a vServer from a failure in the underlying rack all of these components have to be thoughts about. This article only discusses the backup and recovery strategies that apply to the storage system of a vServer.

There are three general approaches that can be applied to the backup and restore process of a vServer. These being:-
  1. Use the ZFS storage capabilities to backup and restore the entire disk images.
  2. Use a backup mechanism, such as rsync, to copy data from the root disks of the vServer to a backup storage device.
  3. Template the existing vServer on a regular basis and use this template to create a new vServer to restore.

Backup using ZFS appliance to backup full disks

This approach essentially makes use of the ZFS appliance to create a backup of the entirety of the ExalogicRepo share and taking a copy of the full disk images. The restore is then done via a process of recovering the root disks and any additional volumes for a vServer and replacing the existing images. As a process it is fairly simple to implement but has some limitations, for example it does not enable the migration from one rack to another, or even the moving to a different physical host within a rack is involved. Similarly restoring individual files or filesystems would mandate starting up the backup copying the files off, shutting it down and reverting to the original and copying the file in.
To be certain of not having a corrupted backup it would also be necessary to ensure that the vServer being backed up is not running at the time that the backup/snapshot is taken.

Backup using backup technology from the vServer - rsync

Introduction

This approach makes use of a backup capability within the Linux environment of the vServer itself. Very much a "standard" approach in the historical physical world where a backup agent is installed into an operating system, this agent backups all the files to a media server. There are many products that provide these services from all the main backup vendors. In this example we will consider using the linux command rsync to provide the capability to backup to the ZFS appliance.

Backup using rsync & ZFS Appliance snapshot capability

The backup process incorporates configuring both the ZFS appliance and the vServer that is being backed up. The process to follow is
  1. Create backup share and configure it to regularly snapshot
  2. Mount backup share on vServer (Using NFS v3)
  3. Issue the rsync command to backup full server on a regular basis. (cron)

Create Backup share

The first activity is to create a project/share to hold the backups of the vServers. Once the filesystem has been created then ensure that you setup the system to automatically create regular snapshots of the share. In the graphic below the share has been setup to snapshot the system daily at 1am and to keep 1 week's worth of snapshots on the storage appliance.



You should also setup replication to push the backups to a remote location for safekeeping. This is a simple activity of setting up a replication target under the Configuration/Services/Remote Replication tab then for the share (or at a project level) define the replication settings.
Make sure the share has root squash enabled. (root access in an NFS exception)

Mount the share on the vServer

It is now possible to mount the share on the vServer. This can be done dynamically at the point in time when the backup is performed or via a permanently mounted share.
It is necessary to mount the share using NFS v3. This is because there are a number of specialist users that will be setup on the vServer with ownership of certain filesystems. (eg. the ldap user) Because NFS v4 has a user based security check then these files may fail to backup successfully so NFS v3 is a better bet.

If using a permanent mount point defined in /etc/fstab then there should be a line similar to that shown below.
...
<IP/Host of storage appliance>:/export/backups/vservers /u02/backups nfs rw,bg,hard,nointr,rsize=131072,wsize=131072,tcp,vers=3 0 0
...

However general advise would be to mount the share specifically for the backup then umount it so that under normal usage of the vServer the backup is not visible to users of the system. This is the mechanism that the linked script uses.

On an Exalogic the initial backup of a simple vServer that has nothing but a deployment of WebLogic took just over 6 minutes for the first backup. Subsequent backups make use of the intelligence built into rsync to only copy changes to the backup version, thus following copies were completed in ~30 seconds. Obviously if there had been a lot of changes to the files then this number would increase towards the original 6 minutes.

vServer configuration for backing up

rsync is a fairly simple command to use, however the setup required to ensure it is configured to copy the correct files to an appropriate remote location is more complex. The basic command to use is shown below with the restore being a reversal of the command.

# rsync -avr --delete --delete-excluded --exclude-from=<List of files to exclude> <Backup from> <backup to>

However to simplify the setup I have created a short script that makes use of the Exalogic ZFS appliance and excludes files appropriate for the Oracle Linux base image. The script I used can be found here and its usage is shown below
donald@esat-df-001 :~/consulting/oracle/exalogic/bin/backup/rsync$ ./rsync_backup-v1.0.sh -help
rsync_otd_backup.sh 
-action=(backup|restore) : [backup]
-nfs_server=<IP of NFS storage device> : [nfs-server]
-nfs_share_dir=<Directory of NFS share> : [/export/backups/vservers]
-mount_point=<Directory of mount point on local machine> : [/mnt/backups]
-backup_dir=<root directory for backups under the mount point> : [esat-df-001]
-directory_to_backup=<Source directory for backing up.> : [/]
-automount
-script

If automount is not specified the system will assume that the mount point defined already exists
-script is used to indicate that the script is run automatically and should not prompt the user
for any input.

Each parameter can be defined from the command line to determine the configuration, however if called automatically (from cron for example) you must include the -script option, otherwise it will prompt for confirmation that the configuration is correct.  The defaults are all setup within the script itself, inside the setup_default_values function at the top, these should be changed to suit your environment.  Similarly the function create_exclusion_list contains a list of files/directories that will not be backedup/restored.  Primarily because these directories are specific to devices attaches, temporary or cache files. The list here is what I have found works using Oracle Linux 5.6 but will need to be reviewed for your environment.
To perform the backup the simplest approach is to setup cron to run the job. I was using a backup run hourly, with the ZFS appliance keeping a copy on a daily basis but the specific needs for backup frequency will vary from environment to environment. An example of the crontab file used is shown below.
[root@esat-df-001 ~]# crontab -l
10 * * * * ./rsync_backup.sh -action=backup -script -nfs_server=172.17.0.17 -nfs_share_dir=/export/backups/vservers -mount_point=/mnt/backups -backup_dir=esat-df-001 -directory_to_backup=/
[root@esat-df-001 ~]#


Restore using rsync

The restore process is approximately a reverse of the backup process however there are various options that make this approach flexible. These being:-
  1. The ability to restore individual files or filesystems to the vServer
  2. A complete restore from backup of vServer
  3. The recreation of a vServer on another host/rack, restoring to the values defined in the backup.
These options can all be fulfilled by the use of rsync with varying degrees of manual intervention or different restore commands.

Recreating a vServer and restoring from backup

Should a vServer become corrupt or deleted (deliberately or accidentally) then it may be necessary to recreate the vServer from a backup. Assuming that the vServer is to have at least its public IP address identical to the previous server then the first activity is to allocate the same IP address to the new vServer that is will be created. This is done by simply allocating the IP address and then during the vServer creation process defining the network to have a static IP address.




Ensure that the vServer you create has a similar disk partitioning structure to the original. Perfectly OK for the partitioning to be done differently but it will be necessary to make changes to the backed up /etc/fstab file to match the new vServer layout and to perform the file system creation and same mount points.
Thus the activities to perform/consider on creation are:-
  1. Ensure the disk size/additional volumes are created as needed.
  2. Allocate IP address for any IPs that are to be recreated in the new vServer. Statically assign them to the vServer during creation.
  3. After first boot
    1. Format and mount volumes/additional disk space as needed.
    2. For all the NFS mounts that were on the previous vServer re-create the mount points. (All defined in the backup copy of /etc/fstab)
    3. Ensure disk partitions/volumes are mounted such that the vServer has similar storage facilities to the original.
  4. Restore from backup.
  5. Edit files to correct for new environment
    1. Edit /etc/hosts to make changes as necessary to any IP addresses appropriate to new vServer/environment
    2. Check the /etc/fstab file to correct according to new partitioning/volumes attached if changed from original
  6. Reboot & test
Point 4 is where the rsync command is run to create a backup, if you are wanting to backup to one of the earlier snapshots then make sure that you use the ZFS appliance to create a new share from one of the snapshots and then use that share to mount the backup and copy the files onto the new vServer.

Backup by Templating an existing vServer (A later blog post....)