Thursday 14 November 2013

Virtualised Exalogic and External DNS Servers

Quite often when configuring Exalogic issues arise with accessing a DNS server,  resulting in delays.  From a management perspective this generally reveals itself as a pause when using ssh to connect to a server of 20-30 seconds.   During management via Exalogic Control DNS issues sometimes cause timeouts in jobs and hence failures. From an application perspective this is often shown up when access to shares on the shared storage take a long time to become available and the creation time or initial read of a file is slow. 

Virtual servers deployed onto Exalogic can easily be setup to access DNS over the 10GbE network either by configuring the Network Services on the EoIB network.  (Select the network that gives access to the 10GbE on your rack and select the "Edit Network Services" action.)  or by simply editing the /etc/resolv.conf file on your vServer to point it to the DNS servers in the environment.  (This could be put into a template if this approach is preferred.)

Editing network services in Exalogic Control
Note - Health Warning - Only attempt to change the network services if you are running Exalogic Elastic Cloud Software with a version of 2.0.6.0.0 or higher!

The shared storage is a slightly different kettle of fish.  When setup it has direct access to the 1GbE management LAN and it is normally through this network that it would gain access to services such as LDAP/NIS or DNS.  However the 1GbE network is not setup to be fault tolerant within Exalogic.  As such a route through the 10GbE network that is fault tolerant should be created.  A DNS service on an vServer can be easily setup that the shared storage can access, following the same principles as was talked about in an earlier blog posting about setting up LDAP for access via internal vServers.

To achieve a similar setup for DNS the following steps should be done:-

  1. Create your vServer with access to at least the 10GbE and the vserver-shared-storage networks.  (Ensure it is marked for HA or alternatively plan for two vservers both running DNS and part of a distribution group.)
  2. Configure the vServer to act as a DNS server.  Can be done using tools like dnsmasq or from the bind package.  The example shown here is using bind to create the service.
    1. Setup a yum repository that your vServer can access.
    2. Install the bind package.
      # yum install bind --skip-broken
      (Notes:-
      • We include the option --skip-broken so that it does not upgrade the packages that bind relies on.  With the rack I tested on there are other utilities that depend on the bind-libs package and upgrading this caused issues with the Infiniband network.  Simply ignoring this mismatch and the named daemon is installed and seems to operate successfully.
      • Not strictly necessary but for testing purposes the unix command nslookup is quite handy.  If this is not already installed then install the bind-utils package.)
    3. Create the /etc/named.conf file with content along the lines of that shown below.

      # cat /etc/named.conf
      options {
          directory "/var/named";

          # Hide version string for security
          version "not currently available";

          # Listen to the loopback device and internal networks only
          listen-on { 127.0.0.1; 172.16.0.14; 172.17.0.41; };
          listen-on-v6 { ::1; };

          # Do not query from the specified source port range
          # (Adjust depending your firewall configuration)
          avoid-v4-udp-ports { range 1 32767; };
          avoid-v6-udp-ports { range 1 32767; };

          # Forward all DNS queries to your DNS Servers
          forwarders { 10.5.5.4; 10.5.5.5; };
          forward only;

          # Expire negative answer ASAP.
          # i.e. Do not cache DNS query failure.
          max-ncache-ttl 3; # 3 seconds

          # Disable non-relevant operations
          allow-transfer { none; };
          allow-update-forwarding { none; };
          allow-notify { none; };
      };
    4. Startup the DNS daemon (named) to ensure it is OK.
      # service named start
    5. Set it up to automatically startup.
      # chkconfig named on
  3. Configure the Storage to include the vServer shared storage IP address in its list of DNS servers.  In our case it is using the Internal vServer IP address of 172.17.0.41 first then would be using other IP addresses via the 1GbE network should that fail.

Configuring DNS on the ZFS Storage Appliance

Thursday 17 October 2013

Exalogic and CPU Oversubscription

Introduction

Ever since the EECS release 2.0.4.0.0 the facility to oversubscribe the physical CPU on an Exalogic has existed.  The documentation explains how the oversubscription is set and introduces the idea of a ratio and the CPU cap.  However I found it slightly light on detail so this posting will attempt to explain further just how the "vCPU to Physical CPU Threads ratio" and the "CPU cap" interact with each other.

The settings for this feature are editable as part of the virtual Data Center so impacts all tenants users of the rack.  Figure 1 shows the screen shot of the configurable  parameters.  These values can be changed at any time during the lifecycle of the virtual data-centre but the impact of changes must be understood.

Figure 1 - Editing the vDC properties for CPU oversubscription

vCPU to Physical CPU Ratio

The vCPU to pCPU ratio is used by the Exalogic Control placement algorithm.  By changing the ratio from 1:1 to say 1:2 you are effectively doubling the number of vCPUs that can be allocated to vServers in the datacenter.  When oversubscribed there is the potential for vServers to be competing with each other for access to the actual CPUs at this point the Xen scheduler will commence allocating access to the physical CPUs.

For example, if we consider the situation of a single compute node with 32 hardware threads.  (2 sockets * 8 cores * 2 threads per core = 32) and we are placing vServers with 1 vCPU then, with the ratio of 1:1, we would be able to place 32 vServers on the physcal compute node.  With the ratio set to 1:2 then we would be able to place 64 vServers on the compute node.

This value can be changed at any time and the change will impact all vServers in the datacenter.  Increasing the ratio is not a problem but should the ratio be changing from 1:2 to 1:1 then this is only valid if all existing vServers can fit into the new virtual data-centre.


Xen Hypervisor and CPU Scheduling

Underpinning the Exalogic rack is the Xen hypervisor and this has a scheduler that can control access to the physical CPUs. The scheduler is similar in principle to the linux scheduler however it referees between running guest OSes or domains, (Including the dom0 domain!)   ensuring that the compute power is shared out appropriately to all.  There are a number of scheduling algorithms available with Xen but the Credit Scheduler is the default and has had most development and testing.  You can check the scheduler running using the xm dmesg command.



# xm dmesg
 __  __            _  _    _   _____  _____     ____  __
 \ \/ /___ _ __   | || |  / | |___ / / _ \ \   / /  \/  |
  \  // _ \ '_ \  | || |_ | |   |_ \| | | \ \ / /| |\/| |
  /  \  __/ | | | |__   _|| |_ ___) | |_| |\ V / | |  | |
 /_/\_\___|_| |_|    |_|(_)_(_)____/ \___/  \_/  |_|  |_|
                                                        
(XEN) Xen version 4.1.3OVM (mockbuild@us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) Wed Dec  5 09:11:29 PST 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: console=com1,vga com1=9600,8n1 dom0_mem=2G

...
(XEN) ERST table is invalid
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 3059.044 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d supported pag

...

On Exalogic when we create a vServer we define the number of vCPUs that are allocated to the guest.  Each vCPU equates to a single hardware thread and on creation of the vServer Xen will allocate the CPUs for the guest to use.  The xm vcpu-list command will show just which cores are allocated to a vServer.


# xm vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
0004fb000006000036b78d88370acd11     5     0    22   -b-  315931.3 12-23
0004fb000006000036b78d88370acd11     5     1    15   -b-  210225.2 12-23
0004fb000006000036b78d88370acd11     5     2    17   -b-   94768.4 12-23
0004fb000006000036b78d88370acd11     5     3    13   -b-   99020.7 12-23
0004fb000006000036b78d88370acd11     5     4    19   -b-   97240.0 12-23
0004fb000006000036b78d88370acd11     5     5    16   -b-   90450.2 12-23
0004fb000006000036b78d88370acd11     5     6    20   -b-   85511.0 12-23
0004fb000006000036b78d88370acd11     5     7    14   -b-   74973.3 12-23
0004fb000006000036b78d88370acd11     5     8    23   -b-   75114.4 12-23
0004fb000006000036b78d88370acd11     5     9    16   -b-   65374.4 12-23
0004fb000006000036b78d88370acd11     5    10    14   -b-   64786.6 12-23
0004fb000006000036b78d88370acd11     5    11    20   -b-   64758.8 12-23

0004fb0000060000ea7b5bf71f3a806c     4     0     9   -b-  139777.8 0-11
0004fb0000060000ea7b5bf71f3a806c     4     1     2   -b-  173475.2 0-11
Domain-0                             0     0     4   -b-   69594.9 any cpu
Domain-0                             0     1     5   -b-   55519.9 any cpu

This in this example we can see that there is vServer running as ID 5 (actually the Exalogic Control vServer) which has been allocated 12 vCPUs of which Xen has determined that it will run on the CPUs 12-23.  Similarly the vServer ID 4 (the Exalogic Control proxy) has been allocated 2 vCPUs which will run on CPUs 0-11.

The credit scheduler will attempt to make sure that all vServers get access to the CPUs they have been allocated to and provided there is no contention (over-subcription) then there will be very little for the scheduler to do.  However if the compute node has guests demanding more compute than it physically has then the scheduler will kick in and do its best to share out the resource according to the scheduling rules.  The rules are a credit based scoring system which is based on two factors, a weight and a cap.
  • The weight is a number that indicates to the scheduler how much "credit" a vServer will get, thus a vServer with a weight of 1000 will get twices as much CPU as a vServer with a weight of 500 - once the system is under contention for the CPU.   
  • The Cap is an absolute limit on the amount of CPU time that a domain can be allocated, it is defined as a % of a vCPU.  Thus if set to 50 then a domain will only be allowed half the available cycles of a pCPU.
If we think about an example it might help explain how this operates.  Consider two domains that are over-subscribed to a vCPU.  Initially the scheduler will allocate a credit score to a domain.   The credit given is worked out based on the weight, the higher the weight the larger the credit score.  While the domain is consuming CPU cycles the scheduler steadily reduces the credit score for the domain.  The scheduler runs an accounting thread independently of the domains and if the credit for one domain drops (significantly) below the credit score for contenting domains then the other domain gets access to the processor and its credit score starts diminishing.  Periodically the accounting thread runs to top up all the credit scores.
If the cap is being used then effectively you are reducing the compute resource available to a domain as any one vServer will only ever be allocated compute up to the cap level.

On an Exalogic the weight is automatically assigned to a vServer and all vServers are given the same weight, the Cap is configurable in Exalogic Control.  It is set at the vDC level and the value becomes embedded into the vServer configuration file so changing the value in the vDC will only impact vServers created after the setting of the Cap.  An example output of the xm sched-credit output on a couple of compute nodes is shown below


[root@el01cn01 root]# xm sched-credit
Name                                ID Weight  Cap
0004fb000006000071d34e42e43bd82b    12    256    0
0004fb0000060000bb90063bf8efe7d7    13    256    0
Domain-0                             0    256    0


[root@el01cn01 root]# ssh root@el01cn07 xm sched-credit
Name                                ID Weight  Cap
0004fb000006000082dbcec62a976907    15  27500   90
Domain-0                             0    256    0

In this case we can see first the output on node 1 which shows two of the control stack vServers with a weight of 256 and output run on compute node 07 with a customer vServer which has a weight of 27500.  Dom0 gets a weight of 256 as per the control stack.  For the customer vServer the vDC had been changed to make the Cap 90% so it will never use more than 90% of a vCPU.

The Weight is automatically set by Exalogic Control and the Cap is set to whatever the vDC value is at the time that the vServer is created.  These values are put in to the vm.cfg file used to hold the configuration of the vServer.

[root@el01cn01 0004fb000006000082dbcec62a976907]# cat vm.cfg
kernel = '/usr/lib/xen/boot/hvmloader'
vif = []
OVM_simple_name = 'test-vserver'

...
name = '0004fb000006000082dbcec62a976907'
vncpasswd = ''
cpu_weight = 27500
pae = 1
memory = 4096
cpu_cap = 90
OVM_high_availability = True
...

Managing this on the Exalogic

So the key understanding required is:-
  1. The vCPU;pCPU ratio only impacts the vServer placement algorithm and enables Exalogic Control to place vServers onto the rack such that there can be more vServers demanding vCPU than there are available pCPUs.
    1. This impacts all vServers, those previously created and those still to be created.
    2. Changing this will not reduce the CPU available to a given vServer until the system comes under CPU contention!
    3.  The recommendation is not to make this ratio any larger than about 1:4 as a small vServer (with just 1 vCPU) when under contention the lack of compute power can lead to instability and timeouts.
  2. The CPU Cap is useful for the situation where you wish to use CPU oversubscription but want to have deterministic access to CPU.  Effectively this will reduce the power of your vServers but delay the time at which the Xen scheduler is used to control vServer access to physical CPU.  (Arguably rather than using the Cap it would be possible to simply reduce the number of CPUs allocated to each vServer and gain the same density/performance.)
    1. The Cap is set at the vDC level but changing it will not effect previously created vServers.  Thus it is possible to have a virtual deployment with different vServers having different caps.

Thursday 29 August 2013

Limiting OTD to listen only on the VIP address


In most production deployments OTD is likely to be deployed in a highly available configuration with two instances working as an active/hot-standby  load balancing pair.  (See my earlier posting on running OTD HA.)  In production the environment will almost certainly have a number of security constraints put on it, one of which will be to keep the number of listening ports to an absolute minimum.  In the case of OTD this will mean that it should only listen for incoming requests on the Virtual IP address, by default the listener will listen on all interfaces for the given port.

Thus we want to setup the configuration to listen on just the VIP, as shown below.



Where the IP Address is the IP address for the VIP.

Having done this an attempt to start up the server instance fails with the error messages shown below.


./startserv
Oracle Traffic Director 11.1.1.7.0 B01/14/2013 04:13
[ERROR:32] startup failure: could not bind to <Virtual IP>:8080 (Cannot assign requested address)
[ERROR:32] [OTD-10380] http-listener-1: http://<Virtual IP>:8080: Error creating socket (Address not available)
[ERROR:32] [OTD-10376] 1 listen sockets could not be created
[ERROR:32] server initialization failed

Alternatively if you attempt to start the instance via the GUI then an error message similar to that shown below will appear.



The reason for this failure is because the VIP is only ever active on one node at a time meaning that when the instance attempts to startup if the vServer it is on has not yet started up the VIP or the VIP is assigned to the other vServer in the HA group then it is impossible to bind to that interface.

Linux has the ability to allow binds to non-local IP addresses using the system configuration net.ipv4.ip_nonlocal_bind.  By setting this variable to 1 it allows the OTD instance to startup even although the IP address is not currently local to the running process.  To set this up simply edit the /etc/sysctl.conf file and add this with the value of 1.




# tail /etc/sysctl.conf

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 250000
vm.min_free_kbytes = 524288

# Additional entry to allow non-local binds so that we only listen to the VIP.

net.ipv4.ip_nonlocal_bind=1 
#
# sysctl -p
#

Once set then issue the sysctl -p command to refresh the configuration and we can startup the OTD instance.

You can check the currently running value in the /proc system files.

# cat /proc/sys/net/ipv4/ip_nonlocal_bind
1

#



Tuesday 18 June 2013

Integrating Enterprise Manager 12c with Exalogic

Overview

This posting provides an overview of how Enterprise Manager 12c has been integrated with Exalogic.  It will then dive into the installation process, providing an overview of the activities that will compliment the documentation.  (For integrating with Exalogic 2.0.6.0.n read this posting in conjunction with the details shown here.)

The versions used in this example are:-
  • Virtualised Exalogic - 2.0.4.0.2
  • Enterprise Manager (EM12c) - 12.1.0.2.0
Oracle Enterprise Manager is a powerful tool that can be used to manage a large enterprise compute facility, looking after both the hardware and the software running. (apps to disk management)  As a model the normal operation is a central managed service or Oracle Management Service (OMS) that consists of an application hosted in WebLogic server with an underlying database to hold configuration and state information.  It communicates with the services it manages via an agent that is deployed onto each operating system.  Via the use of plugins and extensions it has specific knowledge of each environment and hence can present an appropriate monitoring and management screen.

For Exalogic there are several plugins that enable it to create a powerful view onto the rack and monitor it from apps to disk.  These plugins include:
  1. ZFS Storage - A specific plugin allows EM12c to communicate with the ZFS appliance within the Exalogic rack to monitor the status of the storage.
  2. Virtualisation - A plugin allows communication with the Oracle Virtual Machine Manager system used in Exalogic to provide details of how the virtual infrastructure is deployed and a view onto each virtual machine (vServer) created.
  3. Exalogic Elastic Cloud/Fusion Middleware - This plugin links in with the Exalogic Control infrastruture and gives information on the state of the physical environment.  It also links into agents deployed onto the vServers and provides a central view on the middleware software that can be deployed onto Exalogic.  (Built in understanding of Weblogic domains, applications deployed, Oracle Traffic Director installations and Coherence clusters.)
  4. Engineered Systems Healthchecks - A plugin that integrates with the exachk scripts to highlight any configuration inconsistencies.
The diagram below depicts a deployment topology for EM12c to monitor Exalogic.   There are more complex options available to make EM12c highly available and to manage firewalls and proxying of communications.  This blog posting is only really considering a basic installation for managing Exalogic.

OMS Deployment to monitor and manage an Exalogic rack
 There are plenty of alternate network configurations and deployment options that could be considered, the key thing is that the OMS server should have a network path to both the Exalogic Control vServers (OVMM & EMOC) and to the client created vServers that will be running the applications.

For example, in a purely test setup we have in the lab we actually run the OMS and OMS repository in a vServer on the Exalogic rack and make use of the IPoIB-virt-admin to give the OMS server suitable access to all the vServers on the rack.  This is great for test and demonstration purposes but in a large enterprise it is likely that the Enterprise Manager configuration will sit externally to the Exalogic.

This posting assumes that you already have an instance of Enterprise Manager 12c operational in your environment.  Details on the installation process can be found in the documentation.   This posting will continue to consider all the steps involved in configuring EM12c to monitor the Exalogic rack.

The installation documentation can be found here :-

EM12c Exalogic Configuration

As an overview the process is:-
  1. Get the correct versions of the software (plugins & EM12c) installed
  2. Deploy agents onto the OVMM & EMOC vservers in the Exalogic Control stack
  3. Deploy the ZFS Storage appliance plugin to monitor the storage
  4. Deploy the Exalogic Elastic Cloud plugin to get the Exalogic monitored.
  5. Deploy the Oracle Virtualization plugin to monitor the OVMM environment
  6. If deploying hosts onto the vServers setup your vServers as needed
  7. Optional - Deploy the Engineered System Healthchecks

Prerequisites

The process of installing/configuring the various components to allow the Exalogic to be monitored in EM12c involves a number of pre-requisites activities.

Ensuring you get the correct plugins

EM12c makes heavy use of plugins. Plugins are managed from the Extensibility menus. (Setup --> Extensibility --> "Self Update" or "Plug-ins")
If you have setup your EM12c instance in a network location that has access to the internet then you can automatically pick up the Oracle plugins from a well known location. Simply click where it says "Online" or "Offline" beside the Connection Mode under the Status in the Self Update page. If you are not able to access the internet then use the Offline mode and on the tab it shows the location for the em_catalog.zip, download this, move it to the OMS server and then Browse/Upload the file or make use of the command line (# emcli import_update_catalog -file <path to zip> -omslocal)
Once uploaded, on the "Plug-ins" page ensure that the following plugins are download and "On Management Server"
  • Oracle Virtualisation (12.1.0.3.0)
    • Note - This is not the most recent version as there is an incompatability with 12.1.0.4.0 and the OVMM instance that runs as part of Exalogic control. If you have 12.1.0.4.0 already deployed then undeploy it from the OMS instance.
  • Exalogic Elastic Cloud Infrastructure (12.1.0.1.0) - Not required for Virtual monitoring as the fusion middleware monitoring incorporates Exalogic.  Necessary for monitoring of a physical Exalogic rack.
  • Oracle Engineered System Healthchecks (12.1.0.3.0)
    • Not necessary for the general system monitoring but allows visibility and control over running exachk, the health checking tool for Exalogic.
  • Sun ZFS Storage Appliance (12.1.0.2.0)

 

Deploying Agents to EMOC & OVMM

For full integration with EM12c it is necessary to have agents deployed to both the OVMM and EMOC vServers. The agent binaries have already been deployed to the control vServers but as EM12c does all the deployment itself it is actually simpler to use the facilities of em12c to deploy into a new directory. As such the following instructions will deploy the agents onto the rack:-
  1. Ensure you have an oracle user and known password on the vServers. (oracle as a user is already present and as root use passwd to change the password to a known value.
  2. Create a directory to host the agent. eg.
    # mkdir -p /opt/oracle/em12c/agent
  3. Make the directory for the agent owned by the oracle user. (Check the group ownership on each vServer, on the OVMM the oracle user is in the dba group while on EMOC it is in the oracle group.)
    # chown -R oracle:oracle /opt/oracle
  4. If the vServers are not setup for DNS then ensure that the fully qualified hostname for the OMS server is included in the /etc/hosts file.
  5. Add the Exalogic info file to the template.
    On the hypervisor (OVS) nodes of the Exalogic rack is an identifier file that specifies the rack identifier.  The file is /var/exalogic/info/em-context.info. In the template create an equivalent directory structure and copy the em-context.info file into this directory.
  6. Make a symbolic link from the sshd file in /etc/pam.d to a file called emagent.  (Allows actions to be perfomed on the vServer using credentials managed in LDAP. - See MOS note How to Configure the Enterprise Management Agent Host Credentials for PAM and LDAP (Doc ID 422073.1) for more detail)
    # cd /etc/pam.d
  7. # ln -s sshd emagent

  8. Make the necessary changes to the sudoers configuration file (/etc/sudoers)
    1. Change Defaults !visiblepw to Defaults visiblepw
    2. Change Defaults requiretty to Defaults !requiretty
    3. Add the sudo permissions for the oracle user as shown below
      oracle ALL=(root) /usr/bin/id,/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh, /*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh,/*/*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh,/*/*/*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh
    4. Now deploy the agent onto the vServer from the Enterprise Manager console.
      Setup --> Add Targets --> Add Target Manually & select the Add Host.... option and follow the wizard.
     

Setup to monitor the ZFS appliance

The ZFS appliance is monitored via an agent deployed to a device that has network access to the appliance, on an Exalogic the recommendation is to use the EM12c agent deployed to the Exalogic Control EMOC vServer.

Before setting up the monitoring in EM12c we have to run through a "workflow" on the ZFS storage appliance itself that will setup a user with appropriate permissions to monitor the appliance.   The agent hosting the ZFS Storage plugin will then communicate with the ZFS appliance as this user to gather details on the current operation.

To achieve this log onto the ZFS BUI as the root user and navigate to "Maintenance" --> "Workflows" Then run the workflow called "Configuring for Oracle Enterprise Manager" which will create the user and appropriate worksheet to allow the monitoring of the device.

Enabling the ZFS Storage for EM12c monitoring

This activity to create the user must be repeated on the second/standby storage head although it is not necessary to recreate the worksheet on the second head.

Once complete then on the Plugin's management page (Setup --> Extensibilty -->Plug-ins) deploy the ZFS Storage Appliance plugin to the OMS instance and then to the EMOC agent. You can then configure up the ZFS target, this is done via the Setup menu.
  • Setup --> Add Target --> Add Target Manually
  • Select "Add Non-Host Targets by Specifying Target Monitoring Properties"
    • Select the Target Type of Sun ZFS Storage Appliance & select the EMOC monitoring agent.
    • In the wizard give the name you would like the appliance to appear as in the EM12c interface and supply the credentials and IP address for the device. (Use the IB storage network, not the public 1GbE network IP.)
Once this completes it is possible to select the target for the storage appliance and view details on the shares created and the current usage of the device.

Monitoring of ZFS Storage Appliance

Deploying the Exalogic Infrastructure Plugin

There are a couple of steps to getting the environment setup for the Exalogic infrastructure plugin to operate correctly.
  1. Sort out the certificates so that the agent can communicate with the Ops Centre infrastructure of Exalogic Control
  2. Deploy/configure the plugin.

Managing the EMOC certificates

The first step is to ensure that the EM12c agent can communicate with the Ops Centre instance which is only available over a secure communications protocol. Because it uses a self-signed certificate it is necessary to include this certificate in the trust store of the agent.
  1. Export the certificate from the Ops Centre keystore. This is the keystore that is in the OEM installation on the ec1-vm vServers. (/etc/opt/sun/cacao2/instances/oem-ec/security/jsse) It is possible to use the JDK tools to extract the certificate.

    # cd /etc/opt/sun/cacao2/instances/oem-ec/security/jsse
    # /opt/oracle/em12c/agent/core/12.1.0.2.0/jdk/bin/keytool -export -alias cacao_agent -file oc.crt -keystore truststore -storepass trustpass

    Note 1 - The default password for the EMOC truststore is "trustpass".  Others have mentioned that the password was "welcome".  If trustpass does not work try out welcome.
    Note 2 - We explicitly use the keytool version that is shipped with the Oracle EM12c Agent (Java 1.6). The default version of java on the Exalogic Control vServer is java 1.4 and running the 1.4 version of keytool against the truststore will result in the following error:-

    # keytool -list keystore truststore
    Enter key store password: trustpass
    keytool error: gnu.javax.javax.crypto.keyring.MalformedKeyringException: incorrect magic
  2. Import the certificate you just exported into the agent's trust store. Ensure you import into the correct AgentTrust.jks file, specifically the one for the agent instance you are using and not (as the docs currently state) the copy in the agent binaries.
    # cd /opt/oracle/em12c/agent/agent_inst/sysman/config/montrust
    # /opt/oracle//em12c/agent/core/12.1.0.2.0/jdk/bin/keytool -import -keystore ./AgentTrust.jks -alias wlscertgencab -file /etc/opt/sun/cacao2/instances/oem-ec/security/jsse/oc.crt

Deploying the Exa Infrastructure Plugin

There are a number of steps to getting the Exalogic Infrastructure plugin to monitor the rack.
  1. Deploy the Exalogic Elastic Cloud Infrastructure to the OMS server. (Setup --> Extensibility --> Plug-ins, select the Exalogic Elastic Cloud Infrastructure and from the actions pick "Deploy on >" & "Management Servers" )
  2. Deploy the plugin to the Ops Center (EMOC) vServer. Once the plugin has been deployed successfully to the OMS instance then the same options as above but select to "Deploy on >" & "Management Agent..." and select the EMOC host agent.
  3. Now we want to run the Exalogic wizard to add the targets for the Exalogic rack itself. This is done via the Setup --> Add Target --> Add Targets Manually options. Then select "Add Non-Host Targets Using Guided Process (Also Adds Related Targets)", pick the Exalogic Elastic Cloud and click on "Add Using Guided Discovery" which will show the wizard as pictured below.

Discovery of Exalogic Elastic Cloud


This wizard appears to finish quickly and it is then possible to select the Exalogic from the Targets menu, however the system will be initialising and synchronising in the background so it takes a few minutes to get the full rack discovered. Once present the screenshots below show the monitoring of the hardware with a general picture for the rack and a couple of shots to show the Infiniband Network monitoring.


Exalogic Monitoring - Hardware view


Monitoring the Infiniband Fabric



Monitoring an Infiniband Switch

Deploying the OVMM Monitoring

The first thing to ensure is that the plugin version that is installed is the 12.1.0.3.0 version of Oracle Virtualization. The steps are similar to the steps for the Exalogic Infrastructure Plugin.

However prior to doing the deployment to Enterprise Manager the OVMM server should be setup to be read-only for the EM12c monitoring agent to use.  Follow these steps on the OVMM server to setup a user as read only.

Login to Oracle VM Manager vServer as oracle user, and then perform the commands in the sequence below.
  1. cd /u01/app/oracle/ovm-manager-3/ovm_shell
  2. sh ovm_shell.sh --url=tcp://localhost:54321 --username=admin --password=<ovmm admin user password>
  3. ovm = OvmClient.getOvmManager ()
  4. f = ovm.getFoundryContext ()
  5. j = ovm.createJob ( 'Setting EXALOGIC_ID' );
    The EXALOGIC_ID can be found in the em-context.info on dom0 located in the following file path location:
    /var/exalogic/info/em-context.info
    You must log in to dom0 as a root user to obtain this file. For example, if the em-context.info file content is ExalogicID=Oracle Exalogic X2-2 AK00018758, then the EXALOGIC_ID will be AK00018758.
  6. j.begin ();
  7. f.setAsset ( "EXALOGIC_ID", "<Exalogic ID for the Rack>");
  8. j.commit ();
  9. Ctrl/d

Now deploy the OVMM virtualisation plugins to the OMS server:-

  1. Deploy the Oracle Virtualization plugin to the OMS server
  2. Deploy the Oracle Virtualization plugin to the agent running on the OVMM server.
  3. Run the add target wizard for the Oracle VM Manager.
    1. Setup --> Add Target --> Add Target Manually
    2. Select the "Add Non-Host Targets by Specifying Target Monitoring Properties" & Chose the target type of "Oracle VM Manager" and the monitoring agent for the OVMM server host.
    3. Enter the details on the wizard page (example shown below)
    4. Submit the job, wait a few minutes to allow the discovery to progress and then you can view the Target under Systems or all targets.

    Running the discovery wizard for the Exalogic Virtualised Infrastructure

Deploying the Engineered System Healthchecks

Both Exalogic and Exadata have a healthcheck script that can be run - exachk. On Exalogic the script can be downloaded from My Oracle Support and when run against an Exalogic rack it will check the configuration of the rack. The running of exachk will create output files that detail any issues found with the rack.  To integrate with Enterprise Manager it is necessary to change the behaviour of exachk to output files in an XML format that can be parsed by the EM12c plugin and presented to the OMS server in a format that it can understand and present on screen.  To modify the behaviour simply set an environment variable prior to running the exachk script - export RAT_COPY_EM_XML_FILES=1. You can also use the RAT_OUTPUT=<output directory> to direct the output to a specific location. (The default behaviour is to put the output into the same directory as the exachk script is run from.

The recommendation for a virtual Exalogic is to run the exachk utility on the EMOC vServer.

To install the plugin simply ensure that the "Oracle Engineered System Healthchecks" plugin is downloaded and installed onto the OMS server and to the agent deployed to the EMOC server.  Then create the target as per the OVMM mechanism. The wizard for the healthcheck simply requests the directory on the server where the output will be read from and the frequency of checking for new versions of the exachk output. (Default is 31 days.)  Then setup the EMOC server to run the exachk on a regular basis.  The output becomes available via the EM12c console and hence can be made available to specific users who may not actually have access to the rack itself.


Monday 10 June 2013

Running OTD HA with minimal root usage


Introduction

An as yet un-documented aspect of the Oracle Traffic Director 11.1.1.7 (OTD) new features, has introduced the ability to operate a highly available load balancing group within the Infiniband fabric but keeping to a minimum usage of the root account. In previous releases to enable the high availability features of the failover group it was necessary for a couple of the OTD running processes to be run with the root privileges, this was something that security conscious customers found disconcerting. 
This blog posting is courtesy of one of my colleagues, Mark Mundy who has been pulling together some instructions on how to setup OTD in an HA configuration and using minimal root user privileges. 

Why is root needed at all?

Before we can demonstrate how we can minimise the use of root in an HA OTD set up it is worth explaining a little of where OTD requires root permission. This will hopefully give the reader an appreciation of why it is used and even in a minimal use scenario how little ‘damage’ using it to execute part of the process stack can do.
Oracle Traffic Director provides support for failover between the instances in a failover group by using an implementation of the Virtual Routing Redundancy Protocol (VRRP), this being keepalived for Linux.
Keepalived v1.2.2 is included in the current Exalogic Guest Base Template and so you do not need to install or configure it. Oracle Traffic Director uses only the VRRP subsystem. If you wish to discover more about keepalived go to http://www.keepalived.org.  (If you are using Solaris then the implementation is done using the Solaris vrrpd service.)
VRRP specifies how routers can failover a VIP address from one node to another if the first node becomes unavailable for any reason. The IP failover is implemented by a router process running on each of the nodes. In a two-node failover group, the router process on the node to which the VIP is currently addressed is called the master. The master continuously advertises its presence to the router process on the second node. Only the root user can start or stop this keepalived process which controls the VIP and so without root permission having a highly available VIP would not be possible. With the 11.1.1.7 release of OTD it is possible to configure a highly available VIP utilising keepalived and root however all other processes associated with OTD such as the instance, the primordial and watchdog will be executed as a non root user. No user data is exposed to the keepalived process.

Example OTD configuration

There are a number of ways that Oracle Traffic Director can be deployed and utilised but for the purposes of this example the simplest and most common approach has been adopted. This design utilises the Exalogic Storage Array for all of the OTD collateral utilising a number of shares within the same project. The design consists of three identical vServers created all with the same vServer type and networking capability. The diagram below should give an idea of the layout of both the vServers and what they are hosting as well as how they are utilising various shares on the storage array. Notice that the Admin Server and Admin Node 1 are using OTD binaries from one share while the Admin Node 2 uses a different one. This will ensure if there is a need to patch OTD it can be done with a degree of availability while this is happening. There is one share for the entire OTD configuration in this example.


The 2 vServers hosting admin nodes work as local proxies for the OTD administration server and it is on these nodes that the highly available instances will perform the actions as designed within the loadbalancing configuration. In previous releases of OTD with this setup it was possible to avoid using root to run the administration server but the admin nodes that run an HA loadbalanced instance required root for a large part, if not all, of their administration and execution. In the latest release this has changed and it is this that is exploited in this example.

Configuring the Admin Server node

The first vServer that needs to be configured is the one hosting the administration server. There is no need to use the root account on this vServer after the specific shares that are needed by OTD have been permanently mounted. This note will not go into the details of how this is achieved as it is assumed the reader is familiar with this. The first share that is required to be temporarily mounted is the /export/common/general share available on the Exalogic storage array. This needs to be mounted and the OTD 11.1.1.7 installer placed in a directory within it (available from Oracle edelivery) and unzipped. An example being /mnt/common/general/OTD11117/Disk1. This can then be un-mounted when OTD has been installed. There is also a need to permanently mount two shares one for the binaries and another for the configurations.



/export/OTDExample/Primary_Install -> /u01/OTD
/export/OTDExample/OTDInstances -> /u01/OTDConfiguration
In this example the user chosen to install and run the Admin Server is the pre-configured oracle user and so ensure that the share mount points are owned by the oracle user on the ZFS appliance.  (See my earlier blog posting about creating shares on the ZFS appliance.)

There is now no additional need to utilise the root account on this vServer. Everything in this section should now be performed as the oracle user or an equivalent non privileged user of your choice.
For simplicity the OTD installer will be installed using a silent installer approach below is an example of how this can be achieved.


 
$ oracle@OTDAdminServer ~]$ /mnt/common/general/OTD11117/Disk1/runInstaller -silent -waitforcompletion -invPtrLoc /home/oracle/oraInst.loc ORACLE_HOME=/u01/OTD/ SKIP_SOFTWARE_UPDATES=TRUE
  The ORACLE_HOME location is where the OTD binaries will be laid down and this will populate the primary binary location on the share. The invPtrLoc will need to point to a previously created file called oraInst.loc which should contain the location of the Oracle Inventory. In this example this file contains the following:


 
$ cd /home/oracle
$ cat oraInst.log
inventory_log=/home/oracle/oraInventory
$
 For more information on silently installing Oracle Traffic Director refer to the documentation
Once the installation is complete the OTD admin server can be created and started. To create the admin server the following command can be used:


 
$ oracle@OTDAdminServer ~]$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/
 When this is executed you will be prompted to provide an OTD admin user password of your choice and then the admin server will be created with its home directory under the /u01/OTDInstances share called admin-server
The admin server can now be started with the following command


 
$ oracle@OTDAdminServer ~]$ /u01/OTDInstances/admin-server/bin/startserv
 When the admin server has started it will be possible as the output will note, to access the admin console from a browser.
Log in and create a new test configuration of your choosing and provide a fake origin server so as to complete the configuration. This is only to demonstrate and the configuration can be later deleted to be replaced with a production one created. Do not deploy the configuration at this stage. Leave the admin console open and for the test configuration enable the plain text report in the virtual server monitoring section. This is done so as to later give us an idea that everything is working as expected in terms of the HA element of OTD.



This completes the work to get the admin server operational and as you can see in terms of OTD no use of the root privilege to configure, start or stop.
Configuring the Admin nodes
The admin nodes now need to be created and started and this is where with earlier releases of OTD there was a heavier requirement for the root account. The new release needs far less and this is demonstrated here by the fact that in order to minimise root use it is now possible to create and start and stop admin nodes without root. In this section we will do this.

Setup the first Instance node

After logging on to the first of the admin nodes (OTDNode1) as root there is an initial requirement to set up permanent shares to both the primary OTD install location and the configuration as there was with the admin server.  These shares are the same as were mounted for the Administration Server.  The mounting of these shares are the only time you require root access for this section.
Using a non privileged account such as the pre-configured oracle account you can now create an admin node and register it with the admin server previously created and started. Prior to running the command ensure the following.
  1. Create a sub-directory under /u01/OTDInstances with the hostname of this vServer
  2. Ensure that the admin server hostname is resolvable on the private IPoIB vNet in the /etc/hosts file
  3. Ensure that the admin node host name is resolvable in the admin server /etc/hosts file on the IPoIB private vNet created
The following example command will show the creation of the admin node.


 
$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/OTDNode1 --server-user=oracle --admin-node --host=OTDAdminServer
The share primary binary location is used to launch the command and you will notice that the admin node server-user is the non privileged oracle user. After providing the admin password and accepting the self signed certificate the admin node should be created.
It is now possible to start the admin node using the startserv script located in the OTDNode1 bin directory. Note previously in order to utilise an HA OTD configuration this would need to be executed as root or as a user in the sudoers list.

Setup the Second Instance Node

Now that the first admin node is running it is possible to do the same with the second admin node.
Before the admin node can be created and started it needs to install OTD into the secondary binary location using a variation of the silent install for the admin server. This will mean mounting temporarily the /export/common/general share and then using the same kit to install OTD to the Secondary_Install share.
The second admin node should have the following shares mounted permanently


/export/OTDExample/Secondary_Install -> /u01/OTD
/export/OTDExample/OTDInstances -> /u01/OTDConfiguration
  
Thus the second instance will run binaries from a separate install from the other instances but the running configuration is located under the same share.

Prior to running the create command ensure the three pre-requisites as above are complete then you can run as an example


$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/OTDNode2 --server-user=oracle --admin-node --host=OTDAdminServ
 
 One created the second admin node can be started using the startserv script under the OTDNode2 instance directory.  

Deploying a basic configuration

The next step is to utilise the admin console or indeed the command line to deploy the current basic configuration and create instances of it on the 2 admin nodes already running. This will not mean we have a highly available loadbalancing pattern but will at this stage mean we have 2 instances hosting the loadbalancing configuration that can be independently accessed on the public EoIB ip addresses assigned to each of the vServers when they were created. We will use the admin console to firstly ensure the 2 admin nodes are available and running and then to deploy the configuration to them both.

Here we can see we have 2 operational admin nodes with no instances deployed to them.
By hitting the deploy button to the top right and selecting the 2 otd admin nodes and NOT the admin server we can deploy the configuration and have 2 instances created one on each node.

The instances can now be started from the admin console. To verify the 2 instances are working it is possible in separate browser tabs to access the instance on the public EoIB ip address assigned to it and get performance metrics from it.  An example url is shown below and by using the IP address for each of the instances then metrics from both instances will be displayed.

http://<OTDNode1-EoIB-IP> :8080/.perf
At this stage it is also possible to verify that the entire OTD process tree for an admin node is all executing as the non privileged user as this example shows:


oracle 19808 0.0 0.0 25648 812 ? Ss May14 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/admin-server/confi
oracle 19809 0.0 0.3 149568 15156 ? S May14 0:17 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config
oracle 19810 0.1 3.5 769996 141768 ? Sl May14 1:23 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/co
oracle 19897 0.0 0.0 11560 968 ? Ss May14 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0
oracle 12508 0.0 0.0 11560 340 ? S 02:13 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa
oracle 12609 0.0 0.0 25648 812 ? Ss 02:16 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/net-test/config -r
oracle 12610 0.3 0.3 135016 12232 ? S 02:16 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r
oracle 12611 0.0 0.5 259904 23304 ? Sl 02:16 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config

Creating a failover group

Now we need to create a failover group to enable the configuration deployed to be made highly available. It is at this point that there is still a requirement to use the root privilege. The first stage in enabling an HA loadbalancing configuration is to create a failover group. This does not require root permission to create, however it will need root permission to activate. Creating a failover group can be done either via the command line or the admin console. For this example we will use the command line. From any of the three vServers log in as the oracle user and issue the following example command to create a new failover group within the configuration already created and active.

[oracle@OTDAdminServer ~]$/u01/OTD/bin/tadm create-failover-group --config=test --virtual-ip=<ip_on_public_EoIB --primary-node otdnode1 --backup-node=otdnode2 --router-id=230 --verbose --port=8989 --user=admin --host=OTDAdminServer --network-prefix-length=21
Enter admin-user-password>
OTD-63008 The operation was successful on nodes [otdnode1, otdnode2].
Warning on node 'otdnode2':
OTD-67334 Could not start failover due to insufficient privileges. Execute the start-failover command on node 'otdnode2' as a privileged user.
Warning on node 'otdnode1':
OTD-67334 Could not start failover due to insufficient privileges. Execute the start-failover command on node 'otdnode1' as a privileged user.
  
The failover group is created however you will see that 2 warnings are given. This is because although the non privileged user can create a failover group, it does not have permission to start the keepalived process. In our scenario the 2 instances are already running and so this warning is generated. This is the first of the changes in 11.1.1.7 to be encountered. The new command start-failover allows only what is required to be started as root to be performed separately. In order to complete the ability to run OTD in an HA configuration the command needs to be run as root or as a non privileged user who is part of the sudoers list locally on each of the admin nodes. This warning would not be seen if the instances were not already running but if an attempt was made to start an instance as a non privileged user in a failover group a warning would be issued about the need to start the failover group separately.
In order to minimise the use of root the preferred way to issue the start-failover command is via the non privileged user after adding it to the sudoers list. The specific permission required to allow this is as follows for the non privileged user and in this case we use the oracle user.

# cat /etc/sudoers | grep oracle
oracle ALL=(root) /u01/OTD/bin/tadm
 To set this up as root on each of the 2 admin nodes execute visudo and add the line to give the oracle user the privilege to run the tadm command.

Once this has been set up the start-failover command can be issued locally on each of the admin node vServers as this example shows.


$ sudo /u01/OTD/bin/tadm start-failover --instance-home=/u01/OTDInstances/OTDNode1 --config=test
[sudo] password for oracle:
OTD-70198 Failover has been started successfully

It is now possible from a browser to access the text based performance statistics as before but on the public vip assigned to the HA OTD configuration. A quick look at the process tree for an OTD admin node will now show what this new command has done.

oracle 19808 0.0 0.0 25648 812 ? Ss May14 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u orac
oracle 19809 0.0 0.3 149568 15156 ? S May14 0:17 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u oracl
oracle 19810 0.1 3.5 770244 142548 ? Sl May14 1:26 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u o
oracle 19897 0.0 0.0 11560 968 ? Ss May14 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0/.cgistub_19810
oracle 12508 0.0 0.0 11560 340 ? S 02:13 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0/.cgistub_19810
oracle 12857 0.0 0.0 25648 808 ? Ss 02:25 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
oracle 12858 0.0 0.3 135016 12236 ? S 02:25 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
oracle 12859 0.0 0.5 259908 23308 ? Sl 02:25 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
root 12986 0.0 0.0 35852 504 ? Ss 02:29 0:00 /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/OTDNode1/net-test/config/keepalived.conf --pid /tmp/net-
root 12987 0.0 0.0 37944 1012 ? S 02:29 0:00 \_ /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/OTDNode1/net-test/config/keepalived.conf --pid /tmp/root 12987 0.0 0.0 37944 1012 ? S 02:29 0:00 \_ /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/O
As you can see, now only the keepalived processes are running as root with everything else run as oracle. In previous releases a lot more of the process tree would have been run as root in order to achieve the same thing.

Starting and stopping the instances hosting the failover group

There are some important points to note around how with a minimal root use set up loadbalancing instances are stopped and started. As the previous section described, with an instance in a failover group, there is a requirement to start the failover element as a privileged user. It is still possible either through the admin console or via the command line to start an instance non privileged, however a warning will be generated in the console messages to remind the administrator to explicitly start the nodes associated keepalived configuration as a privileged user only if started through the admin console. Until this is done the vip is not operational.

It is important to note that if starting via the CLI the instance will start but no warning about explicitly starting the failover group will be given and this means that an administrator could think the HA vip was working when it will not be. In any scripted startup it is important to ensure after starting the instance the start-failover command is issued locally to each of the instances in the group.
Stopping an instance is also a two stage process, as an instance attempted to be stopped via a non privileged user either through the admin console or the CLI will fail until the associated failover group on the node on which it is running is stopped. This can be shown by the following examples.
From the admin console attempting to stop instances before stopping the failover group.



From the CLI attempting to stop an instance.

$ /u01/OTDInstances/OTDNode1/net-test/bin/stopserv
[ERROR:32] server is not stopped because failover is running. Before stopping the server, execute stop-failover command as a privileged user
For either approach, prior to stopping the instance, there is a need to run the stop of the failover group locally as a privileged user as seen below.

$ sudo /u01/OTD/bin/tadm stop-failover --instance-home=/u01/OTDInstances/OTDNode1 --config=test
It is worth pointing out however that the number of times an instance needs to be stopped and restarted is minimal due to the way most of the configuration changes made are dynamically applied to an instance. By utilising the administration console or including in CLI based configuration changes the ‘reconfig-instance’ command stops and starts can be minimised. 
 

Changing the primary failover instance

The situation may arise where there is a need to ‘flip’ the current owner of the OTD HA vip to the backup node. One such occasion being a possible maintenance window and ordinarily this would be achieved by issuing the ‘Toggle Primary’ button in the admin console or the tadm set-failover-group-primary command. These are still applicable and can be initiated by a non privileged user however there is an additional step that needs to be performed from the CLI locally on the admin nodes.
If you use the admin console to toggle the primary you will see a warning generated however the console will acknowledge the new primary group member.

Testing to see if the backup is now primary by accessing it will show that the existing primary is still the primary, despite the console saying otherwise.
Similarly if the switch is made using the CLI the following warning is given but after the warning the administration ‘thinks’ the switch has been made.

$ /u01/OTD/bin/tadm set-failover-group-primary --config=test --virtual-ip=10.1.5.126 --primary-node=otdnode2 --user=admin --host=OTDAdminServer
Enter admin-user-password>
OTD-63008 The operation was successful on nodes [otdnode1, otdnode2].
Warning on node 'otdnode2':
OTD-67335 Could not restart failover due to insufficient privileges. Execute the start-failover command on node 'otdnode2' as a privileged user.
Warning on node 'otdnode1':
OTD-67335 Could not restart failover due to insufficient privileges. Execute the start-failover command on node 'otdnode1' as a privileged user.
$ /u01/OTD/bin/tadm list-failover-groups --config=test --user=admin --port=8989 --host=OTDAdminServer --all Enter admin-user-password>
10.1.5.126 otdnode2 otdnode1

In order to actually make the toggle active the failover group needs to be restarted and this will force a re-read of the keepalived.conf which will ensure the vip is plumbed up on the new primary host. This command needs to be executed as the privileged user on both instances. It is therefore paramount to ensure that if there is a need to toggle the primary vip host that this two stage process is carried out.

[oracle@OTDNode2 ~]$ sudo /u01/OTD/bin/tadm stop-failover --instance-home=/u01/OTDInstances/OTDNode2 --config=test
[sudo] password for oracle:
OTD-70201 Failover has been stopped successfully.
[oracle@OTDNode2 ~]$ sudo /u01/OTD/bin/tadm start-failover --instance-home=/u01/OTDInstances/OTDNode2 --config=test
OTD-70198 Failover has been started successfully.

Deleting a failover group

Deleting a failover group is also now with minimal root usage, a two stage process that needs to be understood. If you decide to delete a failover group from the admin console or the CLI and the instances in the failover group are running then although the console and the CLI will allow a non privileged user to delete the group, a warning will be generated to alert that the failover group has not been stopped. The keepalived.conf will be removed; however the vip will still be active. It will only be destroyed once the stop-failover command has been executed locally by the privileged user on all instances of the failover group. It is important to realise this and perform the 2 operations close together to ensure that when removing a failover group the vip associated is stopped as well.
Here is an example of the warning in the admin console.

Known Issues

There is a known issue currently outstanding with Oracle Traffic Director that will cause issues if after initial configuration, the administration server is stopped and restarted and the node names chosen to be used in the deployment contain any upper case letters. The issue manifests itself in the admin console where messages like the following are seen Error in parsing configuration TechDemoMWConfig. OTD-63763 Configuration 'test' has not been deployed to node 'OTDNode1'’. After this any attempts to subsequently modify the configuration will fail. This is a bug resolved in a later release and so to workaround this for now there are 2 choices.
Use only lowercase node names for the configuration
Log in as the non privileged user to the administration server vServer and edit the ../config-store/server-farm.xml’ for the administration server node and convert the node names to all lowercase – eg otdnode1. Save the file and then restart the administration server.

Setting up manually simple init.d scripts for vServer start/stop

Because none of the nodes has been configured or executed as root, the new feature available in 11.1.1.7.0 to automatically create init.d startup/shutdown services is not available. Therefore in order to have both nodes and instances start and stop cleanly when a vServer is shutdown or started, manually created and configured /etc/init.d scripts need to be put in place. This clearly requires the use of the root account but is a one off exercise to set up. Here we show an example that can be used to give at least a rudimentary start/stop script for your OTD minimal root privilege environment. These scripts are far less rich in terms of functionality than those provided by the product.
For the administration server node only one init.d script is required to be created and in this example this is called otd-admin-server. An example can be seen here that can be tailored to suit a specific environment:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#
ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/AdminServer"
INSTANCE_NAME="admin-server"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script
case "$1" in
start)
COMMAND="$INSTANCE_HOME/admin-server/bin/startserv"
su - $OTD_USER -c $COMMAND
echo "$ORACLE_HOME/bin/tadm start-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
;;
stop)
echo "$ORACLE_HOME/bin/tadm stop-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
COMMAND="$INSTANCE_HOME/admin-server/bin/stopserv"
su - $OTD_USER -c $COMMAND
;;
status)
ps -ef | grep $INSTANCE_NAME
;;
*)
echo $"Usage: $0 {start|stop|status}"
exit 1

esac
Once saved and made executable with a chmod +x.
Issuing the following as root within the /etc/init.d directory will configure the script to be started and stopped appropriately as the vServer is.

# chkconfig otd-admin-server on

On each of the admin nodes, 2 scripts need to be created so as to be able to stop the highly available instance and failover group independently of the admin node.
For the admin nodes create a script in the /etc/init.d directory called otd-admin-server and tailor the example below to reflect your environment:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#
ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/OTDNode1"
INSTANCE_NAME="admin-server"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script
case "$1" in
start)
COMMAND="$INSTANCE_HOME/admin-server/bin/startserv"
su - $OTD_USER -c $COMMAND
echo "$ORACLE_HOME/bin/tadm start-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
;;
stop)
echo "$ORACLE_HOME/bin/tadm stop-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
COMMAND="$INSTANCE_HOME/admin-server/bin/stopserv"
su - $OTD_USER -c $COMMAND
;;
status)
ps -ef | grep admin-server
;;
*)
echo $"Usage: $0 {start|stop|status}"
exit 1

esac
 
Now create a second script on each of the admin nodes this time called otd-net-test for the highly available instance and again this example can be tailored to suit your environment. The name being based is the configuration the instance is hosting:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#

ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/OTDNode1"
INSTANCE_NAME="net- TechDemoMWConfig"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script

case "$1" in
start)
    COMMAND="$INSTANCE_HOME/$INSTANCE_NAME/bin/startserv"
    su - $OTD_USER -c $COMMAND
    echo "$ORACLE_HOME/bin/tadm start-failover --instance-home $INSTANCE_HOME --config=TechDemoMWConfig" > ${OTD_TADM_SCRIPT}
    chmod 755 ${OTD_TADM_SCRIPT}
    $OTD_TADM_SCRIPT
    rm -f ${OTD_TADM_SCRIPT}
    ;;
stop)
    echo "$ORACLE_HOME/bin/tadm stop-failover --instance-home $INSTANCE_HOME --config=TechDemoMWConfig" > ${OTD_TADM_SCRIPT}
    chmod 755 ${OTD_TADM_SCRIPT}
    $OTD_TADM_SCRIPT
    rm -f ${OTD_TADM_SCRIPT}
    COMMAND="$INSTANCE_HOME/$INSTANCE_NAME/bin/stopserv"
    su -$OTD_USER -c $COMMAND
    ;;
status)
    ps -ef | grep $INSTANCE_NAME
    ;;
*)
    echo $"Usage: $0 {start|stop|status}"
    exit 1
esac
 
 
Note these scripts not only stop and start the instance and admin nodes, the instance one also ensures that the failover group is started as the root privileged user. It is also worth pointing out that there is an additional command to start and stop the SNMP sub-agent, this is optional and only required if it is the intention to have, at a later date your OTD estate to be monitored via an Enterprise Manager 12C agent.
Once the 2 scripts are complete, as root, the chkconfig <script_name> on command can be executed to enable them.
The new services can be tested by running the following as the root user on each of the vServers in turn to see that the OTD estate stops and then restarts.


service otd-net-test stop
service otd-admin-server stop
service otd-admin-server start
service otd-net-test start
When complete it will be possible to stop any one of the vServers and maintain a load balancing capability and when restarted the OTD components on the vServer will automatically restart on startup.
Utilising this note you will now have a highly available Oracle Traffic Directory configuration that is only using the root privilege where it is strictly required to do so.