Showing posts with label Oracle Traffic Director. Show all posts
Showing posts with label Oracle Traffic Director. Show all posts

Tuesday, 7 July 2015

Oracle Traffic Director - Deployment options, Virtual Servers vs Configurations

Summary

Oracle Traffic Director (OTD) is a powerful software load balancing solution.  As with most good products there is a degree of flexibilty in how it can be deployed with different approaches allowing the solution to be formed.  This article discusses two options that could be used to determine different routing possiblilties.

The scenario that is being considered is a need to perform two separate load balancing activities in the same OTD environment.  For example, load balancing to an older SOA 11g deployment and to SOA 12c for recent integration deployments.  Another possible example would be two routes to the same back end service but one is designed for high priority traffic while the other route will throttle the service at a preset load.   The two options that are discussed are:
  1. Using two separate configurations, one for SOA 11g and one for SOA12c.
  2. Using one configuration that has two virtual servers.  The virtual servers handling the routing for each environment.
Needless to say either option can be appropriate and it will depend on the details of the overall solution and to some extent personal preference to determine the right answer for a particular customer environment.  Of course other options such as more complex routing rules within a single configuration or multiple OTD domains are also options to think about.

OTD Configuration Overview

Simple configuration

An OTD deployment, in its simplest form, consists of an administrative instance which manages the configuration and a deployed instance.  The deployed configuration specifies the HTTP(S)/TCP listening port, routing rules to one or more origin servers, logging setup etc.   In many situations there is a business need to use OTD to manage requests to different business applications or even just to different environments/versions of an application.  It is obviously possible to split these out by using independent deployments of OTD however to minimise the resources required and keep the number of deployed components to a minimum there are options to use one administration server.

The base configuration options


The minimum configuration that will appear for a configuration is a setup which defines things like the listening ports, SSL certificates, logging setup and critically at least one origin server pool and a virtual server.  The origin server pool is a simple enough concept in that it defines the back end services to actually fulfil the client requests. 

Using Virtual Servers

The virtual servers provide a mechanism to isolate traffic sent to the software load balancer.  Each virtual server contains its own set of routing rules which can determine the origin servers to send requests to, caching rules, traffic shaping and overrides for logging and the layer 7 firewall rules.  The virtual server to be used for subsequent processing is identified by either the listening port or the hostname used to send the request.

Virtual Server example - Routing based on otrade-host
Virtual Server example - Routing based on websession-host
So in the above example both hostnames otrade-host and websession-host resolve to the same IP address in DNS (or in the clients local /etc/hosts file).   In this case two virtual servers also use the same listener.  If the client makes a request to access otrade-host then the first virtual server is used and if they request websession-host then the second's rules are used.

There is always at least one virtual server.  By default this is created and the hosts field left blank such that it is used if any traffic hits the listening port.

Solution Variations

Multiple Configurations

Overview

In this setup two configurations can be defined and deployed.  It is quite possible to have both configurations deployed to the same OS instances. (Admin node in OTD talk.)  The result of deploying the configuration to the admin nodes is the creation of another running instance of OTD.
Running multiple configurations
Thus in the example shown above we have three OS instances, one to host the admin server which could be co-located with the actual instances.  There are two OS instances which host two OTD servers, one for each configuration.  I have shown two OS instances to run the configuration to indicate that they can be setup in a failover group to provide HA, each config can utilise a different VIP.

Advantages

  • Each configuration is managed independently of each other.  (within the one administration server) 
    • The settings are independent of each other.
    • The running instances for each configuration are independent of each other. i.e. Can be stopped and started without impacting the other configuration instances running.
  • Simple to understand

Disadvantages

  • Care must be taken to ensure that the configurations do not have clashes with each other.  (eg. Same listenting ports)
  • Results in more processes running on each OS instance.

Multiple Virtual Servers

Overview

In this situation there is one configuration with multiple virtual servers which result in different routing rules being applied to send requests on.   In the diagram below we have deployed a single configuration to two OS instances with the configuration containing two virtual servers.  As per the multi-config option I have shown two OS instances to indicate that the failover group can be used for HA.

OTD Using Two Virtual Servers

Advantages

  • One configuration that provides visibility of all configuration in the environment.
  • Minimal running processes
    • Simplifying the monitoring
    • Reducing resources required to run the system

Disadvantages

  • Introduces dependencies between the environments
    • eg. Can share listeners, origin server pools, logging config etc.  Thus one change can impact all instances
    • eg. Some changes mandate a restart of an instance.  A change for one config may have an impact on load balancing for the other environment.
  • Complexity of a single configuration
  • Dependencies on external factors.  (DNS resolution of hostnames/firewalls for port access.)

Conclusions

There are no hard and fast rules to figure out which approach is the best one for you.  It will ultimately depend on the requirements for the load balancing.  If a configuration is changing frequently and is functionally independent then I would tend to go for the multiple configuration route.  If on the other hand simplicity of monitoring and minimal resource footprint alongside a fairly static configuration was the situation I would tend to use the multiple virtual server approach.

Essentially the classic IT answer of "it depends" will apply.  Only a good understanding of the requirements will clarify which way to go.  (Although if you are using OTD 11.1.1.6 then you might be better with the virtual server approach as there are a few limitations to the VIPs using keepalive for the failover groups)

Friday, 13 June 2014

Example of Application Version Management Using OTD

When using Exalogic to host an application it is fairly common to see Oracle Traffic Director (OTD) being used as a load balancer in front of the application.  OTD has a rich functional set that can make it more than just a load balancer/proxy - this allows it to perform traffic management/shaping.  In a recent engagement I was asked if it could perform application version management, to which the answer is yes.

The solution presented here is just one possible answer to application versioning and there are often many other approaches that should be considered.

Problem Statement

The scenario we are looking at here is a situation where a new JEE application is being deployed to an application server and the new version is to be set live on a particular date/time.  Existing sessions with the old application are to carry on using the previous version of the application until their sessions expire.
There are a number of restrictions that apply:-
  • No changes to the underlying application.  (For example, WLS has the concept of versioning built into it but the application code - descriptors - must be altered to use this.  In our scenario we are using a third party application and do not have the ability to make any changes.)
  • No application down time is allowed.  (App is accessed 24*7)
  • Existing sessions remain on the previous version and new sessions use the new version

Solution Overview

The solution used makes use of the routing rules that are available in OTD.  Essentially inspecting the HTTP requests that hit OTD and determine if the request should remain on the old application version or be routed to the new version.

The first problem is to figure out how to know if a request is to be routed to the old or new version of the application.  Assuming the back end application is an application server it is very likely that a jsessionid cookie will be set for any existing client connection.  If not set then we could route what will be a new session onto the new application version.  The problem comes with the next request, it will have the jsessionid cookie set and we have no way of knowing if the session is to be routed to the old or the new application version.  A solution to this issue is to use OTD to inject an additional cookie into the browser, this cookie can contain the time that it was created.  Then we can simply inspect this cookie on each request and if the time it holds is before a cut-off date then route to the old version and if after the cut off then route to the new application.

Solution Detail

Injecting a Cookie

The first activity is to use OTD to inject a cookie into the HTTP response that goes back to the client.  This can be done using the Server Application Functions (SAFs) that OTD supports, these are documented in this link.  (Although check for newer documentation releases from here.)

In summary OTD makes use of a special file the obj.conf file that contains a list of directives to instruct OTD how to handle incoming HTTP requests.  Many of the changes done through the OTD administration console or command line end up as directives in this file.  In addition to the commands that are neatly wrapped up in the console interfaces there are additional directives and functions that can be added by editing the correct obj.conf file by hand.  Inserting a cookie is one such SAF.

The first activity is to understand which file to edit.  If you simply do a search from the command line under the OTD instances directory for all *.obj.conf files you will see that there are multiple copies of files held.  This is because each configuration/virtual server has its own copy of the file and these are held centrally in a repository and copied out to the running instances when changes are deployed.  In our scenario we are wanting to change the HTTP processing for a particular "Virtual Server" as part of an OTD configuration.  The path to the file we are interested in is:-



<OTD Instance Home>/admin-server/config-store/<configuration name>/config/<virtual 
server name>-obj.conf



This resides on the server that is hosting the OTD administration server which manages all the obj.conf files for all the configurations/virtual servers used.  The <virt-server>-obj.conf file is the copy held in the repository, it can be edited to add a section that will inject the cookie.  Having edited the file then either use the command line (tadm) or the browser to deploy the configuration file.  The action of deploying the config file will copy the configuration to the running instances and update them so that they use the new configuration.  This update is done dynamically and does not require any downtime of the OTD instance.

A sample directive to add the cookie is shown below:-



<Object name="default">
AuthTrans fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true"
NameTrans fn="assign-name" id="default-route" name="default-route"
NameTrans fn="map" from="/" to="/"
<If not defined $cookie{"MyApp"}>
ObjectType fn="set-cookie" name="MyApp" value="$time"
</If>

Service fn="proxy-retrieve" method="*"
AddLog fn="flex-log"
</Object>

<Object name="default-route">
...



So in this case if the cookie has not been presented to OTD by the client browser then OTD will add a cookie called "MyApp" with a value of the current time.  (In seconds since the epoch).  The client will then present this cookie in all subsequent requests.  It is possible to set additional options such as the domain and path which can specify when the client will send the cookie or to set a max-age parameter which will make the cookie persistent with the browser and last for the duration specified, in seconds.  In our example the max-age is not set so the cookie will last for the duration of the session.  (until the browser is shutdown)

Handy Hint
With unix it is fairly easy to get the current time in seconds since the epoch using the # date +%s command which will simply format the current date.  Or to convert from a number of seconds to a meaningful date use # date -d @1402655677

Setting up a Routing Rule

Having added the cookie we now have something that we can easily use to route with.  This can be done either by editing the obj.conf file directly or using the browser interface which provides some help in ensuring that the syntax is correct.   To add/edit routing rules simple ensure that you have selected the configuration you want to manage and then open up the virtual server in the navigation pane and it is possible to then add the routing rules.

In our scenario we are going to add a routing rule that will inspect the "MyApp" cookie and if the value is greater than a set time then route the request to the next version of the application.  This could be as simple as routing to a different origin server pool if the new application has been setup in an entirely different set of WLS instances, or as in this case the rule will cause a re-write of the request to access a different URL where the two applications (new and old) are both held in the same application server but with different base URLs. 

Routing rules to direct requests to the new app version


These rules can then be deployed to the server instances which again is done dynamically with no down time.  Then when the clients access the service if not present they will have the "MyApp" cookie added to their browser and each request is then checked against the cookie value and if the value is greater than the time specified in the rule then the request will be routed to the second version of the application.  Sessions that started on before the cutoff time will continue to use the previous version of the application using the default rules.  (or other routing rules.)

The changes to the routing rules are reflected in the obj.conf file as shown below.


#
# Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
#

# You can edit this file, but comments and formatting changes
# might be lost when you use the administration GUI or CLI.

<Object name="default">
AuthTrans fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true"
<If defined $cookie{"MyApp"} and $cookie{"MyApp"} gt "1402483929">
NameTrans fn="assign-name" id="myapp-v2" name="myapp-v2"
</If>

NameTrans fn="assign-name" id="default-route" name="default-route"
NameTrans fn="map" from="/" to="/"
<If not defined $cookie{"
MyApp"}>
ObjectType fn="set-cookie" name="
MyApp" value="$time"
</If>
Service fn="proxy-retrieve" method="*"
AddLog fn="flex-log"
</Object>

<Object name="default-route">
Route fn="set-origin-server" origin-server-pool="origin-server-pool-1"
</Object>

<Object name="myapp-v2">
NameTrans fn="map" to="/MyApp-v2" from="/MyApp"
Route fn="set-origin-server" origin-server-pool="origin-server-pool-1"
</Object>




Obviously once the previous application version has been fully retired then the default rule can be used to direct to the new version and this routing rule deleted.  Potentially the cookie injection can also be removed.

Conclusion

With a very simple configuration consisting of a single cookie injection and a routing rule it is possible to use Traffic Director to manage different application versions. 

Tuesday, 11 February 2014

Websocket support in Oracle Traffic Director

Introduction

In recent releases the Websockets function has been added as a capability to both WebLogic 12.1.2 and Oracle Traffic Director (11.1.1.7.0).  This blog posting is another courtesy of my colleague Mark Mundy who has been doing some investigation into what is involved to setup an architecture with a browser accessing a web application running websockets on WebLogic and proxied/load balanced through OTD.

The WebLogic documentation covers the setup of Websockets at the server side and the OTD documentation explains what is needed for the load balancer.  This blog posting is intended to show how the two can be used together on the latest Exalogic virtualised release to have OTD load balance a simple example websocket application with relative ease. This posting will not go through every aspect of the setup as the assumption is that the reader has some familiarity with Exalogic Control, WebLogic and OTD.  The example application being used is the one that ships as part of the example set for WebLogic 12.1.2 with a slight modification as will be highlighted.
The deployment topology that is used in this scenario is shown below.
WebSocket deployment Topology - Browser --> OTD Load Balancer --> Application(WLS Cluster)

Setting up the Exalogic Environment 

Depending on how complex you want to make the environment, will determine what needs to be set up. In order to demonstrate the basic load balancing while ensuring that there is no direct network connectivity between the client browser and the back end WLS managed server hosting the websocket application, a minimum of 2 vServers would be needed. In order to test out features such as load balancing during failure scenarios at both WLS and OTD ideally 5 would be created. For this example it will be assuming 5 vServers are created and the topology reflected in the following diagram shows the networking setup.
 
Exalogic vServer deployment topology

WebLogic Installation and Configuration

WebLogic 12.1.2 is the minimum version that can be used for Websocket support, select the 12c Generic Download from the Oracle download site.  (This will have a dependency on the latest 64bit java version so if not already installed on your vServers then install prior to installing WLS.)
Install the Complete Installation of WLS 12.1.2 using the installed 64 bit Java 7.  Installing the complete installation will ensure the example applications are available and it is the Websockets example that will be used.  Once WLS is installed complete the post installation configuration to ensure this happens. For more details on getting the examples available see this link
The specific websockets example application used in this document will be introduced after a specific WLS domain is created to host it.
Rather than use the WLS examples domain a new domain will be used so as to keep things as simple as possible but at the same time allowing the ability to have more flexibility in the testing. Further detail about the WLS implementation is available in the documentation.

A new WLS domain needs to be created using the domain wizard.  In the topology shown above the two instances in a cluster that listen on the private vNet and the administration server uses this network as the default but also has an additional HTTP channel setup to listen on the EoIB public network so that administration becomes simple from outside the Exalogic rack.  The websocket application can then be deployed to the cluster such that is is only accessible to the client population through the OTD tier, providing a level of security.

The example that ships with WebLogic will only work in the scenario where the browser and WLS managed server are both on the same host. To resolve this issue we make a small change to the code in the example before building the application and deploying to the cluster.  In the example source code identify exampleScript.js and change the line that opens up the websocket.
This Javascript file is normally located in:-
<WLS_Home>/user_projects/applications/wl_server/examples/src/examples/webapp/html5/webSocket/src/main/webapp/script
Use your editor of choice and search for "localhost:7001" and replace it with " + window.location.host +" as shown below.

...
if (ws == null) {
      ws = new WebSocket("ws://localhost:7001/webSocket/ws");
      ws.onopen = function() {
        addMsg("Server Msg: WebSocket has been started.");
      };

...
becomes
...
if (ws == null) {
      //ws = new WebSocket("ws://localhost:7001/webSocket/ws");
      ws = new WebSocket('ws://' + window.location.host + '/webSocket/ws');

      ws.onopen = function() {
        addMsg("Server Msg: WebSocket has been started.");
      };

...

The directory <WLS_Home>/user_projects/applications/wl_server/examples/src/examples/webapp/html5/webSocket contains full instructions on building and deploying the application.  In our case we only need to re-build the application (ant build) and then use the standard WLS admin console (or WLST or ant if you prefer) to deploy the war file created to the WebLogic Cluster.

Oracle Traffic Director Installation and Configuration

For Websocket support we need to ensure that we install OTD 11.1.1.7.0 or higher.  In our scenario we will install OTD onto the admin vServer and the two OTD instance vServers to end up with an HA cluster hosted on the two instance vServers that will route requests through to the backend WLS cluster.  It is useful to enable monitoring on OTD so that we can easily spot the sessions being created in OTD and through to the back end WLS instances.  To enable monitoring highlight the virtual server and enable the plain text report.  As shown below.

Enable plain text monitoring in OTD

Websocket support is enabled by default in OTD so there is nothing else other than the deployment of the configuration to the cluster before we test out our Websocket application.  However for information if there is a desire to disable websockets for a particular route then this can be achieved via the enable/disable of the "WebSocket Upgrade" for a given route.


Enable/Disable Websocket support in OTD.

With the monitoring in place it is then possible to access the plain text report by accessing the URI http://OTD IP address:port/.perf where the Origin Server statistics are of particular interest.  As highlighted below.


Testing the application

In another browser session/tab it is then possible to access the OTD instance with the URN /websocket.  You can then click the start button to open up the socket and enter text to send to the server.  Currently you should use either Chrome or Firefox but not Internet Explorer as it does not support websockets.


WLS WebSocket example application

By starting up additional browser sessions you can create multiple websockets and the .perf output will reflect the load balancing to the back end WLS instances.


In this example we can see that 3 websocket sessions have been created and they have been load balanced round the two available WLS server instances.

Behind the scenes what is actually happening is that there are two socket connections created for each WebSocket.  The first is between the client browser and the OTD instance and the second from OTD to the WLS instances.  OTD acts to bridge the gaps between the two.  Below is the output from netstat on the OTD instance to show these socket connections.

Browser to OTD instance.
# netstat -an
Active Connections
Proto  Local Address          Foreign Address        State
TCP    138.3.32.68:52732      138.3.49.9:8080        ESTABLISHED


& OTD to WLS
# netstat -an | grep 7003
tcp        0      0 192.168.16.74:39223         192.168.16.71:7003          ESTABLISHED


The nature of establishing multiple sockets to link the load balancer to the back end service means that in the event of a failure by either OTD or WLS then the socket connection will be broken and the client must re-establish the socket again.  Thus for HA connections there is a dependency on the client capturing any exceptions and re-establishing the connection.
Note:- If OTD is setup with a failover group (HA config) then one instance is primary and the second acts as a backup.  Should the primary fail then the websocket would have to be re-established using the backup OTD instance.  Once the primary is available again OTD will automatically fail back which will again break the websocket connection. 



Thursday, 29 August 2013

Limiting OTD to listen only on the VIP address


In most production deployments OTD is likely to be deployed in a highly available configuration with two instances working as an active/hot-standby  load balancing pair.  (See my earlier posting on running OTD HA.)  In production the environment will almost certainly have a number of security constraints put on it, one of which will be to keep the number of listening ports to an absolute minimum.  In the case of OTD this will mean that it should only listen for incoming requests on the Virtual IP address, by default the listener will listen on all interfaces for the given port.

Thus we want to setup the configuration to listen on just the VIP, as shown below.



Where the IP Address is the IP address for the VIP.

Having done this an attempt to start up the server instance fails with the error messages shown below.


./startserv
Oracle Traffic Director 11.1.1.7.0 B01/14/2013 04:13
[ERROR:32] startup failure: could not bind to <Virtual IP>:8080 (Cannot assign requested address)
[ERROR:32] [OTD-10380] http-listener-1: http://<Virtual IP>:8080: Error creating socket (Address not available)
[ERROR:32] [OTD-10376] 1 listen sockets could not be created
[ERROR:32] server initialization failed

Alternatively if you attempt to start the instance via the GUI then an error message similar to that shown below will appear.



The reason for this failure is because the VIP is only ever active on one node at a time meaning that when the instance attempts to startup if the vServer it is on has not yet started up the VIP or the VIP is assigned to the other vServer in the HA group then it is impossible to bind to that interface.

Linux has the ability to allow binds to non-local IP addresses using the system configuration net.ipv4.ip_nonlocal_bind.  By setting this variable to 1 it allows the OTD instance to startup even although the IP address is not currently local to the running process.  To set this up simply edit the /etc/sysctl.conf file and add this with the value of 1.




# tail /etc/sysctl.conf

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 250000
vm.min_free_kbytes = 524288

# Additional entry to allow non-local binds so that we only listen to the VIP.

net.ipv4.ip_nonlocal_bind=1 
#
# sysctl -p
#

Once set then issue the sysctl -p command to refresh the configuration and we can startup the OTD instance.

You can check the currently running value in the /proc system files.

# cat /proc/sys/net/ipv4/ip_nonlocal_bind
1

#



Monday, 10 June 2013

Running OTD HA with minimal root usage


Introduction

An as yet un-documented aspect of the Oracle Traffic Director 11.1.1.7 (OTD) new features, has introduced the ability to operate a highly available load balancing group within the Infiniband fabric but keeping to a minimum usage of the root account. In previous releases to enable the high availability features of the failover group it was necessary for a couple of the OTD running processes to be run with the root privileges, this was something that security conscious customers found disconcerting. 
This blog posting is courtesy of one of my colleagues, Mark Mundy who has been pulling together some instructions on how to setup OTD in an HA configuration and using minimal root user privileges. 

Why is root needed at all?

Before we can demonstrate how we can minimise the use of root in an HA OTD set up it is worth explaining a little of where OTD requires root permission. This will hopefully give the reader an appreciation of why it is used and even in a minimal use scenario how little ‘damage’ using it to execute part of the process stack can do.
Oracle Traffic Director provides support for failover between the instances in a failover group by using an implementation of the Virtual Routing Redundancy Protocol (VRRP), this being keepalived for Linux.
Keepalived v1.2.2 is included in the current Exalogic Guest Base Template and so you do not need to install or configure it. Oracle Traffic Director uses only the VRRP subsystem. If you wish to discover more about keepalived go to http://www.keepalived.org.  (If you are using Solaris then the implementation is done using the Solaris vrrpd service.)
VRRP specifies how routers can failover a VIP address from one node to another if the first node becomes unavailable for any reason. The IP failover is implemented by a router process running on each of the nodes. In a two-node failover group, the router process on the node to which the VIP is currently addressed is called the master. The master continuously advertises its presence to the router process on the second node. Only the root user can start or stop this keepalived process which controls the VIP and so without root permission having a highly available VIP would not be possible. With the 11.1.1.7 release of OTD it is possible to configure a highly available VIP utilising keepalived and root however all other processes associated with OTD such as the instance, the primordial and watchdog will be executed as a non root user. No user data is exposed to the keepalived process.

Example OTD configuration

There are a number of ways that Oracle Traffic Director can be deployed and utilised but for the purposes of this example the simplest and most common approach has been adopted. This design utilises the Exalogic Storage Array for all of the OTD collateral utilising a number of shares within the same project. The design consists of three identical vServers created all with the same vServer type and networking capability. The diagram below should give an idea of the layout of both the vServers and what they are hosting as well as how they are utilising various shares on the storage array. Notice that the Admin Server and Admin Node 1 are using OTD binaries from one share while the Admin Node 2 uses a different one. This will ensure if there is a need to patch OTD it can be done with a degree of availability while this is happening. There is one share for the entire OTD configuration in this example.


The 2 vServers hosting admin nodes work as local proxies for the OTD administration server and it is on these nodes that the highly available instances will perform the actions as designed within the loadbalancing configuration. In previous releases of OTD with this setup it was possible to avoid using root to run the administration server but the admin nodes that run an HA loadbalanced instance required root for a large part, if not all, of their administration and execution. In the latest release this has changed and it is this that is exploited in this example.

Configuring the Admin Server node

The first vServer that needs to be configured is the one hosting the administration server. There is no need to use the root account on this vServer after the specific shares that are needed by OTD have been permanently mounted. This note will not go into the details of how this is achieved as it is assumed the reader is familiar with this. The first share that is required to be temporarily mounted is the /export/common/general share available on the Exalogic storage array. This needs to be mounted and the OTD 11.1.1.7 installer placed in a directory within it (available from Oracle edelivery) and unzipped. An example being /mnt/common/general/OTD11117/Disk1. This can then be un-mounted when OTD has been installed. There is also a need to permanently mount two shares one for the binaries and another for the configurations.



/export/OTDExample/Primary_Install -> /u01/OTD
/export/OTDExample/OTDInstances -> /u01/OTDConfiguration
In this example the user chosen to install and run the Admin Server is the pre-configured oracle user and so ensure that the share mount points are owned by the oracle user on the ZFS appliance.  (See my earlier blog posting about creating shares on the ZFS appliance.)

There is now no additional need to utilise the root account on this vServer. Everything in this section should now be performed as the oracle user or an equivalent non privileged user of your choice.
For simplicity the OTD installer will be installed using a silent installer approach below is an example of how this can be achieved.


 
$ oracle@OTDAdminServer ~]$ /mnt/common/general/OTD11117/Disk1/runInstaller -silent -waitforcompletion -invPtrLoc /home/oracle/oraInst.loc ORACLE_HOME=/u01/OTD/ SKIP_SOFTWARE_UPDATES=TRUE
  The ORACLE_HOME location is where the OTD binaries will be laid down and this will populate the primary binary location on the share. The invPtrLoc will need to point to a previously created file called oraInst.loc which should contain the location of the Oracle Inventory. In this example this file contains the following:


 
$ cd /home/oracle
$ cat oraInst.log
inventory_log=/home/oracle/oraInventory
$
 For more information on silently installing Oracle Traffic Director refer to the documentation
Once the installation is complete the OTD admin server can be created and started. To create the admin server the following command can be used:


 
$ oracle@OTDAdminServer ~]$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/
 When this is executed you will be prompted to provide an OTD admin user password of your choice and then the admin server will be created with its home directory under the /u01/OTDInstances share called admin-server
The admin server can now be started with the following command


 
$ oracle@OTDAdminServer ~]$ /u01/OTDInstances/admin-server/bin/startserv
 When the admin server has started it will be possible as the output will note, to access the admin console from a browser.
Log in and create a new test configuration of your choosing and provide a fake origin server so as to complete the configuration. This is only to demonstrate and the configuration can be later deleted to be replaced with a production one created. Do not deploy the configuration at this stage. Leave the admin console open and for the test configuration enable the plain text report in the virtual server monitoring section. This is done so as to later give us an idea that everything is working as expected in terms of the HA element of OTD.



This completes the work to get the admin server operational and as you can see in terms of OTD no use of the root privilege to configure, start or stop.
Configuring the Admin nodes
The admin nodes now need to be created and started and this is where with earlier releases of OTD there was a heavier requirement for the root account. The new release needs far less and this is demonstrated here by the fact that in order to minimise root use it is now possible to create and start and stop admin nodes without root. In this section we will do this.

Setup the first Instance node

After logging on to the first of the admin nodes (OTDNode1) as root there is an initial requirement to set up permanent shares to both the primary OTD install location and the configuration as there was with the admin server.  These shares are the same as were mounted for the Administration Server.  The mounting of these shares are the only time you require root access for this section.
Using a non privileged account such as the pre-configured oracle account you can now create an admin node and register it with the admin server previously created and started. Prior to running the command ensure the following.
  1. Create a sub-directory under /u01/OTDInstances with the hostname of this vServer
  2. Ensure that the admin server hostname is resolvable on the private IPoIB vNet in the /etc/hosts file
  3. Ensure that the admin node host name is resolvable in the admin server /etc/hosts file on the IPoIB private vNet created
The following example command will show the creation of the admin node.


 
$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/OTDNode1 --server-user=oracle --admin-node --host=OTDAdminServer
The share primary binary location is used to launch the command and you will notice that the admin node server-user is the non privileged oracle user. After providing the admin password and accepting the self signed certificate the admin node should be created.
It is now possible to start the admin node using the startserv script located in the OTDNode1 bin directory. Note previously in order to utilise an HA OTD configuration this would need to be executed as root or as a user in the sudoers list.

Setup the Second Instance Node

Now that the first admin node is running it is possible to do the same with the second admin node.
Before the admin node can be created and started it needs to install OTD into the secondary binary location using a variation of the silent install for the admin server. This will mean mounting temporarily the /export/common/general share and then using the same kit to install OTD to the Secondary_Install share.
The second admin node should have the following shares mounted permanently


/export/OTDExample/Secondary_Install -> /u01/OTD
/export/OTDExample/OTDInstances -> /u01/OTDConfiguration
  
Thus the second instance will run binaries from a separate install from the other instances but the running configuration is located under the same share.

Prior to running the create command ensure the three pre-requisites as above are complete then you can run as an example


$ /u01/OTD/bin/tadm configure-server --user=admin --java-home=/u01/OTD/jdk/ --instance-home=/u01/OTDInstances/OTDNode2 --server-user=oracle --admin-node --host=OTDAdminServ
 
 One created the second admin node can be started using the startserv script under the OTDNode2 instance directory.  

Deploying a basic configuration

The next step is to utilise the admin console or indeed the command line to deploy the current basic configuration and create instances of it on the 2 admin nodes already running. This will not mean we have a highly available loadbalancing pattern but will at this stage mean we have 2 instances hosting the loadbalancing configuration that can be independently accessed on the public EoIB ip addresses assigned to each of the vServers when they were created. We will use the admin console to firstly ensure the 2 admin nodes are available and running and then to deploy the configuration to them both.

Here we can see we have 2 operational admin nodes with no instances deployed to them.
By hitting the deploy button to the top right and selecting the 2 otd admin nodes and NOT the admin server we can deploy the configuration and have 2 instances created one on each node.

The instances can now be started from the admin console. To verify the 2 instances are working it is possible in separate browser tabs to access the instance on the public EoIB ip address assigned to it and get performance metrics from it.  An example url is shown below and by using the IP address for each of the instances then metrics from both instances will be displayed.

http://<OTDNode1-EoIB-IP> :8080/.perf
At this stage it is also possible to verify that the entire OTD process tree for an admin node is all executing as the non privileged user as this example shows:


oracle 19808 0.0 0.0 25648 812 ? Ss May14 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/admin-server/confi
oracle 19809 0.0 0.3 149568 15156 ? S May14 0:17 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config
oracle 19810 0.1 3.5 769996 141768 ? Sl May14 1:23 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/co
oracle 19897 0.0 0.0 11560 968 ? Ss May14 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0
oracle 12508 0.0 0.0 11560 340 ? S 02:13 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa
oracle 12609 0.0 0.0 25648 812 ? Ss 02:16 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/net-test/config -r
oracle 12610 0.3 0.3 135016 12232 ? S 02:16 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r
oracle 12611 0.0 0.5 259904 23304 ? Sl 02:16 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config

Creating a failover group

Now we need to create a failover group to enable the configuration deployed to be made highly available. It is at this point that there is still a requirement to use the root privilege. The first stage in enabling an HA loadbalancing configuration is to create a failover group. This does not require root permission to create, however it will need root permission to activate. Creating a failover group can be done either via the command line or the admin console. For this example we will use the command line. From any of the three vServers log in as the oracle user and issue the following example command to create a new failover group within the configuration already created and active.

[oracle@OTDAdminServer ~]$/u01/OTD/bin/tadm create-failover-group --config=test --virtual-ip=<ip_on_public_EoIB --primary-node otdnode1 --backup-node=otdnode2 --router-id=230 --verbose --port=8989 --user=admin --host=OTDAdminServer --network-prefix-length=21
Enter admin-user-password>
OTD-63008 The operation was successful on nodes [otdnode1, otdnode2].
Warning on node 'otdnode2':
OTD-67334 Could not start failover due to insufficient privileges. Execute the start-failover command on node 'otdnode2' as a privileged user.
Warning on node 'otdnode1':
OTD-67334 Could not start failover due to insufficient privileges. Execute the start-failover command on node 'otdnode1' as a privileged user.
  
The failover group is created however you will see that 2 warnings are given. This is because although the non privileged user can create a failover group, it does not have permission to start the keepalived process. In our scenario the 2 instances are already running and so this warning is generated. This is the first of the changes in 11.1.1.7 to be encountered. The new command start-failover allows only what is required to be started as root to be performed separately. In order to complete the ability to run OTD in an HA configuration the command needs to be run as root or as a non privileged user who is part of the sudoers list locally on each of the admin nodes. This warning would not be seen if the instances were not already running but if an attempt was made to start an instance as a non privileged user in a failover group a warning would be issued about the need to start the failover group separately.
In order to minimise the use of root the preferred way to issue the start-failover command is via the non privileged user after adding it to the sudoers list. The specific permission required to allow this is as follows for the non privileged user and in this case we use the oracle user.

# cat /etc/sudoers | grep oracle
oracle ALL=(root) /u01/OTD/bin/tadm
 To set this up as root on each of the 2 admin nodes execute visudo and add the line to give the oracle user the privilege to run the tadm command.

Once this has been set up the start-failover command can be issued locally on each of the admin node vServers as this example shows.


$ sudo /u01/OTD/bin/tadm start-failover --instance-home=/u01/OTDInstances/OTDNode1 --config=test
[sudo] password for oracle:
OTD-70198 Failover has been started successfully

It is now possible from a browser to access the text based performance statistics as before but on the public vip assigned to the HA OTD configuration. A quick look at the process tree for an OTD admin node will now show what this new command has done.

oracle 19808 0.0 0.0 25648 812 ? Ss May14 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u orac
oracle 19809 0.0 0.3 149568 15156 ? S May14 0:17 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u oracl
oracle 19810 0.1 3.5 770244 142548 ? Sl May14 1:26 \_ trafficd -d /u01/OTDInstances/OTDNode1/admin-server/config -r /u01/OTD -t /tmp/admin-server-4baa05d0 -u o
oracle 19897 0.0 0.0 11560 968 ? Ss May14 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0/.cgistub_19810
oracle 12508 0.0 0.0 11560 340 ? S 02:13 0:00 \_ /u01/OTD/lib/Cgistub -f /tmp/admin-server-4baa05d0/.cgistub_19810
oracle 12857 0.0 0.0 25648 808 ? Ss 02:25 0:00 trafficd-wdog -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
oracle 12858 0.0 0.3 135016 12236 ? S 02:25 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
oracle 12859 0.0 0.5 259908 23308 ? Sl 02:25 0:00 \_ trafficd -d /u01/OTDInstances/OTDNode1/net-test/config -r /u01/OTD -t /tmp/net-test-b92c33b1 -u oracle
root 12986 0.0 0.0 35852 504 ? Ss 02:29 0:00 /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/OTDNode1/net-test/config/keepalived.conf --pid /tmp/net-
root 12987 0.0 0.0 37944 1012 ? S 02:29 0:00 \_ /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/OTDNode1/net-test/config/keepalived.conf --pid /tmp/root 12987 0.0 0.0 37944 1012 ? S 02:29 0:00 \_ /usr/sbin/keepalived --vrrp --use-file /u01/OTDInstances/O
As you can see, now only the keepalived processes are running as root with everything else run as oracle. In previous releases a lot more of the process tree would have been run as root in order to achieve the same thing.

Starting and stopping the instances hosting the failover group

There are some important points to note around how with a minimal root use set up loadbalancing instances are stopped and started. As the previous section described, with an instance in a failover group, there is a requirement to start the failover element as a privileged user. It is still possible either through the admin console or via the command line to start an instance non privileged, however a warning will be generated in the console messages to remind the administrator to explicitly start the nodes associated keepalived configuration as a privileged user only if started through the admin console. Until this is done the vip is not operational.

It is important to note that if starting via the CLI the instance will start but no warning about explicitly starting the failover group will be given and this means that an administrator could think the HA vip was working when it will not be. In any scripted startup it is important to ensure after starting the instance the start-failover command is issued locally to each of the instances in the group.
Stopping an instance is also a two stage process, as an instance attempted to be stopped via a non privileged user either through the admin console or the CLI will fail until the associated failover group on the node on which it is running is stopped. This can be shown by the following examples.
From the admin console attempting to stop instances before stopping the failover group.



From the CLI attempting to stop an instance.

$ /u01/OTDInstances/OTDNode1/net-test/bin/stopserv
[ERROR:32] server is not stopped because failover is running. Before stopping the server, execute stop-failover command as a privileged user
For either approach, prior to stopping the instance, there is a need to run the stop of the failover group locally as a privileged user as seen below.

$ sudo /u01/OTD/bin/tadm stop-failover --instance-home=/u01/OTDInstances/OTDNode1 --config=test
It is worth pointing out however that the number of times an instance needs to be stopped and restarted is minimal due to the way most of the configuration changes made are dynamically applied to an instance. By utilising the administration console or including in CLI based configuration changes the ‘reconfig-instance’ command stops and starts can be minimised. 
 

Changing the primary failover instance

The situation may arise where there is a need to ‘flip’ the current owner of the OTD HA vip to the backup node. One such occasion being a possible maintenance window and ordinarily this would be achieved by issuing the ‘Toggle Primary’ button in the admin console or the tadm set-failover-group-primary command. These are still applicable and can be initiated by a non privileged user however there is an additional step that needs to be performed from the CLI locally on the admin nodes.
If you use the admin console to toggle the primary you will see a warning generated however the console will acknowledge the new primary group member.

Testing to see if the backup is now primary by accessing it will show that the existing primary is still the primary, despite the console saying otherwise.
Similarly if the switch is made using the CLI the following warning is given but after the warning the administration ‘thinks’ the switch has been made.

$ /u01/OTD/bin/tadm set-failover-group-primary --config=test --virtual-ip=10.1.5.126 --primary-node=otdnode2 --user=admin --host=OTDAdminServer
Enter admin-user-password>
OTD-63008 The operation was successful on nodes [otdnode1, otdnode2].
Warning on node 'otdnode2':
OTD-67335 Could not restart failover due to insufficient privileges. Execute the start-failover command on node 'otdnode2' as a privileged user.
Warning on node 'otdnode1':
OTD-67335 Could not restart failover due to insufficient privileges. Execute the start-failover command on node 'otdnode1' as a privileged user.
$ /u01/OTD/bin/tadm list-failover-groups --config=test --user=admin --port=8989 --host=OTDAdminServer --all Enter admin-user-password>
10.1.5.126 otdnode2 otdnode1

In order to actually make the toggle active the failover group needs to be restarted and this will force a re-read of the keepalived.conf which will ensure the vip is plumbed up on the new primary host. This command needs to be executed as the privileged user on both instances. It is therefore paramount to ensure that if there is a need to toggle the primary vip host that this two stage process is carried out.

[oracle@OTDNode2 ~]$ sudo /u01/OTD/bin/tadm stop-failover --instance-home=/u01/OTDInstances/OTDNode2 --config=test
[sudo] password for oracle:
OTD-70201 Failover has been stopped successfully.
[oracle@OTDNode2 ~]$ sudo /u01/OTD/bin/tadm start-failover --instance-home=/u01/OTDInstances/OTDNode2 --config=test
OTD-70198 Failover has been started successfully.

Deleting a failover group

Deleting a failover group is also now with minimal root usage, a two stage process that needs to be understood. If you decide to delete a failover group from the admin console or the CLI and the instances in the failover group are running then although the console and the CLI will allow a non privileged user to delete the group, a warning will be generated to alert that the failover group has not been stopped. The keepalived.conf will be removed; however the vip will still be active. It will only be destroyed once the stop-failover command has been executed locally by the privileged user on all instances of the failover group. It is important to realise this and perform the 2 operations close together to ensure that when removing a failover group the vip associated is stopped as well.
Here is an example of the warning in the admin console.

Known Issues

There is a known issue currently outstanding with Oracle Traffic Director that will cause issues if after initial configuration, the administration server is stopped and restarted and the node names chosen to be used in the deployment contain any upper case letters. The issue manifests itself in the admin console where messages like the following are seen Error in parsing configuration TechDemoMWConfig. OTD-63763 Configuration 'test' has not been deployed to node 'OTDNode1'’. After this any attempts to subsequently modify the configuration will fail. This is a bug resolved in a later release and so to workaround this for now there are 2 choices.
Use only lowercase node names for the configuration
Log in as the non privileged user to the administration server vServer and edit the ../config-store/server-farm.xml’ for the administration server node and convert the node names to all lowercase – eg otdnode1. Save the file and then restart the administration server.

Setting up manually simple init.d scripts for vServer start/stop

Because none of the nodes has been configured or executed as root, the new feature available in 11.1.1.7.0 to automatically create init.d startup/shutdown services is not available. Therefore in order to have both nodes and instances start and stop cleanly when a vServer is shutdown or started, manually created and configured /etc/init.d scripts need to be put in place. This clearly requires the use of the root account but is a one off exercise to set up. Here we show an example that can be used to give at least a rudimentary start/stop script for your OTD minimal root privilege environment. These scripts are far less rich in terms of functionality than those provided by the product.
For the administration server node only one init.d script is required to be created and in this example this is called otd-admin-server. An example can be seen here that can be tailored to suit a specific environment:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#
ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/AdminServer"
INSTANCE_NAME="admin-server"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script
case "$1" in
start)
COMMAND="$INSTANCE_HOME/admin-server/bin/startserv"
su - $OTD_USER -c $COMMAND
echo "$ORACLE_HOME/bin/tadm start-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
;;
stop)
echo "$ORACLE_HOME/bin/tadm stop-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
COMMAND="$INSTANCE_HOME/admin-server/bin/stopserv"
su - $OTD_USER -c $COMMAND
;;
status)
ps -ef | grep $INSTANCE_NAME
;;
*)
echo $"Usage: $0 {start|stop|status}"
exit 1

esac
Once saved and made executable with a chmod +x.
Issuing the following as root within the /etc/init.d directory will configure the script to be started and stopped appropriately as the vServer is.

# chkconfig otd-admin-server on

On each of the admin nodes, 2 scripts need to be created so as to be able to stop the highly available instance and failover group independently of the admin node.
For the admin nodes create a script in the /etc/init.d directory called otd-admin-server and tailor the example below to reflect your environment:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#
ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/OTDNode1"
INSTANCE_NAME="admin-server"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script
case "$1" in
start)
COMMAND="$INSTANCE_HOME/admin-server/bin/startserv"
su - $OTD_USER -c $COMMAND
echo "$ORACLE_HOME/bin/tadm start-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
;;
stop)
echo "$ORACLE_HOME/bin/tadm stop-snmp-subagent --instance-home $INSTANCE_HOME" > ${OTD_TADM_SCRIPT}
chmod 755 ${OTD_TADM_SCRIPT}
su - $OTD_USER -c ${OTD_TADM_SCRIPT}
rm -f ${OTD_TADM_SCRIPT}
COMMAND="$INSTANCE_HOME/admin-server/bin/stopserv"
su - $OTD_USER -c $COMMAND
;;
status)
ps -ef | grep admin-server
;;
*)
echo $"Usage: $0 {start|stop|status}"
exit 1

esac
 
Now create a second script on each of the admin nodes this time called otd-net-test for the highly available instance and again this example can be tailored to suit your environment. The name being based is the configuration the instance is hosting:

#!/bin/sh
#
# Startup script for the Oracle Traffic Director 11.1.1.7
# chkconfig: 2345 85 15
# description: Oracle Traffic Director is a fast, \
# reliable, scalable, and secure solution for \
# load balancing HTTP/S traffic to servers in \
# the back end.
# processname: trafficd
#

ORACLE_HOME="/u01/OTD"
OTD_USER=oracle
INSTANCE_HOME="/u01/OTDInstances/OTDNode1"
INSTANCE_NAME="net- TechDemoMWConfig"
PRODUCT_NAME="Oracle Traffic Director"
OTD_TADM_SCRIPT=/tmp/otd_script

case "$1" in
start)
    COMMAND="$INSTANCE_HOME/$INSTANCE_NAME/bin/startserv"
    su - $OTD_USER -c $COMMAND
    echo "$ORACLE_HOME/bin/tadm start-failover --instance-home $INSTANCE_HOME --config=TechDemoMWConfig" > ${OTD_TADM_SCRIPT}
    chmod 755 ${OTD_TADM_SCRIPT}
    $OTD_TADM_SCRIPT
    rm -f ${OTD_TADM_SCRIPT}
    ;;
stop)
    echo "$ORACLE_HOME/bin/tadm stop-failover --instance-home $INSTANCE_HOME --config=TechDemoMWConfig" > ${OTD_TADM_SCRIPT}
    chmod 755 ${OTD_TADM_SCRIPT}
    $OTD_TADM_SCRIPT
    rm -f ${OTD_TADM_SCRIPT}
    COMMAND="$INSTANCE_HOME/$INSTANCE_NAME/bin/stopserv"
    su -$OTD_USER -c $COMMAND
    ;;
status)
    ps -ef | grep $INSTANCE_NAME
    ;;
*)
    echo $"Usage: $0 {start|stop|status}"
    exit 1
esac
 
 
Note these scripts not only stop and start the instance and admin nodes, the instance one also ensures that the failover group is started as the root privileged user. It is also worth pointing out that there is an additional command to start and stop the SNMP sub-agent, this is optional and only required if it is the intention to have, at a later date your OTD estate to be monitored via an Enterprise Manager 12C agent.
Once the 2 scripts are complete, as root, the chkconfig <script_name> on command can be executed to enable them.
The new services can be tested by running the following as the root user on each of the vServers in turn to see that the OTD estate stops and then restarts.


service otd-net-test stop
service otd-admin-server stop
service otd-admin-server start
service otd-net-test start
When complete it will be possible to stop any one of the vServers and maintain a load balancing capability and when restarted the OTD components on the vServer will automatically restart on startup.
Utilising this note you will now have a highly available Oracle Traffic Directory configuration that is only using the root privilege where it is strictly required to do so.