Monday, 10 March 2014

EM12c and Exalogic 2.0.6

Introduction

In an earlier posting I wrote about the process used to integrate a virtualised Exalogic rack with EM12c.  Since that article was written both EM12c and Exalogic have had an upgrade.  This posting is a short one to highlight the changes done/work needed to get the following combination of products working together:-
  • Exalogic - 2.0.6.n.n
  • EM12c - 12.1.0.3.0
In EM12c the plugins now used are:-
  • Oracle Virtualization - 12.1.0.5.0
  • Oracle Engineered System Healthchecks - 12.1.0.4.0
  • Sun ZFS Storage Appliances - 12.1.0.4.0

Agent Deploy

The steps to deploy the agent to EMOC and OVMM is the same as previously mentioned.  The changes from the previous instructions are:-
  • Now EMOC and OVMM are both on the same vServer so there is only one agent that needs to be deployed.
  • No need to create the /var/exalogic/info/em-context.info file.  However you do need to ensure the /var/log/cellos/SerialNumbers file is created and populated.  (See deployment notes)
  • The sudo permissions can be simplified down to the oracle user permissions of:-
    oracle ALL=(root) /usr/bin/id, /opt/oracle/em12c/agent/*/agentdeployroot.sh
Note 1 - I have done this agent deployment a number of times now and because the labs I am working in often do not have DNS fully setup I end up using the local hosts files for name resolutions.  It is critical that both the OMS server and the agent deployment target server have the target hostname fully qualified in their hosts files.  Otherwise the latter steps of deployment when EM12c attempts to "Secure Agent" will fail to startup the agent.

Note 2 - The same process applies to deploying an agent to a guest vServer.  However, be warned I did this on a vServer using the base 2.0.6.0.0 guest based template that had a couple of other small applications on it.  The agent deployment uses a reasonable amount of disk space (~1Gb) and during the deployment this can fail.  The log on the OMS server was reporting an error "pty required false  with no inputs".  It turns out that this was because the first step in the "Remote Prerequisite Check Details" performs an unzip of the installation media which was running out of disk space and hanging.  Killing the unzip process caused the step to fail and indicate that the likely cause was disk space.  To avoid this in the first place ensure there is adequate disk space on the vServer.

Note 3 - In order to ensure that a vServer is discovered correctly as part of the Exalogic rack it is necessary to ensure that the file /var/log/cellos/SerialNumbers is generated from the dmidecode command output.  The script shown below can be used to generate this. Simply cut and paste this into a file called generateSerialNo.sh, make it executable and run it on the vServer.


   serialCode=`dmidecode |grep Serial|grep -v Not|cut -d ":" -f2|cut -d " " -f2`
    if [ -f /var/log/cellos/SerialNumbers ]; then
        echo "File /var/log/cellos/SerialNumbers already exists."
    else
        mkdir -p /var/log/cellos
        echo "====START SERIAL NUMBERS====" > /var/log/cellos/SerialNumbers
        echo "==Motherboard, from dmidecode==" >> /var/log/cellos/SerialNumbers
        echo "--System serial--" >> /var/log/cellos/SerialNumbers
        echo "$serialCode" >> /var/log/cellos/SerialNumbers
        echo "--Chassis serial--" >> /var/log/cellos/SerialNumbers
        echo "$serialCode" >> /var/log/cellos/SerialNumbers
    fi



Note 4 - The configuration of the agent will fail if it cannot resolve its own hostname to a valid IP address. i.e. Make sure that /etc/hosts has an entry in it that specifies the hostname of the vServer you are adding.  (When creating vServers this does not happen by default as the names put into the /etc/hosts file are <server name>-<Network IP - dash separated>.)

Setup ZFS Appliance target

The setup process for the ZFS appliance is the same as described previously apart from a minor change that is required to the ZFS configuration.  During the process of seting up the appliance as a target it is necessary to run the "Configuring for Enterprise Manager monitoring" workflow.  This creates the user that will be used by the agent to log onto the appliance and gather the stats.  However the user that is created is defined as a kiosk user.  It is necessary to deselect this option from the user created (oracle_agent) because EM12c requires access to some other data from the console.  If this option is not deselected then EM12c discovers the appliance but raises an alert "Cannot monitor target. Incorrect Credentials"

Having corrected the kiosk user then in the EM12c interface select the ZFS target and under Monitoring --> All Metrics select the metrics you are interested in and enable the ones you are interested in.  (By default they are disabled.)

Conclusion

The general process for deployment is essentially unchanged, just a few minor variations on a theme. 


Tuesday, 18 February 2014

Some Exalogic ZFS Appliance security tips and tricks

Introduction

The ZFS appliance that is internal to Exalogic has been configured specifically for the rack, however while it is "internal" there are still a number of configuration options that should be considered when setting up a machine for production usage.  This blog posting is not an exhaustive list of all the security settings that can be done for a ZFS appliance but does pick off some configuration values that should be thought about whenever the appliance is being setup for use.

User Security

Once an Exalogic rack has been installed by default there will be a single root user of the ZFS array defined.  It is likely that other roles may need to create and manage storage space for their specific accounts.  Handing out the root privileges to other users is not recommended.

The permissions are determined via a three layered system.
  • Authorizations
    • Configuration items have CRUD (Create, Read, Update, Delete) like actions that can be taken.  
  • Roles
    • Each role defines a number of authorizations that can be performed by a user with that role
  • User
    • Defines either a local or remote directory based user that is allowed to authenticate to the ZFS appliance, the roles and hence authorizations will determine which activities the user is able to perform.
In most situations that I have come across the ZFS appliance is administered by the rack administrator so all system level configuration can be performed by one user.  However, there is often a need to be able to provide delegated administration to either an individual share or to all shares in a project.

Consider a scenario where the vDC is to be setup with an account that will host all vServers for Application A, the application may require some shares created to host the binaries and configuration files.  The machine administrator can initially create a project, say called application_a.  Then the role for administrating the project can be created.  To do this click on Configuration --> Users and click on the + symbol beside the Roles to create a new role. 
Create role to administer shares for a specific project
For the authorizations select the scope to be that of the Projects and Shares, then chose the exalogic storage pool and the project that was created earlier.  In this scenario we select all authorizations for all shares so that the user can create multiple shares as needed, although all within the context of the project.  (Click on Add to add the Authorisations selected and then click on add to create the user.) It is possible to only allow specific actions on the project or limit the administration to a single share.

Having created the role we now need to create a user and allocate the role to that user.

Creating a user with restricted permissions


In the example shown above we create a local user that will only have the role to administer the Application A project as limited by the selection of the roles associated with the user. 

Should that user then attempt to make a change to anything other than their project/share the system will respond with the following message.

Error reported when the authorisation has not been granted.



Project/Share Security

Having defined a user with limited access to the ZFS device we now turn our attention to the configuration that provides a level of security to help prevent malicious attacks on an NFS mounted share.  Most of the configuration settings for a share can also be set at the project level, as such we will discuss these first and remember that if necessary the inheritance can be overridden to give an individual share a unique configuration.

  • General
    • Space Usage
      • The quota can be used to prevent any shares in this project from exceeding a set size.  Handy to set to ensure that this project does not use all the available disk space on the device.
    • Mountpoint
      • Not strictly a security feature but it is good practice to always ensure that the project has a unique mountpoint defined.  By default a share will append the share name onto the project's mountpoint to determine the location in the ZFS appliances directory structure the data for the share.  A format that we use is to have all shares given a mount point of /export/<project name>/<share name>
    • Read Only
      • Obviously not possible in many cases but certainly at the share level you may wish to have the share setup as Read/Write initially and then change it to be read only so that users cannot accidentally delete the data on it.  (For example a binaries only filesystem.) During upgrades it could be switched back to read/write for the duration of the patching.
    • Filesystems - LUNS
      • Not directly applicable for Exalogic today but certification to use the iSCSI facility of the ZFS appliance is underway.  At which point then setting the user, group and permissions for LUNs created will be required.
  • Protocols
    • NFS 
      • Share Mode
        • Set to None so that by default a client cannot mount the filesystem unless they have specifically been given permission as an exception
      • Disable setuid/setgid file creation
      • Prevent clients from mounting subdirectories
        • Obviously security related but it will be up to the individual usecase to determine appropriate usage.
      • NFS Exceptions
        • Having set the share mode to None the usage of NFS Exceptions to allow clients to mount the share is mandatory. There are three mechanisms available to restrict access to a particular host or set of hosts.  Restricting by Host with a fully qualified domain name, by DNS domain or by network. 
          In general I have found the restriction by network to be the most useful but that is partly because DNS domains are often not used when setting up for short term tests.  When using the Network Type specify the "entity" to be a network using the CIDR notion.  So for example, I might want to restrict the share to only vServers in the network range 172.17.1.1 through to 172.17.1.14 in which case the entity should be set to 172.17.1.1/28.  The netmask can be taken down to an individual IP address /32 if only one vServer is allowed to mount the share.
          The access mode set to read/write or read only as is needed for the share usage.
          Root Access indicates if the root user on a client machine would have the root access to files on the share.  In general NFS terminology this is known as root squash.
Example NFS setup

    • HTTP, FTP & SFTP
      • Leave with share mode of None unless there is a specific need to allow these protocols to access data held on the share.
  • Access
    • This is a tab that has specific information for a share (other than the ACL Behaviour) so should be set independently for each share.  The Root Directory Access specifies the user/group and the file permissions that will be applied to the share when mounted on the client machine.  If using NFSv4 and hence some sort of shared user repository then the user and group are validated against this store, otherwise you can use values such as nobody:nobody to specify the user:group or enter the UID/GID of the users.  These IDs must map onto a user:group ID in the client machine.   The directory permissions set according to the needs of the application.
    • ACL
      • Very fine grained access to files and directories is managed via Access Control Lists (ACLs) which describe the permissions granted to specific users or groups.  More detail available from Wikipedia or in the NFSv4 specification (page 50) that is supported by the ZFS appliance.  In general I have found the default settings have been enough for my needs where the world can read the ACLs but only the owner has permission to change/delete them.

Administration Security

The ZFS appliance has many configuration settings  however to lock down the appliance it is possible to turn off a number of the services or re-configure them from the default to minimise risk of intrusion.
  • Data Services
    • NFS
    • iSCSI - If not used then disable the service.  (As of Exalogic 2.0.6.1.0 iSCSI is only supported for the Solaris Operating System.  In future releases it will also be supported for Linux/virtualised racks.)
    • SMB, FTP, HTTP, NDMP, SFTP, TFTP can all be disabled unless specifically needed for some function.  (For example, I quite often use the HTTP service to allow easy access to some media files or to host a yum server.)
  • Directory Services
    • Generally use either NIS, LDAP or Active Directory for a shared identity store.  Turn off the services you are not using.
  • System Settings
    • Most of the system settings are useful to have enabled on the rack.  The default settings of having Phone home and Syslog disabled are the best bet.
  • Remote Access
    •  SSH is almost certain to be required to administer the device via the CLI and using scripted configurations.  However if you setup another user with all necessary permissions then it is possible to change "Permit root login" to deselect this option.  This means that it will no longer be possible to use the root account to ssh onto the rack.  NOTE - If using exaBR, exaPatch, exaChk etc. then these rely on ssh access as root so the flag would need to be toggled back prior to running these tools.
 By default the appliance can be administered on all networks.  This can be tightened up so that administration can only occur over the specific management networks.  To disable administration on a particular interface select the Configuration --> Network --> Configuration tab and then highlight the Interface that you want to disable and click the edit icon to change the properties and deselect the Allow Administration option.

Preventing administration on a particular interface
It is possible to prevent administration on all the networks but the recommendation is to simply prevent it from the networks that a guest vServer can join.  Namely the IPoIB-vserver-shared-storage and the IPoIB-default.  These interfaces can be identified by the IP addresses or partition keys in the description shown in the browser interface.  The IPoIB-default network belonging to "via pffff_ibp1, pffff_ibp0" and the storage network will normally have an ip address in the 172.17.n.n network and be on partition 8005.  (via p8005_ibp1, p8005_ibp0) The partition for the shared storage may vary as it is configurable as part of the Exalogic Configuration Utility on the initial installation.

The effect of deselecting "Allow Administration" on the interface means that a browser will see an "Unable to connect" error and if the ssh interface is used then the following message is shown.

# ssh donald@172.17.0.9Password:
Password:
Last login: Tue Feb 18 11:51:00 2014 from 138.3.48.238
You cannot administer the appliance via this IP address.
Connection to 172.17.0.9 closed.

Summary

In conclusion, there are actually relatively few actions to be taken from the default settings of an Exalogic ZFS appliance but the following should always be considered:-
  1. Setup users to administer the projects and shares that are limited to only have write access to the shares they need.
  2. For each share make certain that only the protocols that are needed are allowed access (normally NFS only, and potentially iSCSI in the future) and ensure that only specific hosts are allowed to mount the shares
  3. Prevent administration on the networks that are connected to guest vServers.


Tuesday, 11 February 2014

Websocket support in Oracle Traffic Director

Introduction

In recent releases the Websockets function has been added as a capability to both WebLogic 12.1.2 and Oracle Traffic Director (11.1.1.7.0).  This blog posting is another courtesy of my colleague Mark Mundy who has been doing some investigation into what is involved to setup an architecture with a browser accessing a web application running websockets on WebLogic and proxied/load balanced through OTD.

The WebLogic documentation covers the setup of Websockets at the server side and the OTD documentation explains what is needed for the load balancer.  This blog posting is intended to show how the two can be used together on the latest Exalogic virtualised release to have OTD load balance a simple example websocket application with relative ease. This posting will not go through every aspect of the setup as the assumption is that the reader has some familiarity with Exalogic Control, WebLogic and OTD.  The example application being used is the one that ships as part of the example set for WebLogic 12.1.2 with a slight modification as will be highlighted.
The deployment topology that is used in this scenario is shown below.
WebSocket deployment Topology - Browser --> OTD Load Balancer --> Application(WLS Cluster)

Setting up the Exalogic Environment 

Depending on how complex you want to make the environment, will determine what needs to be set up. In order to demonstrate the basic load balancing while ensuring that there is no direct network connectivity between the client browser and the back end WLS managed server hosting the websocket application, a minimum of 2 vServers would be needed. In order to test out features such as load balancing during failure scenarios at both WLS and OTD ideally 5 would be created. For this example it will be assuming 5 vServers are created and the topology reflected in the following diagram shows the networking setup.
 
Exalogic vServer deployment topology

WebLogic Installation and Configuration

WebLogic 12.1.2 is the minimum version that can be used for Websocket support, select the 12c Generic Download from the Oracle download site.  (This will have a dependency on the latest 64bit java version so if not already installed on your vServers then install prior to installing WLS.)
Install the Complete Installation of WLS 12.1.2 using the installed 64 bit Java 7.  Installing the complete installation will ensure the example applications are available and it is the Websockets example that will be used.  Once WLS is installed complete the post installation configuration to ensure this happens. For more details on getting the examples available see this link
The specific websockets example application used in this document will be introduced after a specific WLS domain is created to host it.
Rather than use the WLS examples domain a new domain will be used so as to keep things as simple as possible but at the same time allowing the ability to have more flexibility in the testing. Further detail about the WLS implementation is available in the documentation.

A new WLS domain needs to be created using the domain wizard.  In the topology shown above the two instances in a cluster that listen on the private vNet and the administration server uses this network as the default but also has an additional HTTP channel setup to listen on the EoIB public network so that administration becomes simple from outside the Exalogic rack.  The websocket application can then be deployed to the cluster such that is is only accessible to the client population through the OTD tier, providing a level of security.

The example that ships with WebLogic will only work in the scenario where the browser and WLS managed server are both on the same host. To resolve this issue we make a small change to the code in the example before building the application and deploying to the cluster.  In the example source code identify exampleScript.js and change the line that opens up the websocket.
This Javascript file is normally located in:-
<WLS_Home>/user_projects/applications/wl_server/examples/src/examples/webapp/html5/webSocket/src/main/webapp/script
Use your editor of choice and search for "localhost:7001" and replace it with " + window.location.host +" as shown below.

...
if (ws == null) {
      ws = new WebSocket("ws://localhost:7001/webSocket/ws");
      ws.onopen = function() {
        addMsg("Server Msg: WebSocket has been started.");
      };

...
becomes
...
if (ws == null) {
      //ws = new WebSocket("ws://localhost:7001/webSocket/ws");
      ws = new WebSocket('ws://' + window.location.host + '/webSocket/ws');

      ws.onopen = function() {
        addMsg("Server Msg: WebSocket has been started.");
      };

...

The directory <WLS_Home>/user_projects/applications/wl_server/examples/src/examples/webapp/html5/webSocket contains full instructions on building and deploying the application.  In our case we only need to re-build the application (ant build) and then use the standard WLS admin console (or WLST or ant if you prefer) to deploy the war file created to the WebLogic Cluster.

Oracle Traffic Director Installation and Configuration

For Websocket support we need to ensure that we install OTD 11.1.1.7.0 or higher.  In our scenario we will install OTD onto the admin vServer and the two OTD instance vServers to end up with an HA cluster hosted on the two instance vServers that will route requests through to the backend WLS cluster.  It is useful to enable monitoring on OTD so that we can easily spot the sessions being created in OTD and through to the back end WLS instances.  To enable monitoring highlight the virtual server and enable the plain text report.  As shown below.

Enable plain text monitoring in OTD

Websocket support is enabled by default in OTD so there is nothing else other than the deployment of the configuration to the cluster before we test out our Websocket application.  However for information if there is a desire to disable websockets for a particular route then this can be achieved via the enable/disable of the "WebSocket Upgrade" for a given route.


Enable/Disable Websocket support in OTD.

With the monitoring in place it is then possible to access the plain text report by accessing the URI http://OTD IP address:port/.perf where the Origin Server statistics are of particular interest.  As highlighted below.


Testing the application

In another browser session/tab it is then possible to access the OTD instance with the URN /websocket.  You can then click the start button to open up the socket and enter text to send to the server.  Currently you should use either Chrome or Firefox but not Internet Explorer as it does not support websockets.


WLS WebSocket example application

By starting up additional browser sessions you can create multiple websockets and the .perf output will reflect the load balancing to the back end WLS instances.


In this example we can see that 3 websocket sessions have been created and they have been load balanced round the two available WLS server instances.

Behind the scenes what is actually happening is that there are two socket connections created for each WebSocket.  The first is between the client browser and the OTD instance and the second from OTD to the WLS instances.  OTD acts to bridge the gaps between the two.  Below is the output from netstat on the OTD instance to show these socket connections.

Browser to OTD instance.
# netstat -an
Active Connections
Proto  Local Address          Foreign Address        State
TCP    138.3.32.68:52732      138.3.49.9:8080        ESTABLISHED


& OTD to WLS
# netstat -an | grep 7003
tcp        0      0 192.168.16.74:39223         192.168.16.71:7003          ESTABLISHED


The nature of establishing multiple sockets to link the load balancer to the back end service means that in the event of a failure by either OTD or WLS then the socket connection will be broken and the client must re-establish the socket again.  Thus for HA connections there is a dependency on the client capturing any exceptions and re-establishing the connection.
Note:- If OTD is setup with a failover group (HA config) then one instance is primary and the second acts as a backup.  Should the primary fail then the websocket would have to be re-established using the backup OTD instance.  Once the primary is available again OTD will automatically fail back which will again break the websocket connection. 



Friday, 31 January 2014

Running DNS (bind) for a private DNS domain in Exalogic

In an earlier post I described a process to setup bind to provide a relay DNS service that can be accessed from internal vServers and the shared storage.  This provides an HA DNS service to the shared storage in particular as without such a setup it will be relying on the non-HA 1GbE network for access to DNS.

The next obvious step in the process is to extend your bind configuration so that a local DNS service can be used for the vServers you create.  This would give name resolution for guests that you do not want included in the external DNS service.

The first step is to setup bind or the named daemon as described in my earlier blog entry.  Ensure that the vServer you are using for the DNS service is connected to an EoIB network and the shared storage, this will mean that it becomes attached to three networks in total.
  1. the EoIB network which will give it access to the main DNS service in the datacenter, 
  2. the vServer-shared-storage which will allow the ZFS appliance to use this as a DNS server 
  3. the IPoIB-virt-admin network.  This is a network that is connected to all vServers so if we make the vServer a full member of this network as described earlier in a post about setting up LDAP on the rack then all vServers created can utilise the DNS services.  All we need to is to configure the network to use the domain service.

Once bind is operational then we can extend the named configuration to include details for an internal domain to the Exalogic rack.  So in this example our datacenter DNS runs on a domain of mycompany.com, for lookups internal to the Exalogic we want to use the domain el01.mycompany.com where el01 represents the Exalogic rack name.   The first step is to edit the main configuration file and add another section to specify that the bind service will be the master for the el01.mycompany.com domain.



# cat /etc/named.conf
options {
    directory "/var/named";

    # Hide version string for security
    version "not currently available";

    # Listen to the loopback device and internal networks only
    listen-on { 127.0.0.1; 172.16.0.14; 172.17.0.41; };
    listen-on-v6 { ::1; };

    # Do not query from the specified source port range
    # (Adjust depending your firewall configuration)
    avoid-v4-udp-ports { range 1 32767; };
    avoid-v6-udp-ports { range 1 32767; };

    # Forward all DNS queries to your DNS Servers
    forwarders { 10.5.5.4; 10.5.5.5; };
    forward only;

    # Expire negative answer ASAP.
    # i.e. Do not cache DNS query failure.
    max-ncache-ttl 3; # 3 seconds

    # Disable non-relevant operations
    allow-transfer { none; };
    allow-update-forwarding { none; };
    allow-notify { none; };
};

zone "el01.mycompany.com" in{
        type master;
        file "el01";
        allow-update{none;};
};
 

The extra section specifies that we will have a zone or DNS domain el01.mycompany.com. Within this zone this DNS server will be the master or authoritative source for all name resolution.  There is a file called el01 which will be the source of all the IP addresses that are served by this server.  Earlier in the configuration is the line

    directory "/var/named";

This specifies the directory that the named daemon will search in for the file called el01. The content of the file is as shown below.


# cat el01
; zone file for el01.mycompany.com
$TTL 2d    ; 172800 secs default TTL for zone
$ORIGIN el2h.mycompany.com.
@             IN      SOA   proxy.el01.mycompany.com. hostmaster.el01.mycompany.com. (
                        2003080800 ; se = serial number
                        12h        ; ref = refresh
                        15m        ; ret = update retry
                        3w         ; ex = expiry
                        3h         ; min = minimum
                        )
              IN      NS      proxy.el01.mycompany.com.
              MX      10      proxy.el01.mycompany.com.

; Server names for resolution in the el01.mycompany.com domain
el01sn-priv   IN      A         172.17.0.9
proxy         IN      A         172.16.0.12
ldap-proxy    IN      CNAME     proxy
 

The properties or directives in the zone file are:-

  1. TTL - Time to live.  If there are downstream name servers then this directive lets them know how long their cache can be valid for.
  2. ORIGIN - Defines the domain name that will be appended to any unqualified lookups.
  3. SOA - Start of Authority details
    1. The @ symbol places the domain name specified in the ORIGIN as the namespace being defined by this SOA record.
    2. The SOA directive is followed by the primary DNS server for the namespace and the e-mail address for the domain.  (Not used in our case but it needs to be present)
    3. The serial number is incremented each time the zone file is updated.  This allows the named to recognise that it needs to reload the content.
    4. The other values indicate time periods to wait for updates or to force refresh slave servers.
  4. NS - Name service - Determines the fully qualified domain for servers that are authoritative in this domain.
  5. MX - Mail eXchange, defines the mail server where mail sent to this domain is to be sent.
  6. A - Address record is used to specify the IP address for a particular name
  7. CNAME - The Cannonical Name which can be used to create aliases for a particular server.
In the example above we have added a few addresses into the DNS domain,
  1. The storage head under the name el01sn-priv.  This means that all vServers will automatically be able to resolve by name the storage for use with NFS mounts.
  2. proxy (or ldap-proxy) is the name that we are using for a server where OTD is installed and configured to be a proxy for an external directory.  Thus enabling all vServers to access LDAP for authentication.  (Useful for NFSv4 mounts from the shared storage.)
So once this is all up and running restart the named service and ensure that your DNS settings in the virt-admin network (in our case) include the search domain for el01.mycompany.com and the IP address for the DNS vServer.  As shown below.  This way every vServer created will be able to use the DNS service.



Thursday, 14 November 2013

Virtualised Exalogic and External DNS Servers

Quite often when configuring Exalogic issues arise with accessing a DNS server,  resulting in delays.  From a management perspective this generally reveals itself as a pause when using ssh to connect to a server of 20-30 seconds.   During management via Exalogic Control DNS issues sometimes cause timeouts in jobs and hence failures. From an application perspective this is often shown up when access to shares on the shared storage take a long time to become available and the creation time or initial read of a file is slow. 

Virtual servers deployed onto Exalogic can easily be setup to access DNS over the 10GbE network either by configuring the Network Services on the EoIB network.  (Select the network that gives access to the 10GbE on your rack and select the "Edit Network Services" action.)  or by simply editing the /etc/resolv.conf file on your vServer to point it to the DNS servers in the environment.  (This could be put into a template if this approach is preferred.)

Editing network services in Exalogic Control
Note - Health Warning - Only attempt to change the network services if you are running Exalogic Elastic Cloud Software with a version of 2.0.6.0.0 or higher!

The shared storage is a slightly different kettle of fish.  When setup it has direct access to the 1GbE management LAN and it is normally through this network that it would gain access to services such as LDAP/NIS or DNS.  However the 1GbE network is not setup to be fault tolerant within Exalogic.  As such a route through the 10GbE network that is fault tolerant should be created.  A DNS service on an vServer can be easily setup that the shared storage can access, following the same principles as was talked about in an earlier blog posting about setting up LDAP for access via internal vServers.

To achieve a similar setup for DNS the following steps should be done:-

  1. Create your vServer with access to at least the 10GbE and the vserver-shared-storage networks.  (Ensure it is marked for HA or alternatively plan for two vservers both running DNS and part of a distribution group.)
  2. Configure the vServer to act as a DNS server.  Can be done using tools like dnsmasq or from the bind package.  The example shown here is using bind to create the service.
    1. Setup a yum repository that your vServer can access.
    2. Install the bind package.
      # yum install bind --skip-broken
      (Notes:-
      • We include the option --skip-broken so that it does not upgrade the packages that bind relies on.  With the rack I tested on there are other utilities that depend on the bind-libs package and upgrading this caused issues with the Infiniband network.  Simply ignoring this mismatch and the named daemon is installed and seems to operate successfully.
      • Not strictly necessary but for testing purposes the unix command nslookup is quite handy.  If this is not already installed then install the bind-utils package.)
    3. Create the /etc/named.conf file with content along the lines of that shown below.

      # cat /etc/named.conf
      options {
          directory "/var/named";

          # Hide version string for security
          version "not currently available";

          # Listen to the loopback device and internal networks only
          listen-on { 127.0.0.1; 172.16.0.14; 172.17.0.41; };
          listen-on-v6 { ::1; };

          # Do not query from the specified source port range
          # (Adjust depending your firewall configuration)
          avoid-v4-udp-ports { range 1 32767; };
          avoid-v6-udp-ports { range 1 32767; };

          # Forward all DNS queries to your DNS Servers
          forwarders { 10.5.5.4; 10.5.5.5; };
          forward only;

          # Expire negative answer ASAP.
          # i.e. Do not cache DNS query failure.
          max-ncache-ttl 3; # 3 seconds

          # Disable non-relevant operations
          allow-transfer { none; };
          allow-update-forwarding { none; };
          allow-notify { none; };
      };
    4. Startup the DNS daemon (named) to ensure it is OK.
      # service named start
    5. Set it up to automatically startup.
      # chkconfig named on
  3. Configure the Storage to include the vServer shared storage IP address in its list of DNS servers.  In our case it is using the Internal vServer IP address of 172.17.0.41 first then would be using other IP addresses via the 1GbE network should that fail.

Configuring DNS on the ZFS Storage Appliance

Thursday, 17 October 2013

Exalogic and CPU Oversubscription

Introduction

Ever since the EECS release 2.0.4.0.0 the facility to oversubscribe the physical CPU on an Exalogic has existed.  The documentation explains how the oversubscription is set and introduces the idea of a ratio and the CPU cap.  However I found it slightly light on detail so this posting will attempt to explain further just how the "vCPU to Physical CPU Threads ratio" and the "CPU cap" interact with each other.

The settings for this feature are editable as part of the virtual Data Center so impacts all tenants users of the rack.  Figure 1 shows the screen shot of the configurable  parameters.  These values can be changed at any time during the lifecycle of the virtual data-centre but the impact of changes must be understood.

Figure 1 - Editing the vDC properties for CPU oversubscription

vCPU to Physical CPU Ratio

The vCPU to pCPU ratio is used by the Exalogic Control placement algorithm.  By changing the ratio from 1:1 to say 1:2 you are effectively doubling the number of vCPUs that can be allocated to vServers in the datacenter.  When oversubscribed there is the potential for vServers to be competing with each other for access to the actual CPUs at this point the Xen scheduler will commence allocating access to the physical CPUs.

For example, if we consider the situation of a single compute node with 32 hardware threads.  (2 sockets * 8 cores * 2 threads per core = 32) and we are placing vServers with 1 vCPU then, with the ratio of 1:1, we would be able to place 32 vServers on the physcal compute node.  With the ratio set to 1:2 then we would be able to place 64 vServers on the compute node.

This value can be changed at any time and the change will impact all vServers in the datacenter.  Increasing the ratio is not a problem but should the ratio be changing from 1:2 to 1:1 then this is only valid if all existing vServers can fit into the new virtual data-centre.


Xen Hypervisor and CPU Scheduling

Underpinning the Exalogic rack is the Xen hypervisor and this has a scheduler that can control access to the physical CPUs. The scheduler is similar in principle to the linux scheduler however it referees between running guest OSes or domains, (Including the dom0 domain!)   ensuring that the compute power is shared out appropriately to all.  There are a number of scheduling algorithms available with Xen but the Credit Scheduler is the default and has had most development and testing.  You can check the scheduler running using the xm dmesg command.



# xm dmesg
 __  __            _  _    _   _____  _____     ____  __
 \ \/ /___ _ __   | || |  / | |___ / / _ \ \   / /  \/  |
  \  // _ \ '_ \  | || |_ | |   |_ \| | | \ \ / /| |\/| |
  /  \  __/ | | | |__   _|| |_ ___) | |_| |\ V / | |  | |
 /_/\_\___|_| |_|    |_|(_)_(_)____/ \___/  \_/  |_|  |_|
                                                        
(XEN) Xen version 4.1.3OVM (mockbuild@us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) Wed Dec  5 09:11:29 PST 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: console=com1,vga com1=9600,8n1 dom0_mem=2G

...
(XEN) ERST table is invalid
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 3059.044 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d supported pag

...

On Exalogic when we create a vServer we define the number of vCPUs that are allocated to the guest.  Each vCPU equates to a single hardware thread and on creation of the vServer Xen will allocate the CPUs for the guest to use.  The xm vcpu-list command will show just which cores are allocated to a vServer.


# xm vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
0004fb000006000036b78d88370acd11     5     0    22   -b-  315931.3 12-23
0004fb000006000036b78d88370acd11     5     1    15   -b-  210225.2 12-23
0004fb000006000036b78d88370acd11     5     2    17   -b-   94768.4 12-23
0004fb000006000036b78d88370acd11     5     3    13   -b-   99020.7 12-23
0004fb000006000036b78d88370acd11     5     4    19   -b-   97240.0 12-23
0004fb000006000036b78d88370acd11     5     5    16   -b-   90450.2 12-23
0004fb000006000036b78d88370acd11     5     6    20   -b-   85511.0 12-23
0004fb000006000036b78d88370acd11     5     7    14   -b-   74973.3 12-23
0004fb000006000036b78d88370acd11     5     8    23   -b-   75114.4 12-23
0004fb000006000036b78d88370acd11     5     9    16   -b-   65374.4 12-23
0004fb000006000036b78d88370acd11     5    10    14   -b-   64786.6 12-23
0004fb000006000036b78d88370acd11     5    11    20   -b-   64758.8 12-23

0004fb0000060000ea7b5bf71f3a806c     4     0     9   -b-  139777.8 0-11
0004fb0000060000ea7b5bf71f3a806c     4     1     2   -b-  173475.2 0-11
Domain-0                             0     0     4   -b-   69594.9 any cpu
Domain-0                             0     1     5   -b-   55519.9 any cpu

This in this example we can see that there is vServer running as ID 5 (actually the Exalogic Control vServer) which has been allocated 12 vCPUs of which Xen has determined that it will run on the CPUs 12-23.  Similarly the vServer ID 4 (the Exalogic Control proxy) has been allocated 2 vCPUs which will run on CPUs 0-11.

The credit scheduler will attempt to make sure that all vServers get access to the CPUs they have been allocated to and provided there is no contention (over-subcription) then there will be very little for the scheduler to do.  However if the compute node has guests demanding more compute than it physically has then the scheduler will kick in and do its best to share out the resource according to the scheduling rules.  The rules are a credit based scoring system which is based on two factors, a weight and a cap.
  • The weight is a number that indicates to the scheduler how much "credit" a vServer will get, thus a vServer with a weight of 1000 will get twices as much CPU as a vServer with a weight of 500 - once the system is under contention for the CPU.   
  • The Cap is an absolute limit on the amount of CPU time that a domain can be allocated, it is defined as a % of a vCPU.  Thus if set to 50 then a domain will only be allowed half the available cycles of a pCPU.
If we think about an example it might help explain how this operates.  Consider two domains that are over-subscribed to a vCPU.  Initially the scheduler will allocate a credit score to a domain.   The credit given is worked out based on the weight, the higher the weight the larger the credit score.  While the domain is consuming CPU cycles the scheduler steadily reduces the credit score for the domain.  The scheduler runs an accounting thread independently of the domains and if the credit for one domain drops (significantly) below the credit score for contenting domains then the other domain gets access to the processor and its credit score starts diminishing.  Periodically the accounting thread runs to top up all the credit scores.
If the cap is being used then effectively you are reducing the compute resource available to a domain as any one vServer will only ever be allocated compute up to the cap level.

On an Exalogic the weight is automatically assigned to a vServer and all vServers are given the same weight, the Cap is configurable in Exalogic Control.  It is set at the vDC level and the value becomes embedded into the vServer configuration file so changing the value in the vDC will only impact vServers created after the setting of the Cap.  An example output of the xm sched-credit output on a couple of compute nodes is shown below


[root@el01cn01 root]# xm sched-credit
Name                                ID Weight  Cap
0004fb000006000071d34e42e43bd82b    12    256    0
0004fb0000060000bb90063bf8efe7d7    13    256    0
Domain-0                             0    256    0


[root@el01cn01 root]# ssh root@el01cn07 xm sched-credit
Name                                ID Weight  Cap
0004fb000006000082dbcec62a976907    15  27500   90
Domain-0                             0    256    0

In this case we can see first the output on node 1 which shows two of the control stack vServers with a weight of 256 and output run on compute node 07 with a customer vServer which has a weight of 27500.  Dom0 gets a weight of 256 as per the control stack.  For the customer vServer the vDC had been changed to make the Cap 90% so it will never use more than 90% of a vCPU.

The Weight is automatically set by Exalogic Control and the Cap is set to whatever the vDC value is at the time that the vServer is created.  These values are put in to the vm.cfg file used to hold the configuration of the vServer.

[root@el01cn01 0004fb000006000082dbcec62a976907]# cat vm.cfg
kernel = '/usr/lib/xen/boot/hvmloader'
vif = []
OVM_simple_name = 'test-vserver'

...
name = '0004fb000006000082dbcec62a976907'
vncpasswd = ''
cpu_weight = 27500
pae = 1
memory = 4096
cpu_cap = 90
OVM_high_availability = True
...

Managing this on the Exalogic

So the key understanding required is:-
  1. The vCPU;pCPU ratio only impacts the vServer placement algorithm and enables Exalogic Control to place vServers onto the rack such that there can be more vServers demanding vCPU than there are available pCPUs.
    1. This impacts all vServers, those previously created and those still to be created.
    2. Changing this will not reduce the CPU available to a given vServer until the system comes under CPU contention!
    3.  The recommendation is not to make this ratio any larger than about 1:4 as a small vServer (with just 1 vCPU) when under contention the lack of compute power can lead to instability and timeouts.
  2. The CPU Cap is useful for the situation where you wish to use CPU oversubscription but want to have deterministic access to CPU.  Effectively this will reduce the power of your vServers but delay the time at which the Xen scheduler is used to control vServer access to physical CPU.  (Arguably rather than using the Cap it would be possible to simply reduce the number of CPUs allocated to each vServer and gain the same density/performance.)
    1. The Cap is set at the vDC level but changing it will not effect previously created vServers.  Thus it is possible to have a virtual deployment with different vServers having different caps.

Thursday, 29 August 2013

Limiting OTD to listen only on the VIP address


In most production deployments OTD is likely to be deployed in a highly available configuration with two instances working as an active/hot-standby  load balancing pair.  (See my earlier posting on running OTD HA.)  In production the environment will almost certainly have a number of security constraints put on it, one of which will be to keep the number of listening ports to an absolute minimum.  In the case of OTD this will mean that it should only listen for incoming requests on the Virtual IP address, by default the listener will listen on all interfaces for the given port.

Thus we want to setup the configuration to listen on just the VIP, as shown below.



Where the IP Address is the IP address for the VIP.

Having done this an attempt to start up the server instance fails with the error messages shown below.


./startserv
Oracle Traffic Director 11.1.1.7.0 B01/14/2013 04:13
[ERROR:32] startup failure: could not bind to <Virtual IP>:8080 (Cannot assign requested address)
[ERROR:32] [OTD-10380] http-listener-1: http://<Virtual IP>:8080: Error creating socket (Address not available)
[ERROR:32] [OTD-10376] 1 listen sockets could not be created
[ERROR:32] server initialization failed

Alternatively if you attempt to start the instance via the GUI then an error message similar to that shown below will appear.



The reason for this failure is because the VIP is only ever active on one node at a time meaning that when the instance attempts to startup if the vServer it is on has not yet started up the VIP or the VIP is assigned to the other vServer in the HA group then it is impossible to bind to that interface.

Linux has the ability to allow binds to non-local IP addresses using the system configuration net.ipv4.ip_nonlocal_bind.  By setting this variable to 1 it allows the OTD instance to startup even although the IP address is not currently local to the running process.  To set this up simply edit the /etc/sysctl.conf file and add this with the value of 1.




# tail /etc/sysctl.conf

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 250000
vm.min_free_kbytes = 524288

# Additional entry to allow non-local binds so that we only listen to the VIP.

net.ipv4.ip_nonlocal_bind=1 
#
# sysctl -p
#

Once set then issue the sysctl -p command to refresh the configuration and we can startup the OTD instance.

You can check the currently running value in the /proc system files.

# cat /proc/sys/net/ipv4/ip_nonlocal_bind
1

#



Tuesday, 18 June 2013

Integrating Enterprise Manager 12c with Exalogic

Overview

This posting provides an overview of how Enterprise Manager 12c has been integrated with Exalogic.  It will then dive into the installation process, providing an overview of the activities that will compliment the documentation.  (For integrating with Exalogic 2.0.6.0.n read this posting in conjunction with the details shown here.)

The versions used in this example are:-
  • Virtualised Exalogic - 2.0.4.0.2
  • Enterprise Manager (EM12c) - 12.1.0.2.0
Oracle Enterprise Manager is a powerful tool that can be used to manage a large enterprise compute facility, looking after both the hardware and the software running. (apps to disk management)  As a model the normal operation is a central managed service or Oracle Management Service (OMS) that consists of an application hosted in WebLogic server with an underlying database to hold configuration and state information.  It communicates with the services it manages via an agent that is deployed onto each operating system.  Via the use of plugins and extensions it has specific knowledge of each environment and hence can present an appropriate monitoring and management screen.

For Exalogic there are several plugins that enable it to create a powerful view onto the rack and monitor it from apps to disk.  These plugins include:
  1. ZFS Storage - A specific plugin allows EM12c to communicate with the ZFS appliance within the Exalogic rack to monitor the status of the storage.
  2. Virtualisation - A plugin allows communication with the Oracle Virtual Machine Manager system used in Exalogic to provide details of how the virtual infrastructure is deployed and a view onto each virtual machine (vServer) created.
  3. Exalogic Elastic Cloud/Fusion Middleware - This plugin links in with the Exalogic Control infrastruture and gives information on the state of the physical environment.  It also links into agents deployed onto the vServers and provides a central view on the middleware software that can be deployed onto Exalogic.  (Built in understanding of Weblogic domains, applications deployed, Oracle Traffic Director installations and Coherence clusters.)
  4. Engineered Systems Healthchecks - A plugin that integrates with the exachk scripts to highlight any configuration inconsistencies.
The diagram below depicts a deployment topology for EM12c to monitor Exalogic.   There are more complex options available to make EM12c highly available and to manage firewalls and proxying of communications.  This blog posting is only really considering a basic installation for managing Exalogic.

OMS Deployment to monitor and manage an Exalogic rack
 There are plenty of alternate network configurations and deployment options that could be considered, the key thing is that the OMS server should have a network path to both the Exalogic Control vServers (OVMM & EMOC) and to the client created vServers that will be running the applications.

For example, in a purely test setup we have in the lab we actually run the OMS and OMS repository in a vServer on the Exalogic rack and make use of the IPoIB-virt-admin to give the OMS server suitable access to all the vServers on the rack.  This is great for test and demonstration purposes but in a large enterprise it is likely that the Enterprise Manager configuration will sit externally to the Exalogic.

This posting assumes that you already have an instance of Enterprise Manager 12c operational in your environment.  Details on the installation process can be found in the documentation.   This posting will continue to consider all the steps involved in configuring EM12c to monitor the Exalogic rack.

The installation documentation can be found here :-

EM12c Exalogic Configuration

As an overview the process is:-
  1. Get the correct versions of the software (plugins & EM12c) installed
  2. Deploy agents onto the OVMM & EMOC vservers in the Exalogic Control stack
  3. Deploy the ZFS Storage appliance plugin to monitor the storage
  4. Deploy the Exalogic Elastic Cloud plugin to get the Exalogic monitored.
  5. Deploy the Oracle Virtualization plugin to monitor the OVMM environment
  6. If deploying hosts onto the vServers setup your vServers as needed
  7. Optional - Deploy the Engineered System Healthchecks

Prerequisites

The process of installing/configuring the various components to allow the Exalogic to be monitored in EM12c involves a number of pre-requisites activities.

Ensuring you get the correct plugins

EM12c makes heavy use of plugins. Plugins are managed from the Extensibility menus. (Setup --> Extensibility --> "Self Update" or "Plug-ins")
If you have setup your EM12c instance in a network location that has access to the internet then you can automatically pick up the Oracle plugins from a well known location. Simply click where it says "Online" or "Offline" beside the Connection Mode under the Status in the Self Update page. If you are not able to access the internet then use the Offline mode and on the tab it shows the location for the em_catalog.zip, download this, move it to the OMS server and then Browse/Upload the file or make use of the command line (# emcli import_update_catalog -file <path to zip> -omslocal)
Once uploaded, on the "Plug-ins" page ensure that the following plugins are download and "On Management Server"
  • Oracle Virtualisation (12.1.0.3.0)
    • Note - This is not the most recent version as there is an incompatability with 12.1.0.4.0 and the OVMM instance that runs as part of Exalogic control. If you have 12.1.0.4.0 already deployed then undeploy it from the OMS instance.
  • Exalogic Elastic Cloud Infrastructure (12.1.0.1.0) - Not required for Virtual monitoring as the fusion middleware monitoring incorporates Exalogic.  Necessary for monitoring of a physical Exalogic rack.
  • Oracle Engineered System Healthchecks (12.1.0.3.0)
    • Not necessary for the general system monitoring but allows visibility and control over running exachk, the health checking tool for Exalogic.
  • Sun ZFS Storage Appliance (12.1.0.2.0)

 

Deploying Agents to EMOC & OVMM

For full integration with EM12c it is necessary to have agents deployed to both the OVMM and EMOC vServers. The agent binaries have already been deployed to the control vServers but as EM12c does all the deployment itself it is actually simpler to use the facilities of em12c to deploy into a new directory. As such the following instructions will deploy the agents onto the rack:-
  1. Ensure you have an oracle user and known password on the vServers. (oracle as a user is already present and as root use passwd to change the password to a known value.
  2. Create a directory to host the agent. eg.
    # mkdir -p /opt/oracle/em12c/agent
  3. Make the directory for the agent owned by the oracle user. (Check the group ownership on each vServer, on the OVMM the oracle user is in the dba group while on EMOC it is in the oracle group.)
    # chown -R oracle:oracle /opt/oracle
  4. If the vServers are not setup for DNS then ensure that the fully qualified hostname for the OMS server is included in the /etc/hosts file.
  5. Add the Exalogic info file to the template.
    On the hypervisor (OVS) nodes of the Exalogic rack is an identifier file that specifies the rack identifier.  The file is /var/exalogic/info/em-context.info. In the template create an equivalent directory structure and copy the em-context.info file into this directory.
  6. Make a symbolic link from the sshd file in /etc/pam.d to a file called emagent.  (Allows actions to be perfomed on the vServer using credentials managed in LDAP. - See MOS note How to Configure the Enterprise Management Agent Host Credentials for PAM and LDAP (Doc ID 422073.1) for more detail)
    # cd /etc/pam.d
  7. # ln -s sshd emagent

  8. Make the necessary changes to the sudoers configuration file (/etc/sudoers)
    1. Change Defaults !visiblepw to Defaults visiblepw
    2. Change Defaults requiretty to Defaults !requiretty
    3. Add the sudo permissions for the oracle user as shown below
      oracle ALL=(root) /usr/bin/id,/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh, /*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh,/*/*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh,/*/*/*/*/ADATMP_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[AP]M/agentdeployroot.sh
    4. Now deploy the agent onto the vServer from the Enterprise Manager console.
      Setup --> Add Targets --> Add Target Manually & select the Add Host.... option and follow the wizard.
     

Setup to monitor the ZFS appliance

The ZFS appliance is monitored via an agent deployed to a device that has network access to the appliance, on an Exalogic the recommendation is to use the EM12c agent deployed to the Exalogic Control EMOC vServer.

Before setting up the monitoring in EM12c we have to run through a "workflow" on the ZFS storage appliance itself that will setup a user with appropriate permissions to monitor the appliance.   The agent hosting the ZFS Storage plugin will then communicate with the ZFS appliance as this user to gather details on the current operation.

To achieve this log onto the ZFS BUI as the root user and navigate to "Maintenance" --> "Workflows" Then run the workflow called "Configuring for Oracle Enterprise Manager" which will create the user and appropriate worksheet to allow the monitoring of the device.

Enabling the ZFS Storage for EM12c monitoring

This activity to create the user must be repeated on the second/standby storage head although it is not necessary to recreate the worksheet on the second head.

Once complete then on the Plugin's management page (Setup --> Extensibilty -->Plug-ins) deploy the ZFS Storage Appliance plugin to the OMS instance and then to the EMOC agent. You can then configure up the ZFS target, this is done via the Setup menu.
  • Setup --> Add Target --> Add Target Manually
  • Select "Add Non-Host Targets by Specifying Target Monitoring Properties"
    • Select the Target Type of Sun ZFS Storage Appliance & select the EMOC monitoring agent.
    • In the wizard give the name you would like the appliance to appear as in the EM12c interface and supply the credentials and IP address for the device. (Use the IB storage network, not the public 1GbE network IP.)
Once this completes it is possible to select the target for the storage appliance and view details on the shares created and the current usage of the device.

Monitoring of ZFS Storage Appliance

Deploying the Exalogic Infrastructure Plugin

There are a couple of steps to getting the environment setup for the Exalogic infrastructure plugin to operate correctly.
  1. Sort out the certificates so that the agent can communicate with the Ops Centre infrastructure of Exalogic Control
  2. Deploy/configure the plugin.

Managing the EMOC certificates

The first step is to ensure that the EM12c agent can communicate with the Ops Centre instance which is only available over a secure communications protocol. Because it uses a self-signed certificate it is necessary to include this certificate in the trust store of the agent.
  1. Export the certificate from the Ops Centre keystore. This is the keystore that is in the OEM installation on the ec1-vm vServers. (/etc/opt/sun/cacao2/instances/oem-ec/security/jsse) It is possible to use the JDK tools to extract the certificate.

    # cd /etc/opt/sun/cacao2/instances/oem-ec/security/jsse
    # /opt/oracle/em12c/agent/core/12.1.0.2.0/jdk/bin/keytool -export -alias cacao_agent -file oc.crt -keystore truststore -storepass trustpass

    Note 1 - The default password for the EMOC truststore is "trustpass".  Others have mentioned that the password was "welcome".  If trustpass does not work try out welcome.
    Note 2 - We explicitly use the keytool version that is shipped with the Oracle EM12c Agent (Java 1.6). The default version of java on the Exalogic Control vServer is java 1.4 and running the 1.4 version of keytool against the truststore will result in the following error:-

    # keytool -list keystore truststore
    Enter key store password: trustpass
    keytool error: gnu.javax.javax.crypto.keyring.MalformedKeyringException: incorrect magic
  2. Import the certificate you just exported into the agent's trust store. Ensure you import into the correct AgentTrust.jks file, specifically the one for the agent instance you are using and not (as the docs currently state) the copy in the agent binaries.
    # cd /opt/oracle/em12c/agent/agent_inst/sysman/config/montrust
    # /opt/oracle//em12c/agent/core/12.1.0.2.0/jdk/bin/keytool -import -keystore ./AgentTrust.jks -alias wlscertgencab -file /etc/opt/sun/cacao2/instances/oem-ec/security/jsse/oc.crt

Deploying the Exa Infrastructure Plugin

There are a number of steps to getting the Exalogic Infrastructure plugin to monitor the rack.
  1. Deploy the Exalogic Elastic Cloud Infrastructure to the OMS server. (Setup --> Extensibility --> Plug-ins, select the Exalogic Elastic Cloud Infrastructure and from the actions pick "Deploy on >" & "Management Servers" )
  2. Deploy the plugin to the Ops Center (EMOC) vServer. Once the plugin has been deployed successfully to the OMS instance then the same options as above but select to "Deploy on >" & "Management Agent..." and select the EMOC host agent.
  3. Now we want to run the Exalogic wizard to add the targets for the Exalogic rack itself. This is done via the Setup --> Add Target --> Add Targets Manually options. Then select "Add Non-Host Targets Using Guided Process (Also Adds Related Targets)", pick the Exalogic Elastic Cloud and click on "Add Using Guided Discovery" which will show the wizard as pictured below.

Discovery of Exalogic Elastic Cloud


This wizard appears to finish quickly and it is then possible to select the Exalogic from the Targets menu, however the system will be initialising and synchronising in the background so it takes a few minutes to get the full rack discovered. Once present the screenshots below show the monitoring of the hardware with a general picture for the rack and a couple of shots to show the Infiniband Network monitoring.


Exalogic Monitoring - Hardware view


Monitoring the Infiniband Fabric



Monitoring an Infiniband Switch

Deploying the OVMM Monitoring

The first thing to ensure is that the plugin version that is installed is the 12.1.0.3.0 version of Oracle Virtualization. The steps are similar to the steps for the Exalogic Infrastructure Plugin.

However prior to doing the deployment to Enterprise Manager the OVMM server should be setup to be read-only for the EM12c monitoring agent to use.  Follow these steps on the OVMM server to setup a user as read only.

Login to Oracle VM Manager vServer as oracle user, and then perform the commands in the sequence below.
  1. cd /u01/app/oracle/ovm-manager-3/ovm_shell
  2. sh ovm_shell.sh --url=tcp://localhost:54321 --username=admin --password=<ovmm admin user password>
  3. ovm = OvmClient.getOvmManager ()
  4. f = ovm.getFoundryContext ()
  5. j = ovm.createJob ( 'Setting EXALOGIC_ID' );
    The EXALOGIC_ID can be found in the em-context.info on dom0 located in the following file path location:
    /var/exalogic/info/em-context.info
    You must log in to dom0 as a root user to obtain this file. For example, if the em-context.info file content is ExalogicID=Oracle Exalogic X2-2 AK00018758, then the EXALOGIC_ID will be AK00018758.
  6. j.begin ();
  7. f.setAsset ( "EXALOGIC_ID", "<Exalogic ID for the Rack>");
  8. j.commit ();
  9. Ctrl/d

Now deploy the OVMM virtualisation plugins to the OMS server:-

  1. Deploy the Oracle Virtualization plugin to the OMS server
  2. Deploy the Oracle Virtualization plugin to the agent running on the OVMM server.
  3. Run the add target wizard for the Oracle VM Manager.
    1. Setup --> Add Target --> Add Target Manually
    2. Select the "Add Non-Host Targets by Specifying Target Monitoring Properties" & Chose the target type of "Oracle VM Manager" and the monitoring agent for the OVMM server host.
    3. Enter the details on the wizard page (example shown below)
    4. Submit the job, wait a few minutes to allow the discovery to progress and then you can view the Target under Systems or all targets.

    Running the discovery wizard for the Exalogic Virtualised Infrastructure

Deploying the Engineered System Healthchecks

Both Exalogic and Exadata have a healthcheck script that can be run - exachk. On Exalogic the script can be downloaded from My Oracle Support and when run against an Exalogic rack it will check the configuration of the rack. The running of exachk will create output files that detail any issues found with the rack.  To integrate with Enterprise Manager it is necessary to change the behaviour of exachk to output files in an XML format that can be parsed by the EM12c plugin and presented to the OMS server in a format that it can understand and present on screen.  To modify the behaviour simply set an environment variable prior to running the exachk script - export RAT_COPY_EM_XML_FILES=1. You can also use the RAT_OUTPUT=<output directory> to direct the output to a specific location. (The default behaviour is to put the output into the same directory as the exachk script is run from.

The recommendation for a virtual Exalogic is to run the exachk utility on the EMOC vServer.

To install the plugin simply ensure that the "Oracle Engineered System Healthchecks" plugin is downloaded and installed onto the OMS server and to the agent deployed to the EMOC server.  Then create the target as per the OVMM mechanism. The wizard for the healthcheck simply requests the directory on the server where the output will be read from and the frequency of checking for new versions of the exachk output. (Default is 31 days.)  Then setup the EMOC server to run the exachk on a regular basis.  The output becomes available via the EM12c console and hence can be made available to specific users who may not actually have access to the rack itself.