Introduction
Until recently running solaris zones using the older Exalogic release (Solaris 11 Express) was quite possible but there was a significant limitation. Namely for a supported configuration the zone had to be located on the local SSD drive of the Exalogic compute node. Because of the limited size of these disks there was effectively a limit to the number/sizes of zones that could be created on each compute node. With the recent release of Exalogic support for Solaris 11 and some further development from the Engineering teams it is now possible to run Solaris Zones on the ZFS appliance making use of the iscsi protocol.Prerequisites
In order to get this working on an Exalogic you should image the rack to Solaris (Exalogic 2.0.4.0.0) and upgrade the rack to the latest patch set (April 2013 PSU - My Oracle Support ID=1545364.1) and also apply a specific patch (My Oracle Support ID=16514816) for "Zones on Shared Storage (ZOSS) over ISCSI".Creating the LUNs on the ZFS Appliance
The first activity is to create the various iscsi groups and initiators on the ZFS appliance so that the LUNs that will host the zones can be created. This is a fairly simple process that involves setting up a SAN (Storage Area Network) with iscsi targets and initiators which can be linked to the LUN storage that is made available to the compute nodes.We will start with some terminology explanations of what the various components we need to setup actually are:-
Term | Description |
---|---|
Logical Unit | A term used to describe a component in a storage system. Uniquely numbered, this creates what is referred to as a Logicial Unit Number, or LUN. The ZFS Appliance may contain many LUNS. These LUNs, when associated with one or more SCSI targets, forms a unique SCSI device, a device that can be accessed by one or more SCSI initiators. |
Target | A target is an end-point that provides a service of processing SCSI commands and I/O requests from an initiator. A target, once configured, consists of zero or more logical units. |
Target Group | A set of targets. LUNs are exported over all the targets in one specific target group. |
Initiator | An application or production system end-point that is capable of initiating a SCSI session, sending SCSI commands and I/O requests. Initiators are also identified by unique addressing methods. |
Initiator Group | A set of initiators. When an initiator group is associated with a LUN, only initiators from that group may access the LUN. |
1. Create iSCSI Targets
To set things up on the ZFS appliance navigate to Configuration-->SAN & select the iSCSI Targets. then click on the + sign beside the iSCSI Targets title to add a target. Having added a target it is possible to drag and drop the target to the right of the screen, into an iSCSI Target Group. Either adding to an existing group or creating a new group. (To drag and drop you need to hover the mouse over the target then a crossed pair of arrows appears, click on this to pick up the target and drag it over to the groups.)Setting up iSCSI Targets on the ZFS Storage Appliance BUI |
2. Setup iSCSI Initiators
The setup for the iSCSI initiators and groups is similar in nature to the setup of the targets. i.e. You click on the + symbol for the iSCSI Initiators, fill in the details then drag and drop the initiator over to the initiator group to either create a new group or add it to an existing one. The only significant complication is that the creation of an iSCSI Initiator involves specifying an Initiator IQN. This is a unique reference number that relates to a specific host. (The compute node that will mount a LUN.) To find this number is is necessary to log onto each compute node in the Exalogic rack and run the iscsiadm list initiator-node command.# iscsiadm list initiator-node Initiator node name: iqn.1986-03.com.sun:01:e00000000000.51891a8b Initiator node alias: el01cn01 Login Parameters (Default/Configured): Header Digest: NONE/- Data Digest: NONE/- Max Connections: 65535/- Authentication Type: NONE RADIUS Server: NONE RADIUS Access: disabled Tunable Parameters (Default/Configured): Session Login Response Time: 60/- Maximum Connection Retry Time: 180/- Login Retry Time Interval: 60/- Configured Sessions: 1 |
So in the example above the Initiator IQN is:-
iqn.1986-03.com.sun:01:e00000000000.51891a8b
This is reflected in the ZFS BUI as shown for the first compute node in the list on the left.
ZFS Appliance iSCSI Initiators added and included in a group. |
3. Create Storage Project and LUNS
The final step on the storage server side of things is to create your project & LUNs. The process to create the project & shares (LUNS in this case) is similar to the process for creating filesystems for use via NFS, as described in an earlier blog posting. In this case rather than creating a Filesystem though you create a LUN.Creating a LUN on the ZFS Storage Appliance |
The LUN will now be available to be mounted on any of the compute nodes that are part of the Initiator Group.
Creating the Solaris Zone on the Shared Storage
We now have the storage prepared so that it can be mounted on the compute nodes, our intention is to store the zone on the shared storage and setup an additional bonded network on the 10GbE Exalogic client network that the zone will have exclusive access to.1. Ensure the disk (LUN) is visible to the node and ready for use.
The first step we need to take is to ensure that the LUN that was created on the storage device is available to the compute node and that the disk is formatted ready for usage. Prior to checking for the disk it may be necessary to run the iscsiadm commands to setup the shared storage as a supplier of LUNs. This should only need to be run once on each compute node but we have found that when all zones are removed from a node it is necessary to re-run this discovery-address command to make the LUNS visible.# iscsiadm add discovery-address <IP of ZFSSA> # iscsiadm modify discovery -t enable # devfsadm -c iscsi # echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t600144F09C96CCA90000518CDEB10005d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdeb10005 1. c0t600144F09C96CCA90000518CDF100006d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdf100006 2. c0t600144F09C96CCA90000518CDFB60007d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdfb60007 3. c0t600144F09C96CCA900005190BFC4000Ad0 <SUN-ZFS Storage 7320-1.0 cyl 8352 alt 2 hd 255 sec 63> /scsi_vhci/disk@g600144f09c96cca900005190bfc4000a 4. c7t0d0 <LSI-MR9261-8i-2.12-28.87GB> /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0 Specify disk (enter its number): Specify disk (enter its number): |
Identifying the LUN on the compute node
The format command can pick out the LUN which is presented to the Compute Node as a local disk. The value after the line /scsi_vhci/disk@g maps onto the GUID of the LUN that was created. This identifies that it is the disk c0t600144F09C96CCA900005190BFC4000Ad0 that is to be formatted and labelled.
# format -e c0t600144F09C96CCA900005190BFC4000Ad0 selecting c0t600144F09C96CCA900005190BFC4000Ad0 [disk formatted] FORMAT MENU: ... format> fdisk No fdisk table exists. The default partition for the disk is: a 100% "SOLARIS System" partition Type "y" to accept the default partition, otherwise type "n" to edit the partition table. n SELECT ONE OF THE FOLLOWING: ... Enter Selection: 1 Select the partition type to create: 1=SOLARIS2 2=UNIX 3=PCIXOS 4=Other 5=DOS12 6=DOS16 7=DOSEXT 8=DOSBIG 9=DOS16LBA A=x86 Boot B=Diagnostic C=FAT32 D=FAT32LBA E=DOSEXTLBA F=EFI (Protective) G=EFI_SYS 0=Exit? f SELECT ONE... ... 6 format> label [0] SMI Label [1] EFI Label Specify Label type[1]: 1 Ready to label disk, continue? y format> quit |
We can now see that the format command shows the disk as available and now sized as per the LUN size.
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t600144F09C96CCA90000518CDEB10005d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdeb10005 1. c0t600144F09C96CCA90000518CDF100006d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdf100006 2. c0t600144F09C96CCA90000518CDFB60007d0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca90000518cdfb60007 3. c0t600144F09C96CCA900005190BFC4000Ad0 <SUN-ZFS Storage 7320-1.0-64.00GB> /scsi_vhci/disk@g600144f09c96cca900005190bfc4000a 4. c7t0d0 <LSI-MR9261-8i-2.12-28.87GB> /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0 Specify disk (enter its number): |
2. Setup the Networking for Client access. (10GbE network.)
The zone that is being setup will be given access to an exclusive IP network, what this means is that we need to create the appropriate VNICs on the global zone and hand control for these VNICs over to the zone to manage. An earlier blog posting discusses setting up the 10GbE network for Solaris running on an Exalogic and this will build on that knowledge.All we need to perform on the global zone is the creation of the VNICs, to do this firstly identify the physical links that relate to the Ethernet over Infiniband devices that the switches present to the Infiniband Host Channel Adapter and hence as devices to the OS. Then using the two links (one for each physical port) to create the VNICs.
# dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE net6 Infiniband up 32000 unknown ibp1 net0 Ethernet up 1000 full igb0 net1 Ethernet unknown 0 unknown igb1 net3 Ethernet unknown 0 unknown igb3 net4 Ethernet up 10 full usbecm0 net8 Ethernet up 10000 full eoib1 net2 Ethernet unknown 0 unknown igb2 net5 Infiniband up 32000 unknown ibp0 net9 Ethernet up 10000 full eoib0 One on each link in this case net8 and net9 from above # dladm create-vnic -l net8 -v 1706 vnic2_1706 # dladm create-vnic -l net9 -v 1706 vnic3_1706 |
3. Create the Zone
We now have the prerequisites necessary to create our zone. (A fairly simple example.) Namely, the storage available via iSCSI and the VNICs we will hand in to the zone to use.# zonecfg -z zone04 Use 'create' to begin configuring a new zone. zonecfg:zone04 create create: Using system default template 'SYSdefault' zonecfg:zone04> set zonepath=/zones/zone04 zonecfg:zone04> add rootzpool zonecfg:zone04:rootzpool> add storage iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a zonecfg:zone04:rootzpool> end zonecfg:zone04> remove anet zonecfg:zone04> add net zonecfg:zone04:net> set physical=vnic2_1706 zonecfg:zone04:net> end zonecfg:zone04> add net zonecfg:zone04:net> set physical=vnic3_1706 zonecfg:zone04:net> end zonecfg:zone04> verify zonecfg:zone04> commit zonecfg:zone04> info zonename: zone04 zonepath: /zones/zone04 brand: solaris autoboot: false bootargs: file-mac-profile: pool: limitpriv: scheduling-class: ip-type: exclusive hostid: fs-allowed: net: address not specified allowed-address not specified configure-allowed-address: true physical: vnic2_1706 defrouter not specified net: address not specified allowed-address not specified configure-allowed-address: true physical: vnic3_1706 defrouter not specified rootzpool: storage: iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a zonecfg:zone04> |
During this configuration process we use the default zone creation template which includes a network for the net0 link (1GbE management) which we do not need in our zone so we remove this as part of the configuration. The storage is defined using the URL for the LUN, this includes the LUN GUID prefixed by iscsi://<IP Address of the Shared Storage>/luname.naa.
The next step is to install the zone and boot it up. Before attempting to do this ensure that you have a valid repository for the Solaris installation setup on the global zone. The zone creation will use this repository to lay down the OS files for the zone.
# zoneadm -z zone04 install Configured zone storage resource(s) from: iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a Created zone zpool: zone04_rpool Progress being logged to /var/log/zones/zoneadm.20130513T104657Z.zone04.install Image: Preparing at /zones/zone04/root. AI Manifest: /tmp/manifest.xml.lPaGVo SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml Zonename: zone04 Installation: Starting ... Creating IPS image Startup linked: 1/1 done Installing packages from: exa-family origin: http://localhost:1008/exa-family/acbd22da328c302a86fb9f23d43f5d10f13cf5a6/ solaris origin: http://install1/release/solaris/ DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 185/185 34345/34345 229.7/229.7 10.6M/s PHASE ITEMS Installing new actions 48269/48269 Updating package state database Done Updating image state Done Creating fast lookup database Done Installation: Succeeded Note: Man pages can be obtained by installing pkg:/system/manual done. Done: Installation completed in 81.509 seconds. Next Steps: Boot the zone, then log into the zone console (zlogin -C) to complete the configuration process. Log saved in non-global zone as /zones/zone04/root/var/log/zones/zoneadm.20130513T104657Z.zone04.install # zoneadm -z zone04 boot |
The zone should boot up very quickly then you can zlogin to the zone to setup the networking. This will involve using the VNICs given to the zone for exclusive control to create interfaces, bond them together using the Solaris ipmp functionality and allocate an IP address. We found that we also had to setup the routing table to give a default route.
# zlogin zone04 [Connected to zone 'zone04' pts/7] Oracle Corporation SunOS 5.11 11.1 December 2012 root@zone04:~# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID vnic2_1706 ? 10000 2:8:20:f5:83:fa random 1706 vnic3_1706 ? 10000 2:8:20:fa:ab:98 random 1706 root@zone04:~# ipadm create-ip vnic2_1706 root@zone04:~# ipadm create-ip vnic3_1706 root@zone04:~# ipadm create-ipmp bond1 root@zone04:~# ipadm add-ipmp -i vnic2_1706 -i vnic3_1706 bond1 root@zone04:~# ipadm set-ifprop -p standby=on -m ip vnic3_1706 root@zone04:~# ipadm show-if IFNAME CLASS STATE ACTIVE OVER lo0 loopback ok yes -- vnic2_1706 ip ok yes -- vnic3_1706 ip ok no -- bond1 ipmp down no vnic2_1706 vnic3_1706 root@zone04:~# ipadm create-addr -T static -a local=138.3.51.2/22 bond1/v4 root@zone04:~# ipadm show-if IFNAME CLASS STATE ACTIVE OVER lo0 loopback ok yes -- vnic2_1706 ip ok yes -- vnic3_1706 ip ok no -- bond1 ipmp ok yes vnic2_1706 vnic3_1706 root@zone04:~# ipadm show-addr ADDROBJ TYPE STATE ADDR lo0/v4 static ok 127.0.0.1/8 bond1/v4 static ok 138.3.51.2/22 lo0/v6 static ok ::1/128 root@zone04:~# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- 127.0.0.1 127.0.0.1 UH 2 0 lo0 138.3.48.0 138.3.51.2 U 2 0 bond1 Routing Table: IPv6 Destination/Mask Gateway Flags Ref Use If --------------------------- --------------------------- ----- --- ------- ----- ::1 ::1 UH 2 0 lo0 root@zone04:~# route -p add default 138.3.48.1 add net default: gateway 138.3.48.1 add persistent net default: gateway 138.3.48.1 root@zone04:~# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 138.3.48.1 UG 1 0 127.0.0.1 127.0.0.1 UH 2 0 lo0 138.3.48.0 138.3.51.2 U 2 0 bond1 Routing Table: IPv6 Destination/Mask Gateway Flags Ref Use If --------------------------- --------------------------- ----- --- ------- ----- ::1 ::1 UH 2 0 lo0 root@zone04:~# |
Migrating the Zone from one host to another.
As a final activity we tried going through the process to see how simple it is to move the zone from one physical host to another. This approach seems to work smoothly and allowed the zone to be moved in a matter of minutes although it did have to be shutdown during the process. (ie. If you are needing 100% service availability then make sure you use a clustered software solution that will enable continuous availability.)Firstly on the compute node that originally hosts the zone shutdown & detatch the zone then export the configuration. We exported it to a filesystem on the ZFS storage that was mounted on both the original and target hosts (/u01/common/general) Alternatively the export could be simply scp'd between the nodes.)
# zoneadm -z zone04 shutdown # zoneadm -z zone04 detach zoneadm: zone 'zone04': warning(s) occured during processing URI: 'iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a' Could not remove one or more iSCSI discovery addresses because logical unit is in use Exported zone zpool: zone04_rpool Unconfigured zone storage resource(s) from: iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a # # mkdir -p /u01/common/general/zone04 # zonecfg -z zone04 export > /common/general/zone04/zone04.cfg |
Then on the new zone host we import the zone from the export created on the original host, attach the zone and boot it up.
# zonecfg -z zone04 -f /common/general/zone04/zone04.cfg # zoneadm -z zone04 attach Configured zone storage resource(s) from: iscsi://192.168.14.133/luname.naa.600144f09c96cca900005190bfc4000a Imported zone zpool: zone04_rpool Progress being logged to /var/log/zones/zoneadm.20130513T135704Z.zone04.attach Installing: Using existing zone boot environment Zone BE root dataset: zone04_rpool/rpool/ROOT/solaris Cache: Using /var/pkg/publisher. Updating non-global zone: Linking to image /. Processing linked: 1/1 done Updating non-global zone: Auditing packages. No updates necessary for this image. Updating non-global zone: Zone updated. Result: Attach Succeeded. Log saved in non-global zone as /zones/zone04/root/var/log/zones/zoneadm.20130513T135704Z.zone04.attach # zoneadm -z zone04 boot |
The only issue that we identified was that the process of detaching and attaching cause the server to boot up with the system configuration wizard running. (Logon to the console to complete the wizard - # zlogin -C zone04) This needs to be completed to allow the zone to boot fully.