Friday 10 May 2013

10GbE connections with Exalogic running Solaris 11.1

All the networking has been significantly changed from Solaris 10 to Solaris 11, full details of just how to configure Solaris 11 networking can be found in the documentation.  However this is a short "how to" posting about how to setup a 10GbE client connection on a Exalogic rack running Solaris.

Infiniband Switch Configuration

Firstly there are some changes needed to the Infiniband switch.  Namely it is necessary to run the command allowhostconfig this is because with Solaris 11 some of the VNIC configuration and setting up of VLANs is done on the compute node and pushed out to the infiniband switches.  Running allowhostconfig means that the switch is set to enable this.  Then create a VLAN with ID -1 on each of the connectors to the 10GbE network.  Repeat the process to create the VLAN you want to use.  In our example this is VLAN 1706.  Finally create the VNICs on the IB switch as described in the Exalogic documentation, ensure that the VNICs created are for the VLAN 0.

#
# allowhostconfig
# createvlan 1A-ETH-1 -vlan -1 -pkey ffff
# showvlan
  Connector/LAG  VLN   PKEY
  -------------  ---   ----
  1A-ETH-1        0    ffff
  1A-ETH-1        1706 ffff
# createvnic ......
#

Go through the process for identifying the GUIDs etc. on the host and defining a MAC address to create the vnic.  (Use ibstat on the host to identify the Infiniband GUID.)

Once all the VNICs for each compute node are created on the switch it will look something like this:-


# showvnics
ID  STATE     FLG IOA_GUID                NODE                             IID  MAC               VLN PKEY   GW
--- --------  --- ----------------------- -------------------------------- ---- ----------------- --- ----   --------
  1 WAIT-IOA    N 0021280001A18A10         EL-C  192.168.14.125            0000 A0:8A:10:50:00:25 NO  ffff   1A-ETH-1
  7 WAIT-IOA    N 0021280001CED42F         EL-C  192.168.14.123            0000 A0:D4:2F:50:00:23 NO  ffff   1A-ETH-1
 10 WAIT-IOA    N 0021280001CEC533         EL-C  192.168.14.119            0000 A0:C5:33:50:00:19 NO  ffff   1A-ETH-1
  0 WAIT-IOA    N 0021280001CEC644         EL-C  192.168.14.124            0000 A0:C6:44:50:00:24 NO  ffff   1A-ETH-1
  2 WAIT-IOA    N 0021280001CED348         EL-C  192.168.14.128            0000 A0:D3:48:50:00:28 NO  ffff   1A-ETH-1

  3 WAIT-IOA    N 0021280001CED44C         EL-C  192.168.14.127            0000 A0:D4:4C:50:00:27 NO  ffff   1A-ETH-1

 11 WAIT-IOA    N 0021280001CED45B         EL-C  192.168.14.120            0000 A0:D4:5B:50:00:20 NO  ffff   1A-ETH-1
  6 WAIT-IOA    N 0021280001CED368         EL-C  192.168.14.126            0000 A0:D3:68:50:00:26 NO  ffff   1A-ETH-1
  9 WAIT-IOA    N 0021280001CED373         EL-C  192.168.14.122            0000 A0:D3:73:50:00:22 NO  ffff   1A-ETH-1
  5 UP          N 0021280001CED37C         EL-C  192.168.14.129            0040 A0:D3:7C:50:00:29 NO  ffff   1A-ETH-1
  4 UP          N 0021280001CED384         EL-C  192.168.14.130            0040 A0:D3:84:50:00:30 NO  ffff   1A-ETH-1
 12 WAIT-IOA    N 0021280001CEC99B         EL-C  192.168.14.117            0000 A0:C9:9B:50:00:17 NO  ffff   1A-ETH-1
  8 WAIT-IOA    N 0021280001CEC6A7         EL-C  192.168.14.121            0000 A0:C6:A7:50:00:21 NO  ffff   1A-ETH-1
 13 WAIT-IOA    N 0021280001CED3EF         EL-C  192.168.14.118            0000 A0:D3:EF:50:00:18 NO  ffff   1A-ETH-1

ie. Expect to see them in a WAIT-IOA mode, they will not change to UP until the corresponding create-ip is run against the links on the solaris hosts.

Exalogic Solaris Host Configuration


The Exalogic I used to discover this information was setup for manual network configuration which is how I would expect it to always be but you can check the setup using the netadm list command.  Assuming manual configuration the first step is to remove any of the hostname files in /etc that relate to the bond number you wish to create.  (The bond1 hostname files were created during the running of the ECU for the 2.0.4.0.0 Exalogic release.)  Then reboot the node.


# rm /etc/hostname.bond1
# rm /etc/hostname.eoib0
# rm /etc/hostname.eoib1
# reboot

Now we need to use the dladm and ipadm commands which allow manipulation of the Solaris networking.
dladm - Data Link administration, ipadm - IP administration.

Firstly identify the data links that relate to the VNICs you created on the Infiniband switch, this is done with the dladm show-phys command.  These are the links that are on the devices identified by eoibn.  In the output below they map to the links net8 and net9.  (On most of the nodes these were net7 & net8)  You can also use the -m option to display the MAC addresses, these will correspond with the MAC addresses used on each Infiniband switch for that VNIC.


# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net6              Infiniband           up         32000  unknown   ibp1
net0              Ethernet             up         1000   full      igb0
net1              Ethernet             unknown    0      unknown   igb1
net3              Ethernet             unknown    0      unknown   igb3
net4              Ethernet             up         10     full      usbecm0
net8              Ethernet             up         10000  full      eoib1
net2              Ethernet             unknown    0      unknown   igb2
net5              Infiniband           up         32000  unknown   ibp0
net9              Ethernet             up         10000  full      eoib0
root@el2bcn01:~# dladm show-phys -m
LINK                SLOT     ADDRESS            INUSE CLIENT
net6                primary  unknown            no   --
net0                primary  0:21:28:d7:e9:44   yes  net0
net1                primary  0:21:28:d7:e9:45   no   --
net3                primary  0:21:28:d7:e9:47   no   --
net4                primary  2:21:28:57:47:17   yes  usbecm0
net8                primary  a0:c5:80:50:0:1    no   --
net2                primary  0:21:28:d7:e9:46   no   --
net5                primary  unknown            no   --
net9                primary  a0:c5:7f:50:0:1    no   --

Now use IP admin to create the IP interfaces for the two links you have identified and the ipmp group.  The link name maps to the names you identified from the dladm show-phys command.

# ipadm create-ipmp bond1
# ipadm create-ip net8
# ipadm create-ip net9

Now create the VNICs and in our case they are to use a tagged VLAN, mapping onto the links identified earlier.  The create VNIC command links together the physical link, a VLAN and gives the interface a name.   The dladm show-vnic command displays what VNICS have been created.

Now create the two interfaces for thes VNICS using the ipadm create-ip command again and then set the properties of one of the interfaces to make it standby  for the bonded interface.  (In Solaris speak ipmp ~= linux bond.)  Then add the two interfaces to the ipmp bond we created earlier.

# dladm create-vnic -l net8 -v 1706 eoib0_1706
# dladm create-vnic -l net9 -v 1706 eoib1_1706
# dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
eoib1_1706          net8         10000  2:8:20:3:df:5f    random            1706
eoib0_1706          net9         10000  2:8:20:c9:8b:b2   random            1706
# ipadm create-ip eoib0_1706

# ipadm create-ip eoib1_1706
# ipadm set-ifprop -p standby=on -m ip eoib1_1706
# ipadm add-ipmp -i eoib0_1706 -i eoib1_1706 bond1

Then give the bonded interface an IP address.


# ipadm create-addr –T static –a local=10.100.44.68/22 bond1/v4
# ipadm show-if
IFNAME     CLASS    STATE    ACTIVE OVER
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net4       ip       ok       yes    --
net8       ip       down     no     --
net9       ip       down     no     --
bond0_0    ip       ok       yes    --
bond0_1    ip       ok       no     --
bond0      ipmp     ok       yes    bond0_0 bond0_1
bond1      ipmp     ok       yes    eoib1_1706 eoib0_1706
eoib1_1706 ip       ok       no     --
eoib0_1706 ip       ok       yes    --


# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
net0/v4           static   ok           138.3.2.87/21
net4/v4           static   ok           169.254.182.77/24
bond0/v4          static   ok           192.168.14.101/24
bond1/v4          static   ok           138.3.48.35/22
bond1/v4a         static   ok           138.3.51.1/22
lo0/v6            static   ok           ::1/128
net0/v6           addrconf ok           fe80::221:28ff:fed7:e944/10

Now your Solaris 10GbE network connection using a tagged VLAN should be up and running.  Looking at the VNICs on the Infiniband switch we can see that an additional VNIC now appears, the MAC address matching onto the MAC address of the underlying interface of bond1.  You can further check the Infiniband IOA_GUID against the host channel adapter in Solaris by either using the dladm show-ib command or ibstat to output the GUIDs.

# showvnics | grep -e STATE -e ----- -e 101
ID  STATE     FLG IOA_GUID                NODE                             IID  MAC               VLN PKEY   GW
--- --------  --- ----------------------- -------------------------------- ---- ----------------- --- ----   --------
 18 UP          N 0021280001CEC57F         EL-C  192.168.14.101            0000 A0:C5:7F:50:00:01 NO  ffff   1A-ETH-1
 35 UP          H 0021280001CEC57F         EL-C  192.168.14.101            8001 02:08:20:C9:8B:B2 1706 ffff   1A-ETH-1


& on the compute node

# ifconfig eoib0_1706
eoib0_1706: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 12
        inet 0.0.0.0 netmask ff000000
        groupname bond1
        ether 2:8:20:c9:8b:b2
root@el2bcn01:~# ifconfig eoib1_1706
eoib1_1706: flags=261000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,INACTIVE,CoS> mtu 1500 index 11
        inet 0.0.0.0 netmask ff000000
        groupname bond1
        ether 2:8:20:3:df:5f


Note - Under Solaris 11 the VNICs we create are dynamic and appear on the infiniband switches when the host OS starts up the VNIC.  Because we configure things with an ipmp group the VNIC is only reported on the actively used switch.  Unlike a physical linux environment these do not appear in the /conf/bx.conf file on the switch.

So in summary the commands to use on the Solaris 11.1 compute node are:-

# ipadm create-ipmp <BOND NAME>
# ipadm create-ip <link name of eoib0>
# ipadm create-ip <link name of eoib1>
# dladm create-vnic -l
<link name of eoib0> -v <VLAN ID> <IF 1 NAME>
# dladm create-vnic -l
<link name of eoib1> -v <VLAN ID> <IF 2 NAME>
# ipadm create-ip <IF 1 NAME>
# ipadm create-ip <IF 2 NAME>
# ipadm set-ifprop -p standby=on -m ip <IF 2 NAME>
# ipadm add-ipmp -i <IF 1 NAME> -i <IF 2 NAME> <BOND NAME>
# ipadm create-addr –T static –a local=<IPv4 ADDRESS>/<Netmask in CIDR format> <BOND NAME>/v4
 

Many thanks to Steve Hall for working all this out!

2 comments:

  1. you can't even disable ipv6 in solaris 11!
    silly Oracle engineers hard at work ;-)

    ReplyDelete
  2. who signed off on the idea of ipadm , dladm instead of using ifconfig? that engineer should be fired.Oracle sun solaris 11.1 is so over-engineered,it's a completely useless operating system..it's comical.
    they've screwed the Oracle enterprise manager with 12c cloud control, and now this!
    good job Larry Ellison! keep doing the America's Cup races.you're completely out of touch.
    Viva la Sybase/SAP , AIX , HPUX, Linux.

    ReplyDelete