Reconfiguring OI/Solaris 11 for full reboot instead of Fast Reboot

Buried deep in the repository is an entry named ‘boot-config.’ This control how your system reboots when you type ‘reboot.’ Sounds simple right? Good.

The svcs(1) reports the service as follows:

jason@heimdall:/home/jason/smf% svcs boot-config
STATE STIME FMRI
online Mar_23 svc:/system/boot-config:default

The rub for me was that my Intel SR2625 (S5520UR) based servers will not reboot properly with the default setting — of course it is set for “fast reboot.” Fast reboot basically allows Solaris to restart in place without resetting the motherboard and starting from scratch. This is both fast and efficient, if it works. The problem for me is, it just didn’t work. The systems would start to boot OI but then spit out some messages about 32bit address space and that’s where the joy stopped.

NOTICE: unsupported 32-bit IO address on pci-pci bridge

32-bit IO address not supported

The work around for me was to tell OpenIndiana how to reboot to avoid this problem. It simply involves flipping a boolean object to true. Below is the diff.


jason@heimdall:/home/jason/smf% gdiff boot-config.dist.smf boot-config.smf
43c43
< <propval name='fastreboot_default' type='boolean' value='true'/>
---
<

This is a one line change. Simply feed that back into the repo and your system will completely restart by resetting the bios and come all the way. If this makes sense to you, feel free to stop reading. The balance of the article goes into the mechanics of how to update the repo.

First, dump the boot-config entry to a file.

root@heimdall:/home/jason/smf# svccfg export boot-config > /tmp/boot-config.smf

Now edit the forementioned line. Then push the update back into the repo as follows:


% svccfg verify /tmp/boot-config.smf
% svccfg import /tmp/boot-config.smf

Voila! You are now set for fast reboot.

Posted in IllumOS, OpenIndiana, OpenSolaris, Nexenta, & Solaris, Intel Server, OpenIndiana | Tagged , , , | Leave a comment

Los Hicimos! We Did It! We broke the bonds of Amazon (AWS, EC2)

Moving out of Amazon is no small feat.

After a couple of months preparation we finally moved out of Amazon Web Services ([AWS], aka. the Roach Motel) on December 12th, 2011. And not a moment too soon. In the end, after a trial run, we successfully moved out of AWS in less than three hours. This was no small feat considering we had to transfer and maintain synchronization of terabytes of data from a source on the other side of the country.

Our new installation is fantastic. It is excellent blend of carrier class technology and commodity hardware. This delicate balance gives us fabulous data handling capabilities while maintaining very low operating costs.

Why is platform so special?

  • It is built on OpenIndiana of course.
  • It leverages ZFS to the max.
  • DDRdrive X1s for logzilla — blazing fast synchronous write performance.
  • Aggregrated gigE links utilizing jumbo frames every where.
  • Each LACP member connects to a separate physical switch within a virtual chassis. If a physical switch chassis fails, the servers connected to that physical chassis continue to operate on the other chassis at gigE speeds.
  • Full tilt on DRAM. Every slot is used with the maximum size.
  • Gobs of 15K RPM SAS disks per data storage device leveraging multiple Sanima-SC/Newisys NDS-2241 storage chassis per server.
  • Since we are on ZFS, we can hot swap the disks to SSDs when suitable enterprise grade devices come available.
  • There is more SSD based cache (L2ARC)per server than the size of the existing data set, so there is plenty of room to grow in read ops.
  • Obviously remote out of band management, KVM, SMASH interface, with all the bells and whistles.
  • Fully redundant power, every where.
  • Should a server fail, the disks owned by that server can be imported on a partner system (thank you SAS!), and a zone booted to continue operating the services provided by the down partner. Genius!
Posted in Uncategorized | Leave a comment

How to PXE Boot Systems on LACP (802.3ad) using Juniper Switches

The real trick here is that Juniper supports an option called ‘force-up.’ Since the PXE images are generally too small and dumb most operating system are unable to leverage LACP during the boot process. Historically this means the switch has to be reconfigured for straight up ethernet switching, then configured back to LACP once the OS is installed. The worst bit is the time lost co-ordinating between the neteng teams and the sys-admin teams in a large organization (such as AOL). This is no longer an issue. I will illustrate how to avoid this problem.

To get started, we will first add some interfaces to a LAG. Assume the first interface on the server happens to be connected to ge-2/0/1, the second interface is on ge-0/0/1, and both live on vlan100 with 9000 byte jumbo frames. Given this, we use the following instructions for the Juniper:


configure
edit interfaces
set ge-0/0/1 ether-options 802.3ad ae2
set ge-2/0/1 ether-options 802.3ad ae2
set ge-2/0/1 ether-options 802.3ad lacp force-up

set ae2 mtu 9014
set ae2 aggregrated-ether-options lacp active
set ae2 aggregated-ether-options periodic fast
set ae2 unit 0 family ethernet-switching vlan members vlan100

As I mentioned before, the real trick here is ‘force-up.’ Force-up tells the switch to ignore the absence of LACP BPDUs and keep the link up. This is quite handy for PXE boot, but runs some risk of problems if the switch holds the LAG member up if the host is misconfigured. Generally, the convenience often out weighs the risk.

Now, your system will PXE boot normally. Below is how to further configure your IllumOS derived system for LACP and jumbo frames. Assuming you have igb interfaces, it looks something like this:


# ifconfig igb0 unplumb
# ifconfig igb1 unplumb
# dladm create-aggr -l igb0 -l igb1 -L passive aggr0
# dladm set-linkprop -p mtu=9000 aggr0
# echo `hostname` > /etc/hostname.aggr0
# svcadm restart network/physical:default

[ technically there is no restart argument for network/physical but it yields the desired action anyway]

Now, let’s verify our handy work…

# dladm show-aggr -L
LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0       igb0         yes          yes  yes  yes  no        no
--          igb1         yes          yes  yes  yes  no        no

Success! Now you are off to the races…

PS – In my example here, I use Juniper EX4200 switches. My ethernet ports from the servers connect to two different physical chassis within a single virtual switch. This assures that even if a physical chassis fails, the server will continue to operate at gigE speeds on the surviving member link.

Give me a shout out if this was helpful to you….

Posted in Juniper, OpenIndiana | Tagged , , , , , , , , , , | 2 Comments

How to configure Time-Slider/autosnap without using the GUI

This is a straight forward process with a twist. The twist is setting an un-obvious ZFS property. Here are the highlights:

  1. Configure Snapshot Properties in SMF repo
  2. Set super secret ZFS property to enable snapshots
  3. Enable related services
  4. Voila!

Okay, here we go. First, let’s dump the manifest and adjust how many one hour snapshots we are going to hold.

# svccfg export auto-snapshot > /tmp/auto-snapshot.smf
# vi /tmp/auto-snapshot.smf

... locate this stanza for the hourly ...

<instance name='hourly' enabled='true'>
<property_group name='zfs' type='application'>
<propval name='interval' type='astring' value='hours'/>
<propval name='keep' type='astring' value='23'/>
<propval name='period' type='astring' value='1'/>
</property_group>
<property_group name='general' type='framework'>
<property name='action_authorization' type='astring'/>
<property name='value_authorization' type='astring'/>
</property_group>
</instance>

I want to hold three days of hourly snapshots. So I changed the value of ‘keep’ from 23 to 71 to hold three days of hourly snapshots. So my stanza looks like this:

<instance name='hourly' enabled='true'>
<property_group name='zfs' type='application'>
<propval name='interval' type='astring' value='hours'/>
<propval name='keep' type='astring' value='71'/>
<propval name='period' type='astring' value='1'/>
</property_group>
<property_group name='general' type='framework'>
<property name='action_authorization' type='astring'/>
<property name='value_authorization' type='astring'/>
</property_group>
</instance>

Okay, great, we have a custom rule set. Now, let’s import it into the SMF repo.

# svccfg import /tmp/auto-snapshot.smf

With the repo updated, we need to set the super secret zfs property and then enable the services. The name of one of my pools is ‘data’ so to set the auto-snapshot property on data I execute this command:

# zfs com.sun:auto-snapshot=true data

This setting will propagate down so that all the file systems in the pool will have snapshots created for them. By delegating file systems to users with ‘zfs allow -u joe_user data/some/filesystem’ users can control which file systems snapshots will be created for by maintaining their own com.sun:auto-snapshot properties.

Now, let’s review the services and the auto-snap property.

root@db012:~# svcs -a |egrep "auto-snap|slider"
disabled Nov_21 svc:/application/time-slider/plugin:rsync
disabled Nov_21 svc:/application/time-slider/plugin:zfs-send
disabled Nov_21 svc:/system/filesystem/zfs/auto-snapshot:daily
disabled Nov_21 svc:/system/filesystem/zfs/auto-snapshot:frequent
disabled Nov_21 svc:/system/filesystem/zfs/auto-snapshot:hourly
disabled Nov_21 svc:/system/filesystem/zfs/auto-snapshot:monthly
disabled Nov_21 svc:/system/filesystem/zfs/auto-snapshot:weekly
disabled 20:55:28 svc:/application/time-slider:default

root@db012:~# zfs get com.sun:auto-snapshot data
NAME PROPERTY VALUE SOURCE
data com.sun:auto-snapshot - -

So, all the services are off and the property isn’t set. So let’s fix that up now.

# zfs set com.sun:auto-snapshot=true data
# svcadm enable auto-snapshot:hourly
# svcadm enable auto-snapshot:frequently
# svcadm enable time-slider

Now, let’s go check on our handy work…

root@db012:/tmp# zfs get -r creation data/zones |grep @zfs-auto-snap
data/zones@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard0012a.apsalar.com@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard0012a.apsalar.com/local@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard0012a.apsalar.com/mysql@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com/ROOT@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com/ROOT/zbe@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com/apsalar@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com/local@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard009b.apsalar.com/postgres@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/ROOT@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/ROOT/zbe@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/apsalar@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/local@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/mysql@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -
data/zones/shard012a.apsalar.com/postgres@zfs-auto-snap_daily-2011-12-19-21h05 creation Mon Dec 19 21:05 2011 -

Voila! Los Hicimos

If for some reason you didnt get hourly snapshots immediately try restarting the time-slider.


# svcadm restart time-slider

Posted in OpenIndiana, ZFS, ZFS Fun Fact | Tagged , , , , | Leave a comment

How to check the size of the ZFS ARC and L2ARC

 

ZFS Fun Fact

The ZFS ARC is the Adaptive Replacement Cache. There is a simply way check the size of the cache on any OpenIndiana, Solaris, OpenSolaris, or Nexenta system.

Simply run the following command as root or equivalent

 

root@caprica:~# echo ::memstat |mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 696132 2719 33%
ZFS File Data 579365 2263 28%
Anon 179254 700 9%
Exec and libs 2473 9 0%
Page cache 11316 44 1%
Free (cachelist) 7529 29 0%
Free (freelist) 618755 2417 30%

Total 2094824 8182
Physical 2094823 8182

To see the utilization on your L2ARC devices simply use zpool iostat with the -v switch

root@caprica:/# zpool iostat -v backup
capacity operations bandwidth
pool alloc free read write read write
----------- ----- ----- ----- ----- ----- -----
backup 9.33T 5.17T 539 885 5.35M 8.74M
raidz1 4.67T 2.58T 269 442 2.68M 4.37M
c8t0d0 - - 197 31 1.20M 1.49M
c8t1d0 - - 195 29 1.20M 1.49M
c8t2d0 - - 196 31 1.20M 1.49M
c8t3d0 - - 195 29 1.20M 1.49M
raidz1 4.67T 2.58T 269 442 2.68M 4.37M
c8t16d0 - - 196 30 1.20M 1.49M
c8t17d0 - - 195 29 1.20M 1.49M
c8t18d0 - - 196 31 1.20M 1.49M
c8t19d0 - - 195 29 1.20M 1.49M
cache - - - - - -
c6t1d0 59.6G 8M 4 1 58.0K 117K
c6t2d0 37.3G 8M 0 0 13.5K 54.8K
----------- ----- ----- ----- ----- ----- -----


You can see my backup pool consumes nearly 100GB of metadata. Amazing.

As a follow up, to check the type of data held in your cache you can use the following zfs get command


root@caprica:/# zfs get secondarycache backup
NAME PROPERTY VALUE SOURCE
backup secondarycache metadata local

Here you see that the backup pool only holds meta data while the data pool outlined below caches whole blocks.


root@caprica:/# zfs get secondarycache data
NAME PROPERTY VALUE SOURCE
data secondarycache all default

Posted in ZFS Fun Fact | Tagged , , | Leave a comment

Creating USB boot media on OpenIndiana or Solaris for OpenIndiana

Today I was rather irritated with the USB build instructions for OpenIndiana. Someone had taken the time to write excellent directions for the Mac but didn’t write anything for Solaris, Opensolaris, or OpenIndiana.

I thought this might be useful for some who wanted to give OpenIndiana a whirl without having to be an expert at drive naming conventions. Makes sense, right?

First, you have to identify your USB device. This can easily be done with ‘iostat -En’

Here is an example from my home file server.

root@caprica:~# iostat -En
c3d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ELITE PRO CF CA Revision:  Serial No:     5B9A102B0   Size: 15.27GB <15267151872 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c0t0d0           Soft Errors: 0 Hard Errors: 68 Transport Errors: 0
Vendor: PepperC  Product: Virtual Disc 1   Revision: 0.01 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 68 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c4t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500541AS     Revision: CC34 Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3026 Predictive Failure Analysis: 0
c4t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c4t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c4t3d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c4t4d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c4t5d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c1t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c1t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c1t3d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c1t4d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c1t5d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c6t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: INTEL SSDSA2M040 Revision: 02HD Serial No:
Size: 40.02GB <40020664320 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c6t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG SSD RBX  Revision: 5D15 Serial No:
Size: 64.02GB <64023257088 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 17 Predictive Failure Analysis: 0
c6t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: INTEL SSDSA2M040 Revision: 02HB Serial No:
Size: 40.02GB <40020664320 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c5t8d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t9d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t10d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t11d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t12d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t13d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c5t14d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c1t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
Size: 1500.30GB <1500301910016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3024 Predictive Failure Analysis: 0
c5t15d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: SAMSUNG HD501LJ  Revision: 0-13 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 18 Predictive Failure Analysis: 0
c8t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t16d0          Soft Errors: 0 Hard Errors: 1 Transport Errors: 6
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t17d0          Soft Errors: 0 Hard Errors: 1 Transport Errors: 10
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t18d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c8t19d0          Soft Errors: 0 Hard Errors: 1 Transport Errors: 10
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t1d0           Soft Errors: 0 Hard Errors: 1 Transport Errors: 10
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t2d0           Soft Errors: 0 Hard Errors: 1 Transport Errors: 10
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 30 Predictive Failure Analysis: 0
c8t3d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST2000DL003-9VT1 Revision: CC32 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c9t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: PNY      Product: USB 2.0 FD       Revision: PMAP Serial No:
Size: 2.02GB <2017460224 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

There we go, all the way at the bottom we find the device name as c9t0d0. We want to make sure we get this right because we don’t want stomp all over another hard drive in the system.

Since this is a writeable dd image, we’ll use dd to write it. Take note of the device name used. It is not c9t0d0s0 like a regular disk, nor is it c9t0d0 like you would pass into ZFS. This is a USB device so it gets a funny little p0 at the end. The path to the device to use in this case is /dev/rdsk/c9t0d0p0 — note that you must use a raw device (rdsk).

root@caprica:/data/OS-images/oi# dd if=oi-dev-151a-x86.usb of=/dev/rdsk/c9t0d0p0 bs=1024k
0+9825 records in
0+9825 records out
1106312704 bytes (1.1 GB) copied, 293.49 s, 3.8 MB/s

Voila.

Posted in Uncategorized | Tagged | 2 Comments

Creating automated installer for OpenIndiana 151

After a brief consultation from Joshua Clulow via the OI discussion mail list i was able to build an automated installer (AI) image using Clulows XML config file. Here is how I did it.

  1. root@heimdall:~# pkg install install/distribution-constructor
  2. curl -k https://raw.github.com/gist/1263061/640f5dbb377bb3a5989a66e95f3b1ec04da88408/ai_x86_image_JMC_151a.xml > ai_x86_image_JMC_151a.xml
  3. distro_const build ai_x86_image_JMC.xml


/usr/share/distro_const/DC-manifest.defval.xml validates
/tmp/ai_x86_image_JMC_151a_temp_2978.xml validates
Simple Log: /rpool/dc/logs/simple-log-2011-10-12-10-57-04
Detail Log: /rpool/dc/logs/detail-log-2011-10-12-10-57-04
Build started Wed Oct 12 10:57:04 2011
Distribution name: OpenIndiana_AI_X86_151a
Build Area dataset: rpool/dc
Build Area mount point: /rpool/dc
==== im-pop: Image area creation
Initializing the IPS package image area: /rpool/dc/build_data/pkg_image
Setting preferred publisher: openindiana.org
Origin repository: http://pkg.openindiana.org/dev
Verifying the contents of the IPS repository
Installing the designated packages

Uninstalling the designated packages
Setting post-install preferred publisher: openindiana.org
Origin repository: http://pkg.openindiana.org/dev
Setting post-install alternate publisher: opensolaris.org
Origin repository: http://pkg.openindiana.org/legacy
==== im-mod: Image area modifications
==== ai-im-mod: Auto Install Image area modifications
128 blocks
==== ba-init: Boot archive initialization
252736 blocks
2816 blocks
35552 blocks
0 blocks
86288 blocks
4752 blocks
0 blocks
0 blocks
32 blocks
176 blocks
15808 blocks
64 blocks
0 blocks
48 blocks
0 blocks
0 blocks
==== ba-config: Boot archive configuration
/usr/share/distro_const/boot_archive_configure[80]: -p: not found [No such file or directory]
/usr/share/distro_const/boot_archive_configure[80]: -p: not found [No such file or directory]
==== ai-ba-config: Auto Install boot archive configuration
==== ba-arch: Boot archive archiving (64-bit)
454816 blocks
331040 blocks
==== ba-arch-32: Boot archive archiving (32-bit)
454816 blocks
253952 blocks
==== post-mod-custom: Post boot archive image area custom modification
==== grub-setup: Grub menu setup
==== post-mod: Post boot archive image area modification
Warning: creating filesystem that does not conform to ISO-9660.
Warning: creating filesystem that does not conform to ISO-9660.
==== ai-publish-pkg: Publish Package
==== iso: ISO image creation
Warning: creating filesystem that does not conform to ISO-9660.
Setting input-charset to 'UTF-8' from locale.
Size of boot image is 4 sectors -> No emulation
2.90% done, estimate finish Wed Oct 12 12:02:17 2011
5.79% done, estimate finish Wed Oct 12 12:02:17 2011
8.69% done, estimate finish Wed Oct 12 12:02:17 2011
11.58% done, estimate finish Wed Oct 12 12:02:17 2011
14.48% done, estimate finish Wed Oct 12 12:02:17 2011
17.37% done, estimate finish Wed Oct 12 12:02:17 2011
20.27% done, estimate finish Wed Oct 12 12:02:17 2011
23.16% done, estimate finish Wed Oct 12 12:02:17 2011
26.06% done, estimate finish Wed Oct 12 12:02:17 2011
28.95% done, estimate finish Wed Oct 12 12:02:17 2011
31.85% done, estimate finish Wed Oct 12 12:02:17 2011
34.74% done, estimate finish Wed Oct 12 12:02:17 2011
37.64% done, estimate finish Wed Oct 12 12:02:17 2011
40.53% done, estimate finish Wed Oct 12 12:02:17 2011
43.43% done, estimate finish Wed Oct 12 12:02:17 2011
46.32% done, estimate finish Wed Oct 12 12:02:17 2011
49.22% done, estimate finish Wed Oct 12 12:02:17 2011
52.11% done, estimate finish Wed Oct 12 12:02:17 2011
55.00% done, estimate finish Wed Oct 12 12:02:17 2011
57.90% done, estimate finish Wed Oct 12 12:02:17 2011
60.79% done, estimate finish Wed Oct 12 12:02:17 2011
63.69% done, estimate finish Wed Oct 12 12:02:17 2011
66.58% done, estimate finish Wed Oct 12 12:02:17 2011
69.48% done, estimate finish Wed Oct 12 12:02:17 2011
72.37% done, estimate finish Wed Oct 12 12:02:17 2011
75.27% done, estimate finish Wed Oct 12 12:02:17 2011
78.16% done, estimate finish Wed Oct 12 12:02:17 2011
81.06% done, estimate finish Wed Oct 12 12:02:17 2011
83.95% done, estimate finish Wed Oct 12 12:02:17 2011
86.84% done, estimate finish Wed Oct 12 12:02:17 2011
89.74% done, estimate finish Wed Oct 12 12:02:18 2011
92.63% done, estimate finish Wed Oct 12 12:02:18 2011
95.53% done, estimate finish Wed Oct 12 12:02:18 2011
98.42% done, estimate finish Wed Oct 12 12:02:18 2011
Total translation table size: 2048
Total rockridge attributes bytes: 38135
Total directory bytes: 264192
Path table size(bytes): 1710
Max brk space used 86000
172733 extents written (337 MB)
==== usb: USB image creation
/dev/rlofi/2: 828600 sectors in 1381 cylinders of 1 tracks, 600 sectors
404.6MB in 87 cyl groups (16 c/g, 4.69MB/g, 2240 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 9632, 19232, 28832, 38432, 48032, 57632, 67232, 76832, 86432,
739232, 748832, 758432, 768032, 777632, 787232, 796832, 806432, 816032, 825632
690416 blocks
Build completed Wed Oct 12 12:03:30 2011
Build is successful.

Posted in system automation | Tagged , , | Leave a comment

OpenIndiana doesn’t recognize system_locale in sysidcfg when creating zones

Using openindiana 151, I found that it blatantly ignores my sysidcfg config files complaining that system_locale wasnt valid.

The work around is simple. Remove system_locale from sysidcfg altogether. The system will happily create the zone, skip kdmconfig, and the dreaded manual process. Thus, leaving you with a completely hands off installation.

Here is what my sysidcfg looks like, for reference:

name_service=DNS
{domain_name=domain.com
name_server=10.0.0.28}
nfs4_domain=dynamic
timezone=US/Pacific
terminal=vt100
root_password=APnJT41dKR/.n
security_policy=NONE
network_interface=PRIMARY {hostname=host.domain.com}
Posted in Zones | Tagged , | Leave a comment

How to make Intel server boards shut the f* up

I recently took possession of some sample servers based on an Intel S5520R motherboard as part of the SR2625URLX chassis SKU. I outfitted it with an RMM3 and when I powered the thing on it blew all of its fans at full speed. It wasn’t just noisy. It was loud. Too loud. Insanely loud.

There is no meaningful way to control the fan speed from within the BIOS. It may be possible to set the fan speed with IPMI, but what happens if you need the extra RPMs? You over heat, that’s what. Clearly that is not optimal.

In practical life servers will be in the data center so who cares how much noise they make, right? As true as that is, the engineering samples are in my ‘start up’ office. They literally drowned out everything else. It was frankly embarrassing.

Here are the lessons learned:

  1. the RMM3 and the M/B BIOS require matched firmware
  2. Down the latest firmware set from Intel. The firmware needs to be unzipped and copied to a USB thumb drive
  3. Read the README so you know what to do ;-)
  4. Boot the system. Select F6 for the boot menu
  5. Select EFI shell
  6. type: FS0:
  7. Order is important. Install the BMC firmware first
  8. Ignore any notifications that you need to reboot for changes to take effect until the end
  9. Install the RMM3 firmware next
  10. Install the SDR/FRU firmware update
  11. Answer all the questions.
  12. When it finishes and prompts you that you need to reboot the system for the changes to take effect, please reboot the system

Voila. You are done. You may want to visit the BIOS and set your altitude properly as that may make a minor difference.

Also, Intel nicely deletes all your BMC/RMM3 settings so if those were set you will need to reconfigure those devices in the BIOS or other tools (eg. IPMI).

I hope this will save you some time…

Cheers,
j.

UPDATE:

http://www.intel.com/support/motherboards/server/sb/CS-022809.htm

I also found a “platform confidence test” image for the S5520UR boards (and others). It is funny, I was searching high and low for “diagnostic” and could find a single thing….

Posted in Intel Server | 2 Comments

Using HPN OpenSSH to flee the roach motel (AWS)

I have a number of sharded databases in AWS (aka the roach motel). Since our experience has been one marked by instability, high variability amongst instances, and AWS related mishaps we are eager to out. One might think it is a simple matter of transferring your data to a real data center and switching DNS. Well, it is not quite that simple.

In a test to see how smoothly this might go I attempted to send snapshots of file systems from east-1b to a data center in “Silicon Valley.” The file system was running a live database and taking megabytes/second of writes. The zfs send operation on the other hand only managed to send about four megabits/second to California over OpenSSH using arcfour as a cipher. This was never going to work unless we shut down for several hours — an option that is simply not on the table.

The problem was clearly a death by distance problem. Some application and stack tuning was clearly in order. So I started looking for folks who had looked at this problem before. Not surprisingly the super computer centers had previously tackled this type of problem.

After giving a few solutions a whirl, I selected the HPN patches for OpenSSH. I built hpn-openssh, started the daemon on the target side, and used the HPN OpenSSH client on the sending side. I then jacked up my TCP receive buffer to an arbitrary figure 49152000 with the command ndd -set /dev/tcp tcp_recv_hiwat 49152000. The result was a whopping 80 megabit/second transfer rate going across country. Truth be told the performance was likely gated by the 10k RPM SAS disk receiving the data.

Lesson #1 for escaping the roach motel: Use a tuned software stack to transfer your data out or you will never ever get out without disrupting your customer base.

After this experience, I feel the HPN patches should be integrated into the OpenSSH base as there is already a configuration option to disable the functionality within the patch set.

j.

Posted in Performance, ZFS | Tagged , | 1 Comment