Adventures with ZFS, Part 1
RAID is scary nowadays. What used to be considered a "must-have" for fast storage with redundancy, is now (with today's ever-larger drive densities) becoming a liability in the form of a ticking MTBF time bomb. Hardware RAID 5 in particular, which was an excellent solution in the days when 73GB enterprise hard drives were considered "large", now gives a very false impression of basic data safety when used with large 1TB+ drives. The topology itself is a liability, because in the event of a disk failure, the thrashing given to the other disks in the array during rebuild increases the likelihood of a second failure to statistically uncomfortable levels.
While in search for a properly state-of-the-art DIY storage solution for my company, I kept hearing about ZFS. The "magic silver bullet" B-tree file system, that self-heals silent corruption with end-to-end checksumming and manages disk devices directly for monolithic high-speed software RAID that is controller-independent. The one that eliminates the RAID5/6 write hole. The one that supports snapshotting and deduplication, that mixes its RAID metaphors within the same volume, that writes in variable-length stripes. The one that brings them all, and in the darkness binds them… no wait, that's a different movie.
Anyway, ZFS is supposed to be hot shit, and has a large contingent of cheerleaders in various corners of the enterprise systems world. It was the brainchild of the Fishworks team at Sun, whose collective accomplishments are too lengthy to discuss at present, but suffice to say they are generally considered legitimate badasses. ZFS was developed to be a key competitive advantage for Solaris in the enterprise storage space, and Sun was well on their way to hitting critical mass with sysadmin mindshare generally, which would have seen ZFS (and Solaris) in a great many environments where it may never have ventured previously. Unfortunately, after Oracle gobbled up Sun they immediately went to work screwing with OpenSolaris and the community development model in ways they shouldn't have, which I won't get into here. To make a long story short, there's now a completely community-built version based on Illumos, called OpenIndiana. I decided to use OpenIndiana as the basis for a high performance, low cost NAS server, and see how far I could go with it in terms of performance and stability on a budget.
Build an enterprise-grade NAS server using up-to-date hardware with at least 100 MB/s sustained reads and writes across the network, via NFS or SMB/CIFS. Both, if possible.
- Norco RPC-2112 storage chassis, 12 SAS bays in 2U
- Three 4-drive SAS2 backplanes with SFF-8087 connectors
- Redundant Zippy 460W PSU
- Supermicro X9SCA-F Sandy Bridge motherboard, with IPMI & 6Gb/s SATA3 onboard
- Intel Xeon E3 1240, 3.3Ghz, 4 cores 8 threads
- 8GB ECC DDR3 (Kingston)
- LSI SAS 9211-8i, 8-port SAS2 6Gb/s HBA, two SFF-8087 ports
- (2) Crucial M4 (Micron C400) 64GB SSDs
- (7) Seagate Constellation ES 1TB (16MB) SAS2 drives
Altogether, it was about $2800 including incidentals, and could have been built for a few hundred less by cutting a few corners, but generally it's a very solidly spec'd entry-level server build for that price. It's been up and running for about three days now, and so far, the hardware runs like a champ. I had to put the LSI HBA in another system to cross-update the firmware to the latest IT version from LSI rather than the included IR version, due to a small DOS-mode incompatibility on the Sandy Bridge chipset, but it flashed without a problem in the other chassis and then fired right up in the new server without any other problems.
I mentioned OpenIndiana earlier. This is a community-maintained port (more or less) of OpenSolaris, currently at version 148 with ZFS pool version 28.
I'm using napp-it for management of the box as a NAS appliance, and so far it works very well. Hooray again for the community, hard at work for the benefit of all of us. Napp-it provides a simple, robust interface for managing the storage hardware, ZFS pools, and network shares directly, and includes some great features that I won't be using in this build but would be of great interest to some other sysadmins, like snapshotting, Comstar iSCSI targets, and others.
One SSD will be used as the system disk, with plenty of extra space reserved for wear leveling. Six of the Seagate drives will be combined in a RAID-Z2 array, which is similar to RAID6 (double parity), with the seventh drive reserved online as a hot spare. The extra SSD will be reserved for future use as a read/write cache disk for the primary ZFS data pool, to see what sort of improvements are possible by adding a fast SSD to a RAID-Z2 array to create hybrid storage.
Initial benchmark results
First benchmarks from within napp-it (directly on the hardware) with Bonnie++ 1.03c return the following:
|POOL||SIZE||Sequential Writes||Sequential Reads||Random Seeks|
|rpool||29.8G||101 MB/s||418 MB/s||6970.9/s|
|datapool||5.44T||247 MB/s||535 MB/s||719.0/s|
247 MB/s writes and 535 MB/s reads on the platter drives? That's damn impressive for what is essentially a low-cost homebrew rack NAS box using free software, and that's without an SSD read or write cache on the data pool. In the next installment, we'll see how that translates in the real world to actual transfers across a network, and see what improvements (if any) are available by adding SSD caching. Stay tuned for that.
|Print article||This entry was posted by Porter on June 28, 2011 at 6:45 pm, and is filed under Open Source. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|