Building a big file server on FreeBSD

by Garrett Wollman

I never thought I'd find myself building file servers. As a network and security guy, NFS did not appear destined to be in my bailiwick. But the conjunction of two outside forces caused me to be in that business unexpectedly throughout much of 2012, and it's looking like I'll be doing plenty more in the years to come.

The first event was a generous donation to our Lab by Quanta Computer, the Taiwan-based contract manufacturer, of a system they called a "KT" rack. "KT" stands for "Korea Telecom", and the rack consisted a bunch of Intel Xeon-based servers with lots of memory and four SAS disk shelves. When we got it, nobody had any idea what to do with the disk shelves, so they just sat there for six months.

The second event was the acquisition of BlueArc, the vendor of our high-end NFS storage appliance, by Hitachi. The Hitachi people didn't really seem to understand the market they had acquired—BlueArc was popular among sites like ours who thought that competing products from EMC and NetApp were grossly overpriced, and Hitachi seemed to think they were acquiring a business with EMC-like margins. So we were looking at a bill for maintenance contracts and upgrades that was well into the six figures, and we naturally wanted to find something more cost-effective.

So of course I said, "Why don't we try building something from this Quanta hardware? They gave us some nice SSDs and lots of big rotating disks, and it can probably meet the requirements that caused us to buy the BlueArc in the first place." I did, and—after a somewhat shaky start—it did. We added even more memory to the file servers, plus redundant SAS controllers and higher-performance SSD, and it really screamed. Having proved that the architecture was workable, we went back to our friends at Quanta to flesh out a production-ready design for file servers, and I'm working on making the transition to the new systems as I write this.

Physically, the final hardware configuration of our test/development configuration is as follows:

Quanta QSSC-S99Q chassis 2 x Intel Xeon E5640 (2.67 GHz)
96 GB DRAM
LSI SAS1064ET Fusion-MPT SAS (for internal boot drives)
2 x Intel 82599EB 10-gigabit network controller
2 x LSI SAS2116 Fusion-MPT SAS-2 (for external drives)
4 x Quanta DNS1700 (QSSC-JB7) disk shelves 23 ea. Seagate ST32000444SS 2-TB SAS-2 drives
1 ea. STec ZeusRAM 8-GB SAS-2 SSD or OCZ Talos 2 240-GB SAS-2 SSD

The LSI SAS2116 chipset is found in a number of OEM products; the retail version of the PCI-Express card that we have is called the "SAS 9201-16e". This is a 16-port card with four SFF-8088 connectors on it; each SFF-8088 carries four SAS ports, and is wired to a single disk shelf. Each shelf has eight 6:1 SAS port expanders, allowing for 24 dual-ported drives. With four shelves, we get a total of 88 active drives (allowing for one spare and one SSD on each shelf). Each shelf has a cable to both SAS controllers, and we do not daisy-chain the shelves so as to avoid introducing another point of failure.

These servers also have an internal SAS backplane. We use this only for boot drives, never for user data, to ensure that the complete storage pool can be moved from one machine to another simply by moving the SAS cables. Since we are not running active-active redundancy (e.g., HAST), our availability strategy requires that we be able to survive a failure of the file server itself merely by relocating the drive shelves to another server.

On the software side, we have been running FreeBSD 9-STABLE with various fixes (primarily to geom_multipath and the mps driver) backported. The new production servers will run our private build of FreeBSD 9.1-RELEASE, and will be part of our Puppet system administration environment. Some other tidbits:


Garrett Wollman, 2013-01-10