Home
Trouble in ZFS land
An interesting incident took place over the last month that took my storage. There are several lessons learned so a write up is deserved. Around Independence Day, I decided to reorganize the storage server for the following reasons:
To accomplish 1, I purchased an adapter that allows me to mount the L2ARC SSD in the R510's 12.7 mm optical bay. That went without a hitch. To accomplish 2, I was torn between striped 4x mirrors or striped 2x RAIDZ2. Striped mirrors is ZFS implementation of RAID10, which is known for best IOPS performance on traditional RAID setups. Striping RAIDZ2s together allows me to lose two disks in each RAIDZ2 sets and maintain redundancy. I decided to go with striped mirrors. Here's my plan on how to make the migration.
Sounds easy... At step 7, zfs recv command returns invalid stream bad magic number. What? Panic. And thus started my crash course on how zfs send/recv works. The zstreamdump tool shows one of the BEGIN block did not have the correct magic number of 0x00000002f5bacbac. x86-64 architecture being little ending has the long stored as 0xaccbbaf502000000. Opening up the first part of the stream file in a hex editor shows the zfs send output is interlaced with the TIME SENT SNAPSHOT and each progress call outs. Looking back at step 3, the command I used was zfs send -Rv livepool > livepool.zfssend. To prevent the job from aborting when I log off shell, I put nohup in front. Looks good right? Wrong. The existence of nohup shifts the stdout redirection to the nohup command level. It is now capturing both the zfs send output and the progress callouts as stdout. The spurious text is throwing off t he ZFS binary file reader. Whipped up a program to strip out the progress callouts. The pattern was consistently "hh:mm:ss 12.3T livepool/dataset@snapshot\n" which made it easier to process. Then the error moved from the start of the file to bad checksum at around the 2.6 TB mark. As the checksum is checked at every block, that means 2.6 TB worth of data are intact. The checksum error is at where the first snapshot ends and the second snapshot starts. zfs send orders the data from oldest snapshot to newest, each newer snapshot is a delta changes log of the previous snapshot. If I can insert the correct end records to tell zfs recv to recognize this is the end of the stream, maybe the directory structure will be created properly. Using the stream of an empty dataset snapshot, I inserted two END back-to-back records. One with checksum and one with 0's as payload. Using test run as tweaks to find the correct checksum. Each test run takes 3 hours so it's become very tedious. Finally zfs recv reports success, one month after the original incident. It reports the file storage dataset snapshot was restored successfully. Navigating in the directories confirmed most files are present. What lies between 2.6 TB and 3.2 TB? More snapshots of the file system and the two logical volumes being used as iSCSI target. I only have test VMs, which have no important data. I can recreate them if needed. Lessons learned:
Quite an adventure, and I can answer all sorts of ZFS send/receive stream questions until this knowledge gets pushed out by new knowledge. |
VLANs and Subnets
Now there are multiple servers and devices on the network, it's time to organize them. I will use the 172.16-31 range because 10 and 192.168 are overused. Connected to the switch but disconnected from the Internet VLAN 2 - management (iDRAC, etc.) VLAN 3 - iSCSI 0 VLAN 4 - iSCSI 1 VLAN 10 - surveillance with PoE IP cameras Disconnected from the switch VMWare distributed port groups VLAN 11 - domain server VLAN 12 - domain client VLAN 99 - test |
Hex Core
I wanted to max out the potential of these servers so off to eBay and shop. This are the prices on Monday, May 1. L5640 2.26GHz $23 E5645 2.4GHz $20 E5649 2.53GHz $27 X5650 2.66GHz $24 X5660 2.8GHz $28 X5670 2.93GHz $44 X series have 95W or 135W TDP. R510 only supports up to 95W TDP so X5670 or X5675 is the max. X series also have a higher QPI rate, but with increase of rate comes with an increase in power consumption. Right now my use is not IO bound so the increase does not seem to be worthwhile. Also the uptick in price at X5670 is hard to stomach... E series have 80W and the L series have 65W TDP. The L series runs at the same voltage so the TDP reduction comes entirely from the lower clock speed. At near idle E series and L series should have similar power cosumption. In the end, I went for two E5645. The price was right too. Also picked up an extra R410 heatsink for the socket. R410 heatsinks work in R510 and costs about 1/3 the price. The parts arrive 3 days later on Thursday. Without further ado, they were installed in the secondary VM host and powered up. Error E1410. CPU1 IErr. Moved the two E5645 into the primary VM host. No problem there. Moved the two E5620s into the secondary VM host. Still E1410. Googling that error code turned up a post about bent pins. My R510 sockets do not have bent pins, so what can be wrong? At a whim, I unlatched the sockets and flexed the locking arm several times. I could hear the pins making contact with the pads at the bottom of the processor. Latched the sockets, mounted the heatsinks, and plugged in the power. Powered on. System booted without error. My theory: The processors' pads had dimples formed after over many years of service. Placing a used processor in a socket, the pins and dimples do not line up causing some pins to be hanging. The flexing of the locking arm allowed the pins to settle into dimples making good connection. Primary VM Host 2x E5645 2.4GHz Hex-Core 64GB RAM ESXi 6.5 Secondary VM Host 2x E5620 2.4GHz Quad-Core 32GB RAM ESXi 6.5 Storage Server 1x E5630 2.53GHz Quad-Core 8GB RAM 6x 2TB SATA 7200rpm HDD 1x 128GB SSD L2ARC 1x 64GB SSD NIL FreeNAS 9.10 Spare part 2x 2TB NLSAS 7200rpm HDD 1x E5630 2.53GHz Quad-Core Wish list: Home surveillance with PoE net cams. Maybe I can get another heatsink and throw that E5630 in the storage server. |
More Memory
Ordered 8x 8GB PC3-12800 ECC registered and installed them in the virtualization server. Sounds simple, but it was not without drama. First ordered the memory from one vendor around mid-March. The tracking info never got updated. Notified the vendor after the delivery window and promptly received refund. Too bad the courier dropped the ball, but the vendor made me whole. Ordered the memory from a different vendor on April 4, received them on April 7. Nice. Why 8GB modules? DDR3 ECC registered memory comes in 4 densities: 2GB, 4GB, 8GB and 16GB. PowerEdge R510s have 8 memory slots (4 per processor.) 2GB modules can't achieve meaningful total memory size so not even under consideration. 4GB modules are dirt cheap. About $5 per module if you look around. This density is fine for systems with lots of memory slots like R710, but a R510 can only have maximum of 32GB. 8GB modules cost about $20 per module, but I was able to find them at $15. R510 can achieve 64GB total memory which is more than I need for now. Consider 8GB unbuffered non-ECC DDR3 or DDR4 RAM for desktops are at about $50 to $60 per module, this is still a bargain. 16GB modules are too expensive. Now the lineup looks like: Virtualization server 2x Xeon E5620 Quad Core 2.4GHz 8x 8GB DDR3 ECC Registered (64GB total) ESXi 6.5 FreeNAS 1x Xeon E5630 Quad Core 2.53GHz 4x 2GB DDR3 ECC Registered (8GB total) 6x 2TB SATA HDD 1x 128GB SATA SSD for L2ARC 1x 64GB SATA SSD for ZIL Spare R510 1x Xeon E5630 Quad Core 2.53GHz 4x 4GB DDR3 (16GB total) Spare parts 4x 4GB DDR3 ECC Registered 2x 2TB NL-SAS HDD To put the spare memory parts into service, I would need to get additional processor for the spare R510. Maybe I will do that next... |
New Storage Hardware, Part 2
As alluded in the New Storage Hardware, MD1200 has fans that are almost too loud to bear. Searching for a solution again. Option 1: Replace the Delta PFC0812DE fans with something that spins slower like the 6000rpm model. Drawbacks: Substitute fans are pricy. Slower fans will have less airflow. Anecdote of MD1200 throwing faults if the fan is too slow. The SC440 and the MD1200 take up 4U and 2U respectively. The clincher is not the technical issues. Rather, it is how much rack space the SC440 and the MD1200 are occupying. Good thing I have another R510 with 8 drive bays. Before we jump into it, there is another hurdle. R510 1x Xeon E5630 4x 2GB RDIMM PERC 6/iR The PERC 6/iR's LSI 1078E controller does not support IT mode. Even if I flash it to LSI firmware, IT mode will not be available. I spent 1 week researching my options. There are plenty of people who replaced the PERC 6/iR with the PERC H200 flashed to LSI IT mode firmware. I can do that too, but the PERC 6/iR uses SFF-8484 "wide" connectors and the H200 uses SFF-8087 connectors. I would need to replace the cables, and they don't come cheap. Looking for ways to reuse the SFF-8484 cables led me to the SAS 6/iR with the LSI 1068E controller. The LSI 1068E is compliant to the SAS 3Gbps standard which has 2.2TB limit. My array has 2TB drives. Perfect. The SAS 6/iR is cheap. Like $6 cheap on eBay. Googling for cross flashing the equivalent LSI 3081E-R IT mode firmware on the SAS 6/iR turned up exactly one result. Given how inexpensive the part was, it was worth the risk. I won't bore you with the technical details but the flashing was successful. Moved the HDDs and SSDs from MD1200 to R510's hot swap bays and the USB flash drive from the SC440 to R510's internal USB port. Turned on the R510 and reconfigured the network, and FreeNAS is ready to serve again. The R510 is much quieter than the MD1200. iDRAC6 reports power usage between 120 and New Storage II R510 8 LFF bay chassis 1x E5630 4x 2GB RDIMM SAS 6/iR flashed to LSI 3081E-R IT mode firmware 6x 2TB HDD 2x SSD for L2ARC and ZIL The MD1200 and LSI 9200-8e will standby until I outgrow R510's 8 drive bays. |
New Storage Hardware
Once the virtualization host role got moved off the PowerEdge SC440, it is freed up for other tasks. Naturally, it seemed to make sense to move FreeNAS off the Core 2 Duo E4400 white box to the PowerEdge SC440. First, a little background information. All of these servers are booting off USB flash drives. I have been using the miniscule SanDisk Cruzer Fit 16GB for the last three years with great success. It makes moving roles around simple and allows me to dedicate the entire disk to ZFS. To prepare the SC440 for the storage role, I added a LSI SAS 9200-8e HBA and a Broadcom 5709 dual port gigabit NIC to it. Updated the 9200-8e HBA to the latest P21 IT mode firmware because of FreeNAS and ZFS. The 6x 2TB SATA disks go into a PowerVault MD1200 directly attached storage (DAS). And the first start up. SC440 boots into FreeNAS. The ZFS pool is mounted. All services are available. Look good, but holy shit the MD1200 is loud. The Delta PFC0812DE fans in each power supply just wails. I need to do something about that... Salvaged some SSD from desktops that were going to be decommissioned. Assigned one each to L2ARC and ZIL role. Storage PowerEdge SC440 Core 2 Quad Q6700 2.66GHz LSI 9200-8e MD1200 with 6x 2TB SATA disks |
New VM Host
Among the equipment I received are two Dell PowerEdge R510 with single processor each. I combined the processor and RAM into one system and ended up with a new virtualization host:
PowerEdge R510
2x Xeon E5620
24GB RDIMM
I put the free |
Starting Point
In January, I received some decommissioned hardware from my employer, and I have been making incremental changes since then. To best document the improvements, it's important to frame the work by defining the starting configuration. Virtualization: PowerEdge SC440 Core 2 Quad Q6700 2.66GHz 8GB DDR2 ECC RAM XenServer 7 Storage: Core 2 Duo E4400 2GHz 4GB DDR2 RAM 6x 2TB 7200 RPM SATA FreeNAS 9.2 Networking: TP-Link Archer C9 They are former desktops that got a second life as servers. |
Reboot
Time for a fresh start with new contents and new challenges. Move older contents into the Archive area. It is not accessible through the navigation bar so you will need go to the Sitemap first. |
1-9 of 9