Note: FreeNAS recently rebranded as TrueNAS. Since I still haven’t upgraded my system I’ll be referring to it as FreeNAS in this post.
Earlier this year, I tangentially followed the story on Western Digital shipping SMR (shingled magnetic recording) drives labelled as “Red” NAS drives. While these drives do provide satisfactory performance for non NAS use, they are significantly slower for most NAS’s, and virtually unusable for ZFS, used in FreeNAS. However, since I don’t really deal much with storage these days, I didn’t actually follow the details. When articles started coming up noting that WD is changing their branding to make it clear which drives are SMR, I assumed the issue was resolved.
Fast forward to September 2020. I returned to Brooklyn from Croatia and found out that one of the 4TB drives in my home FreeNAS server failed. The drives are from 2016, so the warranty expired. I promptly reordered what I assumed was an identical drive. I should have heeded the advice from a few folks on Twitter regarding checking for SMR in model names. I could swear that I did. But it turns out I actually did not… This post documents my process of finding this out the hard way, in case anyone else stumbles upon the same experience.
I ordered an WD Red 4TB NAS drive on Newegg. I can now clearly see that it is WD40EFAX (the SMR version of the same drive I had), and not WD40EFRX (regular CMR drive that WD sold for years), which is now branded WD Red Plus. Since the drive looked virtually the same as the one I was replacing, I was sure I got WD40EFRX.
So I followed the instructions to replace a drive in FreeNAS and… Resilvering failed. When running
zpool status -v each time I tried I got a bunch of WRITE and CKSUM errors, with
too many errors on the side.
It looked exactly the same as the issue that took my original drive out of commission, so I made three wrong guesses:
- I didn’t “burn-in” the drive
- Mini SAS cable between LSI SAS 2008 and the drives was faulty
- I got a faulty drive
The third assumption was correct, but not in the way I thought, obviously. After I attempted a burn in by running a long SMART test (
smartctl -t long /dev/da4 in my case), I still experienced issues. I also tried replacing the cable. Resilver failed each time at ~7%. I should have immediately guessed the problem was something else when the SMART test didn’t come up with any errors, but alas. I RMA’d the drive through WD.
WD RMA experience wasn’t great. I couldn’t do it through their web form and instead had to contact support. Then they had issues with their payment system. I ended up paying $5 for shipping. All of this was unnecessary, of course, since they shipped me yet another SMR WD40EFAX drive.
I basically went through the same experience with the second drive, until I finally checked the model number. Of course, it was the SMR model. I ordered a replacement drive. This time I got WD40EFRX, the regular CMR version now branded WD Red Plus 4TB NAS.
I did come across a great article explaining the science behind shingled drives, which also suggested that they could in fact function with ZFS if write caching and lookahead were turned off. This of course would slow down the resilvering process so much that the drive would effectively be unusable. Since I did have some extra time before the new drive came in, I decided to test it. I couldn’t get FreeNAS to turn off these two functions. I tried adding the following lines to
But it had no effect. Instead, I turned them off using a Linux VM running on the same server. I run FreeNAS as a VM on ESXI, and it connects to the LSI SAS 2008 RAID controller through PCI passthrough. This way I could easily access the drive through another VM. I issued these two commands:
hdparm -W0 /dev/sda – turns off write caching
hdparm -A0 /dev/sda – turns off lookahead
This time it worked. This was the otput of
camcontrol identify da4 -v in FreeNAS:
And then the long resilver started. It took 5 days to reach 15%, but there were no errors (previously resilver would fail after 7%). At this point my new drive arrived, so I stopped the resilver.
The lesson of this long and unnecessary conundrum is to pay close attention to details and 👏 LISTEN👏 TO👏 PEOPLE👏 ON👏 TWITTER👏 (well, maybe sometimes). But it also shows what a mess WD did with their SMR naming mixup.