Just for fun, here are some short notes on setting up the old backblaze pod. This backblaze pod is the older chassis, but with the new SuperMicro X9SCL-F motherboard, Intel i3-2120 and 16GB of ram. It is using the older controllers (not the new RAID controller). The reason this is a franken-pod and not a new one is that the new backblaze units (hardware RAID) use a backplane that requires a new pod, different power supply and the hardware raid, which were pushing the project over budget. Although there are online instructions for building a pod, I like mine on one page.
In a nutshell, install Debian on the main disk. Note for this I used the netinstall with non-free drivers. Personally, I had to remove the SATA controllers to do this as grub was getting confused and the rescue was not working, so this was simpler. Near the end when it asks for packages, I just installed SSH server and unchecked the rest. There is no need for desktop services/etc here. Finish that up, shutdown, install controllers, boot up. Now lets configure this box o’ drives.
Install a few tools we need first:
apt-get install hdparm mdadm parted xfsprogs lvm2
You will need to partition all of the disks. I leave writing a small one line script for parted as an exercise to the reader, but by hand it would look like:
parted /dev/sdn mkpart primary 2048s 100%
parted /dev/sdn set 1 raid on
That would be done for all 45 drives (not the boot disk, of course).
Next issue: The most important issue with the backblaze is knowing where each drive is located in the chassis. The backblaze/45 drives people do it by choosing which SATA cables go to which controllers, which works with the exact controllers and motherboard they have. If you use a different one it may not match. If someone else wires up the cables (as in my case) it will not match. The way I present it here takes about 15 minutes with the machine open in front of you, but when you are done you know where the drives are. When you built it you should have noted each drive and its position. We arbitrarily choose one end to start at and call that space 1 and the last space 45 and then write down the serial number of each drive (they are on the drive) and the associated position. After that is done, in Debian, get the serial number associated with the device name by doing:
for each device. Again, it is left as an exercise to script that out for all of your devices. Write down the device number next to the drive in your list. This will come in very handy when a drive fails.
Now we just need to make the arrays. Since we do not have stacks of backblaze units all backing each other up, we use RAID 6 here and not RAID 5. Every third disk is chosen to more evenly distribute the drives over the controllers and the backplane, as shown in the whiteboard example below.
Create the raid arrays with the mdadm tool you installed previously. You will note in the commands below that the first drive is not /dev/sda. That is because Debian took it to be the USB stick upon installation. /dev/sdau is the 500G boot drive. It comes in last as it is on the motherboard SATA Controller and Debian is starting with the PCIe SATA Controller drives first.
mdadm –create –verbose /dev/md0 –level=6 –raid-devices=14 /dev/sdb1 /dev/sde1 /dev/sdh1 /dev/sdk1 /dev/sdn1 /dev/sdq1 /dev/sdt1 /dev/sdw1 /dev/sdz1 /dev/sdac1 /dev/sdaf1 /dev/sdai1 /dev/sdal1 /dev/sdao1 –spare-devices=1 /dev/sdar1
mdadm –create –verbose /dev/md0 –level=6 –raid-devices=14 /dev/sdc1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdo1 /dev/sdr1 /dev/sdu1 /dev/sdx1 /dev/sdaa1 /dev/sdad1 /dev/sdag1 /dev/sdaj1 /dev/sdam1 /dev/sdap1 –spare-devices=1 /dev/sdas1
mdadm –create –verbose /dev/md2 –level=6 –raid-devices=14 /dev/sdd1 /dev/sdg1 /dev/sdj1 /dev/sdm1 /dev/sdp1 /dev/sds1 /dev/sdv1 /dev/sdy1 /dev/sdab1 /dev/sdae1 /dev/sdah1 /dev/sdak1 /dev/sdan1 /dev/sdaq1 –spare-devices=1 /dev/sdat1
cat /proc/mdstat
The output from cat /proc/mdstat should show them all there, but read-only and pending. You will have to force them to get a move on.
mdadm –readwrite /dev/md1
mdadm –readwrite /dev/md2
Now you should see them all in a state of active and resyncing:
At this point we should not forget to put the array into mdadm so that it will come online at boot.
When mdadm returns with the output of the three arrays, simply add that to the end of /etc/mdadm/mdadm.conf (right under the line that says: # definition of existing MD arrays).
You can do anything you want with the drives now, but personally, I like to wait until they are done syncing before hammering them with data. Ready? Great, our game plan is to initialize them for use with LVM, create the volume groups using all three RAID arrays then format it and put it to use.
Ok, so we initialized the three RAID arrays with pvcreate and then created a single volume group of all three arrays with vgcreate and then displayed the volume group with vgdisplay. This volume group has the name backblaze2. You can name yours fred, or whatever you like. Let us now create the logical volume and then format it. Look carefully at the output of vgdisplay backblaze2. Note there is a line that says: "Free PE / Size"
You will want to take the number that corresponds to the “Free PE” and use that in the next command. In my case it was 34337853. The name after -n is the name of the volume group. You can use fred-backup or whatever you like.
So, if that all looks good, let us go ahead and format it with xfs and mount it on /mnt/data.
At this point if it all looks good, you can add an entry to fstab that looks like the following to auto mount the filesystem at boot:
You should now be good to go with the drive portion of the backblaze. Next up, how to install Amanda on the pod.
The biggest issue with RAID are the unrecoverable read errors.
If you loose the drive, the RAID has to read 100% of the remaining drives even if there is no data on portions of the drive. If you get an error on rebuild, the entire array will die.
http://www.enterprisestorageforum.com/storage-management/making-raid-work-into-the-future-1.html
A UER on SATA of 1 in 10^14 bits read means a read failure every 12.5 terabytes. A 500
GB drive has 0.04E14 bits, so in the worst case rebuilding that drive in a five-drive
RAID-5 group means transferring 0.20E14 bits. This means there is a 20% probability
of an unrecoverable error during the rebuild. Enterprise class disks are less prone to this problem:
http://www.lucidti.com/zfs-checksums-add-reliability-to-nas-storage
A few notes. First off, the backblaze is not meant to be a backup solution, which would require the level of redundancy you speak of. Also, the way backblaze gets around this is by using multiple backblaze units and raiding across them.
I would also add a few other items. First, some file system can get around read errors..
http://www.zdnet.com/article/btrfs-hands-on-exploring-the-error-recovery-features-of-the-new-linux-file-system/
Also, there used to be a huge difference in URE and BER between consumer and “enterprise” drives, but now I think it is mostly the length of the warranty. In fact, it has been shown that consumer drives were better than enterprise drives as recently as 2014:
http://www.computerworld.com/article/2687068/consumer-drives-shown-to-be-more-reliable-than-enterprise-drives.html
http://lifehacker.com/why-enterprise-hard-drives-might-not-be-worth-the-cost-1476333889