I recently got a few (5) hard drives to turn my home server into a NAS with trueNAS scale and my idea is to have 4 usable and 1 for redundancy, my question is… How does RAID work, like what is RAID 0, RAID 5, software RAID etc, and does any of that even matter for my use case?
If you’re using TrueNAS it already has some types of RAID it wants to do. Assuming your 5 drives are the same size what you want is called RAIDz1 (1 standing for one drive worth of redundancy).
It is a type of RAID5, which means instead of having 5x usable storage you reserve 1x for redundancy information spread out across the 5, and get only 4x usable space.
Since you’re a beginner you get the usual lecture: RAID is not backup. RAID allows a certain number of your drives to fail without losing any data; it spreads the risk of hardware failure.
RAID won’t help if you delete a file or accidentally explicitly format the wrong drive or even the whole array, and won’t help if the PC is stolen or struck by lightning or burns in a fire.
The solution used by TrueNAS (ZFS) has something called snapshots that can help with modified or deleted files.
For anything else you have to consider which of your files are “my world has ended”-level of important and backup to a HDD in a drawer, or to Blu Ray discs, or online to the cloud.
Thanks, and yes, the disks are all the same, speed, capacity, brand etc… I’m confused about the difference between RAID 5 and RAIDz1, they seem to do the same thing on the surface and looking at the other comments and online, one of them is probably what I’m gonna go for. The only thing I get about the 2 is that RAIDz uses ZFS and RAID 5 does not (?)…
My “NAS” is relatively powerful and read/write speeds aren’t really a big deal for me, as it’s gonna be bottlenecked by the 1GBPs connection on my “NAS” (a PC I’ve scrambled together from handouts and cheap parts over the years)
RAID5 is the theoretical concept where you spread one drive worth of redundancy across multiple drives.
Traditionally this concept used to be implemented with special hardware cards that you plugged into your server and connected HDDs to it and it had its own BIOS where you managed the drives.
Later Linux implemented this concept (and other RAID concepts like RAID1, RAID6, RAID10 etc.) without the need of a special card. The Linux drives is called MD (Multiple Device). Actually Linux took it much further and can take any storage devices and do RAID with them: it can work with a whole disk but also with partitions, or with regular files formatted to look like a partition etc.
The cool thing about Linux MD is that it allows you to do any RAID combination that’s logically possible, even if it’s dumb and you’d never use it in real life, like RAID5 with 2 drives (normally you’d use RAID1 for that) or RAID1 with only one drive (no redundancy). Why? Because sometimes you need those dumb things. For example I have two RAID1 arrays and I noticed that both drives in one array show SMART signs that they might fail in the future (10% chance). Linux MD allowed me to remove one drive from each array and reconnect them to the other; now each array has one 100% healthy drive and one 90% healthy drive.
The uncool thing about Linux MD is that this is where it stops. It doesn’t care what filesystem you use on the arrays and it has no other features. This makes its parity RAID implementations (RAID5, RAID6, RAID50, RAID60 etc.) vulnerable to sudden shutdowns (power failure or button off) because the drives may be left in disagreement about parity. To work around it you need a power UPS or a PCIe adapter card with a builtin battery, so that the parity is correctly written to the drives in case of power failure.
You can get some extra cool features (like snapshots) by using a filesystem like BTRFS on top of a Linux MD array.
RAIDz1 is a completely different implementation of RAID5 which uses ZFS and originated in the BSD world, not Linux. Linux gets to use it courtesy of the OpenZFS project as an external kernel module; it can never be included directly into the Linux kernel because of fundamental licensing differences. TrueNAS Scale is a Linux OS so it uses the module approach; there’s also TrueNAS Core which is a BSD OS so it has native ZFS support. If you’re only going to use your NAS as a NAS (for storage, not virtualization) I would recommend Core.
ZFS is both a RAID implementation and a filesystem; sort of like MD + BTRFS, but much more tightly integrated. ZFS has lots of extra features built-in: it has no write hole vulnerability; it has snapshots; it has compression; it can mark folders for special use cases and for example only activate compression on your documents folder but not on your movie folder.
The issue with ZFS is that it’s much more complex and opinionated than Linux MD, so if you were to manage it directly yourself it would be a lot to learn. Even more experienced people have to think very carefully before using it. But since TrueNAS (both Scale and Core) have a user-friendly GUI that won’t matter to you.
I don’t wanna use TrueNAS Core, as I’m not planning to use it as “just a NAS”, I also plan to run a few other things on it, like pihole, searxng, wireguard, (maybe) nextcloud and a few other things. Other than that, I’m just not as familiar with BSD as I am with linux, nor do I particularly care to familiarize myself with it. As for ZFS, I’m still not sure about it, but looking at all the other options, it does look like the most straight forward and secure way to go, to me anyways…
To add, unlike “traditional” RAID, ZFS is also a volume manager and can have an arbitrary number of dynamic “partitions” sharing the same storage pool (literally called a “pool” in zfs). It also uses checksumming to determine if data has been corrupted. On redundant setups it will then quietly repair the corrupted parts with the redundant information while reading.