[Solved] Is this hard drive failing?

Kalcifer@lemmy.world · edit-2 1 year ago

[Solved] Is this hard drive failing?

Kalcifer@lemmy.world · 1 year ago

That disk certainly isn’t healthy.

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

image the whole thing with ddrescue

Since you mention “image”, I’m assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

PriorProject@lemmy.world · edit-2 1 year ago

That disk certainly isn’t healthy.

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

GPT is the partition scheme that stores the partition table. Very few pieces of software interact with that layer of your storage system. The first GPT table error tells us that, unless we’ve been messing with low-level tools that might break the partition table… the physical disk has probably already lost data. So we’re already primed to suspect a busted disk.

Then the kernel log snippets you pasted show tons of errors in the block device layer. I know noisy application logs sometimes train us to ignore error messages, but the kernel block device layer does not log out error messages for fun. If you see any log like ERROR sdx where sdx is a block device that stores important data without a backup… you’re about to be in for a rough ride.

image the whole thing with ddrescue

Since you mention “image”, I’m assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

Yes, though you can pipe ddrescue into gzip or another compressor and if the drive isn’t full and you’re lucky enough to have some decent sized zero’d out regions they’ll compress very well. In the best case, you might only need a disk big enough to hold the live data. In the worst case, yeah, you need a matched disk or bigger.

Pro tip, buy drives in pairs and automate backups to one of them. If you have a disk you can’t copy to another disk, you almost might as well have no disk. This kind of thing happens, not a lot… but I lose a disk maybe every 3y-5y or so. I have a few disks around… maybe 6 online at any given time. But it’s not like I’m running hundreds of them. They just conk out every now and again and you’ve got to be ready for them.

nous@programming.dev · 1 year ago

Pro tip, buy drives in pairs and automate backups to one of them.

Honestly I don’t think this is the best way. Best to buy them at different times or buy two from different manufacturers. Chances are that if you buy two identical ones together once one starts to fail the other is not far behind. Or if there is some defect in the batch you could have both fail quickly and within a very short window of each other.

If you much buy identical drives best to have one be far less active then the other, like be an offline backup rather then a hot backup.

PriorProject@lemmy.world · 1 year ago

Yeah fair. This is sound advice.

I buy matched pairs to mirror, and then offset the purchase of my pair of backup drives. So I end up having 4 copies on two different models. And when my primary disks get full I “promote” my larger backup disks to primary and buy a new/larger pair of backup disks that are big enough to store many snapshots of my primaries. I knew this was too much for OP and tried to simplify… but your approach is equally simple and better.

Max-P@lemmy.max-p.me · 1 year ago

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

That kind of error message is never good news:

[ 300.090572] sd 6:0:0:0: [sde] tag#16 Sense Key : Hardware Error [current] [ 326.010925] Buffer I/O error on dev sde, logical block 4, async page read

I mean, technically it could also be the SATA controller/interface being bad, the USB errors might indicate that. But in all cases, it’s struggling to read the drive, and that’s never good and you should always assume the worst. Best case the drive is healthy and you extracted the data for nothing, but that’s a good problem to have.

Since you mention “image”, I’m assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

Yeah, that’s a bit of a problem. I mean nothing stopping you from trying to mount it. Make sure you mount it read-only, as it’ll both protect the drive from potentially corrupting more data, and read-only filesystems are also more tolerant to errors whereas read-write errors will cause the filesystem to bail.

It really depends on how much you care about the data. If it’s only nice to have but not critical to keep, you can afford more risky recovery operations.

You can use testdisk to try to locate the partitions on it, and depending on the filesystem you might be able to only copy the file data that’s still good.

This might be good help as well: https://wiki.archlinux.org/title/file_recovery

Kalcifer@lemmy.world · 1 year ago

For your reference, please see the updated post. I ran a S.M.A.R.T test, and the drive is indeed borked.

Thank you very much for all of the extra information!

[Solved] Is this hard drive failing?

[Solved] Is this hard drive failing?

Solution

Original Post