Understanding Proxmox ZFS Disk IO stats

hamsda@feddit.org · 11 days ago

Understanding Proxmox ZFS Disk IO stats

dbtng@eviltoast.org · 10 days ago

I’ll concur with mlfh, the constant Proxmox corosync writes and gawd knows what else have a reputation for ‘cutting through commercial ssds like a torch through tissue paper’ (that’s frequently dropped on their forum.)
Also, yes. Enterprise SSD. You get at least 10x the lifespan, depending on the type.

I think some folks just use LVM for the OS on SSD. I’ve done it myself in some circumstances, although I am a ZFS fan.

My homelab runs a zfs mirror raid for a secondary datastore (ie this is NOT the OS drive) on a pair of commercial grade lexar 790 NVMe. Both drives have 0% usage after most of a year in service, although it hosts several VMs that run 24/7.

hamsda@feddit.org · 10 days ago

Yeah, I guess I should’ve put like +50% more money into it and gotten some Enterprise SSDs instead. Well, what’s done is done now.

I’ll try replacing the disks with enterprise SSDs when they die, which will probably happen fast, seeing as the wearout is already at 1% after 1 month of low usage.

What do you think about Samsung OEM Datacenter SSD PM893 3,84 TB?

Thanks for taking the time to answer!

dbtng@eviltoast.org · 10 days ago

It looks like that part is a Mixed Use drive. Particularly in this 6gb interface, you’ll enjoy something with equal read/write, so that seems like a reasonable choice. If you are interested in comparing to their other drives, they have a great configurator on their page.
https://semiconductor.samsung.com/ssd/datacenter-ssd/pm893/

I know it’s irritating to watch your SSDs burn up, but with 1% used in a month … your current drives will last at least a couple years. You won’t have to make this decision for a while yet. I think the thing to do is check it occasionally, and plan ahead when it gets low. You may well decide that the cheaper drives are worth it in the end.

hamsda@feddit.org · 10 days ago

Thank you very much for your input, I’ll definitely have to go with business drives whenever the current ones die.

Thankfully, I do have monitoring for SMART data and drive health, so I’ll be warned before something bad happens.

mlfh@lemmy.sdf.org · 11 days ago

I delved into exactly this when I was running proxmox on consumer ssds, since they were wearing out so fast.

Proxmox does a ton of logging, and a ton of small updates to places like /etc/pve and /var/lib/pve-cluster as part of cluster communications, and also to /var/lib/rrdcached for the web ui metrics dashboard, etc. All of these small writes go through huge amounts of write amplification via zfs, so a small write to the filesystem ends up being quite a large write to the backing disk itself.

I found that vms running on the same zfs pool didn’t have quite the degree of write amplification when their writes were cached - they would accumulate their small writes into one large one at intervals, and amplification on the larger dump would be smaller.

For a while I worked on identifying everywhere these small writes were happening, and backing those directories with hdds instead of ssds, moving /var/log from each vm onto its own disk and moving it onto the same hdd-backed zpool, and my disk wearout issues mostly stopped.

Eventually, though, I found some super cheap retired enterprise ssds on ebay, and moved everything back to the much simpler stock configuration. Back to high sustained ssd writes, but I’m 3 years in and still at only around 2% wearout. They should last until the heat death of the universe.

dbtng@eviltoast.org · 10 days ago

Is “Discard” the write caching you refer to?
Or are you talking about the actual Write Cache?

mlfh@lemmy.sdf.org · 8 days ago

The actual write cache there - writeback accumulates writes before flushing them in a larger chunk. It doesn’t make a huge difference, nor did tweaking zfs cache settings when I tried it a few years ago, but it can help if the guest is doing a constant stream of very small writes.

hamsda@feddit.org · 10 days ago

So I just looked it up: According to Proxmox VE “disks” interface, my SATA SSD drives have 1% wearout after ~1 month of low usage. That seems pretty horrible.

I guess I’m going to wait until they die and buy enterprise SSDs as a replacement.

I’m definitely not going to use HDDs, as the server is in my living room and I’m not going to tolerate constant HDD sounds.

[EDIT] I don’t even have a cluster, it’s just a single Proxmox VE on a single server using ZFS and it’s still writing itself to death.

[EDIT2] What do you think about Samsung OEM Datacenter SSD PM893 3,84 TB?

Thanks for your input!

mlfh@lemmy.sdf.org · 10 days ago

The datasheet for the Samsung PM893 3.84TB drives say they’re warrantied for 7PBW and 2 million hours MTBF (can write 7PB or run for 2 million hours before average drive failure). Quite pricey, but looks like it’ll run forever in a home environment.

Good luck!

hamsda@feddit.org · 10 days ago

Thank you very much for your input. I’ll definitely have to go for the business models whenever the current ones die.

I knew I would make some mistake and learn something new, with this being my first real server-PC (instead of mini-pc or raspberry pi) and RAID. I just wished it wasn’t that pricey of a mistake :(

mlfh@lemmy.sdf.org · 8 days ago

I wouldn’t say it’s a big mistake, you’ve likely still got a few years left on your current drives as-is. And you can replace them with same- or larger-capacity drives one at a time to spread the cost out.

Keep an eye out for retired enterprise ssds on ebay or the like - I got lucky and found mine there for $20 each, with 5 years of uptime but basically nothing written to them so no wearout at all - probably just sat in a server with static data for a full refresh cycle. They’ve been great.

hamsda@feddit.org · 8 days ago

Sadly, it seems I cannot replace the disks one-by-one. At least not if I don’t upgrade the SSD size to greater than 4TB at the same time.

The consumer 4TB SSDs yield 3,64 TiB, whereas the datacenter 4TB SSDs seem to yield 3,49 TiB. As far as I know, one cannot replace a zfs raid z1 drive with a smaller one. I’ll have to watch the current consumer SSDs closely and be prepared for when I’ll have to switch them.

I’m not all too sure about buying used IT / stuff in general from ebay, but I’ll have a look, thanks!

dbtng@eviltoast.org · 5 days ago

If you want enterprise gear on the cheap, yes. Ebay.
There are regular vendors on Ebay with thousands of verified sales. Go with those till you figure it all out.
You can definitely make bad choices, but even when I’ve gotten bad drives, the vendor just immediately refunded the money, like that day.