Hey admin colleagues
1) Do you run CEPH?
2) If so, do you have a SSD backed pool?
3) Do you use this pool for RBD images?
If you've answered all three with yes, do you run fstrim inside these RBD devices? (If yes please leave a comment so we can maybe reach out to you.) If you don't do it for a reason please leave a comment.
@ops We are using trim in VMs running on Ceph RBDs in SSD pools, mostly because it seemed the right thing to do if Ceph shall be able to reclaim pool resources from partially filled RBDs. I have not payed too much attention if this decreases performance, is that something you experienced?
@INCO during the run of fstrim the fs is nearly unusably slow. For example 2 Minutes read time for the 40KB the textfile component of node_exporter needs to read.
@ops Our Cluster is Pretty small, 30% of 24TB (raw size) is in use mostly for lightly used VM OS disks.
Thank you for the information I will keep that in mind should we also experience slowdown issues.
So far our biggest issue was when an SSD fails ungracefully, it apparently takes some time for the OSD to be marked as failed and IO being redirected to a redundant block on another OSD, and until then we saw some VMs freeze in disk IO.