The maintenance went fine. We experienced 1 initial slowdown for about 2 minutes and two more slowdowns of about 30 seconds each.
The cluster is currently still recovering with our new throttling parameters applied which we expect may take another 2-3 hours without needing our attention.
Posted 4 months ago. Sep 19, 2017 - 22:57 CEST
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted 4 months ago. Sep 19, 2017 - 22:00 CEST
Our scheduled maintenance was intended to be performed tomorrow night and we adjusted this accordingly.
Unfortunately, the Status Page maintenance calendar starts its' week with Sundays, so we accidentally picked today instead of tomorrow. We're sorry for the confusion.
Posted 4 months ago. Sep 18, 2017 - 21:04 CEST
We need to reboot our storage servers to adjust BIOS settings for improved stability and perform preventative filesystem checks. We will take down one storage server and let it recover to minimize impact.
We have discussed the performance impact of the recovery traffic with Ceph developers and have determined new settings that look promising to dramatically reduce slow requests and hanging IO during recovery. Our lab setup has shown those to be stable and we will use those settings on the cluster during this maintenance. We can not promise those to be perfect yet and thus expect multiple windows of 1-2 minutes of increased IO latency.