Storage maintenance (4/10)

Scheduled Maintenance Report for Flying Circus

Completed

The maintenance went fine. We experienced 1 initial slowdown for about 2 minutes and two more slowdowns of about 30 seconds each.

The cluster is currently still recovering with our new throttling parameters applied which we expect may take another 2-3 hours without needing our attention.

Posted Sep 19, 2017 - 22:57 CEST

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Posted Sep 19, 2017 - 22:00 CEST

Update

Our scheduled maintenance was intended to be performed tomorrow night and we adjusted this accordingly.

Unfortunately, the Status Page maintenance calendar starts its' week with Sundays, so we accidentally picked today instead of tomorrow. We're sorry for the confusion.

Posted Sep 18, 2017 - 21:04 CEST

Scheduled

We need to reboot our storage servers to adjust BIOS settings for improved stability and perform preventative filesystem checks. We will take down one storage server and let it recover to minimize impact.

We have discussed the performance impact of the recovery traffic with Ceph developers and have determined new settings that look promising to dramatically reduce slow requests and hanging IO during recovery. Our lab setup has shown those to be stable and we will use those settings on the cluster during this maintenance. We can not promise those to be perfect yet and thus expect multiple windows of 1-2 minutes of increased IO latency.

Posted Sep 18, 2017 - 14:07 CEST