Our statistics show that performance should be back to normal since around 05:00 CEST.
There's a little remaining recovery traffic ongoing without impact on production performance.
Oct 14, 08:07 CEST
We have restarted all Ceph daemons as a measure to counter potential issues that resulted from the changed network changes "on the fly". We currently see a lot of recovery traffic in the cluster thus resulting in slow requests, but see the situation improving. We'll keep an eye on the situation for a little longer.
Oct 14, 02:31 CEST
We're still seeing stuck requests and are analyzing the situation.
Oct 14, 00:48 CEST
We've finished configuring the jumbo frames and are currently cleaning up expected stuck requests in the Ceph cluster. We'll update here once done.
Oct 14, 00:31 CEST
Maintenance is progressing, albeit a bit more slowly than anticipated. We're entering the last phase in which we'll activate jumbo frames in the next minutes and expect to be finished until 00:30 CEST.
Oct 13, 23:51 CEST
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 13, 22:00 CEST
We are going to further improve our network configuration for enhanced reliability and performance. This will affect all networks but specifically will increase the responsiveness of our storage cluster during periods of recovery.
The network changes include some settings (like "Ethernet Flow Control" and "Jumbo Frames") that will cause intermittent connectivity issues on all VLANs. We expect multiple short interruptions in the range of 10 seconds for individual servers applying their settings and one longer 15 minute interruption when switching the storage network to jumbo frames.
Oct 1, 10:13 CEST