Our fix has been rolled out over the last hours and has been stable.
We've removed our workarounds from early this morning and our telemetry shows no performance impacts after removing the workarounds, thus confirming our fix.
Please note: one of the workarounds was to disable processing of configuration management events. A number of events (like changes to VM resources, maintenance mails, some internal DNS records) have backed up over the day and have been processed in the last 15 minutes.
Posted Apr 22, 2025 - 17:44 CEST
Update
We have implemented a long-term fix for tonight's issue and are preparing to roll it out in our production cluster over the next hours.
This will happen in a staggered fashion, host by host, and will be transparent for our customers. We will disable the the short-term workarounds that we implemented earlier today when the rollout shows that our long-term fix is holding up.
Posted Apr 22, 2025 - 12:15 CEST
Update
We are implementing and testing a structural fix for the storage management software component.
Posted Apr 22, 2025 - 06:15 CEST
Update
We are continuing to work on a fix for this issue.
Posted Apr 22, 2025 - 06:14 CEST
Update
All VMs are back online. We are verifying the individual services now.
Posted Apr 22, 2025 - 06:13 CEST
Update
There are still some VMs offline. We are working on it.
Posted Apr 22, 2025 - 05:53 CEST
Update
The storage server traffic is back to normal now. We now have a look a the affected VMs and services.
Posted Apr 22, 2025 - 05:04 CEST
Update
We are still in the process of preventing the calls.
Posted Apr 22, 2025 - 04:53 CEST
Identified
An expensive metadata call seems to be the issue. We are about to prevent the specific call.
Posted Apr 22, 2025 - 04:28 CEST
Update
We are continuing to investigate this issue.
Posted Apr 22, 2025 - 04:10 CEST
Investigating
A single OSD ("Disk") shows unusual data transfer behaviour causing VMs which use the OSD to slow down significantly.
Posted Apr 22, 2025 - 03:59 CEST
This incident affected: RZOB (production) (VM storage cluster).