Storage cluster issues in RZOB

Incident Report for Flying Circus

Resolved

The issue we've experienced yesterday has resurfaced today. We recovered the cluster within 14 mnutes and things where operational again.

We've traced the trigger for the issue to a potential race condition in the boot process of our Ceph servers and have disabled further automated maintenances that can trigger reboots until we have solved the issue.
Posted Aug 06, 2025 - 21:13 CEST