Storage performance degradation

Incident Report for Flying Circus

Resolved

We have not seen further performance impact after having removed the two suspicious daemons. We're closing the issue now. We'll replace the defective disks within our cluster regular operations.

Posted May 14, 2019 - 19:19 CEST

Monitoring

We identified two storage daemons that had appeared to have been functioning but not responding to requests as expected. Unfortunately the daemons have not been logging any errors and are otherwise behaving regularly, as are the underlying disks. We have removed the affected daemons/disks from the cluster and performance has been back to normal. We are still monitoring the situation for further anomalies.

Posted May 14, 2019 - 16:36 CEST

Investigating

We are currently experiencing a performance degradation in our storage cluster. We see customer services affected in a flapping manner with inconsistent performance and repeatedly services timing out.

Posted May 14, 2019 - 16:00 CEST

This incident affected: RZOB (production) (VM storage cluster).