Virtualisation cluster outage
Incident Report for Flying Circus
Resolved
The situation is stable after we have re-balanced the VMs.

We will be looking into the root cause of the incident, to prevent such failures in the future.
Posted Nov 18, 2019 - 13:06 CET
Monitoring
We manually have improved the VM balancing and applications are available again.
Posted Nov 18, 2019 - 10:38 CET
Identified
The VM balancing algorithm allocated a *lot* of VMs to a single host which became overloaded. The host is being evacuated now. The overall availability is improving.
Posted Nov 18, 2019 - 10:26 CET
Investigating
A lot of VMs are currently very slow, causing application outages.
Posted Nov 18, 2019 - 10:02 CET
This incident affected: RZOB (production) (VM servers).