Upgrading our monitoring infrastructure: false positives
Incident Report for Flying Circus
Resolved
Sorry for the terse initial message. We have resolved the situation.

During the planned software upgrade of our monitoring server (Sensu) we temporarily disabled the server configuration but for a longer period than we expected. As a side effect all monitoring clients lost their configuration and were unable to communicate with the monitoring server which in turn caused timeouts of the server's check for client connectivity (known as the "keepalive" check to detect general availability/communication issues).

This caused multiple hundreds of false alerts and issues between 18:57 and 19:30.

No customer services were actually affected according to our external monitoring.
Posted Jan 11, 2021 - 19:40 CET
Investigating
We are currently upgrading our monitoring infrastructure. This causes unexpected false positives right now.
Posted Jan 11, 2021 - 19:32 CET
This incident affected: Central services.