RZOB was not available
Incident Report for Flying Circus
Resolved
We have analyzed the situation and found that our internal routing daemon on our active router crashed due to an overflow of our neighbour table. Automatic crash recovery did not succeed as the overflowing tables where still causing issues.

Due to the crash the router lost all internal routes and thus traffic became blackholed. During initial diagnosis we quickly restarted the affected routing daemon and - as the neighbour tables had been garbage collected - it was able to start properly and the router was able to resume routing traffic properly.

This may have been a neighbour discovery attack, but we can not be sure with the data that is available.

We have identified a number of possible improvements: the router daemon should not crash in these situations, we will be increasing our neighbour table sizes, ensure router failover in those situations and will introduce additional monitoring to potentially alert us in future similar situations.

Overall outage was a bit less than 8 minutes between 07:58 and 08:06.
Posted Aug 28, 2024 - 10:21 CEST
Update
We will provide more details on the incident later.
Posted Aug 28, 2024 - 08:12 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 28, 2024 - 08:07 CEST
Investigating
RZOB is not available. We are investigating.
Posted Aug 28, 2024 - 08:03 CEST
This incident affected: RZOB (production) (Network and Internet uplink).