We have solved the issue which manifested sporadically in two ways:
1. DNS queries that required our border routers to retrieve data from the internet timed out 2. Traffic from VMs to the internet that required NAT timed out
The issue was caused by a new transfer network that was introduced in the maintenance earlier today. It was generally carrying traffic to our data center, but traffic destined for addresses on the network itself was lost.
Our routers dynamically used transfer net addresses when sending traffic to the outside world (i.e. for DNS and NAT as described above). To resolve the issue we have changed our router configs to use source addresses from our networks which are properly routed.
We've tested the configuration in the redundant configuration using fail-overs and have seen the issues on affected machines subside.
We're sorry for the outage.
Posted Jul 09, 2025 - 00:07 CEST
Update
We have achieved a semi-stable routing state again, where connectivity has mainly been restored. DNS resolution within RZOB is currently not stable.
Posted Jul 08, 2025 - 23:16 CEST
Investigating
Unfortunately, our triaging attempt caused additional connectivity issues, now also via IPv6. These issues only affect some machines, but the underlying cause is not identified yet.
Posted Jul 08, 2025 - 23:10 CEST
Identified
Due to the changes in the announced RZOB network uplink maintenance, the RZOB data centre lost some of its IPv4 connectivity.
We are triaging this by reverting the routing changes to the state before the maintenance right now to restore IPv4 connectivity.
Note: The revert itself will also briefly interrupt the data centre connectivity again.
Posted Jul 08, 2025 - 23:05 CEST
This incident affected: RZOB (production) (Network and Internet uplink).