We have finished our data center migration. Despite us having prepared thoroughly we've hit a couple of issues (which we'll likely write a blog post about later) that caused intermittent downtimes. Public facing downtime was in total less than 20 minutes. Some backend services had issues recovering autonomously with status page downtimes of usually less than 45 minutes (two extreme cases showed 3 hours).
Almost all of our expected tasks have been implemented but we'll need to perform two additional network maintenances in the next weeks as some of the network components were not delivered in time by our vendor. Nevertheless, our network has already been upgraded with a 40g backbone infrastructure and some configuration adjustments caused a pleasant performance boost in our storage cluster.
Thanks for your patience and your business - and have a good night!
Posted Jun 01, 2018 - 00:42 CEST
We have removed the faulty switch from the cluster and relocated all affected servers to a new switch.
Posted May 31, 2018 - 10:02 CEST
One of our switches is behaving erratically when plugging cables. We're suspecting an electric grounding problem or capacitor issue. We're extracting the switch from our cluster and are moving affected servers over to another switch which will take around 10-15 minutes.
Posted May 31, 2018 - 09:46 CEST
Network issues resolved, storage is working again
Posted May 29, 2018 - 12:05 CEST
Currently seeing random network failures in the storage network which lead to flaky I/O. We are orking on the problem.
Posted May 29, 2018 - 11:50 CEST
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted May 29, 2018 - 09:00 CEST
We are consolidating our hardware and racks in the data center and will perform a longer period of maintenance over multiple days. We have prepared thoroughly for the migration to avoid any downtimes and will use this opportunity to further improve our network.