Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Feb 29, 2016 - 21:00 CET
We are implementing a measure against one of the effects we saw during the recent outage: during periods of high activity applications became starved for IO and thus unresponsive. Our evaluation shows improvements by adding a second generous SSD-based read/write caching layer to all storage servers. In our preparation this showed improved overall latency and more predictable behaviour under high load scenarios.
Implementing this requires both a reboot of each storage server (to upgrade the RAID controller firmware) and a partial redistribution of data between the existing disks. To reduce the risk and impact of interruptions, we will implement this improvement on one server each day after 21:00 CET starting from Monday 2016-02-29 until Sunday 2016-03-06.
We have configured our environment for reduced impact during those operations, but applications may experience temporary slow-downs of disk throughput and IO latency resulting in higher response times.