Backups partially delayed due to high ratio of full backup and verification jobs
Incident Report for Flying Circus
Resolved
Backups have been running almost completely according to our schedule in the last days. A few very large VMs (multiple TiB) and hourly backups are still a bit lagging but slowly catching up. The consistency issue is fixed and verification is working properly.
Posted May 21, 2018 - 19:33 CEST
Update
Full backups and extensive verification are still ongoing: we are seeing properly verified full backups, proper verification of most existing backups and are also catching a few inconsistencies as we suspected. We are still seeing substantial delays due to the enormous load of the task queue and will keep you updated over the next 1-2 days.
Posted May 17, 2018 - 17:02 CEST
Monitoring
We noticed a number of VMs showing issues with their backup consistency. We found a bug in the backup software which was fixed and have tasked our backup system with verifying questionable backups from the past days. Additionally we're now running precautionary full backups which causes delays in the regular backup schedule, due to the increased load.

We'll keep monitoring the situation to ensure that backups are created properly and consistently.
Posted May 16, 2018 - 08:43 CEST
This incident affected: RZOB (production) (VM backup).