Object storage (S3) low-level migration (Part 2, includes DOWNTIME)

Scheduled Maintenance Report for Flying Circus

Completed

The S3 object gateways have been restarted.

Further analysis revealed that the issue has arisen due to a logic error in our migration script. The script was dutifully tested in the last weeks but has not caught a specific ordering issue that prohibited the last round of flushes in our production environment.

As we temporarily re-enabled the S3 gateways to limit downtime this may have caused your application to have seen incomplete data in the buckets (likely to have seen them as errors where objects where not found). Data that was written in this period was consolidated by us.

Posted May 22, 2023 - 22:51 CEST

Update

The copy is still in progress as we had to work around a few inconsistencies that stopped the copy process. It's currently running smoothly and should finish in the next minutes. After wrapping up we will double check the results and re-enable the S3 gateways.

Posted May 22, 2023 - 22:34 CEST

Update

We have analyzed the situation and decided to go forward copying the remaining objects manually as we did not detect any data mismatches. We expect this to happen within the next 15 minutes and will update you further after that.

Posted May 22, 2023 - 22:18 CEST

Update

The pool had migrated almost all of the roughly 230 million objects in our pool in the last week. Unfortunately, finishing up the last 11k objects is currently giving unexpected errors and diagnosis is taking longer than expected. We currently need to keep the S3 gateways offline for the diagnosis and will update you here in at latest 30 minutes - hopefully with our new course of action.

Posted May 22, 2023 - 21:47 CEST

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Posted May 22, 2023 - 21:00 CEST

Scheduled

This is the second part of the previously announced storage migration for increased data durability.

To finish the migration we need to take the object gateways offline. Ideally this will take only a few minutes, but copying the last data may also take longer and thus we schedule a 1 hour maintenance.

Posted May 11, 2023 - 10:31 CEST

This scheduled maintenance affected: RZOB (production) (VM storage cluster).