Due to a change made to our authentication process, some customers experienced issues logging in to and using the SafetyCulture platform.
Impacted users would have:
Timestamps below are in Coordinated Universal Time (UTC)
2023-09-11 23:20 - We made a change to our authentication process in the European data centre and tested the change successfully.
2023-09-11 23:43 - We made the same change in the Australian data centre, but tests failed. Internal teams were notified immediately and began our Incident Management process.
2023-09-11 23:53 - We began reverting the change, which involved cycling some instances in our edge services, and we saw gradual improvements over the next 15 minutes.
2023-09-12 00:07 - We completed the instance cycling and platform functionalities were fully restored.
The issue was resolved by reverting the change we made in our Australian infrastructure.
In retrospect, we identified the cause as incorrect regional configuration, which was why the issue didn’t occur during the change in our European infrastructure.
To prevent issues like this from happening again, we’ll be undertaking a post-incident review to:
We’re currently in the planning process to migrate away from this edge architecture which will allow even faster recovery.
We want to apologise to all customers who were impacted by this incident.
We will be focused on reviewing our processes and practices in light of this incident to ensure we continue to provide a reliable experience.