SafetyCulture Platform outage in the Asia Pacific region
Incident Report for SafetyCulture
Postmortem

Summary

Due to a change made to our authentication process, some customers experienced issues logging in to and using the SafetyCulture platform.

Impacted users would have:

  • their accounts located in Australia; or
  • transited through our Australian infrastructure

Timeline

Timestamps below are in Coordinated Universal Time (UTC)

2023-09-11 23:20 - We made a change to our authentication process in the European data centre and tested the change successfully.

2023-09-11 23:43 - We made the same change in the Australian data centre, but tests failed. Internal teams were notified immediately and began our Incident Management process.

2023-09-11 23:53 - We began reverting the change, which involved cycling some instances in our edge services, and we saw gradual improvements over the next 15 minutes.

2023-09-12 00:07 - We completed the instance cycling and platform functionalities were fully restored.

Resolution

The issue was resolved by reverting the change we made in our Australian infrastructure.

In retrospect, we identified the cause as incorrect regional configuration, which was why the issue didn’t occur during the change in our European infrastructure.

What are we changing going forward?

To prevent issues like this from happening again, we’ll be undertaking a post-incident review to:

  • understand how incorrect configuration occurred
  • improve the configuration checks prior to releasing a change

We’re currently in the planning process to migrate away from this edge architecture which will allow even faster recovery.

We want to apologise to all customers who were impacted by this incident.

We will be focused on reviewing our processes and practices in light of this incident to ensure we continue to provide a reliable experience.

Posted Sep 12, 2023 - 05:37 UTC

Resolved
This incident is now resolved.
Posted Sep 12, 2023 - 00:18 UTC
Update
A fix has been implemented and we are currently monitoring results. You may need to refresh the page to see the issue resolved.
Posted Sep 12, 2023 - 00:13 UTC
Monitoring
A fix has been implemented and we are currently monitoring results.
Posted Sep 12, 2023 - 00:10 UTC
Investigating
We are currently aware of an issue where upon logging in, an error message will state "Your account has been deactivated". We have a team current investigating this issue. Sorry for the inconvenience caused as we work on a fix.
Posted Sep 11, 2023 - 23:57 UTC
This incident affected: SafetyCulture (Inspections, Issues, Sensors, Assets, Analytics, Public Library, Heads Up, Public API, Mobile App, Web Platform, Training, Actions, Integrations).