Some sensors are not operational due to global cellular data network outage
Incident Report for SafetyCulture
Resolved
Our upstream partner has addressed the outage issue, we will continue to monitor the situation.
Posted Sep 01, 2023 - 07:36 UTC
Update
The majority of sensors are now back online and operating normally. We are continuing to monitor and are working alongside our partners to ensure ongoing stability.
Posted Aug 31, 2023 - 06:52 UTC
Monitoring
Most sensors are now back online and operating normally. We are working to restore connectivity to the remaining sensors that are offline.

We will continue to monitor the situation and work with our upstream partners to ensure services remain stable.
Posted Aug 30, 2023 - 18:04 UTC
Update
Our upstream partner is continuing to implement changes to bring devices back online, and is currently working to mitigate congestion. We are continuing to monitor the situation.
Posted Aug 30, 2023 - 07:53 UTC
Update
Our upstream partner has shifted traffic to a new node and is beginning to see improvements in IoT device connectivity.

We are continuing to monitor the situation. We will continue to post updates here as we learn more.
Posted Aug 30, 2023 - 05:45 UTC
Update
The previously identified fix for the failures in our upstream partner's network interfaces was not ultimately viable and was not implemented.

Our partner believes they have now identified a common problem with the interface failures and are working on a fix.
Posted Aug 30, 2023 - 02:26 UTC
Update
Our upstream parter has began implementing a fix for the regression. Once the fix is in place the traffic will be increase slowly to ensure a stable recovery.

The expected recovery time remains estimated at 0700 to 0800 UTC. We will provide further updates if this estimate is revised.
Posted Aug 30, 2023 - 01:24 UTC
Update
The likely root cause of the regression has been identified on a node in our upstream partner's network interface. They engaged with their vendors and a solution has been identified, which will be executed in the next 30 minutes. The updated node will come back online and traffic will be gradually increased to it to ensure a stable recovery.

The expected recovery time depends on the depth of backlog of connection requests, it will be a slow release to avoid overloading the signal, they're estimating recovery by 0700 to 0800 UTC.
Posted Aug 30, 2023 - 01:08 UTC
Update
Our upstream partner has identified the root cause of the regression. One of the provider's network interface was unable to support the amount of traffic being released and was compromised.

The solution is currently being assessed, in which they'll migrate the signaling services to a different node that is operating normally and has bandwidth to support the incremental traffic.

Latest estimate for full resolution is 0700 to 0800 UTC. This estimate is based on the assumption that the applied solutions work as intended.
Posted Aug 29, 2023 - 23:25 UTC
Update
Additional sensors in all regions are now offline. Our upstream partners have confirmed there has been a major regression.

We're monitoring the impact and will provide further updates when available.
Posted Aug 29, 2023 - 21:24 UTC
Update
There are still issues with the connectivity on our upstream partners network.

The cause of this incident has now been rectified and stability has been confirmed.

They're now facing a signaling storm due to congestion, they're restricting the traffic and are slowly increasing throughput to resolve this.

We're still unable to provide an ETA but we started seeing improvements.
Posted Aug 29, 2023 - 13:01 UTC
Update
Our upstream partner continues to work on the remaining network instability issues adversely affecting subscriber attachments. There is no ETA yet on resolution. We will continue to post updates here as we learn more.
Posted Aug 29, 2023 - 08:14 UTC
Update
Our upstream partners have stabilized the replacement hardware and continue to work on the remaining network instability issues that are adversely affecting subscribe attachments. There is no ETA yet on resolution. We will continue to post updates here as we learn more.
Posted Aug 29, 2023 - 07:31 UTC
Update
Our upstream partners have replaced faulty hardware, have begun bringing interconnect links back online, and are continuing to work to resolve the issues that remain. Unfortunately, bringing back the interconnect links have not yet had the expected effect on connectivity. Subscriber attachments are still adversely affected. There is no ETA yet on resolution. We will continue to post updates here as we learn more.
Posted Aug 29, 2023 - 06:15 UTC
Update
Our upstream partners have replaced faulty hardware, have begun bringing interconnect links back online, and are continuing to work to resolve the issues that remain. Subscriber attachments are still adversely affected.

We will continue to post updates here as we learn more.
Posted Aug 29, 2023 - 04:32 UTC
Update
Our upstream partners have replaced faulty hardware and are continuing to work to resolve the issues that remain.

We will continue to post updates here as we learn more.
Posted Aug 29, 2023 - 03:33 UTC
Update
Unfortunately the fix attempted by our upstream partners did not resolve the issue. They are continuing to investigate.

We will post updates here as we learn more.
Posted Aug 29, 2023 - 02:49 UTC
Update
Upstream partners have identified the issue and are implementing a fix.
Posted Aug 29, 2023 - 02:47 UTC
Identified
The issue has been identified. We will post updates here as we learn more.
Posted Aug 29, 2023 - 00:10 UTC
Update
Our third-party data provider has notified us of a data outage. This is currently affecting all customers using a cellular gateway on mobile data.

We are monitoring the situation closely and will provide updates as soon as possible.
Posted Aug 28, 2023 - 20:21 UTC
Investigating
We are currently investigating this issue.
Posted Aug 28, 2023 - 19:56 UTC
This incident affected: SafetyCulture (Sensors).