We are currently experiencing an issue within our application hosting infrastructure that is affecting a proportion of Xero Customers. Xero’s operations team is working to identify and resolve the cause of the issue an as soon as possible.
We will provide any updates, including expected time for the service to be restored, as soon as we have more information.
11:48 NZT: We’ve identified the issue and are currently working to get everyone back up and running. Thank you for your continued patience.
11:59 NZT: Our provider is working to fix their hardware. We’re getting closer to getting you back on Xero. We are sorry for the disruption this is causing you.
12:10 NZT: Our provider has identified and is replacing faulty hardware and we’re working on our end to get everyone back online.
12:18 NZT: We are sorry about the frustration this may be causing. We very rarely have extended outages like this and thank you for your support while we pull out all the stops to get this fixed.
12:33 NZT: The outage is now affecting all users. We cannot tell you how sorry we are about this interruption. We are working hard alongside our provider to get this resolved.
12:56 NZT: We can confirm that no data has been lost as a result of this outage.
1:11 NZT: We have resolved the issue. Please look forward to a debrief from us on the outage. We, again, are very sorry for the disruption this has caused you.
Xero was hit earlier today with an outage that affected many of our customers for approximately two hours. We’re closely monitoring our systems to ensure everything is healthy again.
We had our dedicated teams working on this issue to get you up and running again and the issue was resolved as quickly as possible. Know that we’re already working to ensure that this doesn’t happen again.
The incident was initially triggered by a hardware outage, which caused instability in our core network.
We don’t have a clear root cause yet. However, the likely reason is that it was a software issue (triggered by the hardware failure) in the switch that corrupted both the primary and redundant network infrastructure.
The way that redundancy works in these situations is that the redundant switches monitor each other and the network and decides when to fail safe over. The system was fully operational when it was last tested two weeks ago. Both the A and B sides of our network were able to run independently and the automated failover between them worked as expected. That didn’t happen this time.
We’re working with the equipment manufacturer and our provider to determine why both sides were impacted, but it’s important to note that there was no data lost and this is not a security issue.
We are migrating to the public cloud which will help ensure this doesn’t happen in the future. The way we will operate in the public cloud is quite different to our current infrastructure and while it removes the likelihood of similar issues, every environment has an element of outage risk.
We know what broke and we knew how to fix it, but we don’t know why it broke, yet. Our team is currently conducting a thorough post-mortem with the hardware manufacturer and our service provider to ensure that we understand the underlying root cause of the issue and better prepare for potential network outages in the future.
Our last major outage was more than five years ago and despite this morning’s outage, we have maintained an uptime of 99.97% for the last 12 months, and will strive to retain our industry-leading 99.99% uptime into the future.
We apologise for letting you down.