Brought to you by

Unexpected outage for some users. (resolved)

Posted 5 years ago in Xero news by Paul Rushworth
Posted by Paul Rushworth

We are currently experiencing an issue within our application hosting infrastructure that is affecting a proportion of Xero Customers. Xero’s operations team is working to identify and resolve the cause of the issue an as soon as possible.

We will provide any updates, including expected time for the service to be restored, as soon as we have more information.

UPDATE: (Wednesday 17th July, 8:51pm GMT) – Operations have identified the cause of the partial outage and are working to resolve as soon as possible.

UPDATE: (Wednesday 17th July, 8:59pm GMT) – Operations team have resolved the cause of this partial outage. Our operations team are closely monitoring the situation for the potential of any re-occurrence.

UPDATE (Friday 19th July, 2:35am GMT):

The outage was related to a self-contained component of Xero’s database stack, which while critical for the operation of Xero, the impact was limited to 20% of our customers. We have a resilient database environment with hot standby servers and automatic failover.  In this particular instance failover did not occur due to a highly unusual chain of events.

Our investigations have focused on both why the database server became unresponsive, the reason that the resilience did not provide seamless failover when one node became unresponsive, and the time to resolution.

  1. App server access to the database server became unresponsive due to a momentary interruption to the TCP stack of the database server, where app server connections became queued and eventually overwhelmed the database servers TCP connectivity. There was no impact to the stability of the database engine itself, or customer data loss. We have identified the processes at fault and are undertaking process changes to remove the likelihood of this reoccurring.
  2. The database itself did not trip an automated failover to the standby database hosts as the availability monitoring determined that the database engine was still operating correctly, all be it inaccessible from our app servers. Xero Operations is continuing to investigate how we better manage automated failover in a similar error state.
  3. Our monitoring systems failed to detect the failed state of app server connections on the active database node and alert the Operations team automatically.  We have identified the reason this particular failure state was not identified as expected and are adapting our monitoring systems to promptly alert our Operations team.

Our operations team continuously analyse the platform and look for ways to improve reliability and performance. We treat any issues seriously and the team are very aware of the impact that system issues have on our customers.  While we maintain a very high up-time we will continue to work to eliminate risk wherever possible.


July 18, 2013 at 9.58 am

When can we expect this to be resolved? I have a client who needs to be in their Xero to invoice this morning

Paul Rushworth in reply to Jeremy
July 18, 2013 at 10.48 am

@Jeremy – The issue has been resolved. If you are still experiencing issues logging in to Xero, please contact our support team.

@Robbie Dellow – Operations are still working on identifying the root cause of the partial outage and will update this post with further clarification upon our finding.

Chris in reply to Paul Rushworth
February 24, 2016 at 11.41 am

The problem is still not resolved, can we please get an update…

Robbie Dellow
July 18, 2013 at 10.41 am

What was the infrastructure outage caused by. I come from an infra background so would be curious as to why no alerts were made or redundancy kicked in.

Kelvin Hartnall
July 20, 2013 at 4.03 pm

Thanks for the update and analysis. Any system will have some level of outage, but really appreciate the openness and transparency. And sounds like you have performed some good analysis and have found things to improve in the system.

Gerry Scullion
July 21, 2013 at 6.08 pm

Looks like the problem has reared it’s head again. Really disappointed as I’m logging in specifically to try and resolve a problem that was caused by your system duplicating entries!

This is last FY that I’ll be using Xero –

Paul Rushworth in reply to Gerry Scullion
July 22, 2013 at 11.07 am

@Gerry Scullion – The issue has been resolved and has not re-occured. If you are still experiencing issues logging in to Xero, please contact our support team.

Robbie Dellow
July 22, 2013 at 11.59 am

Hi Paul – assume you are still going to fill me in on the root cause of the ‘partial’ outage.

Paul Rushworth in reply to Robbie Dellow
July 22, 2013 at 1.15 pm

@Robbie Dellow. Hi, please see the body of the blog, we updated analysis on Friday afternoon NZT.

February 24, 2016 at 11.37 am

Still not able to access Xero… Problem is not yet resolved???

Leave a reply

Your email address will not be published. Required fields are marked *