Skip to content

What happened today…

We try really hard to do the right thing at Xero but occasionally there are things we can’t control. Today was one of those days … In keeping with our goal of being transparent we thought we’d share with you the challenges we faced and how we managed them during the outage we had this morning.

Around 8:15am NZT while doing some routine checks on our servers, both Craig, our Chief Technology Officer and Paul, our Infrastructure Manager, lost their connection to our hosting environment. Almost immediately our monitoring alerts indicated that a number of our production systems were unavailable. The initial indication was that this was due to a network issue, but shortly after that we got our first update from Rackspace advising it was more widespread – a power outage at their Dallas data center where Xero is hosted from. 

Fortunately our blog was still up, so at around 8:50am NZT we posted our first notification to customers explaining that Xero was unavailable due to this power outage. Shortly after the first blog post Rackspace started to restore power to the data center and the full system was back live around 9am NZT. The total outage was approximately 45 minutes.

Throughout the morning we continued  to use both our blog and Twitter to keep everyone up-to-date. In fact the amount of blog traffic caused a short outage on the blog itself! This was quickly rectified by extending the hosting capacity specifically for the blog.

Power is obviously a critical element to our and any other online service. What happened today is a very rare occurrence, especially for a provider such as Rackspace that prides themselves on high availability. In more than 2 years our system availability or service level has been 99.99%. We are standing by for a full debrief from Rackspace and from this we will consider what further improvements we can jointly make to minimize the risk of a similar outage.

We are dedicated to providing a world-class service to our customers and we apologize to anyone who was affected by this downtime, but stress at no time was there any risk to your data. We trust that the open and frequent updates kept everyone abreast of what was happening.

 

Read more about Company News

 

18 comments

Paul Lattimore
30 June 2009 #

Show me an internet based entity that hasn’t had an outage. Appreciate the updates – certainly a refreshing change.

Paul Lattimore
30 June 2009 #

Question though… Are the backups being reviewed?

Campbell
30 June 2009 #

Interested to hear what Rackspace have to say. I am sure they plan for all sorts of failures and have so much redundancy it isn’t funny, yet things like this can still happen.

I have to also say having your tweets certainly saved me time in letting staff know when things were or were not online.

For your handling of a bad situation you still score top marks!

Craig Walker
30 June 2009 #

@Paul Yes – backups reviewed immediately as part of our consistency checks. Also took a full backup when we came back online just to make sure.

@Campbell Yes – we pride ourselves on our openness – Twitter is a great tool for that and it was also fantastic that Rackspace were blogging and tweeting through the whole thing – no point in hiding from the truth – just gets you into more trouble!

Nic Wise
1 July 2009 #

Nice transparent post, Ali. Good to see the transparency permeates thru all parts of the business :)

And yes – outages suck, but 45 mins in what? 2 years? Not exactly a bad record, esp if there is no data loss.

[...] Xero had an outage last evening UK time, 8.15am in New Zealand. The outage lasted 45 minutes and was attributed to data centre issues at Rackspace Dallas, its hosting provider. Xero isn’t alone. The Rackspace issue spread to others. As you can see from the sponsored feeds, the company kept users informed via its blog and Twitter. [...]

Keith Patton
1 July 2009 #

if you host within a single data centre you can expect outages.

Cecil
2 July 2009 #

I am not sure about how the backup process works in Rackspace services. I think Rackspace should have supposed to provide more reliable backup services as it has multiple data centers in the world. I have two questions based on this.

1. Is Xero currently solely relying on the services provided by Rackspace including backup.

2. Would it be realistic for Xero in the future to use another services provider as the backup service so that it can shorten the outage time as much as possible?

I guess the users would feel happier to bear with slower backup services when an emergency occurs rather than waiting for long outage time.

And I think it would also be good for Xero to be more transparent in providing information about its service provider in the future. This is the first time I got to know Xero is using the services from Rackspace(at least I didn’t see it in this year’s annual report, but please excuse me if I am wrong on this point). I personally think all customers and shareholders would love to know more about this.

Simon
2 July 2009 #

@Cecil – it does say they’re a RackSpace premium partner on the Xero homepage, but you’re right it’s not very obvious

Xero have you considered offering a premium service involving site-to-site failover? I personally wouldn’t want to pay for it as I’m only a small business, but some of the bigger companies who can’t handle 45 minutes of downtime might be interest. I’ve set it up in the past and I know that it’s not an easy (or cheap) thing to do, but it could be something to look at.

Craig Walker
2 July 2009 #

@Cecil @Simon We have blogged about it before (http://blog.xero.com/2008/10/rackspace-saas-event/) and we make no secret of the fact we use Rackspace. In actual fact it’s something we’re very proud of – our partnership with Rackspace has been one of the best things we’ve ever done. I’m sorry it’s not more obvious.

As you say real-time site-to-site redundancy is actually very difficult to achieve and can lead to situations where the app is more available but data consistency is diminished. This is something we are definitely looking at though – we are continually trying to improve our processes to make Xero the best experience it can be.

Adrian Pearson
3 July 2009 #

I have just noticed, logging into Xero, that there is to be more server maintenance (and therefore a short downtime) this coming Sunday. The server maintenance events seem to be coming closer and more frequently. It would be nice to know more information. Knowing if it’s planned or remedial work for instance – just so we know how much or little to be worried!

Rod Drury
4 July 2009 #

Hi Adrian. Multi-currency was our biggest release ever. So like all releases we monitor and tweak. The few issues we noticed during the week we’ve fixed but have to wait until the quietest time of the week to apply. MC was only 5 days ago so we’re onto it.

We don’t anticipate another big engine change for a period of time so this is a bit more intense than usual.

With Xero you should be outside on Sunday.

We don’t think we’re doing too bad: http://www.digitalwpc.com/News/Permalink/WPC-Connect-Outage

Cheers

Rod

Sheryl
4 July 2009 #

My internet accounting system (not Xero and not by my choice) goes down for maintenance regularly during the day. Despite repeated requests to not have upgrades completed during working hours, it can be quite common for it to be down for more than 30 mins at a time, often at crucial times. On top of that we do not get any warning that the system is going to be down, we start to do something and all of a sudden we get the “down for maintenance” page.
One 45min unscheduled downtime in two years sounds like complete bliss to me!

Ben Kepes
4 July 2009 #

Sheryl – sounds interesting – I’d be keen to hear the backstory of the system that you use. Flick me an e on ben AT diversity DOT net DOT enzed

PJ
8 July 2009 #

Nice example of honesty and transparency in a corporate blog. Do you have set editorial guidelines for what you post on your blog or do you go with the flow?

Rod Drury
8 July 2009 #

Our team is small enough that those who blog know our tone of voice and to ‘blog safely’.

Joshua Milne
12 January 2011 #

Understanding downtime can happen, mirrored hosting to different city/country would be a better solution with data so important to customers.

Are the backups happening to a different location via mirrored hosting or some other method?
If a city got blown up in a war where does your services run from and where can you find the offsite/country backups?

Google and other SaaS Cloud provisions have these areas covered. Can we please get your clear break down regarding this.

Quickbooks online have mirrored servers around the world to ensure uptime. If power goes out it make no difference to users stuck with no server availablity. Can you please comment on this?

Xero is the best accounting software package I have seen for small business!! keep up the great work.

Rod Drury
12 January 2011 #

Joshua, we believe we are best in class with our redundant technology infrastructure.

See http://www.xero.com/accounting-software/compare/

Some googling will show you that QuickBooks Online has a far from impressive uptime performance.

Rod

Add your comment





We welcome all feedback but prefer a real name and email address.