What happened today…

We try really hard to do the right thing at Xero but occasionally there are things we can’t control. Today was one of those days … In keeping with our goal of being transparent we thought we’d share with you the challenges we faced and how we managed them during the outage we had this morning.

Around 8:15am NZT while doing some routine checks on our servers, both Craig, our Chief Technology Officer and Paul, our Infrastructure Manager, lost their connection to our hosting environment. Almost immediately our monitoring alerts indicated that a number of our production systems were unavailable. The initial indication was that this was due to a network issue, but shortly after that we got our first update from Rackspace advising it was more widespread – a power outage at their Dallas data center where Xero is hosted from. 

Fortunately our blog was still up, so at around 8:50am NZT we posted our first notification to customers explaining that Xero was unavailable due to this power outage. Shortly after the first blog post Rackspace started to restore power to the data center and the full system was back live around 9am NZT. The total outage was approximately 45 minutes.

Throughout the morning we continued  to use both our blog and Twitter to keep everyone up-to-date. In fact the amount of blog traffic caused a short outage on the blog itself! This was quickly rectified by extending the hosting capacity specifically for the blog.

Power is obviously a critical element to our and any other online service. What happened today is a very rare occurrence, especially for a provider such as Rackspace that prides themselves on high availability. In more than 2 years our system availability or service level has been 99.99%. We are standing by for a full debrief from Rackspace and from this we will consider what further improvements we can jointly make to minimize the risk of a similar outage.

We are dedicated to providing a world-class service to our customers and we apologize to anyone who was affected by this downtime, but stress at no time was there any risk to your data. We trust that the open and frequent updates kept everyone abreast of what was happening.

18 Comments

Paul Lattimore
June 30, 2009 at 8:01 pm

Show me an internet based entity that hasn’t had an outage. Appreciate the updates – certainly a refreshing change.

Paul Lattimore
June 30, 2009 at 8:03 pm

Question though… Are the backups being reviewed?

Campbell
June 30, 2009 at 8:05 pm

Interested to hear what Rackspace have to say. I am sure they plan for all sorts of failures and have so much redundancy it isn’t funny, yet things like this can still happen.

I have to also say having your tweets certainly saved me time in letting staff know when things were or were not online.

For your handling of a bad situation you still score top marks!

Craig Walker
June 30, 2009 at 8:25 pm

@Paul Yes – backups reviewed immediately as part of our consistency checks. Also took a full backup when we came back online just to make sure.

@Campbell Yes – we pride ourselves on our openness – Twitter is a great tool for that and it was also fantastic that Rackspace were blogging and tweeting through the whole thing – no point in hiding from the truth – just gets you into more trouble!

Nic Wise
July 1, 2009 at 1:37 am

Nice transparent post, Ali. Good to see the transparency permeates thru all parts of the business :)

And yes – outages suck, but 45 mins in what? 2 years? Not exactly a bad record, esp if there is no data loss.

Never fear transparency | AccMan
July 1, 2009 at 3:25 am

[…] Xero had an outage last evening UK time, 8.15am in New Zealand. The outage lasted 45 minutes and was attributed to data centre issues at Rackspace Dallas, its hosting provider. Xero isn’t alone. The Rackspace issue spread to others. As you can see from the sponsored feeds, the company kept users informed via its blog and Twitter. […]

Keith Patton
July 1, 2009 at 9:11 pm

if you host within a single data centre you can expect outages.

Cecil
July 2, 2009 at 12:13 pm

I am not sure about how the backup process works in Rackspace services. I think Rackspace should have supposed to provide more reliable backup services as it has multiple data centers in the world. I have two questions based on this.

1. Is Xero currently solely relying on the services provided by Rackspace including backup.

2. Would it be realistic for Xero in the future to use another services provider as the backup service so that it can shorten the outage time as much as possible?

I guess the users would feel happier to bear with slower backup services when an emergency occurs rather than waiting for long outage time.

And I think it would also be good for Xero to be more transparent in providing information about its service provider in the future. This is the first time I got to know Xero is using the services from Rackspace(at least I didn’t see it in this year’s annual report, but please excuse me if I am wrong on this point). I personally think all customers and shareholders would love to know more about this.

Simon
July 2, 2009 at 12:30 pm

@Cecil – it does say they’re a RackSpace premium partner on the Xero homepage, but you’re right it’s not very obvious

Xero have you considered offering a premium service involving site-to-site failover? I personally wouldn’t want to pay for it as I’m only a small business, but some of the bigger companies who can’t handle 45 minutes of downtime might be interest. I’ve set it up in the past and I know that it’s not an easy (or cheap) thing to do, but it could be something to look at.

Craig Walker
July 2, 2009 at 1:54 pm

@Cecil @Simon We have blogged about it before (http://blog.xero.com/2008/10/rackspace-saas-event/) and we make no secret of the fact we use Rackspace. In actual fact it’s something we’re very proud of – our partnership with Rackspace has been one of the best things we’ve ever done. I’m sorry it’s not more obvious.

As you say real-time site-to-site redundancy is actually very difficult to achieve and can lead to situations where the app is more available but data consistency is diminished. This is something we are definitely looking at though – we are continually trying to improve our processes to make Xero the best experience it can be.

Adrian Pearson
July 3, 2009 at 10:31 pm

I have just noticed, logging into Xero, that there is to be more server maintenance (and therefore a short downtime) this coming Sunday. The server maintenance events seem to be coming closer and more frequently. It would be nice to know more information. Knowing if it’s planned or remedial work for instance – just so we know how much or little to be worried!

Rod Drury
July 4, 2009 at 2:08 pm

Hi Adrian. Multi-currency was our biggest release ever. So like all releases we monitor and tweak. The few issues we noticed during the week we’ve fixed but have to wait until the quietest time of the week to apply. MC was only 5 days ago so we’re onto it.

We don’t anticipate another big engine change for a period of time so this is a bit more intense than usual.

With Xero you should be outside on Sunday.

We don’t think we’re doing too bad: http://www.digitalwpc.com/News/Permalink/WPC-Connect-Outage

Cheers

Rod

Sheryl
July 4, 2009 at 3:32 pm

My internet accounting system (not Xero and not by my choice) goes down for maintenance regularly during the day. Despite repeated requests to not have upgrades completed during working hours, it can be quite common for it to be down for more than 30 mins at a time, often at crucial times. On top of that we do not get any warning that the system is going to be down, we start to do something and all of a sudden we get the “down for maintenance” page.
One 45min unscheduled downtime in two years sounds like complete bliss to me!

Ben Kepes
July 4, 2009 at 6:25 pm

Sheryl – sounds interesting – I’d be keen to hear the backstory of the system that you use. Flick me an e on ben AT diversity DOT net DOT enzed

PJ
July 8, 2009 at 11:51 am

Nice example of honesty and transparency in a corporate blog. Do you have set editorial guidelines for what you post on your blog or do you go with the flow?

Rod Drury
July 8, 2009 at 1:47 pm

Our team is small enough that those who blog know our tone of voice and to ‘blog safely’.

Joshua Milne
January 12, 2011 at 12:27 am

Understanding downtime can happen, mirrored hosting to different city/country would be a better solution with data so important to customers.

Are the backups happening to a different location via mirrored hosting or some other method?
If a city got blown up in a war where does your services run from and where can you find the offsite/country backups?

Google and other SaaS Cloud provisions have these areas covered. Can we please get your clear break down regarding this.

Quickbooks online have mirrored servers around the world to ensure uptime. If power goes out it make no difference to users stuck with no server availablity. Can you please comment on this?

Xero is the best accounting software package I have seen for small business!! keep up the great work.

Rod Drury
January 12, 2011 at 10:13 am

Joshua, we believe we are best in class with our redundant technology infrastructure.

See http://www.xero.com/accounting-software/compare/

Some googling will show you that QuickBooks Online has a far from impressive uptime performance.

Rod

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Countdown to Xerocon Melbourne: reason #2 to attend

Reason #2: At Xerocon Melbourne you’ll be inspired by our exhibitors There’s less than two weeks to go until Xerocon Melbourne 2015! We’re still counting down the top reasons to attend for those of you who haven’t been convinced. In case you haven’t heard, Xerocon Melbourne is the biggest accounting tech conference in Australasia. Accountants ...

100 Small Business Guides and 100 ways we help

If you’re looking for tips and tricks on small business accounting, payroll, invoicing (and more), go no further than our Small Business Guides. We’ve been producing these for almost two years. This month we wanted to celebrate the news that we’ve published our 100th guide! Why do we produce the Small Business Guides? We’re aware that ...