Skip to content

Less is more

Let’s say for the sake of illustrating a point that BT has 4M business customers in the UK and if every one of those customers maintained a record of BT’s address details, whether on an accounting system like Xero or just in a simple contacts database then there are theoretically at least 4M separate instances of BT’s supplier account details in use today.

Which is to say there are 3,999,999 duplicate records that in the old unconnected world of business is an unavoidable consequence of necessity. Using some dubious expert guesswork I roughly calculate that one instance of BT’s details would use around 4k of disk space and therefore by theoretical extension all those duplicate BT supplier account records would soak up a combined 14 gigabytes of disk space.But that’s just one big company that has lots of customers and clearly not every single one of the UK’s 4.7 million companies trades with each other.

So, let’s ignore the big old edge cases and instead guess that the average small business – for they make up 98% of the 4.7M – maintains 500 other companies’ customer records in their accounting systems, not including prospect lists or other databases like services, warranty, memberships etc. Using my same dubious guesswork of 4k per record, that throws out a customer record database of about 2 megabytes per company. And if all those 4.7M theoretical 2 megabyte databases were dumped onto a single hard disk, that disk would need to be a not inconsiderable 9,000 gigabytes in size – single hard disks run to about 2,000 gigabytes today.

If all UK company data were stored only once in a centralised cloud database and all systems of record stored  a simple data pointer to each centralised record copy, the collective cloud data-file would be a measly 18 gigabytes.

You can get USB memory sticks larger than that today for twenty quid.

Finally, to get the absolute worst-case theoretical filesize where every UK company traded with every UK company we’d multiply 4.7M companies by 4k per company record to throw out single company’s data filesize of 18 gigabytes which we’d then multiply by 4.7M to arrive at a whopping 8.4 billion gigabytes of collective disk space.

So, a single instance of a universal cloud based database would take up 18 gigabytes and the theoretical worst case offline figure is 8.4 billion gigabytes with the true figure being goodness knows somewhere in between. And that’s before factoring in other records and transactions.

But before this blog post gets totally out of control, my simple observation is this; as we shift ever more into an online digital world whether it be systems of record in business or 10 million personal music libraries containing exact duplicate copies of a single MP3 file of a Lady Gaga track, you have to wonder if we will ever kick this thus far inescapable appetite for epic levels of database redundancy that our legacy IT systems and old world business processes impose.

I hope we do.

 

Read more about Technology, SaaS, United Kingdom

 

7 comments

John H
2 June 2010 #

I think this makes a compelling argument from some perspectives, especially things like energy efficiency and keeping data current but …
Does anyone own the one copy (or few copies) of data? If so, can they change access to it as and when they like? Can they charge for access? Cut me off if I don’t pay? If I link to it from my own organisation’s data can they change the schema or tag metadata wihenever they want to? It’s great if I don’t have to set up and maintain an extensive directory of customer addresses, not so great if I wake up one morning and I don’t know who any of my sales orders are for. I’m genuinely interested in – but still struggling with – this aspect of the cloud and Saas provision in general. I think Xero looks like a great product with some compelling cloud-supported features but I’m still troubled by the possibility that we’re just trading old problems for new.

Ed Molyneux
2 June 2010 #

Hmmm.

Which of BT’s many, many office addresses and contact numbers should each of those customers store, assuming they may be dealing with different areas of BT’s enormous business?

Gosh, if only there were some way of identifying those addresses uniquely, like actually storing the address, and that storage was cheap…

Sometimes lack of standardisation is not the problem, and a ‘universal cloud database’ is not the answer. Unless you’re selling universal cloud databases, that is :->

Martin Gatehouse
2 June 2010 #

I’ll be happy once I get rid of the ‘epic levels of data redundancy’ within my own organisation’s IT systems alone……..

Gary Turner
2 June 2010 #

OK, so this wasn’t a thinly veiled attempt at pitching another of the 101 reasons cloud computing will save the planet (even though it will, obviously) but more some idle observations that originated in thoughts about the current physical necessity of everyone storing their own data, specifically music libraries – versus the much less IT intensive alternative of streaming your music library from the cloud.

It’s all binary data, and interesting to think about the business data aspects. And interesting to speculate what the future role of the big scale database companies like Experian might be. Will an open source, free license database of all UK companies emerge?

Plus I wanted to show off my binary data file calculation skills and this seemed like a great vehicle.

Jon
2 June 2010 #

As you (indirectly) pointed out, storage is too cheap. A one terrabyte hard drive is less than £100 now, and prices keep falling sharply. Hence, just about everyone doesnt care if things ‘take up more space’ – just slap another hard drive into the array.

A much better case could be made for the reducing the work and cost involved in setting up and maintaining the data, rather than the cost of the physical hardware itself.

Gary Turner
2 June 2010 #

Jon – totally agree – my post got too hung up on storage capacity but the process overhead is at the root of what I set out to say.

My iTunes library has a good few thousand tracks in it. If I replace my computer it’s the devil’s own job to move all that. Plus I need to maintain a local backup strategy for it. That’s a massive overhead and one which relies on me being sufficiently tech savvy to undertake.

Same applies to common or public business data or mapping data for that fact.

Can’t help but think a centralised, free, open license database of all businesses will emerge on the web for that kind of stuff.

If we can’t efficiently manage common business data then what chance will we have of managing this coming data explosion – http://bit.ly/a4fiGM.

Stuart Bale
4 June 2010 #

To me, the obvious independent and centrally managed database of all businesses within a region is the tax authority. And they already have assigned a unique ‘key’ to every business.
Wishful thinking I know, but if this was accessible to systems providers to integrate directly with, then a bounty of valuable, time-saving products and services would appear.

Add your comment





We welcome all feedback but prefer a real name and email address.